Skip to main content
POST
/
v1
/
videos
/
generations
Kling v3 Video Generation
curl --request POST \
  --url https://toapis.com/v1/videos/generations \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "prompt": "<string>",
  "mode": "<string>",
  "duration": 123,
  "aspect_ratio": "<string>",
  "reference_images": [
    "<string>"
  ],
  "image_with_roles": [
    {
      "url": "<string>",
      "role": "<string>"
    }
  ],
  "audio": true,
  "metadata": {
    "negative_prompt": "<string>",
    "watermark": true
  }
}
'
  • Async task API, returns a task ID after submission
  • Supports text-to-video, image-to-video, explicit first/last frame control, and audio video
  • mode=std maps to 720P, mode=pro maps to 1080P
  • audio=true generates an audio video and is billed as Sound
  • Text-to-video supports 15 seconds; image-to-video supports up to 10 seconds
Use publicly accessible image URLs. Do not pass base64 image data. Upload local images with the Upload Image API first.

Authorization

Authorization
string
required
All endpoints require Bearer Token authentication.
Authorization: Bearer YOUR_API_KEY

Request Parameters

model
string
required
Video generation model name, fixed as kling-v3.
prompt
string
required
Text prompt. Describe the subject, action, scene, camera movement, and style.
mode
string
default:"std"
Generation mode.
  • std - standard mode, 720P
  • pro - professional mode, 1080P
duration
integer
default:"5"
Video duration in seconds.Options: 5, 10, 15
15 seconds is text-to-video only. Requests with input images support up to 10 seconds.
aspect_ratio
string
default:"16:9"
Video aspect ratio. Common values: 16:9, 9:16, 1:1
reference_images
string[]
Normal reference images.
  • These images are treated as references only
  • They are not automatically converted into first/last frames
  • Use image_with_roles for explicit frame control
image_with_roles
object[]
Explicit image-role array for frame control and mixed inputs.
last_frame is only sent when explicitly declared in image_with_roles. The system no longer infers the last frame from reference_images[1].
audio
boolean
default:"false"
Whether to generate an audio video.
metadata
object
Extended parameters.

Input Rules

Input shapeBehavior
reference_images onlyNormal references
image_with_roles with only first_frame / last_frameFrame control
Both fields used together, or roles include both frame and reference semanticsMixed mode

Examples

Text-to-Video

{
  "model": "kling-v3",
  "prompt": "A golden cat running on a sunlit meadow, slow motion, cinematic quality",
  "mode": "std",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Image Reference

{
  "model": "kling-v3",
  "prompt": "Use the reference character and animate a subtle smile",
  "reference_images": ["https://example.com/reference.jpg"],
  "mode": "std",
  "duration": 5
}

First and Last Frame Control

{
  "model": "kling-v3",
  "prompt": "The city naturally transitions from day to night",
  "image_with_roles": [
    { "url": "https://example.com/day.jpg", "role": "first_frame" },
    { "url": "https://example.com/night.jpg", "role": "last_frame" }
  ],
  "mode": "pro",
  "duration": 5
}

Mixed Reference and Frame Input

{
  "model": "kling-v3",
  "prompt": "Keep the character identity consistent while transitioning scenes",
  "reference_images": ["https://example.com/character-reference.jpg"],
  "image_with_roles": [
    { "url": "https://example.com/start-scene.jpg", "role": "first_frame" },
    { "url": "https://example.com/end-scene.jpg", "role": "last_frame" }
  ],
  "mode": "pro",
  "duration": 5
}

Audio Video

{
  "model": "kling-v3",
  "prompt": "A singer performing on stage, crowd cheering, flashing lights",
  "mode": "std",
  "duration": 5,
  "audio": true
}
Video generation is asynchronous. Use the Get Video Task Status endpoint to query progress and results.