Skip to main content
POST
/
v1
/
videos
/
generations
Kling v2.6 Video Generation
curl --request POST \
  --url https://toapis.com/v1/videos/generations \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "prompt": "<string>",
  "mode": "<string>",
  "duration": 123,
  "aspect_ratio": "<string>",
  "reference_images": [
    "<string>"
  ],
  "image_with_roles": [
    {
      "url": "<string>",
      "role": "<string>"
    }
  ],
  "negative_prompt": "<string>",
  "audio": true,
  "watermark": true
}
'
  • Async processing mode, returns task ID for subsequent queries
  • Supports text-to-video, image-to-video, explicit first/last frame control, and audio video
  • Supports standard mode (720P) and professional mode (1080P)
  • Professional mode supports automatic audio generation

Authorization

Authorization
string
required
All API endpoints require Bearer Token authentication.
Authorization: Bearer YOUR_API_KEY

Request Parameters

model
string
required
Video generation model name, fixed as kling-v2-6.
prompt
string
required
Text prompt, maximum 2500 characters.
mode
string
default:"std"
Generation mode.
  • std - standard mode (720P, silent video only)
  • pro - professional mode (1080P, supports automatic audio generation)
duration
integer
default:"5"
Video duration in seconds.Options: 5 or 10
aspect_ratio
string
default:"16:9"
Video aspect ratio. Common values: 16:9, 9:16, 1:1
reference_images
string[]
Normal reference images.
  • These images are treated as references only
  • They are not automatically converted into first/last frames
  • Use image_with_roles for explicit frame control
image_with_roles
object[]
Explicit image-role array for frame control and mixed inputs.
last_frame is only sent when explicitly declared in image_with_roles. The system no longer infers the last frame from reference_images[1].
negative_prompt
string
Negative prompt to exclude unwanted content.
audio
boolean
default:"false"
Whether to automatically generate audio.
Available only in mode: "pro".
watermark
boolean
Whether to add watermark.

Input Rules

Input shapeBehavior
reference_images onlyNormal references
image_with_roles with only first_frame / last_frameFrame control
Both fields used together, or roles include both frame and reference semanticsMixed mode

Examples

Text-to-Video

{
  "model": "kling-v2-6",
  "prompt": "A golden cat running on a sunlit meadow, slow motion, cinematic quality",
  "mode": "std",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Image Reference

{
  "model": "kling-v2-6",
  "prompt": "Animate the referenced portrait with a subtle smile",
  "reference_images": ["https://example.com/reference.jpg"],
  "mode": "std",
  "duration": 5
}

First and Last Frame Control

{
  "model": "kling-v2-6",
  "prompt": "City timelapse transitioning from day to night",
  "image_with_roles": [
    { "url": "https://example.com/day-city.jpg", "role": "first_frame" },
    { "url": "https://example.com/night-city.jpg", "role": "last_frame" }
  ],
  "mode": "pro",
  "duration": 5
}

Mixed Reference and Frame Input

{
  "model": "kling-v2-6",
  "prompt": "Keep the character identity while moving from one scene to another",
  "reference_images": ["https://example.com/character-reference.jpg"],
  "image_with_roles": [
    { "url": "https://example.com/start-scene.jpg", "role": "first_frame" },
    { "url": "https://example.com/end-scene.jpg", "role": "last_frame" }
  ],
  "mode": "pro",
  "duration": 5
}

Pro Mode + Auto Audio

{
  "model": "kling-v2-6",
  "prompt": "Waves crashing against rocks, seagulls circling in the sky, lighthouse in the distance",
  "mode": "pro",
  "duration": 10,
  "audio": true,
  "aspect_ratio": "16:9"
}
Video generation is an async task. Use the Get Video Task Status endpoint to query progress and results.