Kling v2.6 Video Generation - ToAPIs Documentation

Async processing mode, returns task ID for subsequent queries
Supports text-to-video, image-to-video, explicit first/last frame control, and audio video
Supports standard mode (720P) and professional mode (1080P)
Professional mode supports automatic audio generation

Authorization

string

required

All API endpoints require Bearer Token authentication.

Authorization: Bearer YOUR_API_KEY

Request Parameters

model

string

required

Video generation model name, fixed as kling-v2-6.

prompt

string

required

Text prompt, maximum 2500 characters.

mode

string

default:"std"

Generation mode.

std - standard mode (720P, silent video only)
pro - professional mode (1080P, supports automatic audio generation)

duration

integer

default:"5"

Video duration in seconds.Options: 5 or 10

aspect_ratio

string

default:"16:9"

Video aspect ratio. Common values: 16:9, 9:16, 1:1

reference_images

string[]

Normal reference images.

These images are treated as references only
They are not automatically converted into first/last frames
Use image_with_roles for explicit frame control

image_with_roles

object[]

Explicit image-role array for frame control and mixed inputs.

Show Show image_with_roles object fields

url

string

required

Publicly accessible image URL.

role

string

required

Image role.Supported values:

first_frame
last_frame
reference
reference_image

last_frame is only sent when explicitly declared in image_with_roles. The system no longer infers the last frame from reference_images[1].

negative_prompt

string

Negative prompt to exclude unwanted content.

audio

boolean

default:"false"

Whether to automatically generate audio.

Available only in mode: "pro".

watermark

boolean

Whether to add watermark.

Input Rules

Input shape	Behavior
`reference_images` only	Normal references
`image_with_roles` with only `first_frame` / `last_frame`	Frame control
Both fields used together, or roles include both frame and reference semantics	Mixed mode

Examples

Text-to-Video

{
  "model": "kling-v2-6",
  "prompt": "A golden cat running on a sunlit meadow, slow motion, cinematic quality",
  "mode": "std",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Image Reference

{
  "model": "kling-v2-6",
  "prompt": "Animate the referenced portrait with a subtle smile",
  "reference_images": ["https://example.com/reference.jpg"],
  "mode": "std",
  "duration": 5
}

First and Last Frame Control

{
  "model": "kling-v2-6",
  "prompt": "City timelapse transitioning from day to night",
  "image_with_roles": [
    { "url": "https://example.com/day-city.jpg", "role": "first_frame" },
    { "url": "https://example.com/night-city.jpg", "role": "last_frame" }
  ],
  "mode": "pro",
  "duration": 5
}

Mixed Reference and Frame Input

{
  "model": "kling-v2-6",
  "prompt": "Keep the character identity while moving from one scene to another",
  "reference_images": ["https://example.com/character-reference.jpg"],
  "image_with_roles": [
    { "url": "https://example.com/start-scene.jpg", "role": "first_frame" },
    { "url": "https://example.com/end-scene.jpg", "role": "last_frame" }
  ],
  "mode": "pro",
  "duration": 5
}

Pro Mode + Auto Audio

{
  "model": "kling-v2-6",
  "prompt": "Waves crashing against rocks, seagulls circling in the sky, lighthouse in the distance",
  "mode": "pro",
  "duration": 10,
  "audio": true,
  "aspect_ratio": "16:9"
}

Video generation is an async task. Use the Get Video Task Status endpoint to query progress and results.

​Authorization

​Request Parameters

​Input Rules

​Examples

​Text-to-Video

​Image Reference

​First and Last Frame Control

​Mixed Reference and Frame Input

​Pro Mode + Auto Audio

Authorization

Request Parameters

Input Rules

Examples

Text-to-Video

Image Reference

First and Last Frame Control

Mixed Reference and Frame Input

Pro Mode + Auto Audio