Kling v3 Video Generation
Kling v3
Kling v3 Video Generation
Generate videos with official Kling v3 using explicit reference-image and frame-role semantics
POST
Kling v3 Video Generation
- Async task API, returns a task ID after submission
- Supports text-to-video, image-to-video, explicit first/last frame control, and audio video
mode=stdmaps to 720P,mode=promaps to 1080Paudio=truegenerates an audio video and is billed as Sound- Text-to-video supports 15 seconds; image-to-video supports up to 10 seconds
Authorization
All endpoints require Bearer Token authentication.
Request Parameters
Video generation model name, fixed as
kling-v3.Text prompt. Describe the subject, action, scene, camera movement, and style.
Generation mode.
std- standard mode, 720Ppro- professional mode, 1080P
Video duration in seconds.Options:
5, 10, 15Video aspect ratio. Common values:
16:9, 9:16, 1:1Normal reference images.
- These images are treated as references only
- They are not automatically converted into first/last frames
- Use
image_with_rolesfor explicit frame control
Explicit image-role array for frame control and mixed inputs.
Whether to generate an audio video.
Extended parameters.
Input Rules
| Input shape | Behavior |
|---|---|
reference_images only | Normal references |
image_with_roles with only first_frame / last_frame | Frame control |
| Both fields used together, or roles include both frame and reference semantics | Mixed mode |
Examples
Text-to-Video
Image Reference
First and Last Frame Control
Mixed Reference and Frame Input
Audio Video
Video generation is asynchronous. Use the Get Video Task Status endpoint to query progress and results.