Supports three modes: Text-to-Video, Image-to-Video, and Reference Video (r2v)
Server automatically routes to the appropriate upstream model based on your request parameters
Supports 720p/1080p resolution, 5/10/15 second duration
Audio is always included in the generated output
Important Change: For better performance and cost control, we no longer support passing base64 image data directly in image_urls. Please use the Upload Image API first to upload images and get URLs, then call this endpoint.
Video content descriptionRequired for text-to-video mode; optional for image-to-video and reference video modes (describe expected motion or style)Example: "A cute cat stretching in the sunshine"
Reference Video (r2v) mode — array of reference video URLsWhen provided, the server routes to the reference video upstream model (wan2.6-r2v). The model uses these videos to guide generation of consistent characters or scenes.
Each entry must be a publicly accessible video URL (http:// or https://)
Up to several reference videos are supported
Example: ["https://cdn.example.com/ref1.mp4"]Note: Cannot be combined with image_urls
{ "model": "wan2.6", "prompt": "The character waves and smiles at the camera", "metadata": { "reference_urls": ["https://cdn.example.com/ref-character.mp4"] }, "shot_type": "single", "resolution": "1080p", "duration": 5}
{ "model": "wan2.6", "prompt": "A golden retriever runs through a field of sunflowers", "negative_prompt": "blurry, low quality, deformed", "aspect_ratio": "16:9", "resolution": "1080p", "duration": 10, "seed": 12345, "prompt_extend": true, "shot_type": "multi", "watermark": false}
Querying Task ResultsVideo generation is an asynchronous task. After submission, a task_id is returned. Use the Get Task Status interface to query generation progress and results.