Seedance 2.0 Video Generation

ByteDance’s next-generation video generation models
Supports doubao-seedance-2-0 and doubao-seedance-2-0-fast
Supports text-to-video, first-frame image-to-video, first-and-last-frame image-to-video, and multimodal reference-to-video
Supports reference images, reference videos, and reference audio
Supports synced audio generation, web search tools, and returning the last frame
Async task workflow with task ID based status queries

Authorizations

Authorization

string

required

All endpoints require Bearer Token authenticationGet your API Key from the API Key Management PageAdd to the request header:

Authorization: Bearer YOUR_API_KEY

Body

model

string

default:"doubao-seedance-2-0"

required

Video generation model nameAvailable models:

doubao-seedance-2-0 - Standard version focused on higher quality, supports 4-15 second output
doubao-seedance-2-0-fast - Faster version for preview and iteration, supports 4-12 second output

prompt

string

Video promptSupports both Chinese and English input. Describe scene, camera movement, subject actions, style, and audio atmosphere as clearly as possible.Recommendations:

Keep Chinese prompts within about 500 characters
Keep English prompts within about 1000 words
When referring to uploaded assets, use labels like “image 1”, “video 1”, or “audio 1”

Example:

"Use video 1 for the POV composition throughout, start from image 1, end on image 2, and preserve the rhythm and mood from audio 1"

duration

integer

default:5

Video duration in secondsAllowed values:

doubao-seedance-2-0: 4-15
doubao-seedance-2-0-fast: 4-12
-1: automatic duration selected by the model

doubao-seedance-2-0-fast does not support durations longer than 12 seconds.

aspect_ratio

string

default:"adaptive"

Video aspect ratioOptions:

21:9
16:9
4:3
1:1
3:4
9:16
adaptive

adaptive behavior:

Text-to-video: the model chooses the most suitable ratio from the prompt
First-frame or first-and-last-frame image-to-video: derived from the first frame
Multimodal reference-to-video: typically prioritizes reference video, then reference image

image_urls

string[]

Image URL array in compatibility modeimage_with_roles is recommended for explicit control.image_urls and image_with_roles should not be used together.

image_with_roles

array

Image array with rolesSupported patterns:

First-frame image-to-video: one first_frame
First-and-last-frame image-to-video: one first_frame plus one last_frame
Multimodal reference-to-video: reference_image entries, 1-9 items

Show Field Description

url

string

required

Image URL, Base64 data, or uploaded asset URISupported formats:

https://...
data:image/<format>;base64,...
asset://<ASSET_ID>

role

string

required

Image roleOptions:

first_frame
last_frame
reference_image

Image requirements:

Formats: jpeg, png, webp, bmp, tiff, gif
Per-image size: less than 30MB
Total request size: recommended within 64MB
Aspect ratio: about 0.4 to 2.5
Dimensions: about 300px to 6000px

First-frame and first-and-last-frame modes cannot be mixed with reference_image, reference_video, or reference_audio
Only one first_frame and one last_frame are allowed
In multimodal reference mode, all images should use reference_image

video_with_roles

array

Video array with rolesCurrently only reference_video is supported for multimodal reference mode.

Show Field Description

url

string

required

Video URL or uploaded asset URISupported formats:

https://...
asset://<ASSET_ID>

role

string

required

Fixed value:

reference_video

Video requirements:

Formats: mp4, mov
Resolution: 480p or 720p
Per-video duration: 2-15 seconds
Maximum count: 3
Total reference video duration: no more than 15 seconds
Per-video size: less than 50MB
Frame rate: about 24-60 FPS

audio_with_roles

array

Audio array with rolesCurrently only reference_audio is supported for multimodal reference mode.

Show Field Description

url

string

required

Audio URL, Base64 data, or uploaded asset URISupported formats:

https://...
data:audio/<format>;base64,...
asset://<ASSET_ID>

role

string

required

Fixed value:

reference_audio

Audio requirements:

Formats: wav, mp3
Per-audio duration: 2-15 seconds
Maximum count: 3
Total reference audio duration: no more than 15 seconds
Per-audio size: less than 15MB

audio_with_roles cannot be used alone. At least one image or video reference is also required.

metadata

object

Extended parameters

Show Field Description

resolution

string

default:"720p"

Video resolutionOptions:

480p
720p

generate_audio

boolean

default:true

Whether to generate synced audio

true: returns a video with audio
false: returns a silent video

The model can generate dialogue, sound effects, and background music from the prompt and visual cues.

return_last_frame

boolean

default:false

Whether to also return the last frame image in the result

tools

array

Tool configurationCurrently supported:

[{ "type": "web_search" }]

Notes:

Best suited for text-to-video requests
The model will search only when needed, which may improve freshness but also add latency

seed

integer

Random seed for generation control

Input Combination Rules

Typical supported combinations:

Text only: text-to-video
Text + one first-frame image: first-frame image-to-video
Text + first-frame image + last-frame image: first-and-last-frame image-to-video
Text + reference images: multimodal reference-to-video
Text + reference videos: video-guided reference-to-video
Text + reference images + reference audio: multimodal reference-to-video
Text + reference images + reference videos + reference audio: multimodal reference-to-video

These three modes are mutually exclusive:

First-frame image-to-video
First-and-last-frame image-to-video
Multimodal reference-to-video

If you need strict first-frame and last-frame control, prefer first_frame and last_frame. If you need broader reference guidance, use reference_image, reference_video, and reference_audio.

Resolution and Aspect Ratio Pixel Map

Resolution	Aspect Ratio	Pixel Size
`480p`	`16:9`	`864x496`
`480p`	`4:3`	`752x560`
`480p`	`1:1`	`640x640`
`480p`	`3:4`	`560x752`
`480p`	`9:16`	`496x864`
`480p`	`21:9`	`992x432`
`720p`	`16:9`	`1280x720`
`720p`	`4:3`	`1112x834`
`720p`	`1:1`	`960x960`
`720p`	`3:4`	`834x1112`
`720p`	`9:16`	`720x1280`
`720p`	`21:9`	`1470x630`

Capabilities and Constraints

Item	Seedance 2.0	Seedance 2.0 Fast
Positioning	Higher quality	Faster generation and lower cost
Duration	`4-15` seconds, or `-1` auto	`4-12` seconds, or `-1` auto
Resolution	`480p` / `720p`	`480p` / `720p`
Image roles	`first_frame` / `last_frame` / `reference_image`	`first_frame` / `last_frame` / `reference_image`
Video roles	`reference_video`	`reference_video`
Audio roles	`reference_audio`	`reference_audio`
Audio generation	`metadata.generate_audio`	`metadata.generate_audio`
Tools	`metadata.tools`	`metadata.tools`
Return last frame	`metadata.return_last_frame`	`metadata.return_last_frame`

Pricing is billed per second. Final display prices may vary by model version, resolution, and marketplace strategy. Please refer to the model pricing page.

Response

string

Unique task identifier for status queries

object

string

Object type, always generation.task

model

string

Model name used

status

string

Task status

queued - Queued
in_progress - Processing
completed - Completed successfully
failed - Failed

progress

integer

Task progress percentage (0-100)

created_at

integer

Task creation timestamp (Unix timestamp)

metadata

object

Task metadata

Video generation is asynchronous. The create call returns a task ID, and you can use Get Video Task Status to poll for progress and results.

curl --request POST \
  --url https://toapis.com/v1/videos/generations \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "doubao-seedance-2-0",
    "prompt": "Use video 1 for the POV composition throughout, use audio 1 as the background bed, start from image 1, and end on image 2 with a fresh commercial tone.",
    "duration": 11,
    "aspect_ratio": "16:9",
    "image_with_roles": [
      {"url": "https://example.com/ref-image-1.jpg", "role": "reference_image"},
      {"url": "https://example.com/ref-image-2.jpg", "role": "reference_image"}
    ],
    "video_with_roles": [
      {"url": "https://example.com/ref-video-1.mp4", "role": "reference_video"}
    ],
    "audio_with_roles": [
      {"url": "https://example.com/ref-audio-1.mp3", "role": "reference_audio"}
    ],
    "metadata": {
      "resolution": "720p",
      "generate_audio": true
    }
  }'

Overview

Quick Start

Chat API

Image API

Video API

Task Management

File Uploads

Account

Authorizations

Body

Input Combination Rules

Resolution and Aspect Ratio Pixel Map

Capabilities and Constraints

Response

Overview

Quick Start

Chat API

Image API

Video API

Task Management

File Uploads

Account

​Authorizations

​Body

​Input Combination Rules

​Resolution and Aspect Ratio Pixel Map

​Capabilities and Constraints

​Response

Authorizations

Body

Input Combination Rules

Resolution and Aspect Ratio Pixel Map

Capabilities and Constraints

Response