Kling 3.0 Omni T2V API
/v1/tasks All models are called through the Unified Async API POST /v1/tasks endpoint; only the input fields differ (see input parameters below).
Model summary
| Model name | kling-3.0-omni/text-to-video |
|---|---|
| Type | Video generation (text-to-video) |
| Endpoint | POST /v1/tasks |
| Pricing | See HiAPI Pricing |
Kling 3.0 Omni text-to-video API by Kuaishou Kling. Generate cinematic clips from a text prompt with native audio (multilingual dialogue and lip sync), multi-shot storytelling, 720p / 1080p / 4K resolutions, and 3-15 second durations.
Production guidance
- For production, pass callback.url at the top level of the request body so HiAPI can notify your service when the task reaches a terminal state.
- GET /v1/tasks/:id is better for local debugging, low-volume jobs, or fallback reconciliation if a callback is missed.
- Use callback.when=final. Both success and fail are terminal states, so your service should deduplicate by taskId.
Best suited for
Generate visuals and synchronized native audio in one call — multilingual dialogue with lip sync — skipping post-production scoring. Good for social shorts and ads.
promptsoundProduce coherent narratives with shot transitions from a single prompt, ideal for story-driven shorts, storyboard checks, and creative drafts.
promptduration720p / 1080p / 4K tiers — pick 4K for high-resolution delivery or big-screen display (billed per second; 4K costs the most).
resolutionAdapt to placements with aspect ratio and 3-15 second durations — one prompt covers 16:9 feeds and 9:16 vertical shorts.
aspect_ratiodurationRequest parameters
model string required Fixed value kling-3.0-omni/text-to-video.
input object required Business parameters. Put Kling 3.0 Omni T2V-specific configuration here.
prompt string required Text prompt describing the video to generate.
resolution enum optional Output resolution. Higher resolution costs more.
aspect_ratio enum optional Output video aspect ratio.
duration integer optional Clip length in seconds (3-15). Cost scales with duration.
sound boolean optional Generate synchronized audio (effects/ambience). Costs more when enabled.
callback object optional Optional callback configuration. When set, HiAPI notifies your service when the task reaches a terminal state.
url string required Required when callback is set; HTTPS URL that receives terminal task notifications.
when enum optional Callback trigger timing. Use final.
Example requests
1080p / 16:9 / 5s with native audio, for narrative shots.
{
"model": "kling-3.0-omni/text-to-video",
"input": {
"prompt": "A red fox sprints across a windswept snowy ridge at golden hour, powder snow flying, cinematic side light, smooth tracking shot",
"aspect_ratio": "16:9",
"resolution": "1080p",
"duration": 5,
"sound": true
}
}4K / 16:9 / 5s ultra-high-resolution clip for big screens and HD delivery.
{
"model": "kling-3.0-omni/text-to-video",
"input": {
"prompt": "Aerial over an Iceland black-sand beach, waves crashing against basalt columns, mist in the morning light, slow push-in, cinematic cool tones",
"aspect_ratio": "16:9",
"resolution": "4K",
"duration": 5,
"sound": false
}
}720p / 9:16 / 8s vertical short — lower cost.
{
"model": "kling-3.0-omni/text-to-video",
"input": {
"prompt": "A neon street at night after rain, shimmering reflections on the ground, cyberpunk mood, slow pan",
"aspect_ratio": "9:16",
"resolution": "720p",
"duration": 8,
"sound": true
}
}Getting the result
- The response returns a taskId immediately without waiting for generation to finish.
- In production, prefer waiting for callback.url to receive the terminal notification. For local debugging, poll GET /v1/tasks/:id.
- When status=success, download the generated video from output[].url.
- When status=fail, fix the request based on the returned error instead of retrying the same invalid payload.
FAQ
Which resolutions and durations does Kling 3.0 Omni text-to-video support?
Resolutions 720p / 1080p / 4K and durations 3-15 seconds (default 5). Pricing is billed by resolution, duration (per second), and whether native audio is enabled; 4K costs the most and sound-on tiers cost more than sound-off. See the live pricing page for current rates.
Does the generated video include audio?
Controlled by the sound parameter. When enabled it generates native audio synchronized with the visuals (multilingual dialogue and lip sync); when disabled it outputs visuals only. Sound-on tiers cost more.
How do I get the generated video?
The response returns a taskId immediately. When the task reaches a terminal state, download the video from output[].url. In production, pass callback.url at the top level to receive terminal notifications and avoid polling.
Does it support image-to-video?
This model is text-to-video only. Use kling-3.0-omni/image-to-video to drive generation from a first-frame image.