Skip to content
English

Kling 3.0 Omni T2V API

POST /v1/tasks

All models are called through the Unified Async API POST /v1/tasks endpoint; only the input fields differ (see input parameters below).

Model summary

Model name kling-3.0-omni/text-to-video
Type Video generation (text-to-video)
Endpoint POST /v1/tasks
Pricing See HiAPI Pricing

Kling 3.0 Omni text-to-video API by Kuaishou Kling. Generate cinematic clips from a text prompt with native audio (multilingual dialogue and lip sync), multi-shot storytelling, 720p / 1080p / 4K resolutions, and 3-15 second durations.

Production guidance

Production guidance
  • For production, pass callback.url at the top level of the request body so HiAPI can notify your service when the task reaches a terminal state.
  • GET /v1/tasks/:id is better for local debugging, low-volume jobs, or fallback reconciliation if a callback is missed.
  • Use callback.when=final. Both success and fail are terminal states, so your service should deduplicate by taskId.

Best suited for

Sound-on finished clips

Generate visuals and synchronized native audio in one call — multilingual dialogue with lip sync — skipping post-production scoring. Good for social shorts and ads.

promptsound
Multi-shot storytelling

Produce coherent narratives with shot transitions from a single prompt, ideal for story-driven shorts, storyboard checks, and creative drafts.

promptduration
4K high-resolution output

720p / 1080p / 4K tiers — pick 4K for high-resolution delivery or big-screen display (billed per second; 4K costs the most).

resolution
Ratio and duration fit

Adapt to placements with aspect ratio and 3-15 second durations — one prompt covers 16:9 feeds and 9:16 vertical shorts.

aspect_ratioduration

Request parameters

model string required

Fixed value kling-3.0-omni/text-to-video.

example kling-3.0-omni/text-to-video
input object required

Business parameters. Put Kling 3.0 Omni T2V-specific configuration here.

prompt string required

Text prompt describing the video to generate.

resolution enum optional

Output resolution. Higher resolution costs more.

default 1080p enum: 720p1080p4K
aspect_ratio enum optional

Output video aspect ratio.

default 16:9 enum: 16:99:161:1
duration integer optional

Clip length in seconds (3-15). Cost scales with duration.

default 5
sound boolean optional

Generate synchronized audio (effects/ambience). Costs more when enabled.

default false
callback object optional

Optional callback configuration. When set, HiAPI notifies your service when the task reaches a terminal state.

url string required

Required when callback is set; HTTPS URL that receives terminal task notifications.

example https://your-domain.com/hiapi/callback
when enum optional

Callback trigger timing. Use final.

default final enum: final

Example requests

Cinematic snow fox

1080p / 16:9 / 5s with native audio, for narrative shots.

Request body
{
  "model": "kling-3.0-omni/text-to-video",
  "input": {
    "prompt": "A red fox sprints across a windswept snowy ridge at golden hour, powder snow flying, cinematic side light, smooth tracking shot",
    "aspect_ratio": "16:9",
    "resolution": "1080p",
    "duration": 5,
    "sound": true
  }
}
4K natural wonder

4K / 16:9 / 5s ultra-high-resolution clip for big screens and HD delivery.

Request body
{
  "model": "kling-3.0-omni/text-to-video",
  "input": {
    "prompt": "Aerial over an Iceland black-sand beach, waves crashing against basalt columns, mist in the morning light, slow push-in, cinematic cool tones",
    "aspect_ratio": "16:9",
    "resolution": "4K",
    "duration": 5,
    "sound": false
  }
}
Vertical city night

720p / 9:16 / 8s vertical short — lower cost.

Request body
{
  "model": "kling-3.0-omni/text-to-video",
  "input": {
    "prompt": "A neon street at night after rain, shimmering reflections on the ground, cyberpunk mood, slow pan",
    "aspect_ratio": "9:16",
    "resolution": "720p",
    "duration": 8,
    "sound": true
  }
}

Getting the result

  1. The response returns a taskId immediately without waiting for generation to finish.
  2. In production, prefer waiting for callback.url to receive the terminal notification. For local debugging, poll GET /v1/tasks/:id.
  3. When status=success, download the generated video from output[].url.
  4. When status=fail, fix the request based on the returned error instead of retrying the same invalid payload.

FAQ

Which resolutions and durations does Kling 3.0 Omni text-to-video support?

Resolutions 720p / 1080p / 4K and durations 3-15 seconds (default 5). Pricing is billed by resolution, duration (per second), and whether native audio is enabled; 4K costs the most and sound-on tiers cost more than sound-off. See the live pricing page for current rates.

Does the generated video include audio?

Controlled by the sound parameter. When enabled it generates native audio synchronized with the visuals (multilingual dialogue and lip sync); when disabled it outputs visuals only. Sound-on tiers cost more.

How do I get the generated video?

The response returns a taskId immediately. When the task reaches a terminal state, download the video from output[].url. In production, pass callback.url at the top level to receive terminal notifications and avoid polling.

Does it support image-to-video?

This model is text-to-video only. Use kling-3.0-omni/image-to-video to drive generation from a first-frame image.

Next steps