Create a video
Create an avatar talking-head video (async). Returns the Video object with status: "queued"; poll GET /v1/videos/{video_id} and download the rendered MP4 from GET /v1/videos/{video_id}/content. Provide a reference image plus exactly one driving input — input (audio-to-video) or input_tts (text-to-video). The body may be JSON or multipart/form-data (upload ref_image / input as raw files).
Authorizations
Your Boson API key, sent as Authorization: Bearer $BOSON_API_KEY.
Body
Provide a ref_image plus exactly one driving input: input (audio-to-video) or input_tts (text-to-video).
Reference image (the face to animate): an http(s) URL, data URI, or base64-encoded raw image bytes. Supported formats: PNG, JPEG, WEBP. Inline (base64 / data-URI) payloads: max 10 MB.
Avatar model ID / public alias.
higgs-avatar Audio-to-video: the driving speech audio as an http(s) URL, data URI, or base64-encoded raw audio bytes. Supported formats: AAC, WAV, MP3, FLAC, OPUS. Max duration: 60 s (it sets the output video length). Provide exactly one of input / input_tts.
Text-to-video: a speech request (the same body as POST /v1/audio/speech). The gateway synthesizes the voice and the avatar lip-syncs to it. The nested stream field is not supported. Provide exactly one of input / input_tts.
Output video size (WxH): square 640x640, landscape 640x480, or portrait 480x640.
640x640, 640x480, 480x640 Response
The created Video object.
A video generation job (the create / retrieve response).
Video ID, e.g. video_8a1f....
"video_8a1f2c3d4e5f6a7b8c9d0e1f"
video Job status.
queued, in_progress, completed, failed Completion percentage (0–100).
Output size (WxH), e.g. 640x640.
Unix timestamp (seconds).
Error message when status is failed.