Create speech

Authorizations

Authorization

string

header

required

Your Boson API key, sent as Authorization: Bearer $BOSON_API_KEY.

Body

application/json

input

string

required

Text to convert to speech. May contain inline tags. Inputs longer than 5000 characters return a 400 input_too_long.

Required string length: 1 - 5000

Example:

"Hello, this is a test."

model

enum<string>

default:higgs-audio-v3-tts

TTS model ID / public alias. Resolved to the served model server-side.

Available options:

higgs-audio-v3-tts

voice

string

default:default

Preset voice name or custom voice ID. Mutually exclusive with ref_audio / ref_text when explicitly provided.

response_format

enum<string>

default:mp3

Output audio format. Streaming requires pcm.

Available options:

mp3,

opus,

pcm,

wav,

aac,

flac

stream

boolean

default:false

If true, stream raw PCM chunks as they are decoded. Requires response_format to be pcm. Speed adjustment is not supported when streaming.

ref_audio

string | null

Inline reference audio for one-off cloning: an http(s) URL, data URI, or base64-encoded raw audio bytes. Supported formats: AAC, WAV, MP3, FLAC, OPUS. Inline (base64 / data-URI) payloads: max 10 MB.

ref_text

string | null

Recommended transcript of ref_audio.

Response

Generated audio. The content type depends on response_format.

The response is of type file.