Create speech
Generate speech audio from text. Returns an audio file, or a stream of raw PCM chunks when stream is true.
Authorizations
Your Boson API key, sent as Authorization: Bearer $BOSON_API_KEY.
Body
Text to convert to speech. May contain inline tags. Inputs longer than 5000 characters return a 400 input_too_long.
1 - 5000"Hello, this is a test."
TTS model ID / public alias. Resolved to the served model server-side.
higgs-audio-v3-tts Preset voice name or custom voice ID. Mutually exclusive with ref_audio / ref_text when explicitly provided.
Output audio format. Streaming requires pcm.
mp3, opus, pcm, wav, aac, flac If true, stream raw PCM chunks as they are decoded. Requires response_format to be pcm. Speed adjustment is not supported when streaming.
Inline reference audio for one-off cloning: an http(s) URL, data URI, or base64-encoded raw audio bytes. Supported formats: AAC, WAV, MP3, FLAC, OPUS. Inline (base64 / data-URI) payloads: max 10 MB.
Recommended transcript of ref_audio.
Response
Generated audio. The content type depends on response_format.
The response is of type file.