Skip to main content
POST
/
v1
/
audio
/
speech
Create speech
curl --request POST \
  --url https://api.boson.ai/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "input": "Hello, this is a test.",
  "model": "higgs-audio-v3-tts",
  "voice": "default",
  "response_format": "mp3",
  "stream": false,
  "ref_audio": "<string>",
  "ref_text": "<string>"
}
'
"<string>"

Authorizations

Authorization
string
header
required

Your Boson API key, sent as Authorization: Bearer $BOSON_API_KEY.

Body

application/json
input
string
required

Text to convert to speech. May contain inline tags. Inputs longer than 5000 characters return a 400 input_too_long.

Required string length: 1 - 5000
Example:

"Hello, this is a test."

model
enum<string>
default:higgs-audio-v3-tts

TTS model ID / public alias. Resolved to the served model server-side.

Available options:
higgs-audio-v3-tts
voice
string
default:default

Preset voice name or custom voice ID. Mutually exclusive with ref_audio / ref_text when explicitly provided.

response_format
enum<string>
default:mp3

Output audio format. Streaming requires pcm.

Available options:
mp3,
opus,
pcm,
wav,
aac,
flac
stream
boolean
default:false

If true, stream raw PCM chunks as they are decoded. Requires response_format to be pcm. Speed adjustment is not supported when streaming.

ref_audio
string | null

Inline reference audio for one-off cloning: an http(s) URL, data URI, or base64-encoded raw audio bytes. Supported formats: AAC, WAV, MP3, FLAC, OPUS. Inline (base64 / data-URI) payloads: max 10 MB.

ref_text
string | null

Recommended transcript of ref_audio.

Response

Generated audio. The content type depends on response_format.

The response is of type file.