Skip to main content

Features

  • Chat-native, low-latency streaming — begin speaking before the full input is finalized.
  • 100 languages — single-digit WER/CER coverage. See Languages.
  • Instant voice cloning — zero-shot from a short reference clip and its transcript. See Voices.
  • Inline control tags — shape emotion, style, prosody, and sound effects with <|emotion:…|>, <|style:…|>, <|prosody:…|>, and <|sfx:…|>. See Tags.

Try it in the playground

The fastest way to hear the model is the playground. Pick a voice, paste text, and press play.

Generate speech with the API

Higgs Audio TTS is in public preview. API usage is currently free and rate-limited while we improve reliability, latency, and model quality.
Set the API key in your shell for the current session:
export BOSON_API_KEY=bai-xxxx
A minimal request needs Authorization, model, and input. Everything else is optional.
curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-audio-v3-tts",
    "input": "Hello, this is a test."
  }' \
  --output out.mp3

Use a preset voice

Use voice to choose a preset speaker.
cURL
curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-audio-v3-tts",
    "input": "Hello, this is a test.",
    "voice": "jake"
  }' \
  --output out.mp3
See Voices for more preset speakers and samples.

Use reference audio

Use ref_audio to clone a voice from a short reference clip. Passing the audio transcript through ref_text can often improve generated audio quality.
cURL
curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-audio-v3-tts",
    "input": "Hello, this is a test.",
    "ref_audio": "https://docs.boson.ai/public/audio/sample.mp3",
    "ref_text": "Same voice, same words, and uh, a completely different presence. I was built for chat native voice, real-time, expressive, and controllable."
  }' \
  --output out.mp3
You must own the right to clone the voice.
See Voices for best practices and reusable custom voices.

Fine-grained control

Inline tags control emotion, style, prosody, and sound effects in the generated audio. Add them to input, and the model adjusts the surrounding speech. For example:
Sample inputSample audio
<|emotion:enthusiasm|>Welcome to the show! <|prosody:pause|>Let's get started!voice: "jake"
See Tags for the complete list and sample audio.

Streaming response

When stream: true, set response_format: "pcm".
curl -N https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-audio-v3-tts-4b",
    "input": "Hello, this is a streaming PCM test.",
    "response_format": "pcm",
    "stream": true
  }'

API reference

Full request body:
{
  "model": "higgs-audio-v3-tts",
  "input": "Text to synthesize.",
  "voice": "default",
  "response_format": "mp3",
  "stream": false,

  "ref_audio": "base64 | data URI | URL",
  "ref_text": "Transcript of the reference audio.",
}
See the API reference for field details and additional options.

Alternative ways to use the model

Beyond the hosted API, you can run the model yourself: