Higgs TTS 3 - Boson

Features

Chat-native, low-latency streaming — begin speaking before the full input is finalized.
100 languages — single-digit WER/CER coverage. See Languages.
Instant voice cloning — zero-shot from a short reference clip and its transcript. See Voices.
Inline control tags — shape emotion, style, prosody, and sound effects with <|emotion:…|>, <|style:…|>, <|prosody:…|>, and <|sfx:…|>. See Tags.

Try it in the playground

The fastest way to hear the model is the playground. Pick a voice, paste text, and press play.

Generate speech with the API

Higgs TTS is in public preview. API usage is currently free and rate-limited while we improve reliability, latency, and model quality.

Set the API key in your shell for the current session:

export BOSON_API_KEY=bai-xxxx

A minimal request needs Authorization, model, and input. Everything else is optional.

curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-tts-3",
    "input": "Hello, this is a test."
  }' \
  --output out.mp3

Use a preset voice

Use voice to choose a preset speaker.

curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-tts-3",
    "input": "Hello, this is a test.",
    "voice": "jake"
  }' \
  --output out.mp3

See Voices for more preset speakers and samples.

Use reference audio

Use ref_audio to clone a voice from a short reference clip. Passing the audio transcript through ref_text can often improve generated audio quality.

curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-tts-3",
    "input": "Hello, this is a test.",
    "ref_audio": "https://docs.boson.ai/public/audio/sample.mp3",
    "ref_text": "Same voice, same words, and uh, a completely different presence. I was built for chat native voice, real-time, expressive, and controllable."
  }' \
  --output out.mp3

To clone from a local file, either encode local file as base64 string or send as `multipart/form-data. Below code shows the latter.

curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -F model=higgs-tts-3 \
  -F input="Hello, this is a test." \
  -F [email protected] \
  -F ref_text="Transcript of the reference clip." \
  --output out.mp3

You must own the right to clone the voice.

See Voices for best practices and reusable custom voices.

Fine-grained control

Inline tags control emotion, style, prosody, and sound effects in the generated audio. Add them to input, and the model adjusts the surrounding speech. For example:

Sample input	Sample audio
`<\|emotion:enthusiasm\|>Welcome to the show! <\|prosody:pause\|>Let's get started!`	`voice: "jake"`

See Tags for the complete list and sample audio.

Streaming response

When stream: true, set response_format: "pcm".

curl -N https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-tts-3",
    "input": "Hello, this is a streaming PCM test.",
    "response_format": "pcm",
    "stream": true
  }'

API reference

Full request body:

{
  "model": "higgs-tts-3",
  "input": "Text to synthesize.",
  "voice": "default",
  "response_format": "mp3",
  "stream": false,

  "ref_audio": "base64 | data URI | URL",
  "ref_text": "Transcript of the reference audio.",
}

See the API reference for field details and additional options.

Alternative ways to use the model

Beyond the hosted API, you can run the model yourself:

Hugging Face — open model weights at bosonai/higgs-tts-v3-4b.
SGLang — serve the model locally for high-throughput inference. See the Higgs TTS cookbook.

​Features

​Try it in the playground

​Generate speech with the API

​Use a preset voice

​Use reference audio

​Fine-grained control

​Streaming response

​API reference

​Alternative ways to use the model

Features

Try it in the playground

Generate speech with the API

Use a preset voice

Use reference audio

Fine-grained control

Streaming response

API reference

Alternative ways to use the model