Features
- Chat-native, low-latency streaming — begin speaking before the full input is finalized.
- 100 languages — single-digit WER/CER coverage. See Languages.
- Instant voice cloning — zero-shot from a short reference clip and its transcript. See Voices.
- Inline control tags — shape emotion, style, prosody, and sound effects with
<|emotion:…|>,<|style:…|>,<|prosody:…|>, and<|sfx:…|>. See Tags.
Try it in the playground
The fastest way to hear the model is the playground. Pick a voice, paste text, and press play.Generate speech with the API
Higgs Audio TTS is in public preview. API usage is currently free and rate-limited while we improve reliability, latency, and model quality.
Authorization, model, and input. Everything else is optional.
Use a preset voice
Usevoice to choose a preset speaker.
cURL
Use reference audio
Useref_audio to clone a voice from a short reference clip. Passing the audio transcript through ref_text can often improve generated audio quality.
cURL
Fine-grained control
Inline tags control emotion, style, prosody, and sound effects in the generated audio. Add them toinput, and the model adjusts the surrounding speech. For example:
| Sample input | Sample audio |
|---|---|
<|emotion:enthusiasm|>Welcome to the show! <|prosody:pause|>Let's get started! | voice: "jake" |
Streaming response
Whenstream: true, set response_format: "pcm".
API reference
Full request body:Alternative ways to use the model
Beyond the hosted API, you can run the model yourself:- Hugging Face — open model weights at bosonai/higgs-audio-v3-tts-4b.
- SGLang — serve the model locally for high-throughput inference. See the Higgs TTS cookbook.