Higgs TTS 3

Features

Chat-native, low-latency streaming — begin speaking before the full input is finalized.
100 languages — single-digit WER/CER coverage. See Languages.
Instant voice cloning — zero-shot from a short reference clip and its transcript. See Voices.
Inline control tags — shape emotion, style, prosody, and sound effects with <|emotion:…|>, <|style:…|>, <|prosody:…|>, and <|sfx:…|>. See Tags.

Try it in the playground

The fastest way to hear the model is the playground. Pick a voice, paste text, and press play.

Generate speech with the API

Higgs TTS is in public preview. API usage is currently free and rate-limited while we improve reliability, latency, and model quality.

Set the API key in your shell for the current session:

export BOSON_API_KEY=bai-xxxx

A minimal request needs Authorization, model, and input. Everything else is optional.

curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-tts-3",
    "input": "Hello, this is a test."
  }' \
  --output out.mp3

import os
import requests

resp = requests.post(
    "https://api.boson.ai/v1/audio/speech",
    headers={"Authorization": f"Bearer {os.environ['BOSON_API_KEY']}"},
    json={
        "model": "higgs-tts-3",
        "input": "Hello, this is a test.",
    },
)
resp.raise_for_status()
with open("out.mp3", "wb") as f:
    f.write(resp.content)

import { writeFile } from "node:fs/promises";

const res = await fetch("https://api.boson.ai/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.BOSON_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "higgs-tts-3",
    input: "Hello, this is a test.",
  }),
});
await writeFile("out.mp3", Buffer.from(await res.arrayBuffer()));

Use a preset voice

Use voice to choose a preset speaker.

curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-tts-3",
    "input": "Hello, this is a test.",
    "voice": "jake"
  }' \
  --output out.mp3

import os
import requests

resp = requests.post(
    "https://api.boson.ai/v1/audio/speech",
    headers={"Authorization": f"Bearer {os.environ['BOSON_API_KEY']}"},
    json={
        "model": "higgs-tts-3",
        "input": "Hello, this is a test.",
        "voice": "jake",
    },
)
resp.raise_for_status()
with open("out.mp3", "wb") as f:
    f.write(resp.content)

import { writeFile } from "node:fs/promises";

const res = await fetch("https://api.boson.ai/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.BOSON_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "higgs-tts-3",
    input: "Hello, this is a test.",
    voice: "jake",
  }),
});
await writeFile("out.mp3", Buffer.from(await res.arrayBuffer()));

See Voices for more preset speakers and samples.

Use reference audio

Use ref_audio to clone a voice from a short reference clip. Passing the audio transcript through ref_text can often improve generated audio quality.

curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-tts-3",
    "input": "Hello, this is a test.",
    "ref_audio": "https://docs.boson.ai/public/audio/sample.mp3",
    "ref_text": "Same voice, same words, and uh, a completely different presence. I was built for chat native voice, real-time, expressive, and controllable."
  }' \
  --output out.mp3

import os
import requests

resp = requests.post(
    "https://api.boson.ai/v1/audio/speech",
    headers={"Authorization": f"Bearer {os.environ['BOSON_API_KEY']}"},
    json={
        "model": "higgs-tts-3",
        "input": "Hello, this is a test.",
        "ref_audio": "https://docs.boson.ai/public/audio/sample.mp3",
        "ref_text": "Same voice, same words, and uh, a completely different presence. I was built for chat native voice, real-time, expressive, and controllable.",
    },
)
resp.raise_for_status()
with open("out.mp3", "wb") as f:
    f.write(resp.content)

import { writeFile } from "node:fs/promises";

const res = await fetch("https://api.boson.ai/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.BOSON_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "higgs-tts-3",
    input: "Hello, this is a test.",
    ref_audio: "https://docs.boson.ai/public/audio/sample.mp3",
    ref_text:
      "Same voice, same words, and uh, a completely different presence. I was built for chat native voice, real-time, expressive, and controllable.",
  }),
});
await writeFile("out.mp3", Buffer.from(await res.arrayBuffer()));

To clone from a local file, either encode local file as base64 string or send as `multipart/form-data. Below code shows the latter.

curl https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -F model=higgs-tts-3 \
  -F input="Hello, this is a test." \
  -F [email protected] \
  -F ref_text="Transcript of the reference clip." \
  --output out.mp3

import os
import requests

with open("voice.wav", "rb") as ref_audio:
    resp = requests.post(
        "https://api.boson.ai/v1/audio/speech",
        headers={"Authorization": f"Bearer {os.environ['BOSON_API_KEY']}"},
        data={
            "model": "higgs-tts-3",
            "input": "Hello, this is a test.",
            "ref_text": "Transcript of the reference clip.",
        },
        files={"ref_audio": ref_audio},
    )

resp.raise_for_status()
with open("out.mp3", "wb") as f:
    f.write(resp.content)

import { readFile, writeFile } from "node:fs/promises";

const form = new FormData();
form.set("model", "higgs-tts-3");
form.set("input", "Hello, this is a test.");
form.set("ref_text", "Transcript of the reference clip.");
form.set("ref_audio", new Blob([await readFile("voice.wav")]), "voice.wav");

const res = await fetch("https://api.boson.ai/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.BOSON_API_KEY}`,
  },
  body: form,
});
await writeFile("out.mp3", Buffer.from(await res.arrayBuffer()));

You must own the right to clone the voice.

See Voices for best practices and reusable custom voices.

Fine-grained control

Inline tags control emotion, style, prosody, and sound effects in the generated audio. Add them to input, and the model adjusts the surrounding speech. For example:

Sample input	Sample audio
`<\|emotion:enthusiasm\|>Welcome to the show! <\|prosody:pause\|>Let's get started!`	`voice: "jake"`

See Tags for the complete list and sample audio.

Streaming response

When stream: true, set response_format: "pcm".

curl -N https://api.boson.ai/v1/audio/speech \
  -H "Authorization: Bearer $BOSON_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "higgs-tts-3",
    "input": "Hello, this is a streaming PCM test.",
    "response_format": "pcm",
    "stream": true
  }'

import os
import requests

BASE_URL = "https://api.boson.ai/v1"
API_KEY = os.environ["BOSON_API_KEY"]

payload = {
    "model": "higgs-tts-3",
    "input": "Hello, this is a streaming PCM test.",
    "voice": "default",
    "response_format": "pcm",
    "stream": True,
}

# The response body is a raw 16-bit / 24kHz / mono PCM byte stream — collect chunks directly.
pcm = bytearray()
with requests.post(
    f"{BASE_URL}/audio/speech",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json=payload,
    stream=True,
    timeout=180,
) as r:
    r.raise_for_status()
    for chunk in r.iter_content(chunk_size=4096):
        if chunk:                       # first non-empty chunk == time-to-first-audio
            pcm.extend(chunk)

with open("out.pcm", "wb") as f:
    f.write(pcm)

import { writeFile } from "node:fs/promises";

const res = await fetch("https://api.boson.ai/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.BOSON_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "higgs-tts-3",
    input: "Hello, this is a streaming PCM test.",
    response_format: "pcm",
    stream: true,
  }),
});

if (!res.ok || !res.body) throw new Error(`HTTP ${res.status}`);

// Raw 16-bit / 24kHz / mono PCM byte stream — no "data:" lines, just collect the bytes.
const reader = res.body.getReader();
const pcmChunks: Uint8Array[] = [];
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  if (value) pcmChunks.push(value);     // first chunk == time-to-first-audio
}

await writeFile("out.pcm", Buffer.concat(pcmChunks));

API reference

Full request body:

{
  "model": "higgs-tts-3",
  "input": "Text to synthesize.",
  "voice": "default",
  "response_format": "mp3",
  "stream": false,

  "ref_audio": "base64 | data URI | URL",
  "ref_text": "Transcript of the reference audio.",
}

See the API reference for field details and additional options.

Alternative ways to use the model

Beyond the hosted API, you can run the model yourself:

Hugging Face — open model weights at bosonai/higgs-tts-v3-4b.
SGLang — serve the model locally for high-throughput inference. See the Higgs TTS cookbook.

Get Started

Text-to-Speech

Talking Avatar

Realtime Interaction

Features

Try it in the playground

Generate speech with the API

Use a preset voice

Use reference audio

Fine-grained control

Streaming response

API reference

Alternative ways to use the model

​Features

​Try it in the playground

​Generate speech with the API

​Use a preset voice

​Use reference audio

​Fine-grained control

​Streaming response

​API reference

​Alternative ways to use the model

Features

Try it in the playground

Generate speech with the API

Use a preset voice

Use reference audio

Fine-grained control

Streaming response

API reference

Alternative ways to use the model