Use ref_audio to clone a voice from a short reference clip. Passing the audio transcript through ref_text can often improve generated audio quality.
curl https://api.boson.ai/v1/audio/speech \ -H "Authorization: Bearer $BOSON_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "higgs-tts-3", "input": "Hello, this is a test.", "ref_audio": "https://docs.boson.ai/public/audio/sample.mp3", "ref_text": "Same voice, same words, and uh, a completely different presence. I was built for chat native voice, real-time, expressive, and controllable." }' \ --output out.mp3
To clone from a local file, either encode local file as base64 string or send as `multipart/form-data. Below code shows the latter.
curl https://api.boson.ai/v1/audio/speech \ -H "Authorization: Bearer $BOSON_API_KEY" \ -F model=higgs-tts-3 \ -F input="Hello, this is a test." \ -F[email protected] \ -F ref_text="Transcript of the reference clip." \ --output out.mp3
You must own the right to clone the voice.
See Voices for best practices and reusable custom voices.
Inline tags control emotion, style, prosody, and sound effects in the generated audio. Add them to input, and the model adjusts the surrounding speech. For example:
Sample input
Sample audio
<|emotion:enthusiasm|>Welcome to the show! <|prosody:pause|>Let's get started!