Features
- Understands speech and text — reasons over prosody, emotion, speaker, and content together instead of transcribing first.
- Follows instructions — holds the instruction frame across turns, even mid-conversation.
- Audio-native tool calling — uses filler speech to hide tool latency, handles async results, and cleanly cancels or ignores stale work.
- Interruption-aware — tracks conversational state and resumes cleanly when the user cuts in.
- Intelligent text output — supports multi-speaker routing and turn-by-turn state tracking for grounded, decision-ready replies.
- Multi-turn — sustains context across long calls and multi-step workflows without losing the thread.
Higgs Audio Instruct will be available soon via a hosted API and a Voice-Agent SDK.