Skip to main content

Features

  • Understands speech and text — reasons over prosody, emotion, speaker, and content together instead of transcribing first.
  • Follows instructions — holds the instruction frame across turns, even mid-conversation.
  • Audio-native tool calling — uses filler speech to hide tool latency, handles async results, and cleanly cancels or ignores stale work.
  • Interruption-aware — tracks conversational state and resumes cleanly when the user cuts in.
  • Intelligent text output — supports multi-speaker routing and turn-by-turn state tracking for grounded, decision-ready replies.
  • Multi-turn — sustains context across long calls and multi-step workflows without losing the thread.
Higgs Audio Instruct will be available soon via a hosted API and a Voice-Agent SDK.