Briefs
Briefs
Today

Google released Gemini 3.1 Flash Live, its most capable real-time audio model, now live in Gemini Live and the API with 90+ language support, improved noise rejection, and SynthID watermarking.
Google's Gemini 3.1 Flash Live is the company's sharpest upgrade to real-time audio AI since the original Gemini Live launch. Released March 26, the model is now available across Gemini Live, Search Live, and the Gemini API in Google AI Studio — extending to over 200 countries and territories.
Real-time voice AI has long struggled with two failure modes: unnatural pauses and poor noise rejection. Gemini 3.1 Flash Live addresses both by better modeling pitch, pace, and conversational context, letting it distinguish relevant speech from background noise like traffic or TV. For developers building voice agents — customer service bots, accessibility tools, real-time translation — this is the kind of improvement that moves the needle from demo-ready to production-ready.
The model scores 90.8% on ComplexFuncBench Audio, a benchmark measuring multi-step function calling in voice contexts — the top mark in its class. Its context window is extended to follow twice as long a conversational thread as the previous 2.5 Flash Native Audio model. All audio output is watermarked with SynthID, Google's provenance system, to help detect AI-generated speech in the wild.
Google's timing is deliberate. OpenAI's Voice Mode and ElevenLabs have defined the high end of real-time audio AI; Gemini 3.1 Flash Live targets the developer API market where speed and multilingual coverage matter most. At 90+ languages and sub-300ms latency in benchmark conditions, it directly competes with the audio tier of OpenAI's Realtime API, which currently supports a narrower language set.