How Flux is tackling one of the biggest challenges in Voice AI: Insights from the Deepgram CEO

Blog Articles

How Flux is tackling one of the biggest challenges in Voice AI: Insights from the Deepgram CEO

Oct 2, 2025

This week, our team got an inside look at something new with exclusive early access to Flux, Deepgram’s newest transcription model. We benchmark a lot of models, but Flux stood out immediately - far from an incremental update, its speed, seamless turn detection, and real-time streaming felt like a genuine step change for Voice AI.

Impressed by the model’s performance, our CEO, Brooke Hopkins, sat down with the Deepgram CEO, Scott Stephenson, to dig into what makes Flux different, why it matters, and what’s next on the horizon for Deepgram.

The end of the interruption vs. latency trade-off?

If you’ve ever been on the other side of a customer service call with a voice agent, you’ve experienced it. One of the hardest challenges in conversational AI is turn-taking - the human intuition of when to pause, when to continue, and how to keep the cadence of a conversation natural.

Historically, systems have struggled to reduce latency without worsening interruption, and vice versa. Gains in one often come at the expense of the other. Models that rely only on silence detection, text output, or VAD (voice activity detection) quickly run into edge cases - awkward pauses, premature cut-offs, or clumsy handoffs between the human and the agent.

Flux’s streaming-first architecture changes that equation. With the addition of start of turn/end of turn labeling and extremely fast updates, Flux eliminates the need for a tradeoff between interruption and latency, enabling dialogue flow that feels much closer to human conversation

Check out Flux's speed on https://benchmarks.coval.ai/playground.

Putting Flux to the test: What our benchmarks revealed

Flux isn’t just faster - its architecture is fundamentally different. Designed as a streaming-first model, it’s optimized for real-time use cases where milliseconds define user experience. When we ran it through our benchmark suite, the results confirmed our intuition:

Dramatically reduced latency to first token compared to prior models (over 50% lower than Nova-3)
Seamless end-of-turn handling without separate detectors
No accuracy tradeoff, even under high throughput (equivalent WER in both Flux and Nova-3)

The next step on the horizon: Neuroplex

Flux is just the first milestone in Deepgram’s broader Neuroplex architecture. With the interruption challenge largely solved, Deepgram is already focusing on the next frontier in Voice AI: context.

Until now, voice AI has passed information primarily through text, flattening context and leaving out crucial elements like tone, empathy, and intent. Neuroplex aims to fix this by mimicking the structure of the human brain - acting like the white matter that connects specialized regions to seamlessly pass context. For Voice AI, that means:

More tunable, lifelike agent behaviors
Multi-dimensional context flowing through STT, LLM, and TTS systems
A path toward conversations that feel indistinguishable from human-to-human

What this means for builders

Flux signals the start of a new standard for speech models: real-time, context-aware, and streaming-native. Engineering leaders should start considering:

Architecture fit: How will streaming-first STT reshape buffering, error handling, and orchestration?
Turn detection: Can you simplify your stack by removing external detectors?
Context pipelines: How will you adapt for Neuroplex-style systems that move beyond text-only context?

Learn more

We’ll continue tracking Flux and other models as they redefine the voice AI stack. See how Flux compares across providers at benchmarks.coval.ai, and try Flux directly on Deepgram.

For more, listen to the full conversation between Scott and Brooke on YouTube.