Introducing Partials: Unlock faster, smoother voice agent conversations with partial transcripts

Published on Sep 8, 2025

One of the biggest challenges in building Voice AI Agents is response time—users expect natural, real-time conversations, but every millisecond counts. That’s why we’re thrilled to announce the general availability of partials on Gladia’s real-time API, a feature now open to all developers.

After an early access phase working closely with Voice AI builders, we’ve validated what many had been asking for: the ability to stream partial transcripts word by word, rather than waiting for the final output. By acting on these partials, agents can understand intent faster, interrupt at the right moment, and deliver a smoother, more responsive user experience.

With Gladia's partials, the first few words in an utterance are emitted exceptionally fast (<100ms, up to 2x faster than other providers offering partial transcripts), allowing voice agents to achieve industry-leading ultra-low latency.

In this blog, we dive into how partials work, how we tailored them to Voice AI agent flows, and how you can leverage partial transcripts together with LLMs to improve latency while ensuring fast, fluid, and natural customer-agent interactions.

What are partials and why they matter

Partials (short for “partial transcripts”) provide a word-by-word streaming transcription of spoken words as they are received. While partials are subject to change until the final transcription is complete, they deliver immediate, actionable insights.

For Voice AI platforms, this is a game-changer: even if partials are less accurate than finals, they can be sent to the LLM instantly, allowing the agent to begin formulating a response without delay. The result is a smoother, more natural call experience—a must-have for building truly fluid voice interactions.

Gladia’s partials tailored for Voice AI Agents

Our team carefully crafted the partials specifically for Voice AI Agent use cases. This means they are designed for ultra-low latency on the initial words of an utterance, with a regular pace for the remainder.

A typical Voice AI Agent flow involves three key steps:

STT (Speech-to-Text): Transforms speech into text (typically handled by real-time transcription APIs such as Gladia or open source models).
LLM (Large Language Model): Interprets the text and formulates an answer (e.g., OpenAI, Gemini).
TTS (Text-to-Speech): Converts the answer back into voice (e.g., Cartesia, ElevenLabs).

The total time taken for this entire process, starting from audio input and speech-to-text, significantly impacts the overall quality of the call.

That's where partials come in.

For an AI agent, knowing when a user begins to speak and what they are saying is crucial; the regular speech_start event is often insufficient. The first few words of an utterance carry vital information:

Interruption: Signifies that the user has started speaking.
Intention: Reveals what the user is trying to communicate.

Consider these examples:

“Yes, I want to order…”
“Stop, this is not what I said…”

In both scenarios, the first word alone is often enough to understand the user's intent and to stop the AI from continuing to speak unnecessarily.

This is why, with our partials:

The first words are emitted exceptionally fast.
The rest of the utterance, until the final transcription, is emitted at a regular pace. This results in the following processing schema:

By enabling partials, voice agent flows can achieve ultra-low latency in the STT phase. Moreover, some of the time saved can be reallocated to utilize a more sophisticated LLM for crafting answers or to simply reduce the total response time, enhancing the user experience.

Partials + LLMs: picking the right configuration for optimal performance

To achieve superior latency for partials, our real-time API utilizes a smaller model than the one used for final transcriptions. In this context, "smaller" directly translates to "faster," requiring fewer milliseconds for inference.

While finals rely on "endpointing" to determine if an utterance is complete, partials are more time-based, with words being forced to be emitted as they are processed.

A significant advantage is that Large Language Models are adept at understanding imperfect or incomplete sentences. This means that the slightly lower accuracy of partials compared to finals is not a major impediment. LLMs can still effectively grasp the user's intent and generate appropriate responses.

For optimal results, we recommend explicitly selecting the target language from the language_config as opposed to having the model auto-detect it, as the minimal audio context makes language detection more sensitive.

Experiment with partials today!

The new developer playground now returns partials, allowing you to experience this feature firsthand.

Partials are live on all Gladia accounts. Please note that the real-time API does not return them by default. To enable this feature, simply use the messages_config.receive_partial_transcripts option when making your request.

For a quick start, refer to our comprehensive code samples, with TypeScript or Python snippets.

Contact us

Your request has been registered

A problem occurred while submitting the form.

Building better voice agents: Lessons from Thoughtly × Gladia's webinar

Voice AI has evolved fast — from early experiments that barely handled a “hello” to today’s real-time conversational agents running across industries. Alex Casella (CTO at Thoughtly) sat down with Gladia’s CEO Jean-Louis Quéguiner to unpack the technical and operational realities of building production-grade voice agents.

Speech-To-Text

Safety, hallucinations, and guardrails: How to build voice AI agents you can trust

As voice agents become a core part of customer and employee experience, users need to know these AI systems are accurate, safe, and acting within boundaries. That’s especially true for enterprise-grade tools, where a rogue voice agent can severely damage relationships and create major legal risks.

Case Studies

How Aircall cut transcription time by 95% with Gladia

The contact center is transforming. Traditionally defined by manual workflows, siloed data, and reactive customer service, today's Contact Center as a Service (CCaaS) platforms are embracing a new era—one driven by real-time AI and automation.