Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Pricing
Get started
Get started

Read more

Speech-To-Text

Building real-time multilingual ASR with code-switching

When a speaker switches languages, traditional models keep outputting the previous one for several hundred milliseconds before catching up, producing garbled text and inaccurate timestamps. The obvious fix is a large multilingual model. But those are expensive to run, awkward to deploy on-device, and still stumble on fast switches. 

Speech-To-Text

Factors affecting the accuracy of speech-to-text transcripts

TL;DR: Production STT accuracy fails not because of model benchmarks, but because of the gap between studio evaluation audio and the messy, multilingual, overlapping speech real users produce. Four root causes drive that gap: input audio quality, speaker traits (accents, code-switching, and overlap), domain vocabulary deficits, and model training data diversity. WER alone doesn't capture production risk. Semantic accuracy and Diarization Error Rate matter just as much when CRM syncs, coaching scores, and AI summaries all depend on what the transcript gets right. Solaria-1 delivers on average 29% lower WER on conversational speech and 3x lower DER compared to alternatives, benchmarked across 7 datasets and 74+ hours of audio with open, reproducible methodology.

Speech-To-Text

Business call transcript analysis techniques for sales and support teams

TL;DR: Upstream transcription errors compound through every downstream system: LLMs, sentiment models, and CRM pipelines are only as reliable as the transcript they process. Core conversation intelligence techniques, including sentiment scoring, BANT extraction, objection mining, and talk-ratio analysis, all depend on transcription quality. Async/batch processing provides full conversation context, making it the right default for post-call workflows.

Gladia and Pipecat partner to push the boundaries of real-time voice AI

Published on May 14, 2025
Gladia and Pipecat partner to push the boundaries of real-time voice AI

We’re thrilled to announce a strategic partnership between Gladia and Daily, the team behind Pipecat, aimed at revolutionizing real-time conversational AI. This collaboration combines our cutting-edge audio intelligence capabilities with their flexible 100% open-source framework, empowering developers to create more dynamic, multilingual, and context-aware voice AI applications.

We’re thrilled to announce a strategic partnership between Gladia and Daily, the team behind Pipecat, aimed at revolutionizing real-time conversational AI. This collaboration combines our cutting-edge audio intelligence capabilities with their flexible 100% open-source framework, empowering developers to create more dynamic, multilingual, and context-aware voice AI applications.

Pipecat is a vendor-neutral framework designed to simplify the creation of voice and multimodal conversational agents. It allows developers to orchestrate LLM models and AI services effortlessly, enabling the development of video and voice applications such as personal coaches, meeting assistants, and customer support bots.

Pipecat is maintained by Daily with the support of the global developer community. Daily is a leader in developer tooling and global WebRTC infrastructure since 2016. Earlier this year Daily announced Pipecat Cloud, the first open source voice AI cloud.

About the partnership

At Gladia, we believe the future of human-AI interaction lies in systems that understand and respond in real-time, just like humans do. Pushing the boundaries of ultra low latency conversational AI is key to bridging the gap between humans and machines, enabling more natural, intuitive communication. In today’s world, where seamless interactions are crucial, having AI that can understand diverse languages and contexts is essential for real collaboration in customer support, meetings, and beyond.

This partnership with Pipecat goes beyond technology—it empowers developers to easily create intelligent, adaptable, multilingual voice AI applications that break down barriers and foster meaningful interactions. By combining Gladia's language processing with Pipecat's flexible framework, we can enable the creation of robust voice platforms that meet the needs of a wide range of use cases.

A shared vision for the future

This partnership is more than just a technical integration; it's a shared commitment to pushing the boundaries of what's possible in real-time conversational AI. In the words of Daily's co-founder:

__wf_reserved_inherit

Jean-Louis Queguiner, CEO of Gladia, also shared his excitement for the partnership: "At Gladia, we believe in pushing the boundaries of what's possible in real-time conversational AI. Partnering with Pipecat allows us to extend that vision even further—combining our advanced language processing capabilities with Pipecat’s open-source platform to help developers create truly innovative, scalable voice AI solutions. This collaboration is about more than just technology; it's about shaping the future of human-AI interaction."

What this means for developers

Developers can now leverage the combined strengths of Pipecat and Gladia to build more sophisticated voice AI applications. Whether you're creating a multilingual customer support bot, a real-time meeting assistant, or an interactive storytelling agent, this partnership provides the tools and flexibility needed to bring your vision to life.

To get started, visit pipecat.ai to explore the framework and sign up to the Gladia Playground to try first-hand our newest STT model, Solaria.

Stay tuned for more updates as we continue to innovate and expand the possibilities of real-time conversational AI.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more