Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Building real-time multilingual ASR with code-switching
When a speaker switches languages, traditional models keep outputting the previous one for several hundred milliseconds before catching up, producing garbled text and inaccurate timestamps. The obvious fix is a large multilingual model. But those are expensive to run, awkward to deploy on-device, and still stumble on fast switches.
Factors affecting the accuracy of speech-to-text transcripts
TL;DR: Production STT accuracy fails not because of model benchmarks, but because of the gap between studio evaluation audio and the messy, multilingual, overlapping speech real users produce. Four root causes drive that gap: input audio quality, speaker traits (accents, code-switching, and overlap), domain vocabulary deficits, and model training data diversity. WER alone doesn't capture production risk. Semantic accuracy and Diarization Error Rate matter just as much when CRM syncs, coaching scores, and AI summaries all depend on what the transcript gets right. Solaria-1 delivers on average 29% lower WER on conversational speech and 3x lower DER compared to alternatives, benchmarked across 7 datasets and 74+ hours of audio with open, reproducible methodology.
Business call transcript analysis techniques for sales and support teams
TL;DR: Upstream transcription errors compound through every downstream system: LLMs, sentiment models, and CRM pipelines are only as reliable as the transcript they process. Core conversation intelligence techniques, including sentiment scoring, BANT extraction, objection mining, and talk-ratio analysis, all depend on transcription quality. Async/batch processing provides full conversation context, making it the right default for post-call workflows.
Gladia and Pipecat partner to push the boundaries of real-time voice AI
Published on May 14, 2025
We’re thrilled to announce a strategic partnership between Gladia and Daily, the team behind Pipecat, aimed at revolutionizing real-time conversational AI. This collaboration combines our cutting-edge audio intelligence capabilities with their flexible 100% open-source framework, empowering developers to create more dynamic, multilingual, and context-aware voice AI applications.
We’re thrilled to announce a strategic partnership between Gladia and Daily, the team behind Pipecat, aimed at revolutionizing real-time conversational AI. This collaboration combines our cutting-edge audio intelligence capabilities with their flexible 100% open-source framework, empowering developers to create more dynamic, multilingual, and context-aware voice AI applications.
Pipecat is a vendor-neutral framework designed to simplify the creation of voice and multimodal conversational agents. It allows developers to orchestrate LLM models and AI services effortlessly, enabling the development of video and voice applications such as personal coaches, meeting assistants, and customer support bots.
Pipecat is maintained by Daily with the support of the global developer community. Daily is a leader in developer tooling and global WebRTC infrastructure since 2016. Earlier this year Daily announced Pipecat Cloud, the first open source voice AI cloud.
About the partnership
At Gladia, we believe the future of human-AI interaction lies in systems that understand and respond in real-time, just like humans do. Pushing the boundaries of ultra low latency conversational AI is key to bridging the gap between humans and machines, enabling more natural, intuitive communication. In today’s world, where seamless interactions are crucial, having AI that can understand diverse languages and contexts is essential for real collaboration in customer support, meetings, and beyond.
This partnership with Pipecat goes beyond technology—it empowers developers to easily create intelligent, adaptable, multilingual voice AI applications that break down barriers and foster meaningful interactions. By combining Gladia's language processing with Pipecat's flexible framework, we can enable the creation of robust voice platforms that meet the needs of a wide range of use cases.
A shared vision for the future
This partnership is more than just a technical integration; it's a shared commitment to pushing the boundaries of what's possible in real-time conversational AI. In the words of Daily's co-founder:
Jean-Louis Queguiner, CEO of Gladia, also shared his excitement for the partnership: "At Gladia, we believe in pushing the boundaries of what's possible in real-time conversational AI. Partnering with Pipecat allows us to extend that vision even further—combining our advanced language processing capabilities with Pipecat’s open-source platform to help developers create truly innovative, scalable voice AI solutions. This collaboration is about more than just technology; it's about shaping the future of human-AI interaction."
What this means for developers
Developers can now leverage the combined strengths of Pipecat and Gladia to build more sophisticated voice AI applications. Whether you're creating a multilingual customer support bot, a real-time meeting assistant, or an interactive storytelling agent, this partnership provides the tools and flexibility needed to bring your vision to life.
To get started, visit pipecat.ai to explore the framework and sign up to the Gladia Playground to try first-hand our newest STT model, Solaria.
Stay tuned for more updates as we continue to innovate and expand the possibilities of real-time conversational AI.
Contact us
Your request has been registered
A problem occurred while submitting the form.
Read more
Speech-To-Text
Building real-time multilingual ASR with code-switching
Speech-To-Text
Factors affecting the accuracy of speech-to-text transcripts
Speech-To-Text
Business call transcript analysis techniques for sales and support teams
From audio to knowledge
Subscribe to receive latest news, product updates and curated AI content.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.