Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Pricing
Get started
Get started

Read more

Speech-To-Text

How contact center AI improves efficiency: benchmarks and ROI

TL;DR: Manual QA teams review 1–5% of contact center calls; AI-powered platforms can score all of them, but only when the underlying transcript is accurate. WER and DER are the hidden bottlenecks: a wrong name, missed compliance phrase, or misattributed speaker corrupts every downstream system that reads the transcript, from routing and agent assist to post-call summaries and QA scoring. Our Solaria-1 model delivers on average 29% lower WER than alternatives on conversational speech and on average 3x lower DER (diarization error rate), covers 100+ languages including 42 that no other STT API supports, and handles the full audio pipeline (record, transcribe, enrich) in a single API.

Speech-To-Text

How to integrate AI into contact center performance monitoring

TL;DR: Most contact centers manually review only a small fraction of calls, leaving compliance breaches and coaching signals undetected. Scaling to 100% AI QA coverage means choosing between three integration patterns (CCaaS-native tools, add-on API layers, or a custom build), each determined by how well your speech infrastructure handles noisy, multilingual audio. For post-call monitoring, async batch transcription outperforms real-time on accuracy, diarization quality, and cost predictability at scale. The bottleneck is getting a reliable transcript from noisy call center audio, which is where Solaria-1 and all-inclusive per-hour pricing matter most.

Speech-To-Text

AI solutions for call centers without human translators

TL;DR: At an illustrative fully loaded offshore rate of $6–$15/hr, replacing BPO translation at 10,000 hours/month with Gladia's Growth plan brings the estimated cost from $80,000–$150,000 down to approximately $2,000/month, with diarization, translation, NER, and sentiment included at the base rate. Every downstream output is ceiling-bounded by STT accuracy: a single transcription error produces a wrong translation, a wrong CRM entry, and a wrong coaching score. Native code-switching support is the bottleneck most teams discover only in production. Solaria-1 covers 100+ languages, including 42 not available on any other STT API, with mid-conversation code-switching built in from day one.

How Aircall cut transcription time by 95% with Gladia

Published on Oct 9, 2025
How Aircall cut transcription time by 95% with Gladia

The contact center is transforming. Traditionally defined by manual workflows, siloed data, and reactive customer service, today's Contact Center as a Service (CCaaS) platforms are embracing a new era—one driven by real-time AI and automation.

Transcription lies at the core of this transformation. Converting voice to text with speed and precision unlocks a cascade of next-gen capabilities: automated summaries, sentiment detection, agent coaching, CRM enrichment, and more. But many legacy or in-house solutions fall short—too slow, too inaccurate, or too resource-heavy to scale.

Aircall, the leading AI-powered voice platform for growing businesses, recognized this inflection point early. To meet the growing demand for fast, intelligent insights from customer conversations, Aircall turned to Gladia’s speech-to-text API.

Here’s how Aircall reduced transcription time by 95%, empowered its users with near-instant insights, and laid the groundwork for a smarter, AI-driven CCaaS future.

About Aircall

Aircall is an integrated customer communications and intelligence platform. It unifies voice and digital channels into one seamless platform, offering one-click integrations with leading CRMs and over 250 business tools. With a strong focus on cloud-based voice solutions, Aircall helps teams streamline conversations, improve customer support, and drive sales efficiency.

Farid Issabhai, Staff Engineer at Aircall, is at the forefront of Aircall’s AI and transcription initiatives. He played a key role in integrating cutting-edge technologies, including Gladia’s speech-to-text API, into Aircall’s workflows.

Challenge: More accurate, fast, and scalable transcription for global telephony

As a leading voice platform, Aircall processes thousands of calls every day across diverse languages and use cases, from customer support to sales interactions. Initially, Aircall developed an in-house transcription engine, but maintaining and improving it proved challenging.

Solution: Gladia’s speech-to-text API

After evaluating different STT API vendors, Aircall chose Gladia for its strong performance in transcription accuracy, especially for key strategic languages.

Gladia’s API allowed Aircall to:

✓ Transcribe calls across multiple languages like Spanish, German, and Italian.

✓ Process over 1M transcriptions per week

✓ Deliver transcripts significantly faster than their previous solutions.

How Aircall uses transcription

Aircall integrates Gladia’s transcriptions as a foundational layer for advanced features for their CCaaS platform:

  • Searchability: Users can search for keywords across calls
  • AI-generated insights: Summaries, key topics, and sentiment analysis are built on top of the transcripts
  • Agent coaching: Aircall’s coaching features assess calls for compliance and training, evaluating factors like greetings or responses to objections
  • CRM Integration: While transcriptions aren’t logged directly into CRMs like HubSpot or Salesforce, summaries and AI insights are pushed via webhooks

Farid explains,

Why Aircall chose Gladia

Aircall’s decision to partner with Gladia was driven by:

  • Accuracy: High performance across key languages benchmarked on internal datasets composed of phone call audio
  • Speed: Drastically reduced transcription delays
  • Developer Experience: A well-designed API that simplified integration
  • Cost-Effectiveness: A solution that balances performance with the economics of scaling

Results: Faster insights, smoother operations

Since switching to Gladia:

  • Transcription times have dropped from up to 30 minutes to under 1.5 minutes
  • Aircall processes around 1M calls weekly, enabling scalable AI features
  • Improved user satisfaction by delivering faster insights

Farid highlights,

Looking ahead

Aircall is exploring new frontiers with real-time transcription and AI voice agents. While asynchronous transcription currently meets most needs, the team is actively experimenting with new features like real-time assistance during sales calls, where AI can suggest responses based on conversation context.

Farid shares,

Final thoughts

Farid’s advice for companies looking to integrate speech-to-text AI:

About Gladia

Gladia provides a speech-to-text and audio intelligence API for building virtual meeting and note-taking apps, call center platforms, and media products, providing transcription, translation, and insights powered by best-in-class ASR, LLMs, and GenAI models.

After reading this case study, do you think Gladia could be the right fit for your business?

Don't hesitate to contact our sales team to explore this in more detail.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more