Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Pricing

Request a demo

Get started

Speech-To-Text

Call center transcription software: what enterprises should look for in 2026

TL;DR: Most contact centers evaluate transcription software using clean-audio lab benchmarks, then watch QA automation break down when BPO (Business Process Outsourcing) agents switch languages mid-call or phone-line noise degrades the signal. In 2026, the criteria that matter are real-world multilingual WER, all-inclusive per-hour pricing, and data sovereignty that holds up under GDPR and HIPAA audit. For enterprise teams, the highest-ROI evaluation step is testing on real BPO call samples rather than vendor demo audio, and asking every shortlisted provider for an all-in per-hour price with diarization, sentiment, and entity extraction enabled.

Speech-To-Text

PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

TL;DR: Legacy pause-and-resume systems don't remove agents, local desktops, or telephony infrastructure from PCI DSS audit scope. Automated, ingestion-level PII redaction scrubs sensitive data before it reaches any database. By removing cardholder data at the ingestion layer, contact center platforms using automated redaction can potentially reduce audit complexity, cut agent handle time (AHT), and protect downstream CRM and LLM pipelines from corrupt data. The accuracy floor for reliable entity detection in PCI audits is significantly higher than for standard QA transcription, making STT model selection a compliance decision as much as a product one.

Speech-To-Text

GDPR, SOC 2, and ISO 27001 speech-to-text: the contact center compliance and certification guide

TL;DR: When your contact center routes voice data through a transcription vendor, every certification gap in that vendor's stack becomes your compliance liability. Voice recordings qualify as personal data under GDPR Article 4, and processing them through uncertified APIs creates direct financial exposure. This guide breaks down what GDPR, SOC 2 Type II, ISO 27001, HIPAA, and PCI DSS each require of your audio infrastructure vendor and maps those requirements to the QA coverage rates and cost-per-contact metrics you manage daily. We hold GDPR, SOC 2 Type II, ISO 27001, HIPAA, and PCI DSS certifications, and never use customer audio for model training on Growth or Enterprise plan.

Code-switching across 100+ languages: where ASR systems succeed and fail

Published on Apr 10, 2026

by Ani Ghazaryan

Code-switching across 100+ languages: where ASR systems succeed and fail

Code-switching ASR must handle mid-sentence language changes. Learn which pairs work, why WER degrades 30-50%, and how to evaluate.

TL;DR: Broad language coverage in an ASR model does not mean it can handle mid-sentence language changes. Code-switching breaks standard monolingual models, causing significant WER degradation relative to single-language baselines on well-studied pairs like Hindi-English, with even higher failure rates on lower-resource combinations. We built Solaria-1 to natively detect and process code-switching across 100+ languages without requiring developers to pre-specify languages in the request, with public usage-based pricing and audio intelligence features included without per-feature add-ons on paid plans.

Your QA pipeline passed on English audio. The transcripts fall apart when your bilingual users mix languages mid-sentence, which is exactly how they communicate.

When your users switch from Hindi to English mid-sentence, or from French to English mid-meeting, the transcript often breaks. This article maps which language pairs support true code-switching, why low-resource pairs remain challenging, and how to evaluate ASR systems for real-world multilingual environments.

Update: new model and updated pricing

Since this article was written, Gladia has released Solaria-3 and updated its pricing. Solaria-3 is our newest speech model, built for real-world business audio that’s noisy, fast-paced, and conversational. It ranks #1 on real English customer calls and across core European languages (EN, FR, DE, ES, IT), beating AssemblyAI, ElevenLabs, Deepgram, Mistral, and Speechmatics. Solaria-1 remains fully supported and is still the better pick for broad language coverage (100+ languages), code-switching, and real-time streaming. The two models are built to complement each other.

On pricing, Gladia now offers three plans: Starter (pay-as-you-go at $0.61/hr async / $0.75/hr real-time, with 10 free hours/month), Growth (as low as $0.20/hr async with volume discounts), and Enterprise (custom pricing, zero data retention, SLAs). Every paid plan includes all features, all languages, and full compliance (GDPR, HIPAA, SOC 2 Type 2).

Compare Solaria-1 and Solaria-3 →

See current pricing →

Defining code-switching for ASR systems

Code-switching: definition and use cases

Code-switching is the practice of alternating between two or more languages within a single conversation. It breaks down into two patterns that matter for ASR design.

Intrasentential code-switching is a language change within a single sentence or clause, following grammatical rules from both languages simultaneously. A contact center example: "Can we review the Q3 forecast, s'il vous plaît?" or "Yaar, can you just reschedule the meeting to 4 baje?" The speaker uses one language's syntax while inserting vocabulary from another.

Intersentential code-switching is a language change at sentence boundaries, where each complete utterance is in one language but the conversation alternates. A team meeting example: "The latest build failed. Kya hua?" Each sentence is internally consistent, but the language flips between them.

Both patterns are normal in bilingual communities and common in the product contexts where ASR matters most: contact centers serving bilingual populations, meeting assistants used by international teams, and voice interfaces embedded in global consumer apps. Gladia's code-switching documentation describes this specifically as scenarios where "the language is changed multiple times throughout the audio," such as a conversation where two people each speak a different language.

Monolingual ASR's WER challenge

Standard ASR models fail at code-switching for a straightforward reason: they were trained on monolingual corpora and optimized for clean, single-language audio. When a speaker switches languages mid-sentence, the model's acoustic and language models are simultaneously operating outside their training distribution.

The WER impact is measurable. For Hindi-English code-switching, research documents substantial WER degradation compared to monolingual baselines. Even for structurally related Romance language pairs, the degradation is severe: code-switching evaluations show baseline models often exceeding 50% WER, with significant error rates persisting even after fine-tuning on code-switched test sets. These are not edge-case failures on unusual audio. They reflect what happens to most deployed models when real bilingual speakers use your product.

Multilingual and code-switching scenarios push models to switch between languages mid-sentence, something rarely reflected in standard benchmarks. A model evaluated only on monolingual test sets can claim accurate multilingual support while failing completely in production.

Implementing code-switching ASR: requirements

Building an ASR system that handles code-switching reliably requires training on audio containing actual language mixing, not separate monolingual datasets that happen to cover the same language pair. The deeper constraint is annotated training corpora of real code-switched speech. Code-switched speech tends to be underrepresented in many ASR training pipelines, making it a challenging scenario for models to handle effectively.

A model that has never been trained on Hinglish will not learn to handle it just because it has seen a lot of Hindi and a lot of English. The mixing patterns, syntactic borrowings, and phonological adaptations that emerge when languages blend require the model to learn from examples of the blend itself.

Which language pairs have native code-switching support?

Spanish-English code-switching metrics

Spanish-English is one of the most studied code-switching pairs, driven by the scale of bilingual populations in the United States and Latin America. Research benchmarks such as LinCE include Spanish-English code-switching data across multiple tasks and corpora, reflecting how extensively this language pair has been studied. Even so, specific WER metrics in public ASR provider documentation remain limited, and production variance is high enough that only real-world testing on your specific audio reveals how well a given system handles your users' switching patterns.

Hindi-English ASR code-switching capabilities

Hindi-English code-switching (Hinglish) dominates contact center and Business Process Outsourcing (BPO) workloads at scale, a direct result of India's large bilingual population. Despite this scale, standard ASR models fail at Hinglish because of compounding gaps in acoustic modeling, vocabulary, language modeling, and training data. Gladia’s BPO use case page specifically addresses this context, which is why Solaria-1's coverage of South Asian languages including Bengali, Punjabi, Tamil, Urdu, Marathi, and Hindi matters for Contact Center as a Service (CCaaS) platforms serving these markets.

French-English ASR: production readiness

French-English code-switching occurs in various contexts, including Canadian markets, francophone African communities, and European contexts where English is the dominant business language but French is the primary community language. Research datasets include French-English combinations, but commercial ASR systems vary significantly in how well they handle regional phonological patterns versus textbook European French.

Evaluating Tagalog/Cantonese code-switching

Tagalog-English (Taglish) and Cantonese-English code-switching expose the largest gap between "supported language" marketing claims and actual production capability. Both appear constantly in contact centers across the Philippines, Hong Kong, and diaspora communities globally, yet both involve typological distance from English that makes end-to-end model training significantly harder.

Across the research literature, a substantial proportion of CS-ASR benchmarks concern Mandarin-English, Hindi-English, and Arabic-English, with combinations like Malay-English and Tagalog-English representing the frontier of current model capability. Any vendor claiming strong performance on Taglish or Cantonese-English should be asked specifically for WER benchmarks on code-switched test sets for those pairs, not aggregated multilingual accuracy figures.

ASR limitations for emerging language pairs

The gap between high-resource and low-resource code-switching pairs is structural, not a function of model size or compute. Dedicated CS datasets are scarce because most models rely on monolingual or mixed-language corpora that fail to reflect real-world code-switching patterns. You cannot close this gap by fine-tuning on more monolingual data for either language in the pair, and for product teams building for these populations, the practical implication is direct: evaluate on real-world audio samples from your actual users, not synthetic test sets.

Why language coverage doesn't guarantee code-switching capability

Scarcity of code-switching training data

Code-switching happens in conversation, not in written text, which means it is underrepresented in virtually every large-scale training corpus that ASR models learn from. Building a competitive code-switching model requires curating, annotating, and training on actual recordings of bilingual speakers switching languages, at scale, across diverse audio conditions. This is expensive and slow, which is why most providers have not done it for more than a handful of pairs.

Why model architecture limits code-switching

The dominant architectural pattern for multilingual ASR uses a Language Identification (LID) model to detect the spoken language, then routes the audio to a separate monolingual ASR engine trained for that language. This design treats language detection as a preprocessing step, not as an integrated part of transcription.

The LID-routing model creates an inherent problem for code-switching because the language identification step operates on a window of audio, and mid-sentence language changes produce ambiguous or incorrect routing signals. The model either assigns the whole sentence to the wrong language engine or triggers a routing switch mid-utterance that disrupts transcription continuity. Gladia's automatic language detection takes a fundamentally different approach: continuous language detection within the transcription pipeline itself rather than an upfront routing decision.

ASR performance: monolingual vs. code-switched

Scenario	Documented WER impact	Source basis
Hindi-English code-switched	30–50% relative WER increase over Hindi baseline	Research on Hinglish ASR
Catalan-Spanish code-switched	51–63% WER on baseline models, 51.29% on fine-tuned	CS-ASR evaluation research
Low-resource pairs (e.g., Malay-English)	Highly variable, with substantial performance gaps	CS benchmark experiments documented in the literature

‍

The cost to your product goes beyond the transcript. Text-based sentiment inference and named entity recognition both operate on the transcript, not the audio signal, so their accuracy is ceiling-bounded by transcription quality. Higher WER on code-switched audio degrades the transcript and can reduce the reliability of downstream analysis built on top of it.

Selecting multilingual code-switching for your product

Define your code-switching language scope

Before you evaluate any vendor, map which code-switching pairs your users actually produce. Pull support ticket language data, review session recordings, and segment user satisfaction scores by language and region. This is not market research. It is a precision requirement, because "multilingual support" varies by orders of magnitude across language pairs, and performance characteristics can differ significantly between different code-switching combinations.

Your evaluation matrix should list every language pair in priority order, weighted by user volume and the severity of a transcription failure in that context. The stakes for transcription accuracy will vary based on your specific use case and user base.

Request WER on code-switched test sets

When you ask a vendor for accuracy benchmarks, ask specifically for WER on code-switched test sets, not aggregated multilingual accuracy figures. Ask for code-switched evaluation methodology, dataset coverage, and production-like test conditions, then compare those results with Gladia’s current benchmark framework covering 8 providers, 7 datasets, and 74+ hours of audio. A vendor that cannot provide WER evidence for your target language pairs under clearly defined test conditions has not tested them in a meaningful way.

Also ask whether the evaluation audio includes natural conversations with mixed languages or only clean studio recordings. The benchmark condition should match your production environment.

Assess production code-switching accuracy

Once integrated, measure WER separately for code-switched segments versus monolingual segments in your production traffic. Build language-specific accuracy monitoring into your QA pipeline from day one so you catch regressions before they reach support ticket volume. Track your code-switching distribution over time, as the language pairs in your production traffic may shift.

Gladia's solution for multilingual code-switching

Current code-switching language pairs

We built Solaria-1 to cover 100+ languages and dialects with native code-switching support, including 42 languages you cannot get from other API-level speech-to-text (STT) providers. That coverage includes high-demand BPO and contact center languages: Tagalog, Bengali, Punjabi, Tamil, Urdu, Persian, and Marathi, as well as languages like Haitian Creole, Maori, and Javanese, where most providers have no production-grade support at all.

When you enable code-switching in our API, the model continuously detects the spoken language and switches transcription accordingly, without requiring callers to announce which language they are using or developers to route audio through a separate language identification step.

Instead of stitching together separate vendors for transcription, diarization, enrichment, and downstream structured outputs, teams can handle the audio pipeline through one API.

How Gladia measures code-switching

We evaluate Solaria-1 against other providers across 7 datasets and over 74 hours of audio, covering diverse languages, accents, and audio conditions. Our benchmark methodology is open and reproducible. Within this evaluation scope, on multilingual datasets with accented speech and challenging acoustic conditions, Solaria-1 achieves up to 29% lower WER and up to 3x lower diarization error rate (DER) compared to leading alternatives, results that reflect the specific language pairs and conditions covered by these benchmarks, and that may vary across other language combinations. Our diarization is powered by pyannoteAI's Precision-2 model.

On paid plans, customer data is not used for model training by default. For teams handling sensitive audio data, Gladia provides deployment flexibility including cloud hosting options and enterprise deployment models.

Real-world code-switching API demo

Code-switching detection can be configured in the API request:

{
  "audio_url": "<your-bilingual-audio-file-url>",
  "detect_language": true,
  "enable_code_switching": true,
  "diarization": true
}

The structured transcript response includes word-level timestamps, speaker labels, and language tags, making it easier to filter or route content by language:

{
  "transcription": {
    "utterances": [
      {
        "speaker": "A",
        "text": "Can we review the Q3 forecast, s'il vous plaît?",
        "words": [
          {"word": "Can", "language": "en", "start": 0.0, "end": 0.3},
          {"word": "s'il", "language": "fr", "start": 4.1, "end": 4.4},
          {"word": "vous", "language": "fr", "start": 4.4, "end": 4.6},
          {"word": "plaît", "language": "fr", "start": 4.6, "end": 4.9}
        ]
      }
    ]
  }
}

Evaluating multilingual ASR in production

Code-switching ASR: WER benchmarks

Production results from teams using Gladia demonstrate the system's real-world performance across multilingual scenarios.

The summarization quality, named entity recognition output, and sentiment signals your downstream AI pipeline should depend on are ceiling-bounded by transcript accuracy. A 1-3% WER in production is not just a transcription metric. It is a data quality metric for every NLP task built on top of that transcript.

For multilingual products, performance matters most on messy production audio with accents, multiple speakers, overlap, and background noise, not only on clean benchmark samples.

Supported code-switching language pairs

The table below compares code-switching support, language coverage, and pricing model with diarization and NER included. Pricing structure is based on published information as of April 2026.

Provider	Code-switching support	Unique language coverage	Pricing model
Gladia (Solaria-1)	Native, automatic, across 100+ languages	100 languages, 42 exclusive to Gladia	All-inclusive: diarization, NER, sentiment, and translation included at no additional cost
Deepgram (Nova-3 Multilingual)	Real-time code-switching across 10 languages (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, Dutch)	45+ languages	Base rate varies by model; features metered separately as add-ons
AssemblyAI	Code-switching capability varies by model	Language coverage not independently verified	Base rate plus per-feature add-ons; total cost varies by feature selection
Google Cloud STT	Supports alternative language detection via `alternativeLanguageCodes` parameter	125 languages	Base rate plus additional charges depending on model and feature selection

‍

Vendor add-on pricing compounds across high-volume workloads: each feature metered separately makes the total bill exponentially harder to model. At 10,000 hours per month with diarization, NER, and sentiment enabled, the gap between all-inclusive and add-on-priced models is material. Our pricing page shows broad feature availability across paid plans.

Handling code-switching in async and real-time transcription

For async workloads, post-call analysis, meeting transcription, and contact center audio, processing the full recording gives Solaria-1 complete utterance context before output is finalized.

Language detection runs across the full audio span rather than on short leading chunks, which reduces mid-transcript language switches caused by ambiguous acoustic segments at segment boundaries.

Diarization accuracy also benefits: speaker turn boundaries are resolved against the full recording rather than inferred from a rolling buffer, which reduces speaker confusion in overlapping or accented speech.

For teams that need live transcription, voice agents, live captions, or live assist workflows, Solaria-1 delivers 103ms partial transcript latency and 270ms final transcript latency when measured over WebSocket connections with 16kHz audio at 30-second chunk lengths under production network conditions.

Language detection runs continuously within the transcription pipeline rather than as a preprocessing step, which removes additional routing logic at the orchestration layer. Teams building on real-time voice frameworks such as Pipecat (open-source voice AI pipeline framework), LiveKit (open-source WebRTC media infrastructure), or Vapi (commercial voice AI orchestration service) can pass audio directly without a separate language identification step upstream.

Test your own multilingual audio to see how Gladia handles automatic language detection and code-switching in production. Review pricing and start integrating with your integration running in less than a day, or review our full benchmark methodology to evaluate performance on your specific language pairs before committing.

FAQs

What is code-switching in ASR?

Code-switching in ASR is a model's ability to accurately transcribe audio when a speaker changes languages mid-conversation or mid-sentence, covering both intrasentential switches (within a single sentence) and intersentential switches (at sentence boundaries). Standard monolingual models fail at this because they were trained on single-language corpora and cannot handle the acoustic and linguistic patterns that emerge when languages blend.

How does Gladia handle code-switching?

Solaria-1 automatically detects and processes language changes across 100+ languages using a single end-to-end multilingual architecture rather than a language identification routing model. You enable it with a single parameter in the API request.

Do you charge extra for code-switching?

No. On Gladia’s paid plans, code-switching is available within the pricing model rather than as a separate per-feature add-on. Check the current pricing page for exact plan-level terms and enterprise packaging details.

Which language pairs have the most mature code-switching support in ASR?

Code-switching performance varies significantly across providers for all language pairs. Test with audio samples from your specific language combination to evaluate whether the system meets your accuracy requirements.

How do I evaluate whether an ASR vendor actually supports code-switching for my language pair?

Ask for WER benchmarks specifically on code-switched test sets rather than monolingual accuracy figures, request the benchmark methodology and dataset names, and test with real audio samples from your own users rather than clean studio recordings. Refer to benchmark methodology, dataset coverage, and code-switched evaluation conditions, then compare those results with Gladia’s benchmark framework covering 8 providers, 7 datasets, and 74+ hours of audio.

Key terms glossary

Intrasentential code-switching: Changing languages within a single sentence or clause, following the grammatical rules of both languages simultaneously.

Intersentential code-switching: Changing languages between sentence boundaries, where each complete utterance is internally consistent but the conversation alternates languages.

**Word error rate (WER):** The standard metric for ASR accuracy, calculated by adding substitutions, insertions, and deletions, then dividing by the total number of spoken words. Lower is better.

Language identification (LID) routing: An ASR architecture where a language detection model classifies audio first, then routes it to a monolingual ASR engine. This design creates accuracy and latency challenges for code-switching because mid-sentence language changes produce ambiguous routing signals.

Diarization error rate (DER): The standard metric for speaker diarization accuracy, measuring the proportion of audio incorrectly attributed to the wrong speaker. Lower is better.

Hinglish: A hybrid language variety combining Hindi and English, widely spoken across India and the South Asian diaspora. Hinglish speakers fluidly mix vocabulary, grammar, and phonology from both languages within conversations and often within individual sentences, making it a common and linguistically complex case study in automatic speech recognition and multilingual transcription systems.

Data Processing Agreement (DPA): A legally binding contract specifying how a vendor handles and protects customer data, required for GDPR compliance and typically reviewed during enterprise procurement.

Contact us

Your request has been registered

A problem occurred while submitting the form.

Speech-To-Text

Call center transcription software: what enterprises should look for in 2026

Speech-To-Text

PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

Speech-To-Text

GDPR, SOC 2, and ISO 27001 speech-to-text: the contact center compliance and certification guide

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

GDPR Compliant

HIPAA Compliant

AICPA SOC Type 2

ISO 27001 Compliant

Gladia

Become the Speech AI expert in your organization with content from Gladia right in your inbox, no more than twice a month.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing your navigation, you apply the use of cookies intended to improve the performance and the functionalities of this site.

No, thanks

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Read more

Call center transcription software: what enterprises should look for in 2026

PII redaction for call recordings: how ingestion-level redaction keeps calls PCI compliant

GDPR, SOC 2, and ISO 27001 speech-to-text: the contact center compliance and certification guide

Code-switching across 100+ languages: where ASR systems succeed and fail

Defining code-switching for ASR systems

Code-switching: definition and use cases

Monolingual ASR's WER challenge

Implementing code-switching ASR: requirements

Which language pairs have native code-switching support?

Spanish-English code-switching metrics

Hindi-English ASR code-switching capabilities

French-English ASR: production readiness

Evaluating Tagalog/Cantonese code-switching

ASR limitations for emerging language pairs

Why language coverage doesn't guarantee code-switching capability

Scarcity of code-switching training data

Why model architecture limits code-switching

ASR performance: monolingual vs. code-switched

Selecting multilingual code-switching for your product

Define your code-switching language scope

Request WER on code-switched test sets

Assess production code-switching accuracy

Gladia's solution for multilingual code-switching

Current code-switching language pairs

How Gladia measures code-switching

Real-world code-switching API demo

Evaluating multilingual ASR in production

Code-switching ASR: WER benchmarks

Supported code-switching language pairs

Handling code-switching in async and real-time transcription

FAQs

Key terms glossary

Contact us

Read more

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Gladia

Newsletter

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.