Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Pricing

Request a demo

Speech-To-Text

Mastering multilingual speech-to-text: handle code-switching with AI

The article explains why code-switching makes multilingual speech-to-text harder, especially when speakers switch languages mid-sentence or use accents in noisy environments.

Speech-To-Text

Best Whisper alternatives for 2026: Comparison of top speech-to-text APIs

The article compares the top Whisper alternatives for 2026 across accuracy, latency, pricing, features, and production readiness.

Speech-To-Text

Mastering CRM data enrichment: AI & speech-to-text for smarter leads

The article explains how AI and speech-to-text can enrich CRM records by turning sales calls into structured lead data like names, budgets, timelines, sentiment, and intent signals. It covers pipeline architecture, accuracy testing, compliance, cost planning, CRM integration, and production monitoring.

Migrating from Speechmatics to Gladia: step-by-step integration guide

Published on Apr 30, 2026

by Ani Ghazaryan

Migrating from Speechmatics to Gladia: step-by-step integration guide

This article explains how to migrate from Speechmatics to Gladia with minimal downtime. It covers the key API changes, staging tests, rollout steps, rollback planning, and troubleshooting needed for a safe production switch.

TL;DR

Most teams complete this migration in under a day of active engineering work.
The core changes are three: replace the Authorization: Bearer YOUR_API_KEY header with x-gladia-key: YOUR_API_KEY, swap the POST /v2/jobs endpoint for POST /v2/pre-recorded, and flatten the transcription_config payload structure.
Run Gladia in a parallel staging environment first, then execute a canary rollout before cutting over production traffic.
Run the parallel staging test in this guide to validate accuracy on your own audio before committing.
Diarization, translation, and NER are available as part of the standard API without requiring separate feature flags in the request.

For most teams, the hardest part of switching STT providers is not the code changes. It is validating that the new model handles your messy, real-world audio better than the old one. This guide gives you the exact API mappings, authentication changes, and testing strategies to migrate your audio pipeline from Speechmatics to Gladia with zero production downtime.

Configure your Gladia migration environment

Audit your current Speechmatics integration

Before writing a single line of migration code, map out exactly what you have running in production. Answer these four questions:

Batch or real-time? Identify whether you're using batch (async) or real-time (WebSocket) transcription. We separate these as /v2/pre-recorded (a REST endpoint for async) and a V2 WebSocket path for live transcription (see the live STT migration guide for the current endpoint).
Which add-ons are active? Note specifically whether diarization, translation, or sentiment analysis are in your current payload, since these affect both your request structure and your invoice.
Callback or polling? Document your current callback mechanism. We support polling via GET /v2/transcription/{id} status checks, a callback_url in the request body, and webhook configuration at the account level.
Language configuration? Document how you currently handle multilingual audio and code-switching. We handle code-switching automatically via language_config withcode_switching: true — no pre-specification of language pairs required.

Obtain your Gladia API keys

Generate your Gladia API key from the dashboard. The Starter plan includes 10 free hours per month, enough to run a full parallel test before committing.

Baseline your Speechmatics performance

Pull your last 30 days of data and record these four baselines. You will not cut over production traffic until Gladia matches or beats all four in staging:

WER: Calculate on at least 50 representative audio samples using your own reference transcripts.
P95 latency: Time from job submission to transcript available.
Monthly spend: Total including all add-on charges (translation is a separate service in Speechmatics).
Error rate: Percentage of jobs that returned a non-200 response.

Key API differences: Speechmatics vs. Gladia

Gladia vs. Speechmatics authentication

The authentication change is a single header replacement. Consult each provider's documentation for the specific header format.

Provider	Header	Value
Speechmatics	`Authorization`	`Bearer YOUR_API_KEY`
Gladia	`x-gladia-key`	`YOUR_API_KEY`

‍

Update your HTTP client configuration globally rather than per-request to avoid patchy behavior during the migration window.

API endpoint design & naming

Operation	Speechmatics	Gladia
Submit batch job	`POST /v2/jobs`	`POST /v2/pre-recorded`
Get job result	`GET /v2/jobs/{id}/transcript`	`GET /v2/transcription/{id}`
List jobs	`GET /v2/jobs`	`GET /v2/transcription`
Real-time stream	WebSocket (see Speechmatics RT docs)	WebSocket (V2 path, see the live STT docs)

‍

The batch V2 transcription API reference uses application/json.

Gladia request payload schema

We infer the job type from the endpoint. There is no transcription_config wrapper or type field. You toggle audio intelligence features like diarization, translation, and NER with boolean flags at the top level of the payload.

Speechmatics vs. Gladia output formats

Both APIs return JSON, but the response shapes differ. Consult each provider's API documentation for specific response structure details.

Field	Speechmatics	Gladia
Transcript path	`results`	`result.transcription.utterances`
Word-level results	`type: "word"`	`words`
Speaker ID	`speaker`	`speaker`
Timestamps	`start_time`, `end_time`	`start`, `end`
Confidence score	`confidence`	`confidence`

‍

WebSocket vs. REST: when to use

Gladia is async-first. For meeting assistants, post-call analytics, and Contact Center as a Service (CCaaS) workflows, use the POST /v2/pre-recorded REST endpoint for async transcription. The async pipeline processes audio at high speed and runs full-context analysis that produces higher-accuracy diarization and multilingual output. For voice agents and live captions where you need low-latency output, use the WebSocket path. Note that some audio intelligence features, such as full-context diarization accuracy, may perform differently in real-time mode, check the live STT docs for the current feature matrix.

Establishing Gladia API connectivity

Set up Gladia auth variables

Store your Gladia API key as an environment variable, not in source code:

# .env
GLADIA_API_KEY=your_gladia_api_key_here

Configure Gladia header authentication

Python:

import os
import requests

headers = {
    "x-gladia-key": os.environ["GLADIA_API_KEY"],
    "Content-Type": "application/json"
}

JavaScript/TypeScript:

const headers = {
  "x-gladia-key": process.env.GLADIA_API_KEY,
  "Content-Type": "application/json"
};

Validate Gladia auth in staging

Run this cURL command to confirm your key is active before writing any migration logic:

curl --request GET \
  --url https://api.gladia.io/v2/transcription \
  --header "x-gladia-key: $GLADIA_API_KEY"

A successful response confirms authentication is working. An authentication error means the key is wrong or inactive. Check your compliance hub configuration if you are using an on-premises or air-gapped deployment, since those endpoints differ.

Mapping Gladia API request payloads

Gladia audio input parameter mapping

The API accepts audio via URL or direct file upload. Map your Speechmatics fetch_data.url to our top-level https://docs.gladia.io/api-reference/v2/pre-recorded/init. YouTube URLs are supported directly.

Set Gladia language & model parameters

Configure language_config based on your audio characteristics. If the object is omitted or languages is left empty, we auto-detect a single language from the audio.

Mode	`language_config` value	Behavior
Manual	`"languages": ["en"]`	Transcribe using the specified language only
Automatic single language	`"languages": []` (or omitted)	Auto-detect a single language from the audio
Automatic multiple languages	`"languages": [], "code_switching": true`	Auto-detect and transcribe mid-conversation code-switching

‍

For CCaaS and contact center audio where speakers may switch languages mid-call, set language_config to {"languages": [], "code_switching": true} for automatic multilingual detection. We detect and transcribe code-switching across all 100+ supported languages without requiring you to pre-specify expected language pairs.

API parameters for diarization & flags

We offer speaker diarization in async transcription, powered by pyannoteAI's Precision-2 model. Enable it with a single boolean in async transcription requests:

{
  "audio_url": "https://your-audio-host.com/audio.wav",
  "diarization": true,
  "diarization_config": {
    "number_of_speakers": 2,
    "min_speakers": 1,
    "max_speakers": 5
  }
}

If you omit diarization_config (or the number_of_speakers field within it), we infer the speaker count from the audio. See what is diarization for more detail on how this works.

PII redaction must be explicitly configured in the request payload and is not enabled by default. Never assume transcripts are anonymized unless you have explicitly enabled redaction in your integration.

Speechmatics to Gladia: code changes

Speechmatics batch request:

import os
import requests

API_KEY = os.environ["SPEECHMATICS_API_KEY"]
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}
url = "https://asr.api.speechmatics.com/v2/jobs"
data = {
    "fetch_data": {
        "url": "https://your-audio-host.com/audio.wav"
    },
    "transcription_config": {
        "language": "en",
        "diarization": "speaker",
        "operating_point": "enhanced"
    }
}
response = requests.post(url, json=data, headers=headers)

Equivalent Gladia request:

import os
import requests

API_KEY = os.environ["GLADIA_API_KEY"]
headers = {"x-gladia-key": API_KEY, "Content-Type": "application/json"}
url = "https://api.gladia.io/v2/pre-recorded"
data = {
    "audio_url": "https://your-audio-host.com/audio.wav",
    "diarization": True,
    "named_entity_recognition": True,
    "language_behaviour""language_config""language_config""language_config": "automatic multiple {"languages"{: [], "code_switching": True
    }True}True}
}
response = requests.post(url, json=data, headers=headers)

The key structural changes: the authentication configuration, the endpoint path, and the payload structure. The async endpoint accepts WAV, M4A, FLAC, and AAC formats in addition to direct audio URLs.

Consuming Gladia's ASR results & metadata

Gladia transcript data structure

Gladia's async response wraps the transcription inside a result object. Your response parser needs to consume result.transcription.utterances, not transcription.utterances:

{
  "result": {
    "transcription": {
      "utterances": [
        {
          "speaker": 0,
          "start": 0.48,
          "end": 4.12,
          "transcript": "The contract renews on the 15th.",
          "words": [
            {
              "word": "contract",
              "start": 0.96,
              "end": 1.44,
              "confidence": 0.99
            }
          ]
        }
      ]
    }
  }
}

Pull start and end from each word object for word-level alignment. The confidence field sits directly on the word object, unlike Speechmatics' alternatives array pattern. This flatter structure simplifies downstream parsing for CRM field population where you want to flag low-confidence entity extractions.

Connect Gladia output to your pipeline

Summaries, action items, named entities, and sentiment scores all return in the same response object, so one API call gives your CRM webhook or coaching scorecard everything it needs.

Verifying Gladia performance post-migration

Validate Gladia in staging alongside Speechmatics

Run both APIs in parallel on the same audio before touching production traffic. Log the full response from each provider against a shared reference transcript for each audio file. Your comparison metrics:

WER per language segment
Speaker diarization error rate (DER) on multi-speaker calls
Named entity extraction recall on domain-specific terms (account numbers, product names, agent IDs)
P95 latency from submission to result available

Benchmark Gladia WER with custom data

Do not rely solely on published benchmarks. Run Solaria-1 on your own audio and compute WER against hand-corrected references: 50+ calls spanning your actual language distribution, including your noisiest recordings and the languages where you have had the most churn or support tickets.

The Gladia benchmark methodology evaluated Solaria-1 against 8 providers across 7 datasets and 74+ hours of audio, showing up to 29% lower WER on conversational speech and up to 3x lower DER vs. alternatives. That data gives you a directional signal, but your evaluation on your audio is the only number that matters for your go/no-go decision.

Validate cost against usage peaks

At 10,000 hours/month, the cost structure difference between Speechmatics with diarization and translation enabled versus Gladia's Growth tier is significant:

Provider / Plan	Rate	10,000 hrs/month
Gladia Starter	$0.61/hr	$6,100/month (all-in)
Gladia Growth	From $0.20/hr	From $2,000/month (all-in, with commitment)
Speechmatics	Pro from $0.24/hr + add-ons	From ~$2,400/month base, add-ons priced separately

‍

At production volumes with diarization, translation, and NER active, Gladia's pricing provides a materially simpler cost model where all audio intelligence features are bundled in the base rate on Starter and higher plans.

Gladia's approach to data residency

On Growth and Enterprise plans, your audio is not used to retrain Gladia's models. On the Starter plan, audio may be used for model training by default. If you process regulated audio (financial conversations, health data, legal calls), you should upgrade to Growth or Enterprise. The compliance hub documents SOC 2 Type II, ISO 27001, HIPAA, and GDPR certifications.

If air-gapped deployment is a hard requirement rather than a preference, evaluate Gladia's Enterprise on-premises option against your specific deployment constraints before committing.

Gladia go-live: production traffic cutover

Configure feature flags for Gladia rollout

Wrap your STT client call in a feature flag before touching production:

def transcribe(audio_url: str) -> dict:
    # Use your feature flag service: LaunchDarkly, Flagsmith, etc.
    if feature_flags.get("use_gladia"):
        return gladia_client.transcribe(audio_url)
    return speechmatics_client.transcribe(audio_url)

This gives you an instant rollback path without a code deploy. Store the flag in your feature flag service so you can flip it per-user or per-traffic-percentage without redeploying.

Execute a canary rollout

Route a small percentage of your production audio to Gladia while keeping the majority on Speechmatics. Monitor error rates, latency, and transcription accuracy before expanding. Define your own go/no-go criteria based on your baseline metrics and business requirements.

Incrementally route traffic to Gladia

Follow this ramp schedule, holding at each stage until metrics are stable:

5% - Monitor until metrics stabilize
25% - Continue observation
50% - Extended monitoring to catch edge cases
100% - full cutover

Do not skip stages. The 50% stage is where you are most likely to surface edge cases that didn't appear in the canary: unusual audio formats, very short calls, or specific language combinations that were statistically rare in the earlier slice.

Migration safety: fallback and contingency plans

Ensure Speechmatics rollback capability

Keep your Speechmatics API keys active and your Speechmatics client code in the codebase for the full observation period after cutover. Maintain rollback capability until you have completed a sufficient period of stable production operation on Gladia.

Set rollback triggers

Define automated rollback conditions before the canary goes live:

Gladia API error rate exceeds your acceptable threshold
P95 transcription latency degrades beyond your service level requirements
WER on sampled calls shows meaningful degradation vs. baseline
Any compliance or data residency alerts from your monitoring systems

When a rollback trigger fires, your feature flag flips 100% of traffic back to Speechmatics without a code deploy. After a sufficient period of production traffic on Gladia with no rollback triggers firing, revoke your Speechmatics API keys, remove the Speechmatics client from your codebase, and update your infrastructure cost model to reflect the new per-hour billing.

Troubleshooting your Gladia migration

Preventing shadow traffic write duplication

The most common issue with parallel staging is duplicate downstream writes. If your transcription callback triggers a CRM update or a database write, Gladia responses in staging will attempt to execute the same downstream logic as the Speechmatics response. Fix this by wrapping shadow responses in a flag check that prevents any write operation when the source is the shadow provider. Log shadow results to a separate store for comparison only.

Preventing in-flight data loss

During the cutover window, some jobs may be submitted to Speechmatics and complete after you have flipped the feature flag to Gladia. Handle this by:

Tagging each job with the provider it was submitted to at submission time.
Routing result callbacks to the correct response handler based on the provider tag, not the current flag state.
Timing the feature flag change for a low-submission window, end of business or off-peak hours, and waiting until your internal records show no Speechmatics job IDs submitted in the last N minutes, where N equals your P95 job completion time from your baseline metrics.

Historical audio reprocessing strategy

To reprocess historical audio through Gladia and normalize transcript quality across your dataset, batch-submit your archived audio URLs to Gladia's /v2/pre-recorded async endpoint. We process audio at approximately 60x realtime speed in async mode (meaning 1 hour of audio takes roughly 1 minute to process), so 1,000 hours of historical recordings processes in approximately 17 hours at standard concurrency.

Start your migration

Migration checklist:

Audit current Speechmatics integration (batch vs. real-time, active features, callback setup)
Document WER, P95 latency, monthly spend, and error rate baseline
Create Gladia account and generate API key
Set up parallel staging environment with audio router
Update authentication headers from Authorization: Bearer to x-gladia-key
Map API endpoints from /v2/jobs to /v2/pre-recorded
Update request payload structure: remove the transcription_config wrapper and use top-level boolean flags
Configure language_config: {"languages": [], "code_switching": true} for multilingual audio
Enable diarization: true if speaker attribution is required
Configure callback_url or webhook in Gladia account settings
Update response parser to consume Gladia's result.transcription.utterances response structure
Run parallel calls in staging and compare WER, DER, and latency
Wrap STT client in a feature flag before touching production
Execute canary rollout and hold at each traffic percentage until metrics are stable
Keep Speechmatics keys active for observation period
Deprecate Speechmatics code and credentials after stable observation window

Start with 10 free hours to test Solaria-1 on your own audio before migrating production traffic. Run the parallel staging test outlined in this guide, validate WER on your noisiest recordings, and execute the canary rollout when your metrics prove Gladia handles your real-world conditions better than Speechmatics.

FAQs

How long does a Speechmatics to Gladia migration actually take?

The code changes themselves are straightforward. The full cutover timeline depends on your canary rollout schedule and observation period before you fully deprecate Speechmatics.

Does Gladia support the same on-premises deployment options as Speechmatics?

We offer deployment flexibility at the Enterprise tier. If air-gapped deployment is a hard requirement, evaluate both providers' current on-premises offerings against your specific security and deployment constraints before committing.

Are diarization and translation included in Gladia's base price?

Yes, on Starter ($0.61/hr async) and Growth (from $0.20/hr async) plans, diarization, translation, NER, and sentiment analysis are all included in the per-hour rate with no add-on charges. Enterprise pricing is custom and can be debundled if needed.

Is Gladia's code-switching detection automatic or does it require pre-configuration?

We detect mid-conversation language changes automatically when language_config is set to {"languages": [], "code_switching": true}. You do not need to specify expected language pairs in advance.

What happens to customer audio on the Gladia Starter plan?

On the Starter plan, audio may be used for model training by default. On Growth and Enterprise plans, customer audio is not used for model retraining. If data privacy is a hard requirement, upgrade to Growth or above before processing production audio.

How do I handle Speechmatics jobs that are still in flight when I flip the feature flag?

Tag each job with the provider at submission time and route the callback to the correct handler based on the tag rather than the current feature flag state. This prevents in-flight jobs from routing their results to the wrong response parser during the cutover window.

What file formats and sizes does Gladia accept for async transcription?

We accept various audio formats including WAV, M4A, FLAC, and AAC. YouTube URLs are supported directly. Consult the API reference for specific file size and duration limits.

Key terms glossary

WER (word error rate): The percentage of words in a transcript that differ from the reference transcription, calculated as substitutions plus insertions plus deletions divided by total reference words. Lower is better.

DER (diarization error rate): The percentage of audio time incorrectly attributed to a speaker, measuring how accurately a model separates multi-speaker audio into speaker turns.

Code-switching: When a speaker changes languages mid-conversation, often within a single sentence. Requires the ASR model to detect the language boundary and switch language contexts without breaking the output.

Canary deployment: A release strategy where a small percentage of production traffic routes to the new system while the majority stays on the existing system, allowing rollback before exposing all users to a change.

Language configuration (language_config): Gladia's V2 API object controlling language handling. Set "languages": ["en"] to pin a specific language, omit languages or leave it empty for single-language auto-detection, or add "code_switching": true alongside an empty languages array to enable automatic mid-conversation language detection across all 100+ supported languages.

Diarization: The process of attributing transcript segments to individual speakers. Gladia provides diarization in async transcription, powered by pyannoteAI's Precision-2 model.

Feature flag: A configuration toggle that routes application logic to different code paths without requiring a deployment. Used in this migration to switch traffic between Speechmatics and Gladia without a code push.

Contact us

Your request has been registered

A problem occurred while submitting the form.

Speech-To-Text

Mastering multilingual speech-to-text: handle code-switching with AI

Speech-To-Text

Best Whisper alternatives for 2026: Comparison of top speech-to-text APIs

Speech-To-Text

Mastering CRM data enrichment: AI & speech-to-text for smarter leads

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

GDPR Compliant

HIPAA Compliant

AICPA SOC Type 2

ISO 27001 Compliant

Gladia

Become the Speech AI expert in your organization with content from Gladia right in your inbox, no more than twice a month.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing your navigation, you apply the use of cookies intended to improve the performance and the functionalities of this site.

No, thanks

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Read more

Mastering multilingual speech-to-text: handle code-switching with AI

Best Whisper alternatives for 2026: Comparison of top speech-to-text APIs

Mastering CRM data enrichment: AI & speech-to-text for smarter leads

Migrating from Speechmatics to Gladia: step-by-step integration guide

Configure your Gladia migration environment

Audit your current Speechmatics integration

Obtain your Gladia API keys

Baseline your Speechmatics performance

Key API differences: Speechmatics vs. Gladia

Gladia vs. Speechmatics authentication

API endpoint design & naming

Gladia request payload schema

Speechmatics vs. Gladia output formats

WebSocket vs. REST: when to use

Establishing Gladia API connectivity

Set up Gladia auth variables

Configure Gladia header authentication

Validate Gladia auth in staging

Mapping Gladia API request payloads

Gladia audio input parameter mapping

Set Gladia language & model parameters

API parameters for diarization & flags

Speechmatics to Gladia: code changes

Consuming Gladia's ASR results & metadata

Gladia transcript data structure

Connect Gladia output to your pipeline

Verifying Gladia performance post-migration

Validate Gladia in staging alongside Speechmatics

Benchmark Gladia WER with custom data

Validate cost against usage peaks

Gladia's approach to data residency

Gladia go-live: production traffic cutover

Configure feature flags for Gladia rollout

Execute a canary rollout

Incrementally route traffic to Gladia

Migration safety: fallback and contingency plans

Ensure Speechmatics rollback capability

Set rollback triggers

Troubleshooting your Gladia migration

Preventing shadow traffic write duplication

Preventing in-flight data loss

Historical audio reprocessing strategy

Start your migration

FAQs

How long does a Speechmatics to Gladia migration actually take?

Does Gladia support the same on-premises deployment options as Speechmatics?

Are diarization and translation included in Gladia's base price?

Is Gladia's code-switching detection automatic or does it require pre-configuration?

What happens to customer audio on the Gladia Starter plan?

How do I handle Speechmatics jobs that are still in flight when I flip the feature flag?

What file formats and sizes does Gladia accept for async transcription?

Key terms glossary

Contact us

Read more

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Gladia

Newsletter

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.