Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Read more

Speech-To-Text

Mastering multilingual speech-to-text: handle code-switching with AI

The article explains why code-switching makes multilingual speech-to-text harder, especially when speakers switch languages mid-sentence or use accents in noisy environments.

Speech-To-Text

Best Whisper alternatives for 2026: Comparison of top speech-to-text APIs

The article compares the top Whisper alternatives for 2026 across accuracy, latency, pricing, features, and production readiness.

Speech-To-Text

Mastering CRM data enrichment: AI & speech-to-text for smarter leads

The article explains how AI and speech-to-text can enrich CRM records by turning sales calls into structured lead data like names, budgets, timelines, sentiment, and intent signals. It covers pipeline architecture, accuracy testing, compliance, cost planning, CRM integration, and production monitoring.

Migrating from Speechmatics to Gladia: step-by-step integration guide

Published on Apr 30, 2026
by Ani Ghazaryan
Migrating from Speechmatics to Gladia: step-by-step integration guide

This article explains how to migrate from Speechmatics to Gladia with minimal downtime. It covers the key API changes, staging tests, rollout steps, rollback planning, and troubleshooting needed for a safe production switch.

TL;DR

  • Most teams complete this migration in under a day of active engineering work.
  • The core changes are three: replace the Authorization: Bearer YOUR_API_KEY header with x-gladia-key: YOUR_API_KEY, swap the POST /v2/jobs endpoint for POST /v2/pre-recorded, and flatten the transcription_config payload structure.
  • Run Gladia in a parallel staging environment first, then execute a canary rollout before cutting over production traffic.
  • Run the parallel staging test in this guide to validate accuracy on your own audio before committing.
  • Diarization, translation, and NER are available as part of the standard API without requiring separate feature flags in the request.

For most teams, the hardest part of switching STT providers is not the code changes. It is validating that the new model handles your messy, real-world audio better than the old one. This guide gives you the exact API mappings, authentication changes, and testing strategies to migrate your audio pipeline from Speechmatics to Gladia with zero production downtime.

Configure your Gladia migration environment

Audit your current Speechmatics integration

Before writing a single line of migration code, map out exactly what you have running in production. Answer these four questions:

  • Batch or real-time? Identify whether you're using batch (async) or real-time (WebSocket) transcription. We separate these as /v2/pre-recorded (a REST endpoint for async) and a V2 WebSocket path for live transcription (see the live STT migration guide for the current endpoint).
  • Which add-ons are active? Note specifically whether diarization, translation, or sentiment analysis are in your current payload, since these affect both your request structure and your invoice.
  • Callback or polling? Document your current callback mechanism. We support polling via GET /v2/transcription/{id} status checks, a callback_url in the request body, and webhook configuration at the account level.
  • Language configuration? Document how you currently handle multilingual audio and code-switching. We handle code-switching automatically via language_config withcode_switching: true — no pre-specification of language pairs required.

Obtain your Gladia API keys

Generate your Gladia API key from the dashboard. The Starter plan includes 10 free hours per month, enough to run a full parallel test before committing.

Baseline your Speechmatics performance

Pull your last 30 days of data and record these four baselines. You will not cut over production traffic until Gladia matches or beats all four in staging:

  • WER: Calculate on at least 50 representative audio samples using your own reference transcripts.
  • P95 latency: Time from job submission to transcript available.
  • Monthly spend: Total including all add-on charges (translation is a separate service in Speechmatics).
  • Error rate: Percentage of jobs that returned a non-200 response.

Key API differences: Speechmatics vs. Gladia

Gladia vs. Speechmatics authentication

The authentication change is a single header replacement. Consult each provider's documentation for the specific header format.

Provider Header Value
Speechmatics Authorization Bearer YOUR_API_KEY
Gladia x-gladia-key YOUR_API_KEY

Update your HTTP client configuration globally rather than per-request to avoid patchy behavior during the migration window.

API endpoint design & naming

Operation Speechmatics Gladia
Submit batch job POST /v2/jobs POST /v2/pre-recorded
Get job result GET /v2/jobs/{id}/transcript GET /v2/transcription/{id}
List jobs GET /v2/jobs GET /v2/transcription
Real-time stream WebSocket (see Speechmatics RT docs) WebSocket (V2 path, see the live STT docs)

The batch V2 transcription API reference uses application/json.

Gladia request payload schema

We infer the job type from the endpoint. There is no transcription_config wrapper or type field. You toggle audio intelligence features like diarization, translation, and NER with boolean flags at the top level of the payload.

Speechmatics vs. Gladia output formats

Both APIs return JSON, but the response shapes differ. Consult each provider's API documentation for specific response structure details.

Field Speechmatics Gladia
Transcript path results result.transcription.utterances
Word-level results type: "word" words
Speaker ID speaker speaker
Timestamps start_time, end_time start, end
Confidence score confidence confidence

WebSocket vs. REST: when to use

Gladia is async-first. For meeting assistants, post-call analytics, and Contact Center as a Service (CCaaS) workflows, use the POST /v2/pre-recorded REST endpoint for async transcription. The async pipeline processes audio at high speed and runs full-context analysis that produces higher-accuracy diarization and multilingual output. For voice agents and live captions where you need low-latency output, use the WebSocket path. Note that some audio intelligence features, such as full-context diarization accuracy, may perform differently in real-time mode, check the live STT docs for the current feature matrix.

Establishing Gladia API connectivity

Set up Gladia auth variables

Store your Gladia API key as an environment variable, not in source code:

# .env
GLADIA_API_KEY=your_gladia_api_key_here

Configure Gladia header authentication

Python:

import os
import requests

headers = {
    "x-gladia-key": os.environ["GLADIA_API_KEY"],
    "Content-Type": "application/json"
}

JavaScript/TypeScript:

const headers = {
  "x-gladia-key": process.env.GLADIA_API_KEY,
  "Content-Type": "application/json"
};

Validate Gladia auth in staging

Run this cURL command to confirm your key is active before writing any migration logic:

curl --request GET \
  --url https://api.gladia.io/v2/transcription \
  --header "x-gladia-key: $GLADIA_API_KEY"

A successful response confirms authentication is working. An authentication error means the key is wrong or inactive. Check your compliance hub configuration if you are using an on-premises or air-gapped deployment, since those endpoints differ.

Mapping Gladia API request payloads

Gladia audio input parameter mapping

The API accepts audio via URL or direct file upload. Map your Speechmatics fetch_data.url to our top-level https://docs.gladia.io/api-reference/v2/pre-recorded/init. YouTube URLs are supported directly.

Set Gladia language & model parameters

Configure language_config based on your audio characteristics. If the object is omitted or languages is left empty, we auto-detect a single language from the audio.

Mode language_config value Behavior
Manual "languages": ["en"] Transcribe using the specified language only
Automatic single language "languages": [] (or omitted) Auto-detect a single language from the audio
Automatic multiple languages "languages": [], "code_switching": true Auto-detect and transcribe mid-conversation code-switching

For CCaaS and contact center audio where speakers may switch languages mid-call, set language_config to {"languages": [], "code_switching": true} for automatic multilingual detection. We detect and transcribe code-switching across all 100+ supported languages without requiring you to pre-specify expected language pairs.

API parameters for diarization & flags

We offer speaker diarization in async transcription, powered by pyannoteAI's Precision-2 model. Enable it with a single boolean in async transcription requests:

{
  "audio_url": "https://your-audio-host.com/audio.wav",
  "diarization": true,
  "diarization_config": {
    "number_of_speakers": 2,
    "min_speakers": 1,
    "max_speakers": 5
  }
}

If you omit diarization_config (or the number_of_speakers field within it), we infer the speaker count from the audio. See what is diarization for more detail on how this works.

PII redaction must be explicitly configured in the request payload and is not enabled by default. Never assume transcripts are anonymized unless you have explicitly enabled redaction in your integration.

Speechmatics to Gladia: code changes

Speechmatics batch request:

import os
import requests

API_KEY = os.environ["SPEECHMATICS_API_KEY"]
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}
url = "https://asr.api.speechmatics.com/v2/jobs"
data = {
    "fetch_data": {
        "url": "https://your-audio-host.com/audio.wav"
    },
    "transcription_config": {
        "language": "en",
        "diarization": "speaker",
        "operating_point": "enhanced"
    }
}
response = requests.post(url, json=data, headers=headers)

Equivalent Gladia request:

import os
import requests

API_KEY = os.environ["GLADIA_API_KEY"]
headers = {"x-gladia-key": API_KEY, "Content-Type": "application/json"}
url = "https://api.gladia.io/v2/pre-recorded"
data = {
    "audio_url": "https://your-audio-host.com/audio.wav",
    "diarization": True,
    "named_entity_recognition": True,
    "language_behaviour""language_config""language_config""language_config": "automatic multiple {"languages"{: [], "code_switching": True
    }True}True}
}
response = requests.post(url, json=data, headers=headers)

The key structural changes: the authentication configuration, the endpoint path, and the payload structure. The async endpoint accepts WAV, M4A, FLAC, and AAC formats in addition to direct audio URLs.

Consuming Gladia's ASR results & metadata

Gladia transcript data structure

Gladia's async response wraps the transcription inside a result object. Your response parser needs to consume result.transcription.utterances, not transcription.utterances:

{
  "result": {
    "transcription": {
      "utterances": [
        {
          "speaker": 0,
          "start": 0.48,
          "end": 4.12,
          "transcript": "The contract renews on the 15th.",
          "words": [
            {
              "word": "contract",
              "start": 0.96,
              "end": 1.44,
              "confidence": 0.99
            }
          ]
        }
      ]
    }
  }
}

Pull start and end from each word object for word-level alignment. The confidence field sits directly on the word object, unlike Speechmatics' alternatives array pattern. This flatter structure simplifies downstream parsing for CRM field population where you want to flag low-confidence entity extractions.

Connect Gladia output to your pipeline

Summaries, action items, named entities, and sentiment scores all return in the same response object, so one API call gives your CRM webhook or coaching scorecard everything it needs.

Verifying Gladia performance post-migration

Validate Gladia in staging alongside Speechmatics

Run both APIs in parallel on the same audio before touching production traffic. Log the full response from each provider against a shared reference transcript for each audio file. Your comparison metrics:

  • WER per language segment
  • Speaker diarization error rate (DER) on multi-speaker calls
  • Named entity extraction recall on domain-specific terms (account numbers, product names, agent IDs)
  • P95 latency from submission to result available

Benchmark Gladia WER with custom data

Do not rely solely on published benchmarks. Run Solaria-1 on your own audio and compute WER against hand-corrected references: 50+ calls spanning your actual language distribution, including your noisiest recordings and the languages where you have had the most churn or support tickets.

The Gladia benchmark methodology evaluated Solaria-1 against 8 providers across 7 datasets and 74+ hours of audio, showing up to 29% lower WER on conversational speech and up to 3x lower DER vs. alternatives. That data gives you a directional signal, but your evaluation on your audio is the only number that matters for your go/no-go decision.

Validate cost against usage peaks

At 10,000 hours/month, the cost structure difference between Speechmatics with diarization and translation enabled versus Gladia's Growth tier is significant:

Provider / Plan Rate 10,000 hrs/month
Gladia Starter $0.61/hr $6,100/month (all-in)
Gladia Growth From $0.20/hr From $2,000/month (all-in, with commitment)
Speechmatics Pro from $0.24/hr + add-ons From ~$2,400/month base, add-ons priced separately

At production volumes with diarization, translation, and NER active, Gladia's pricing provides a materially simpler cost model where all audio intelligence features are bundled in the base rate on Starter and higher plans.

Gladia's approach to data residency

On Growth and Enterprise plans, your audio is not used to retrain Gladia's models. On the Starter plan, audio may be used for model training by default. If you process regulated audio (financial conversations, health data, legal calls), you should upgrade to Growth or Enterprise. The compliance hub documents SOC 2 Type II, ISO 27001, HIPAA, and GDPR certifications.

If air-gapped deployment is a hard requirement rather than a preference, evaluate Gladia's Enterprise on-premises option against your specific deployment constraints before committing.

Gladia go-live: production traffic cutover

Configure feature flags for Gladia rollout

Wrap your STT client call in a feature flag before touching production:

def transcribe(audio_url: str) -> dict:
    # Use your feature flag service: LaunchDarkly, Flagsmith, etc.
    if feature_flags.get("use_gladia"):
        return gladia_client.transcribe(audio_url)
    return speechmatics_client.transcribe(audio_url)

This gives you an instant rollback path without a code deploy. Store the flag in your feature flag service so you can flip it per-user or per-traffic-percentage without redeploying.

Execute a canary rollout

Route a small percentage of your production audio to Gladia while keeping the majority on Speechmatics. Monitor error rates, latency, and transcription accuracy before expanding. Define your own go/no-go criteria based on your baseline metrics and business requirements.

Incrementally route traffic to Gladia

Follow this ramp schedule, holding at each stage until metrics are stable:

  1. 5% - Monitor until metrics stabilize
  2. 25% - Continue observation
  3. 50% - Extended monitoring to catch edge cases
  4. 100% - full cutover

Do not skip stages. The 50% stage is where you are most likely to surface edge cases that didn't appear in the canary: unusual audio formats, very short calls, or specific language combinations that were statistically rare in the earlier slice.

Migration safety: fallback and contingency plans

Ensure Speechmatics rollback capability

Keep your Speechmatics API keys active and your Speechmatics client code in the codebase for the full observation period after cutover. Maintain rollback capability until you have completed a sufficient period of stable production operation on Gladia.

Set rollback triggers

Define automated rollback conditions before the canary goes live:

  • Gladia API error rate exceeds your acceptable threshold
  • P95 transcription latency degrades beyond your service level requirements
  • WER on sampled calls shows meaningful degradation vs. baseline
  • Any compliance or data residency alerts from your monitoring systems

When a rollback trigger fires, your feature flag flips 100% of traffic back to Speechmatics without a code deploy. After a sufficient period of production traffic on Gladia with no rollback triggers firing, revoke your Speechmatics API keys, remove the Speechmatics client from your codebase, and update your infrastructure cost model to reflect the new per-hour billing.

Troubleshooting your Gladia migration

Preventing shadow traffic write duplication

The most common issue with parallel staging is duplicate downstream writes. If your transcription callback triggers a CRM update or a database write, Gladia responses in staging will attempt to execute the same downstream logic as the Speechmatics response. Fix this by wrapping shadow responses in a flag check that prevents any write operation when the source is the shadow provider. Log shadow results to a separate store for comparison only.

Preventing in-flight data loss

During the cutover window, some jobs may be submitted to Speechmatics and complete after you have flipped the feature flag to Gladia. Handle this by:

  • Tagging each job with the provider it was submitted to at submission time.
  • Routing result callbacks to the correct response handler based on the provider tag, not the current flag state.
  • Timing the feature flag change for a low-submission window, end of business or off-peak hours, and waiting until your internal records show no Speechmatics job IDs submitted in the last N minutes, where N equals your P95 job completion time from your baseline metrics.

Historical audio reprocessing strategy

To reprocess historical audio through Gladia and normalize transcript quality across your dataset, batch-submit your archived audio URLs to Gladia's /v2/pre-recorded async endpoint. We process audio at approximately 60x realtime speed in async mode (meaning 1 hour of audio takes roughly 1 minute to process), so 1,000 hours of historical recordings processes in approximately 17 hours at standard concurrency.

Start your migration

Migration checklist:

  • Audit current Speechmatics integration (batch vs. real-time, active features, callback setup)
  • Document WER, P95 latency, monthly spend, and error rate baseline
  • Create Gladia account and generate API key
  • Set up parallel staging environment with audio router
  • Update authentication headers from Authorization: Bearer to x-gladia-key
  • Map API endpoints from /v2/jobs to /v2/pre-recorded
  • Update request payload structure: remove the transcription_config wrapper and use top-level boolean flags
  • Configure language_config: {"languages": [], "code_switching": true} for multilingual audio
  • Enable diarization: true if speaker attribution is required
  • Configure callback_url or webhook in Gladia account settings
  • Update response parser to consume Gladia's result.transcription.utterances response structure
  • Run parallel calls in staging and compare WER, DER, and latency
  • Wrap STT client in a feature flag before touching production
  • Execute canary rollout and hold at each traffic percentage until metrics are stable
  • Keep Speechmatics keys active for observation period
  • Deprecate Speechmatics code and credentials after stable observation window

Start with 10 free hours to test Solaria-1 on your own audio before migrating production traffic. Run the parallel staging test outlined in this guide, validate WER on your noisiest recordings, and execute the canary rollout when your metrics prove Gladia handles your real-world conditions better than Speechmatics.

FAQs

How long does a Speechmatics to Gladia migration actually take?

The code changes themselves are straightforward. The full cutover timeline depends on your canary rollout schedule and observation period before you fully deprecate Speechmatics.

Does Gladia support the same on-premises deployment options as Speechmatics?

We offer deployment flexibility at the Enterprise tier. If air-gapped deployment is a hard requirement, evaluate both providers' current on-premises offerings against your specific security and deployment constraints before committing.

Are diarization and translation included in Gladia's base price?

Yes, on Starter ($0.61/hr async) and Growth (from $0.20/hr async) plans, diarization, translation, NER, and sentiment analysis are all included in the per-hour rate with no add-on charges. Enterprise pricing is custom and can be debundled if needed.

Is Gladia's code-switching detection automatic or does it require pre-configuration?

We detect mid-conversation language changes automatically when language_config is set to {"languages": [], "code_switching": true}. You do not need to specify expected language pairs in advance.

What happens to customer audio on the Gladia Starter plan?

On the Starter plan, audio may be used for model training by default. On Growth and Enterprise plans, customer audio is not used for model retraining. If data privacy is a hard requirement, upgrade to Growth or above before processing production audio.

How do I handle Speechmatics jobs that are still in flight when I flip the feature flag?

Tag each job with the provider at submission time and route the callback to the correct handler based on the tag rather than the current feature flag state. This prevents in-flight jobs from routing their results to the wrong response parser during the cutover window.

What file formats and sizes does Gladia accept for async transcription?

We accept various audio formats including WAV, M4A, FLAC, and AAC. YouTube URLs are supported directly. Consult the API reference for specific file size and duration limits.

Key terms glossary

WER (word error rate): The percentage of words in a transcript that differ from the reference transcription, calculated as substitutions plus insertions plus deletions divided by total reference words. Lower is better.

DER (diarization error rate): The percentage of audio time incorrectly attributed to a speaker, measuring how accurately a model separates multi-speaker audio into speaker turns.

Code-switching: When a speaker changes languages mid-conversation, often within a single sentence. Requires the ASR model to detect the language boundary and switch language contexts without breaking the output.

Canary deployment: A release strategy where a small percentage of production traffic routes to the new system while the majority stays on the existing system, allowing rollback before exposing all users to a change.

Language configuration (language_config): Gladia's V2 API object controlling language handling. Set "languages": ["en"] to pin a specific language, omit languages or leave it empty for single-language auto-detection, or add "code_switching": true alongside an empty languages array to enable automatic mid-conversation language detection across all 100+ supported languages.

Diarization: The process of attributing transcript segments to individual speakers. Gladia provides diarization in async transcription, powered by pyannoteAI's Precision-2 model.

Feature flag: A configuration toggle that routes application logic to different code paths without requiring a deployment. Used in this migration to switch traffic between Speechmatics and Gladia without a code push.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more