Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Pricing

Request a demo

Get started

Speech-To-Text

How decision intelligence improves customer service consistency in contact centers

TL;DR: Contact centers fail to deliver consistent service when routing infrastructure runs on static rules engines that cannot handle the complexity of real human conversation. Modern speech-to-text infrastructure addresses this by processing raw audio and feeding structured outputs to your CRM, using machine learning to analyze intent, sentiment, and speaker characteristics. Transcription accuracy sets the ceiling for every downstream action: a wrong word silently corrupts a CRM entry, a missed intent misfires a routing decision, and a misread sentiment score delays escalation. This playbook covers how to build and deploy that architecture without blowing your latency budget or your unit economics.

Speech-To-Text

Real-time speech analytics for live agent assist

TL;DR: Live agent assist only works when the transcription layer delivers partial results fast enough for downstream NLP to process within a sub-second window. If the pipeline exceeds 1,000ms total, prompts arrive after agents have already spoken, which inflates Average Handle Time and erodes agent trust. This playbook covers the full real-time pipeline architecture, from streaming transcription through intent analysis to agent desktop rendering, and shows how contact centers can expand QA coverage from a 1-3% manual sample to 100% of interactions without adding headcount.

Speech-To-Text

How to identify prospect companies from sales call transcripts

TL;DR: Most product teams try to run LLM extraction on raw, undiarized transcripts and end up with CRM records polluted by the sales rep's own company names, tools, and competitor mentions. The fix is an async-first pipeline that separates speaker dialogue before any entity extraction happens. This guide walks through a working Python and Claude API pipeline using our async transcription, pyannoteAI Precision-2 diarization, and Solaria-3 or Solaria-1 depending on your language mix, so you extract clean prospect-side signals and sync accurate data to your CRM.

Gladia async API for meeting transcription: integration guide and best practices

Published on Apr 17, 2026

by Ani Ghazaryan

Gladia async API for meeting transcription: integration guide and best practices

Build a meeting assistant with Gladia async API: authentication, upload, webhooks, diarization, and LLM integration in under a day.

TL;DR: Gladia's async API gives you a single endpoint that uploads audio, transcribes it with Solaria-1, and returns LLM-ready structured data at roughly 60 seconds of processing per hour of audio. Authentication takes minutes and most teams ship a working integration in under a day. The Starter plan gives you 10 free monthly hours to evaluate against your own audio. Growth and Enterprise plans never use your audio for model retraining, with no opt-out required. You can submit files up to 1000MB and 135 minutes per request. Diarization uses pyannoteAI's Precision-2 model and works in async workflows only.

Self-hosting an open-source model for meeting transcription usually starts as a weekend project and ends as a part-time infrastructure job. One or two engineers who should be shipping product spend their time instead on GPU provisioning, version management, file-size constraints, and silent accuracy regressions that only surface in production. Managed infrastructure eliminates all of these problems.

This guide gives you the exact steps to integrate Gladia's async API into a meeting assistant: authentication, audio submission, webhook handling, polling fallback, diarization, error handling, cost modeling, and production architecture patterns. Python and Node.js code examples are included throughout.

Update: new model released

Since publishing this article, Gladia has released Solaria-3 — our newest speech model, built specifically for real-world business audio: noisy, fast-paced, and conversational. On production recordings, Solaria-3 ranks #1 across English and core European languages (EN, FR, DE, ES, IT), beating AssemblyAI, ElevenLabs, Deepgram, Mistral, and Speechmatics. It’s also 26% more accurate than Solaria-1 on real English customer calls. That said, the two models are built to complement each other, not compete. Solaria-1 remains the better choice if you need broad language coverage (100+ languages), code-switching support, real-time streaming, or if your audio is clean, formal, or institutional, such as parliamentary recordings. Solaria-3 is the upgrade if your priority is accuracy on European business audio, call center recordings, or anything noisy and conversational. Not sure which to use?

Compare Solaria-1 and Solaria-3 →

See the open-source STT benchmark →

Building production meeting assistants: async STT

Async vs. real-time: key differences

The choice between async and real-time transcription determines your entire downstream architecture. Here are the relevant trade-offs:

Aspect	Async API	Real-time API
Latency	~60 seconds per hour of audio	Low-latency streaming (Solaria-1 processes at 270ms latency)
Input method	File upload or URL via REST	WebSocket streaming
Diarization	pyannoteAI Precision-2 (included)	Post-processing recommended
Best use cases	Meeting post-analysis, note generation, compliance archival	Live captions, voice agents, agent assist
Output	Full structured transcript with speaker labels, word timestamps, entities, sentiment	Incremental partial and final transcript
Accuracy ceiling	Higher (full-context processing)	Competitive but constrained by streaming window

‍

For meeting assistants, choose async as your default. Post-meeting notes, action items, CRM entries, and coaching scores all generate after the meeting ends, so a few seconds of processing delay are irrelevant. What matters is getting the transcript right the first time, because every downstream system inherits errors from the audio layer. Our async transcription architecture guide covers the full pipeline design in detail.

Ideal use cases for async API

Use the async endpoint for any workflow where you capture audio first and process it immediately after:

Post-meeting note generation: Upload the recording the moment a call ends, receive the full transcript with speaker diarization, and pipe the structured output to your LLM for summarization and action item extraction.
Contact center post-call analytics: Process call recordings in batch for sentiment, entity recognition, and compliance flagging without affecting the live call experience. Aircall runs more than 1M calls per week through this pattern.
Meeting BaaS integrations: Teams using connectors like Recall.ai or MeetingBaaS capture the recording via a bot, then route the file directly to Gladia's upload endpoint. The meeting bot STT guide covers this integration pattern in detail.
Multilingual meeting intelligence: Any team with speakers switching languages mid-call, where full-context code-switching accuracy after the fact matters more than real-time speed.

Maximizing transcription accuracy

Accuracy at the transcription layer sets a ceiling for everything downstream. A wrong name in a transcript becomes a wrong name in the CRM entry, the coaching score, and the AI summary, and by the time the error surfaces, it has already corrupted three systems.

Gladia's current production model, Solaria-1, 100+ languages with native code-switching support. On our async STT benchmark, evaluated across 8 providers, 7 datasets, and 74+ hours of audio, Solaria-1 achieves up to 29% lower WER than alternatives on conversational speech and up to 3x lower DER. The methodology is open and reproducible so you can cross-reference against your own audio.

Most APIs handle code-switching poorly. Gladia's code-switching detection works natively across all 100+ supported languages in both async and real-time modes, with no configuration required. If your meetings include bilingual participants, this prevents the silent accuracy degradation that only surfaces through support tickets after non-English users churn. The multilingual meeting transcription guide covers language coverage and accuracy trade-offs in depth.

Secure API key setup for meeting assistants

Obtaining your Gladia API key

Get your API key in three steps:

Sign up at app.gladia.io
Click Home then Generate new API key
Copy the key and store it as an environment variable (GLADIA_API_KEY)

Never hard-code the key into application code or commit it to version control. Full authentication details are in the Gladia authentication reference.

The Starter plan includes 10 free monthly hours, refreshed automatically, which gives you enough volume to run a full proof-of-concept against real meeting audio before committing to a paid tier. All API requests authenticate with a single header: x-gladia-key: YOUR_GLADIA_API_KEY.

Data policy for your test audio: On the Starter plan, your audio data is used for model training by default. On Growth and Enterprise plans, your audio is never used for model retraining and no opt-out action is required. If your test audio contains sensitive conversations, start your evaluation on a Growth plan or use synthetic recordings.

Prepare for Gladia API integration

The async upload endpoint accepts WAV, M4A, FLAC, and AAC files, up to 1000MB and 135 minutes per submission. For recordings that exceed either limit, split the audio before uploading.

If your recordings land in cloud storage, you can skip the upload step and pass a publicly accessible or pre-signed URL directly to the transcription endpoint. The URL must remain accessible at the time Gladia fetches the file during processing.

"Gladia AI impresses with its speed and transcription accuracy... you simply upload an audio or video file, and within seconds, you receive a clear, well-organized transcript." - Mohamed M. on G2

Managing API concurrency and throughput

Concurrency limits depend on your plan and are documented in detail at concurrency and rate limits. Paid plans support significantly higher concurrent requests than the Starter plan.

If you process burst meeting volumes at end-of-day (common in CCaaS integrations), model your peak concurrency against these limits before launching. Gladia can increase concurrency for enterprise volumes based on your throughput requirements.

Initiating your transcription requests

API endpoint for direct file upload

The async workflow uses two sequential endpoints:

Upload:POST <https://api.gladia.io/v2/upload> Send the audio file via multipart/form-data to the upload endpoint, receive an audio_url
Transcribe:POST <https://api.gladia.io/v2/transcription> Submit the audio_url as JSON to the transcription endpoint, receive a job id and result_url

Full request body schema, including all optional intelligence parameters, is in the API reference.

Python guide: uploading audio

import requests
import os
import time

GLADIA_UPLOAD_URL = 'https://api.gladia.io/v2/upload'
GLADIA_TRANSCRIPTION_URL = 'https://api.gladia.io/v2/transcription'

def upload_audio(file_path: str, api_key: str) -> str:
    with open(file_path, 'rb') as f:
        response = requests.post(
            'https://api.gladia.io/v2/upload',GLADIA_UPLOAD_URL,
            files={'audio': f},
            headers={'x-gladia-key': api_key}
        )
    response.raise_for_status()
    return response.json()['audio_url']

def start_transcription(audio_url: str, api_key: str) -> tuple[str, str]:
    payload = {
        'audio_url': audio_url,
        'diarization': True,
        'detect_language': True
    }
    response = requests.post(
        'https://api.gladia.io/v2/transcription',GLADIA_TRANSCRIPTION_URL,
        json=payload,
        headers={
            'x-gladia-key': api_key,
            'Content-Type': 'application/json'
        }
    )
    response.raise_for_status()
    data = response.json()
    return data['id'], data['result_url']

def poll_for_transcript(result_url: str, api_key: str) -> dict:
    headers = {'x-gladia-key': api_key}
    delay = 3
    max_delay = 30
    while True:
        response = requests.get(result_url, headers=headers)
        response.raise_for_status()
        data = response.json()
        if data['status'] == 'done':
            return data['result']
        elif data['status'] == 'error':
            raise RuntimeError(f"Transcription failed: {data.get('error')}")
        time.sleep(delay)
        delay = min(delay * 1.5, max_delay)

# Usage
api_key = os.environ['GLADIA_API_KEY']
audio_url = upload_audio('meeting.wav', api_key)
job_id, result_url = start_transcription(audio_url, api_key)
print(f"Job submitted: {job_id}")
result = poll_for_transcript(result_url, api_key)
print(result['transcription']['full_transcript'])

Store result_url immediately after calling the transcription endpoint. You need it for both polling and as a fallback if webhook delivery fails.

JavaScript: POST audio for Gladia

const axios = require('axios');
const fs = require('fs');
const FormData = require('form-data');

const GLADIA_UPLOAD_URL = 'https://api.gladia.io/v2/upload';
const GLADIA_TRANSCRIPTION_URL = 'https://api.gladia.io/v2/transcription';

async function uploadAudio(filePath, apiKey) {
  const form = new FormData();
  form.append('audio', fs.createReadStream(filePath));
  const response = await axios.post(
    'https://api.gladia.io/v2/upload',GLADIA_UPLOAD_URL,
    form,
    { headers: { ...form.getHeaders(), 'x-gladia-key': apiKey } }
  );
  return response.data.audio_url;
}

async function startTranscription(audioUrl, apiKey) {
  const response = await axios.post(
    'https://api.gladia.io/v2/transcription',GLADIA_TRANSCRIPTION_URL,
    { audio_url: audioUrl, diarization: true, detect_language: true },
    { headers: { 'x-gladia-key': apiKey, 'Content-Type': 'application/json' } }
  );
  return { id: response.data.id, resultUrl: response.data.result_url };
}

async function pollWithBackoff(resultUrl, apiKey) {
  let delay = 3000;
  const maxDelay = 30000;
  while (true) {
    const response = await axios.get(resultUrl, {
      headers: { 'x-gladia-key': apiKey }
    });
    const { status, result, error } = response.data;
    if (status === 'done') return result;
    if (status === 'error') throw new Error(`Job failed: ${error}`);
    await new Promise(resolve => setTimeout(resolve, delay));
    delay = Math.min(delay * 1.5, maxDelay);
  }
}

(async () => {
  const apiKey = process.env.GLADIA_API_KEY;
  const audioUrl = await uploadAudio('meeting.wav', apiKey);
  const { id, resultUrl } = await startTranscription(audioUrl, apiKey);
  console.log(`Job submitted: ${id}`);
  const result = await pollWithBackoff(resultUrl, apiKey);
  console.log(result.transcription.full_transcript);
})();

The initiate transcription reference documents the full request body schema including all optional intelligence parameters.

Integrating webhooks for data delivery

Registering webhook listener URLs

You can configure your webhook endpoint in the Gladia dashboard at app.gladia.io/account, or pass a callback_url directly in the transcription request body. The callback_url approach is useful when you need per-job routing to different downstream services:

{
  "audio_url": "https://your-storage.com/meeting.wav",
  "diarization": true,
  "detect_language": true,
  "callback_url": "https://your-api.example.com/webhooks/gladia"
}

Your endpoint must respond with a 200 OK. Gladia sends a POST request with the full result payload when processing completes.

Webhook JSON payload schema

Success payload structure:

{
  "id": "45463597-20b7-4af7-b3b3-f5fb778203ab",
  "status": "done",
  "result": {
    "transcription": {
      "full_transcript": "Hello, this is the meeting transcript...",
      "utterances": [
        {
          "speaker": 0,
          "start": 0.0,
          "end": 5.5,
          "text": "Hello, this is the meeting transcript...",
          "words": [{ "word": "Hello", "start": 0.1, "end": 0.5, "confidence": 0.95 }],
          "confidence": 0.93,
          "language": "en"
        }
      ]
    }
  }
}

Error payload structure:

{
  "id": "45463597-20b7-4af7-b3b3-f5fb778203ab",
  "event": "transcription.error",
  "error": { "code": 400, "message": "Bad Request" },
  "custom_metadata": {}
}

Parse the status field first. Route done payloads to your transcript processing pipeline and error payloads to your retry queue. The audio-to-LLM pipeline docs cover structuring the downstream pipeline once the transcript arrives.

Preventing duplicate events with idempotency

Network conditions occasionally cause duplicate webhook deliveries. Prevent duplicate processing with these steps:

Use the id field as an idempotency key
Check whether you have already processed a transcript with that id before starting work
Implement a unique constraint on the id column in your database
Do not rely on delivery order because retries can arrive out of sequence

Ensuring production webhook reliability

Two practices eliminate the gap between webhook delivery and reliable processing:

Store result_url immediately after job submission, before processing completes. If your webhook endpoint is unavailable during delivery, your polling fallback retrieves the result from result_url without resubmitting the audio.
Run a polling reconciliation job on a schedule to check the status of any job whose webhook was not acknowledged within your threshold window.

Polling for transcription results (no webhooks)

Use polling in two situations: during local development where you lack a public webhook endpoint, and as a fallback for jobs whose webhook delivery fails. For production traffic above moderate volume, webhooks eliminate the overhead of repeated GET requests against incomplete jobs.

When polling, use exponential backoff. Start at a few seconds, back off progressively, and cap at 30 seconds. Most async jobs complete in well under 60 seconds per hour of audio, so aggressive short-interval polling wastes request quota. The Python and JavaScript examples above both implement this pattern.

Setting up advanced diarization options

Gladia diarization API settings

Diarization identifies which speaker said what in a multi-participant recording and works in async workflows only. Gladia integrates pyannoteAI's Precision-2 model natively as part of the base transcription pipeline. Across Gladia's async STT benchmark, evaluated across 8 providers, 7 datasets, and 74+ hours of audio, Solaria-1 achieves up to 3x lower DER than alternatives.

Enable diarization with the full configuration object:

{
  "audio_url": "YOUR_AUDIO_URL",
  "diarization": true,
  "diarization_config": {
    "enhanced": true,
    "number_of_speakers": 4,
    "min_speakers": 2,
    "max_speakers": 8
  }
}

Set number_of_speakers if you know the exact count. If you don't (the common case for meeting assistants with variable attendance), use min_speakers and max_speakers to bound the search space instead. For a deep-dive on DER sources and how to interpret diarization results in production, see our diarization guide.

Aircall processes more than 1M calls per week through Gladia and cut transcription time by 95% (from 30 minutes to 1.5 minutes per call), outcomes verifiable against their published case study.

Accurate language detection and custom vocabulary

Pass "detect_language": true to let Solaria-1 identify the spoken language automatically. This detects language correctly even with strong accents, avoiding a common ASR failure mode. For meetings with a known language, specify it explicitly to reduce detection overhead. Gladia supports 100+ languages, including 42 that no other API-level STT competitor supports, among them Tagalog, Bengali, Punjabi, Tamil, Urdu, Persian, and Marathi.

Custom vocabulary lets you register product names, industry terms, and proper nouns that would otherwise generate high-confidence wrong transcriptions:

{
  "audio_url": "YOUR_AUDIO_URL",
  "custom_vocabulary": true,
  "custom_vocabulary_config": {
    "vocabulary": [
      "Solaria-1",
      { "value": "LLM pipeline", "intensity": 0.8 },
      { "value": "pyannoteAI", "intensity": 0.9 }
    ],
    "default_intensity": 0.5
  }
}

The intensity value (0.0 to 1.0) weights how strongly the model biases toward that term. Our fintech customer reports 98.5% numerical accuracy using this approach. The meeting transcription mistakes guide covers common accuracy pitfalls in production.

Precise timestamps for meeting AI

Every word in the transcript response carries start, end, and confidence values. The utterances array also includes utterance-level timestamps alongside the speaker identifier. This precision lets downstream systems generate chapter markers, jump-to-moment search, and time-anchored action items without guessing.

Implementing robust API error handling

Gladia API error codes and responses

Handle these status codes explicitly rather than catching generic exceptions:

HTTP status	Error type	Common cause	Recommended action
400	Bad request	Missing `audio_url` or malformed payload	Validate your payload schema before submission
415	Unsupported media type	Content-Type header mismatch or unsupported format	Verify your Content-Type header and audio format match supported types
422	Unprocessable content	Request syntax correct but server cannot process the contained instructions	Check the API response body for validation details and correct your request parameters
429	Rate limit exceeded	Concurrent requests exceeding plan limit	Implement exponential backoff and queue excess jobs
5xx	Server error	Infrastructure issue on Gladia's side	Retry with backoff and monitor status

‍

Troubleshooting transcription failures

The most common failure modes in production:

Inaccessible audio URL (422): Pre-signed URLs expiring before Gladia fetches the file. Ensure your URL remains valid for the duration of the job's processing window.
Format validation failures (415): Some container formats (particularly MKV and WEBM) require transcoding to a supported codec before submission. Validate the audio stream format, not just the file extension.
Partial uploads (400): Network interruptions during large file uploads produce a 400 on the subsequent transcription call. Check the upload response status before proceeding to the transcription step.

For recurring unexplained errors, the async STT getting started docs and direct Slack access to Gladia engineers resolve issues faster than a ticket queue.

Key metrics for API health

Track these signals in your observability stack:

Job completion rate: Ratio of status: done to status: error responses across all submitted jobs
P95 processing latency: Time from job submission to webhook delivery or polling completion (expect roughly 60 seconds per hour of audio at standard load)
Webhook delivery failure rate: Count of jobs where your polling fallback fired because the webhook did not arrive
429 rate: Indicates you hit concurrency limits and need either backpressure on your queue or a plan upgrade

Gladia maintains 99.9%+ uptime. Subscribe to the status page for incident notifications rather than discovering outages through application errors.

Optimizing and predicting Gladia API costs

Calculating per-hour API costs

Gladia charges per hour of audio duration. Gladia does not charge per-feature for diarization, language detection, sentiment analysis, NER, translation, or summarization on Starter and Growth plans, which are all included in the base rate.

HTTP status	Error type	Common cause	Recommended action
400	Bad request	Missing `audio_url` or malformed payload	Validate your payload schema before submission
415	Unsupported media type	Content-Type header mismatch or unsupported format	Verify your Content-Type header and audio format match supported types
422	Unprocessable content	Request syntax correct but server cannot process the contained instructions	Check the API response body for validation details and correct your request parameters
429	Rate limit exceeded	Concurrent requests exceeding plan limit	Implement exponential backoff and queue excess jobs
5xx	Server error	Infrastructure issue on Gladia's side	Retry with backoff and monitor status

‍

For cost modeling, the math is straightforward. At 10,000 hours/month on the Growth tier at $0.20/hr, your total cost is $2,000/month, inclusive of diarization, sentiment, entities, and translation. Compare this to providers that meter features separately and add those charges to the base transcription rate.

Optimize audio input for lower spend

Two audio handling choices reduce billable duration without affecting transcript quality:

Trim silence before submission. Long silences at the start and end of meeting recordings (hold music, pre-meeting buffer) count toward billable duration. Use a lightweight pre-processing step with pydub or ffmpeg to strip leading and trailing silence.
Downsample before uploading. Gladia's model performs well on 16kHz mono audio. Submitting 48kHz stereo recordings from Zoom or Teams increases file size and upload time without improving transcript accuracy. Convert to 16kHz mono before uploading.

Architecting your meeting assistant for production

Implementing queues for Gladia API calls

Production meeting assistants do not call the Gladia API synchronously in the request path. Use a queue-based architecture:

Meeting recording saves to storage (S3, GCS, or local disk)
Worker enqueues a transcription job message containing the storage URL and meeting metadata
Worker service picks up the message, calls POST /v2/transcription, stores the returned id and result_url in the database with status pending
Gladia processes the audio and delivers the result to your webhook endpoint
Webhook handler updates the database record and routes the structured transcript to your LLM pipeline

This decouples meeting capture from transcription processing, handles burst volumes without blocking the main application, and provides natural retry points at each stage. The AI note-taker architecture guide documents this pattern with code examples.

Ensuring compliance: data residency options

Gladia holds SOC 2 Type II, ISO 27001, HIPAA, and GDPR certifications. Infrastructure runs in France (EU-west) by default, with US-west also available. On-premises and air-gapped deployment options exist for organizations with strict data residency requirements.

On Growth and Enterprise plans, Gladia never uses your audio to retrain models, with no opt-out action or enterprise contract clause required. On the Starter plan, audio can be used for model training by default. PII redaction is available but requires explicit configuration in the transcription request body and does not activate automatically.

Gladia offers configurable data retention policies: 1-month, 1-week, 1-day, and zero retention. Zero retention means Gladia deletes audio and transcripts immediately after processing completes. All data is encrypted at rest and in transit.

Architecting for API resilience

Three patterns make the integration resilient to transient failures:

Dead letter queue: Route failed jobs (after a maximum retry count) to a DLQ for manual inspection rather than silently dropping them. A delayed transcript is better than a lost one in most meeting assistant workflows.
Database-as-source-of-truth: The canonical state of every transcription job lives in your database, not in Gladia's system. Your application manages pending, processing, done, and failed states via webhooks and polling, not by inferring state from API responses alone.
Circuit breaker: If your error rate for Gladia API calls exceeds a threshold over a rolling window, stop sending new jobs for a short back-off period and alert your on-call team. This prevents thundering-herd retries from consuming your quota during a transient outage.

The code-switching in speech recognition guide gives a detailed breakdown of how language detection works in production audio, and the Solaria live demo webinar shows model behavior under realistic meeting conditions including noisy environments and accented speech.

Start with 10 free monthly hours and run the upload and polling code above against your own meeting recordings. If your test audio includes sensitive conversations, upgrade to Growth first, since Starter trains on audio by default.

FAQs

What are the file size and duration limits for the async upload endpoint?

The upload endpoint accepts files up to 1000MB and 135 minutes per submission. For recordings exceeding either limit, split the audio into segments before uploading and merge the resulting transcripts using word-level timestamps.

Does diarization work in real-time transcription mode?

No. Diarization via pyannoteAI Precision-2 works in async workflows only. For real-time pipelines that need speaker attribution, handle diarization in post-processing after the session completes.

Does Gladia use my audio to train its models?

On the Starter plan, yes, by default. On Growth and Enterprise plans, your audio is never used for model retraining, with no opt-out action required. If you are processing sensitive conversations, use a paid tier from the start of your evaluation.

Can I pass a cloud storage URL instead of uploading a file?

Yes. Pass a publicly accessible or pre-signed URL as audio_url in the transcription request body and skip the upload endpoint entirely. The URL must remain accessible when Gladia fetches the file during processing, so ensure your pre-signed URL does not expire before the job completes.

Is PII redaction enabled by default?

No. PII redaction requires explicit configuration in the transcription request body. It does not activate automatically regardless of plan tier or compliance settings.

Key terms glossary

Solaria-1: Gladia's current production ASR model, Solaria-1 outperforms other models across DER by 3x and scores 29% lower WER than alternatives on conversational speech. Released January 2026.

Diarization: The process of segmenting audio by speaker and labeling which speaker said which words. In Gladia's async API, diarization uses pyannoteAI Precision-2 and works only in async workflows.

Code-switching: Mid-conversation language changes where a speaker alternates between two or more languages within a single session. Gladia detects this natively across 100+ languages without explicit configuration.

Webhook: An HTTP callback that Gladia calls on your endpoint when a transcription job completes. Delivers the full structured transcript payload including speaker labels, timestamps, and intelligence outputs.

Idempotency: The property of an operation that produces the same result when applied multiple times. In webhook handling, using the job id as an idempotency key prevents duplicate processing when delivery retries occur.

WER (word error rate): The standard metric for transcription accuracy, calculated as (substitutions + insertions + deletions) / total reference words. Lower is better. Gladia publishes WER benchmarks by language and audio condition.

DER (diarization error rate): The percentage of audio incorrectly attributed to the wrong speaker. Gladia achieves up to 3x lower DER than alternatives, benchmarked across 8 providers, 7 datasets, and 74+ hours of audio.

Async transcription: Processing pre-recorded audio files after capture rather than streaming in real time. Full-context processing enables higher accuracy for diarization and multilingual audio. Gladia's primary workflow, used by meeting assistants and CCaaS platforms.

Contact us

Your request has been registered

A problem occurred while submitting the form.

Speech-To-Text

How decision intelligence improves customer service consistency in contact centers

Speech-To-Text

Real-time speech analytics for live agent assist

Speech-To-Text

How to identify prospect companies from sales call transcripts

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

GDPR Compliant

HIPAA Compliant

AICPA SOC Type 2

ISO 27001 Compliant

Gladia

Become the Speech AI expert in your organization with content from Gladia right in your inbox, no more than twice a month.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing your navigation, you apply the use of cookies intended to improve the performance and the functionalities of this site.

No, thanks

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Read more

How decision intelligence improves customer service consistency in contact centers

Real-time speech analytics for live agent assist

How to identify prospect companies from sales call transcripts

Gladia async API for meeting transcription: integration guide and best practices

Building production meeting assistants: async STT

Async vs. real-time: key differences

Ideal use cases for async API

Maximizing transcription accuracy

Secure API key setup for meeting assistants

Obtaining your Gladia API key

Prepare for Gladia API integration

Managing API concurrency and throughput

Initiating your transcription requests

API endpoint for direct file upload

Python guide: uploading audio

JavaScript: POST audio for Gladia

Integrating webhooks for data delivery

Registering webhook listener URLs

Webhook JSON payload schema

Preventing duplicate events with idempotency

Ensuring production webhook reliability

Polling for transcription results (no webhooks)

Setting up advanced diarization options

Gladia diarization API settings

Accurate language detection and custom vocabulary

Precise timestamps for meeting AI

Implementing robust API error handling

Gladia API error codes and responses

Troubleshooting transcription failures

Key metrics for API health

Optimizing and predicting Gladia API costs

Calculating per-hour API costs

Optimize audio input for lower spend

Architecting your meeting assistant for production

Implementing queues for Gladia API calls

Ensuring compliance: data residency options

Architecting for API resilience

FAQs

What are the file size and duration limits for the async upload endpoint?

Does diarization work in real-time transcription mode?

Does Gladia use my audio to train its models?

Can I pass a cloud storage URL instead of uploading a file?

Is PII redaction enabled by default?

Key terms glossary

Contact us

Read more

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Gladia

Newsletter

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.