Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Pricing
Get started
Get started

Read more

Speech-To-Text

Gladia integration recipes: connect calls to your CRM and workflow stack

TL;DR: Connecting call data to CRM and workflow tools requires accurate transcription at the base layer — downstream records are only as reliable as the words captured first. This guide covers four integration paths: Zapier for prototyping, Make.com for visual conditional routing, n8n self-hosted for high-volume privacy-sensitive workloads, and direct REST API for production infrastructure. Gladia's Solaria-1 model benchmarks at an average 29% lower WER and 3x lower DER versus alternatives.

Speech-To-Text

How to build a customer support call flow (AI blueprint)

TL;DR: Traditional IVR systems route calls by button press and fail when callers switch languages mid-sentence. AI-augmented flows treat audio as a structured pipeline: async transcription handles the high-accuracy layer for diarization, post-call summaries, and CRM sync, while real-time transcription at sub-300ms latency enables the live agent assist layer covered in this guide. Sub-300ms latency ensures guidance arrives while conversations progress; higher latency reduces assist usefulness. Building in-house involves substantial infrastructure, DevOps, and maintenance costs.

Speech-To-Text

Call transcription accuracy benchmarks: What contact centers should measure

TL;DR: Public STT benchmarks on clean English audio rarely predict how models perform on noisy, accented, multilingual contact center calls. To evaluate vendors properly, measure WER overall, WER per language and accent, DER, latency p50/p95/p99, and code-switching accuracy on your own production audio, not vendor test sets. Self-reported accuracy claims are meaningless without published methodology. Hidden per-feature fees for diarization and NER can compound significantly at scale compared to all-inclusive pricing models.

Transcribing audio with Gladia's async SDK

Apr 21, 2026
By Ani Ghazaryan
Transcribing audio with Gladia's async SDK

Transcribing an audio file should take one call. In practice, it usually takes five or six: upload the file, create a job, poll the endpoint until it's done, parse the response, and wrap the whole thing in retry logic for when something fails midway. It's not hard work, but it's the kind of repetitive plumbing that ends up in every project that touches speech-to-text.

We built the SDK to remove it. You pass in a file and get back a transcript, with the uploading, job handling, and polling managed internally. The same applies to the features built on top of transcription: diarization, translation, summarization, and custom vocabulary are all accessible through the same interface, and documented so you can tell what each one does before you use it.

Here's how it works.

Installation

The SDK is available for JavaScript and Python. For JavaScript projects:

npm install @gladiaio/sdk

One-call transcription

The headline feature is the transcribe() method, which handles the entire pipeline end-to-end. You can pass it a local file path, binary data, or a remote URL, and it takes care of the rest:

import { GladiaClient } from "@gladiaio/sdk";

const gladiaClient = new GladiaClient({ apiKey: "YOUR_GLADIA_API_KEY" });

const transcription = await gladiaClient
  .preRecorded()
  .transcribe("YOUR_AUDIO_URL_OR_LOCAL_PATH");

That's the minimum viable example. For real-world use, you'll likely want to pass configuration options — languages, code-switching, custom vocabulary, and so on:

const transcription = await gladiaClient.preRecorded().transcribe('YOUR_AUDIO_URL_OR_LOCAL_PATH', {
  language_config: {
    languages: ['en', 'fr'],
    code_switching: true,
  },
  custom_vocabulary: true,
  custom_vocabulary_config: {
    vocabulary: ['Gladia', 'Solaria', 'Salesforce'],
  },
});

When you need more control

If you'd rather manage each step yourself — for example, to upload once and run multiple jobs against the same file, or to integrate with your own job queue — the SDK exposes the individual building blocks behind transcribe():

  1. Upload the audio with uploadFile(), which returns an audio_url along with metadata (duration, channels, file size, etc.).
  2. Create a transcription job with createUntyped(), passing the audio_url and any transcription options.
  3. Retrieve the result using one of three mechanisms: polling, webhooks, or a callback URL.

For polling, the SDK offers poll(), or a combined createAndPollUntyped() in JavaScript (create_and_poll() in Python) that fires off the job and waits on the result in a single call. The polling helpers handle the "wait until status is done" loop automatically, so you don't have to write it yourself.

If you'd rather not keep a process alive waiting for results, you can either configure webhooks in the Gladia dashboard or include a callback_config on the job itself, pointing to your own endpoint.

Beyond basic transcription

Once the transcription pipeline is in place, Gladia layers in a handful of audio intelligence features that can be enabled per-job: speaker diarization, translation into 100+ target languages, PII redaction for GDPR-sensitive workflows, and sentiment analysis covering up to 25 emotions. These are opt-in flags on the same job payload you already send to transcribe().

Sample code

Gladia maintains a samples repository on GitHub with runnable examples in Python, JavaScript, and TypeScript, organized into core-concepts/ (basic pre-recorded and live STT) and examples/ (applied use cases). The applied samples are picked to mirror the segments most teams building on Gladia fall into:

  • Call center sentiment analysis with diarization — combines speaker separation with sentiment extraction per utterance, so you get more than a flat transcript: you can see how each speaker's tone shifts across a call.
  • Anonymized calls with PII redaction — transcribes calls without retaining client data. Gladia offers 13 redaction presets covering GDPR, HIPAA Safe Harbor, and health information categories (full list here).
  • Meeting summaries — aimed at note-takers and meeting recording tools, showing how to go from raw audio to a structured summary.
  • YouTube video translation — built for media workflows, demonstrating custom vocabulary (one of the stronger accuracy levers) and code-switching together. The same pipeline extends to subtitles, named entity recognition, and other intelligence features.

There are also integration examples for Discord, Google Meet, LiveKit, OBS, Pipecat, and Twilio.

Wrapping up

The SDK's design essentially gives you two modes: a one-liner when you just want a transcript, and a step-by-step API when you need finer control. For most applications, transcribe() is probably enough — but it's nice to know the pieces are there if you ever outgrow it.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more