Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript

Pricing

Request a demo

Sign up

Get started

How decision intelligence improves customer service consistency in contact centers

TL;DR: Contact centers fail to deliver consistent service when routing infrastructure runs on static rules engines that cannot handle the complexity of real human conversation. Modern speech-to-text infrastructure addresses this by processing raw audio and feeding structured outputs to your CRM, using machine learning to analyze intent, sentiment, and speaker characteristics. Transcription accuracy sets the ceiling for every downstream action: a wrong word silently corrupts a CRM entry, a missed intent misfires a routing decision, and a misread sentiment score delays escalation. This playbook covers how to build and deploy that architecture without blowing your latency budget or your unit economics.

Speech-To-Text

Real-time speech analytics for live agent assist

TL;DR: Live agent assist only works when the transcription layer delivers partial results fast enough for downstream NLP to process within a sub-second window. If the pipeline exceeds 1,000ms total, prompts arrive after agents have already spoken, which inflates Average Handle Time and erodes agent trust. This playbook covers the full real-time pipeline architecture, from streaming transcription through intent analysis to agent desktop rendering, and shows how contact centers can expand QA coverage from a 1-3% manual sample to 100% of interactions without adding headcount.

Speech-To-Text

How to identify prospect companies from sales call transcripts

TL;DR: Most product teams try to run LLM extraction on raw, undiarized transcripts and end up with CRM records polluted by the sales rep's own company names, tools, and competitor mentions. The fix is an async-first pipeline that separates speaker dialogue before any entity extraction happens. This guide walks through a working Python and Claude API pipeline using our async transcription, pyannoteAI Precision-2 diarization, and Solaria-3 or Solaria-1 depending on your language mix, so you extract clean prospect-side signals and sync accurate data to your CRM.

Transcribing audio with Gladia's async SDK

Apr 21, 2026

By Ani Ghazaryan

Transcribing audio with Gladia's async SDK

Transcribing an audio file should take one call. In practice, it usually takes five or six: upload the file, create a job, poll the endpoint until it's done, parse the response, and wrap the whole thing in retry logic for when something fails midway. It's not hard work, but it's the kind of repetitive plumbing that ends up in every project that touches speech-to-text.

We built the SDK to remove it. You pass in a file and get back a transcript, with the uploading, job handling, and polling managed internally. The same applies to the features built on top of transcription: diarization, translation, summarization, and custom vocabulary are all accessible through the same interface, and documented so you can tell what each one does before you use it.

Here's how it works.

Installation

The SDK is available for JavaScript and Python. For JavaScript projects:

npm install @gladiaio/sdk

One-call transcription

The headline feature is the transcribe() method, which handles the entire pipeline end-to-end. You can pass it a local file path, binary data, or a remote URL, and it takes care of the rest:

import { GladiaClient } from "@gladiaio/sdk";

const gladiaClient = new GladiaClient({ apiKey: "YOUR_GLADIA_API_KEY" });

const transcription = await gladiaClient
  .preRecorded()
  .transcribe("YOUR_AUDIO_URL_OR_LOCAL_PATH");

That's the minimum viable example. For real-world use, you'll likely want to pass configuration options — languages, code-switching, custom vocabulary, and so on:

const transcription = await gladiaClient.preRecorded().transcribe('YOUR_AUDIO_URL_OR_LOCAL_PATH', {
  language_config: {
    languages: ['en', 'fr'],
    code_switching: true,
  },
  custom_vocabulary: true,
  custom_vocabulary_config: {
    vocabulary: ['Gladia', 'Solaria', 'Salesforce'],
  },
});

When you need more control

If you'd rather manage each step yourself — for example, to upload once and run multiple jobs against the same file, or to integrate with your own job queue — the SDK exposes the individual building blocks behind transcribe():

Upload the audio with uploadFile(), which returns an audio_url along with metadata (duration, channels, file size, etc.).
Create a transcription job with createUntyped(), passing the audio_url and any transcription options.
Retrieve the result using one of three mechanisms: polling, webhooks, or a callback URL.

For polling, the SDK offers poll(), or a combined createAndPollUntyped() in JavaScript (create_and_poll() in Python) that fires off the job and waits on the result in a single call. The polling helpers handle the "wait until status is done" loop automatically, so you don't have to write it yourself.

If you'd rather not keep a process alive waiting for results, you can either configure webhooks in the Gladia dashboard or include a callback_config on the job itself, pointing to your own endpoint.

Beyond basic transcription

Once the transcription pipeline is in place, Gladia layers in a handful of audio intelligence features that can be enabled per-job: speaker diarization, translation into 100+ target languages, PII redaction for GDPR-sensitive workflows, and sentiment analysis covering up to 25 emotions. These are opt-in flags on the same job payload you already send to transcribe().

Sample code

Gladia maintains a samples repository on GitHub with runnable examples in Python, JavaScript, and TypeScript, organized into core-concepts/ (basic pre-recorded and live STT) and examples/ (applied use cases). The applied samples are picked to mirror the segments most teams building on Gladia fall into:

Call center sentiment analysis with diarization — combines speaker separation with sentiment extraction per utterance, so you get more than a flat transcript: you can see how each speaker's tone shifts across a call.
Anonymized calls with PII redaction — transcribes calls without retaining client data. Gladia offers 13 redaction presets covering GDPR, HIPAA Safe Harbor, and health information categories (full list here).
Meeting summaries — aimed at note-takers and meeting recording tools, showing how to go from raw audio to a structured summary.
YouTube video translation — built for media workflows, demonstrating custom vocabulary (one of the stronger accuracy levers) and code-switching together. The same pipeline extends to subtitles, named entity recognition, and other intelligence features.

There are also integration examples for Discord, Google Meet, LiveKit, OBS, Pipecat, and Twilio.

Wrapping up

The SDK's design essentially gives you two modes: a one-liner when you just want a transcript, and a step-by-step API when you need finer control. For most applications, transcribe() is probably enough — but it's nice to know the pieces are there if you ever outgrow it.

Contact us

Your request has been registered

A problem occurred while submitting the form.

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

GDPR Compliant

HIPAA Compliant

AICPA SOC Type 2

ISO 27001 Compliant

Gladia

Newsletter

Become the Speech AI expert in your organization with content from Gladia right in your inbox, no more than twice a month.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing your navigation, you apply the use of cookies intended to improve the performance and the functionalities of this site.

No, thanks

Accept

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

New model: Solaria-3

Test our real-time and async transcription

2026 Meeting Assistant Report

Read more

How decision intelligence improves customer service consistency in contact centers

Real-time speech analytics for live agent assist

How to identify prospect companies from sales call transcripts

Transcribing audio with Gladia's async SDK

Installation

One-call transcription

When you need more control

Beyond basic transcription

Sample code

Wrapping up

Contact us

Read more

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.

Gladia

Newsletter

From audio to knowledge

Subscribe to receive latest news, product updates and curated AI content.