Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Read more

Product News

Transcribing audio with Gladia's async SDK

Transcribing an audio file should take one call. In practice, it usually takes five or six: upload the file, create a job, poll the endpoint until it's done, parse the response, and wrap the whole thing in retry logic for when something fails midway. It's not hard work, but it's the kind of repetitive plumbing that ends up in every project that touches speech-to-text.

Speech-To-Text

Building a meeting summarization pipeline: async STT + LLM in 5 steps

Building a meeting summarization pipeline with async STT and LLM in 5 steps: audio ingestion, API integration, and prompt engineering.

Speech-To-Text

Real-time latency for meeting transcription: latency budgets and live note-taking requirements

Real-time latency for meeting transcription requires measuring end-to-end delays across audio chunking, network routing, and rendering.

Transcribing audio with Gladia's async SDK

Apr 21, 2026
By Ani Ghazaryan
Transcribing audio with Gladia's async SDK

Transcribing an audio file should take one call. In practice, it usually takes five or six: upload the file, create a job, poll the endpoint until it's done, parse the response, and wrap the whole thing in retry logic for when something fails midway. It's not hard work, but it's the kind of repetitive plumbing that ends up in every project that touches speech-to-text.

We built the SDK to remove it. You pass in a file and get back a transcript, with the uploading, job handling, and polling managed internally. The same applies to the features built on top of transcription: diarization, translation, summarization, and custom vocabulary are all accessible through the same interface, and documented so you can tell what each one does before you use it.

Here's how it works.

Installation

The SDK is available for JavaScript and Python. For JavaScript projects:

npm install @gladiaio/sdk

One-call transcription

The headline feature is the transcribe() method, which handles the entire pipeline end-to-end. You can pass it a local file path, binary data, or a remote URL, and it takes care of the rest:

import { GladiaClient } from "@gladiaio/sdk";

const gladiaClient = new GladiaClient({ apiKey: "YOUR_GLADIA_API_KEY" });

const transcription = await gladiaClient
  .preRecorded()
  .transcribe("YOUR_AUDIO_URL_OR_LOCAL_PATH");

That's the minimum viable example. For real-world use, you'll likely want to pass configuration options — languages, code-switching, custom vocabulary, and so on:

const transcription = await gladiaClient.preRecorded().transcribe(
  "YOUR_AUDIO_URL_OR_LOCAL_PATH",
  {
    language_config: {
      languages: ["en", "fr"],
      code_switching: true,
    },
    custom_vocabulary: true,
    custom_vocabulary_config: {
      vocabulary: ["Gladia", "Solaria", "Salesforce"],
    },
  }
);

When you need more control

If you'd rather manage each step yourself — for example, to upload once and run multiple jobs against the same file, or to integrate with your own job queue — the SDK exposes the individual building blocks behind transcribe():

  1. Upload the audio with uploadFile(), which returns an audio_url along with metadata (duration, channels, file size, etc.).
  2. Create a transcription job with createUntyped(), passing the audio_url and any transcription options.
  3. Retrieve the result using one of three mechanisms: polling, webhooks, or a callback URL.

For polling, the SDK offers poll(), or a combined createAndPollUntyped() in JavaScript (create_and_poll() in Python) that fires off the job and waits on the result in a single call. The polling helpers handle the "wait until status is done" loop automatically, so you don't have to write it yourself.

If you'd rather not keep a process alive waiting for results, you can either configure webhooks in the Gladia dashboard or include a callback_config on the job itself, pointing to your own endpoint.

Beyond basic transcription

Once the transcription pipeline is in place, Gladia layers in a handful of audio intelligence features that can be enabled per-job: speaker diarization, translation into 100+ target languages, PII redaction for GDPR-sensitive workflows, and sentiment analysis covering up to 25 emotions. These are opt-in flags on the same job payload you already send to transcribe().

Sample code

Gladia maintains a samples repository on GitHub with runnable examples in Python, JavaScript, and TypeScript, organized into core-concepts/ (basic pre-recorded and live STT) and examples/ (applied use cases). The applied samples are picked to mirror the segments most teams building on Gladia fall into:

  • Call center sentiment analysis with diarization — combines speaker separation with sentiment extraction per utterance, so you get more than a flat transcript: you can see how each speaker's tone shifts across a call.
  • Anonymized calls with PII redaction — transcribes calls without retaining client data. Gladia offers 13 redaction presets covering GDPR, HIPAA Safe Harbor, and health information categories (full list here).
  • Meeting summaries — aimed at note-takers and meeting recording tools, showing how to go from raw audio to a structured summary.
  • YouTube video translation — built for media workflows, demonstrating custom vocabulary (one of the stronger accuracy levers) and code-switching together. The same pipeline extends to subtitles, named entity recognition, and other intelligence features.

There are also integration examples for Discord, Google Meet, LiveKit, OBS, Pipecat, and Twilio.

Wrapping up

The SDK's design essentially gives you two modes: a one-liner when you just want a transcript, and a step-by-step API when you need finer control. For most applications, transcribe() is probably enough — but it's nice to know the pieces are there if you ever outgrow it.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more