We built the SDK to remove it. You pass in a file and get back a transcript, with the uploading, job handling, and polling managed internally. The same applies to the features built on top of transcription: diarization, translation, summarization, and custom vocabulary are all accessible through the same interface, and documented so you can tell what each one does before you use it.
Here's how it works.
Installation
The SDK is available for JavaScript and Python. For JavaScript projects:
npm install @gladiaio/sdk
One-call transcription
The headline feature is the transcribe() method, which handles the entire pipeline end-to-end. You can pass it a local file path, binary data, or a remote URL, and it takes care of the rest:
import { GladiaClient } from "@gladiaio/sdk";
const gladiaClient = new GladiaClient({ apiKey: "YOUR_GLADIA_API_KEY" });
const transcription = await gladiaClient
.preRecorded()
.transcribe("YOUR_AUDIO_URL_OR_LOCAL_PATH");
That's the minimum viable example. For real-world use, you'll likely want to pass configuration options — languages, code-switching, custom vocabulary, and so on:
const transcription = await gladiaClient.preRecorded().transcribe(
"YOUR_AUDIO_URL_OR_LOCAL_PATH",
{
language_config: {
languages: ["en", "fr"],
code_switching: true,
},
custom_vocabulary: true,
custom_vocabulary_config: {
vocabulary: ["Gladia", "Solaria", "Salesforce"],
},
}
);
When you need more control
If you'd rather manage each step yourself — for example, to upload once and run multiple jobs against the same file, or to integrate with your own job queue — the SDK exposes the individual building blocks behind transcribe():
- Upload the audio with
uploadFile(), which returns an audio_url along with metadata (duration, channels, file size, etc.). - Create a transcription job with
createUntyped(), passing the audio_url and any transcription options. - Retrieve the result using one of three mechanisms: polling, webhooks, or a callback URL.
For polling, the SDK offers poll(), or a combined createAndPollUntyped() in JavaScript (create_and_poll() in Python) that fires off the job and waits on the result in a single call. The polling helpers handle the "wait until status is done" loop automatically, so you don't have to write it yourself.
If you'd rather not keep a process alive waiting for results, you can either configure webhooks in the Gladia dashboard or include a callback_config on the job itself, pointing to your own endpoint.
Beyond basic transcription
Once the transcription pipeline is in place, Gladia layers in a handful of audio intelligence features that can be enabled per-job: speaker diarization, translation into 100+ target languages, PII redaction for GDPR-sensitive workflows, and sentiment analysis covering up to 25 emotions. These are opt-in flags on the same job payload you already send to transcribe().
Sample code
Gladia maintains a samples repository on GitHub with runnable examples in Python, JavaScript, and TypeScript, organized into core-concepts/ (basic pre-recorded and live STT) and examples/ (applied use cases). The applied samples are picked to mirror the segments most teams building on Gladia fall into:
- Call center sentiment analysis with diarization — combines speaker separation with sentiment extraction per utterance, so you get more than a flat transcript: you can see how each speaker's tone shifts across a call.
- Anonymized calls with PII redaction — transcribes calls without retaining client data. Gladia offers 13 redaction presets covering GDPR, HIPAA Safe Harbor, and health information categories (full list here).
- Meeting summaries — aimed at note-takers and meeting recording tools, showing how to go from raw audio to a structured summary.
- YouTube video translation — built for media workflows, demonstrating custom vocabulary (one of the stronger accuracy levers) and code-switching together. The same pipeline extends to subtitles, named entity recognition, and other intelligence features.
There are also integration examples for Discord, Google Meet, LiveKit, OBS, Pipecat, and Twilio.
Wrapping up
The SDK's design essentially gives you two modes: a one-liner when you just want a transcript, and a step-by-step API when you need finer control. For most applications, transcribe() is probably enough — but it's nice to know the pieces are there if you ever outgrow it.