Gladia

Changelog

Audio to LLM is now generally available

Voice content keeps growing, but product teams still lose time wiring transcription to LLMs by hand. Audio to LLM closes that gap: one API path from audio to transcript to insight, so you spend less time on glue code and more time on the product experience your users see.

Audio to LLM graduates from alpha to stable, supported for pre-recorded transcription.

Highlights

  • Ship faster on voice data: Turn calls, meetings, and interviews into summaries, follow-ups, compliance checks, or CRM-ready notes without building a separate LLM pipeline.
  • Prompts you control: Ask for bullet takeaways, action items, tone checks, red flags, or anything else you would ask an analyst. Run multiple prompts at once and get one answer per question.
  • Built-in intelligence, default speed: Out of the box, responses use a fast, efficient model suited to high-volume workloads. Enterprise accounts can optionally choose from the 700+ models they need heavier models or a specific vendor.

Links

Asynchronous SDK

Gladia's SDK 1.0.0 version covers the Asynchronous Speech-to-Text API for TypeScript/Javascript and Python.

Integrating an SDK for Gladia's Asynchronous STT API boils down to these key advantages:

  • Zero Boilerplate: Abstracts the manual "Upload → Poll → Retrieve" cycle into a single, clean function call.
  • Error Resilience: Includes built-in retry logic and type-safety, handling network hiccups and API errors out-of-the-box.
  • Minimal Code: Reduces hundreds of lines of complex "plumbing" to a few robust, highly readable lines of code.
  • Accelerated Time-to-Market: Requires significantly less specialized API knowledge, allowing teams to ship features in hours, not weeks.
  • Native Ecosystems: Fully optimized for our customers stack with dedicated libraries available in Python and TypeScript.
  • Full Feature Parity: Provides instant access to the complete suite of Gladia's intelligence features, including Speaker Diarization, Sentiment Analysis, Summarization, and PII Redaction.

The package are available on pip and npm:

  • pip install gladiaio-sdk for Python.
  • npm install @gladiaio/sdk for TypeScript/Javascript

Easiest way to transcribe an url in Python:

from gladiaio_sdk import GladiaClient
print(GladiaClient(api_key="{API_KEY}")
	.pre_recorded_v2()
	.transcribe("https://github.com/gladiaio/gladia-samples/blob/main/data/anna-and-sasha-16000.wav?raw=true")
	.result.transcription.full_transcript)

Links

Open Benchmark for Speech-to-Text — 2026

Gladia has published a fully open, reproducible benchmark comparing Solaria-1 against 8 leading speech recognition providers across 7 datasets and 74+ hours of audio in 6 languages. The full methodology and evaluation framework are open-sourced.

  • Transparent & Reproducible: Every audio file is sent to every provider's production API with default settings: no custom tuning or prompt engineering. All results can be independently verified.
  • Standardized Normalization: Transcripts are normalized using gladia-normalization (open-source Python package) before WER computation, eliminating formatting differences that inflate error rates.
  • Broad Domain Coverage: Evaluation spans conversational telephone speech (Switchboard), multilingual reading (Common Voice 24, MLS), financial calls (Earnings22), parliamentary speech (VoxPopuli), and streaming scenarios (Pipecat).
  • 6 Languages Evaluated: English, French, German, Spanish, Italian, and Portuguese.
  • 8 Providers Compared: Gladia Solaria-1, AssemblyAI (U3 Pro & U2), ElevenLabs Scribe V2, Deepgram Nova-3, Speechmatics Enhanced, Soniox V4, and Mistral Voxtral Mini Transcribe 2.

Links

Longer Login Sessions

The automatic logout behavior on the Gladia Playground (app.gladia.io) has been removed. Sessions now persist much longer, so you stay authenticated throughout your work.

  • Google SSO: Sessions stay active without forced re-login.
  • Email / Password: Same improvement — no more hourly disconnects.
  • Seamless Workflow: Keep working across long transcription sessions, dashboard reviews, or API key management — your session stays active throughout.

Links

Hebrew Transcription — Major Accuracy Upgrade

Gladia's Asynchronous API now delivers a 3x accuracy improvement on Hebrew transcription, powered by Solaria-1. The Word Error Rate drops from 27.1% down to 7.5%.

  • 3x More Accurate: WER reduced from 27.1% to 7.5%, bringing Hebrew on par with top-tier language support.
  • Robust in the Real World: The model handles a wide range of Hebrew accents, speaking styles, and audio conditions with high reliability.
  • Simple Activation: Just set language to he in your request, the accuracy gain applies automatically.

Code Switching is not supported, only one language must be specified in the languages configuration.

Configuration example:

language_config": {
    "languages": ["he"],
    "code_switching": false
  }

ISO 27001 & ISO 27701 Certification

Gladia is now officially ISO 27001 and ISO 27701 certified. Our information security management system is built, audited, and continuously maintained in line with these internationally recognized standards.

ISO 27001 & ISO 27701 Certification

Links

PII Redaction

Gladia's Pre-recorded API now supports automatic detection and redaction of Personally Identifiable Information (PII) in transcripts.

Handling audio data often involves processing conversations that contain sensitive information. PII Redaction helps you:

  • Privacy Compliance: Comply with regulations like GDPR, CCPA/CPRA, HIPAA, and APPI out-of-the-box.
  • Data Protection: Automatically replace sensitive entities (names, emails, phone numbers, addresses, financial details) with safe markers or masks.
  • Consistent Entity Tracking: Same entity mentioned multiple times receives the same marker ID (e.g. "John Smith" becomes [NAME_1] everywhere), enabling downstream LLM reasoning without exposing raw PII.
  • Flexible Output Modes: Choose between MASK (character-level masking: #### #####) or MARKER (labeled placeholders: [NAME_1], [EMAIL_1]).
  • Preset Entity Groups: Use built-in presets like GDPR, HIPAA_SAFE_HARBOR, PCI, CPRA, or specify individual entity types for fine-grained control.
  • Broad Entity Coverage: Supports 40+ entity types across Core PII, Financial/PCI, Sensitive/GDPR Article 9, and Healthcare categories.

Enable PII Redaction with a single parameter:

{
  "audio_url": "YOUR_AUDIO_URL",
  "pii_redaction": true
}

Customize behavior with pii_redaction_config:

{
  "audio_url": "YOUR_AUDIO_URL",
  "pii_redaction": true,
  "pii_redaction_config": {
    "entity_types": ["GDPR"],
    "processed_text_type": "MARKER"
  }
}

Example output with MARKER mode:

Original: Hi, I'm calling about the order for John Smith. Can you confirm the delivery to john.smith@company.com? Redacted: Hi, I'm calling about the order for [NAME_1]. Can you confirm the delivery to [EMAIL_1]?

Links