Blog

Technical guides, customer stories, and product updates
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Speech-To-Text

OpenAI Whisper vs Google STT vs Amazon Transcribe: the ASR rundown (2026 edition)

Speech recognition has always been a crowded space. But in the last few years, the models have gotten faster, cheaper, and smarter. New architectures have entered the picture. And the baseline expectation for what "good enough" looks like has shifted dramatically.

Speech-To-Text

Best open-source speech-to-text models in 2026

TL;DR: The open-source ASR landscape has shifted dramatically in the last few years. DeepSpeech is discontinued, Kaldi is legacy, and a new generation of models — NVIDIA Canary-Qwen, Qwen3-ASR, Parakeet, and Moonshine — now compete with or surpass commercial APIs on standard accuracy benchmarks. But benchmark WER and production performance are not the same thing, especially for conversational audio. This guide covers the 8 best open-source speech-to-text models in 2026, with benchmarks, architecture details, and honest deployment considerations.

Case Studies

How Gladia's multilingual audio-to-text API supercharges Carv's AI for recruiters

In today's professional landscape, the average workday of a recruiter is characterized by a perpetual cycle of administrative tasks, alternated by intake calls with hiring managers and interviews with candidates. And while recruiters enjoy connecting with hiring managers and candidates, there’s an almost universal disdain for the administrative side of the job.

Speech-To-Text

What is ASR & how do speech recognition models work?

Automatic speech recognition (ASR) is a cornerstone of many business applications in domains ranging from call centers to smart device engineering. At their core, ASR models, also referred to as Speech-to-Text (STT), intelligently recognize human speech and convert it into a written format.

Speech-To-Text

Fine-tuning ASR models: Key definitions, mechanics, and use cases

Many modern AI models are built for general-purpose applications and require fine-tuning for domain-specific tasks. The fine-tuning process involves taking an existing model and training it further on domain-specific data. The additional training allows the model to understand the new data and improve its performance in a particular field.

Tutorials

Building a song transcription system with profanity filter using Whisper, GPT 3.5 and Spleeter

The inception of music streaming gained initial popularity in 1999 with the founding of Napster, one of the pioneering streaming platforms. Millions of songs were available to listen to and download for free through the platform using the internet. One no longer needed to buy pre-recorded tapes, go to live shows, or tune into radio stations to listen to music.

Case Studies

AI-powered healthcare assistant enhances medical transcription by 120% with Gladia

Medical transcription is among the most critical and challenging verticals for ASR models to date.

Product News

What is summarization?

Summarization in speech-to-text (STT) AI is a popular feature that streamlines the extraction of essential information from spoken content. By condensing lengthy audio recordings or live conversations into concise summaries, STT summarization enhances user experience, facilitating quicker understanding and decision-making for the final users.

Case Studies

Opening up new markets for a sales meeting and CRM enrichment platform: Spoke's success story with Gladia

In the past, sales teams around the world were presented with a twofold challenge. In addition to showcasing their products in the best light to prospects, they needed to take detailed notes during the call and fill their CRM software manually afterward.