Blog

Technical guides, customer stories, and product updates
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Speech-To-Text

The evolution and impact of Speech AI: An in-depth conversation with Gladia's CEO Jean-Louis

Once in a while, we like to zoom out of our day-to-day to reflect on the bigger trends affecting our customers to, ultimately, adapt our product accordingly. Today, what are the key shifts happening in voice-first platforms, and how can speech recognition help them to navigate these?

Speech-To-Text

AI Model Biases: What went wrong with Whisper by OpenAI?

When you start working with an AI model, however powerful, you can never be 100% sure of what will happen with it in practice. We've worked with Whisper ASR by OpenAI since its release in 2022 – and what we discovered is nothing short of surprising.

Speech-To-Text

Enhancing CX with AI: Key trends to watch 2024

AI is transforming contact centers at an accelerating pace. Speech AI technologies are at the forefront of this revolution, enabling companies to provide better customer experiences through a combination of advanced agent-assist techniques and fully automated interactions that feel natural and human-like.

Case Studies

How VEED is streamlining video editing and subtitles with AI transcription

User-generated content has become a cornerstone of the internet-driven economy. As part of this shift, various platforms have emerged to provide easy-to-use tools to create high-quality video content in a matter of minutes — with AI transcription playing a foundational role in their product development.

Tutorials

How to build a speaker identification system for recorded online meetings

Virtual meeting recordings are becoming increasingly used as a source of valuable business knowledge. However, given the large amount of audio data produced in meetings by companies, getting the full value out of recorded meetings can be tricky.

Speech-To-Text

OpenAI Whisper vs Google STT vs Amazon Transcribe: the ASR rundown (2026 edition)

Speech recognition has always been a crowded space. But in the last few years, the models have gotten faster, cheaper, and smarter. New architectures have entered the picture. And the baseline expectation for what "good enough" looks like has shifted dramatically.

Speech-To-Text

Best open-source speech-to-text models in 2026

TL;DR: The open-source ASR landscape has shifted dramatically in the last few years. DeepSpeech is discontinued, Kaldi is legacy, and a new generation of models — NVIDIA Canary-Qwen, Qwen3-ASR, Parakeet, and Moonshine — now compete with or surpass commercial APIs on standard accuracy benchmarks. But benchmark WER and production performance are not the same thing, especially for conversational audio. This guide covers the 8 best open-source speech-to-text models in 2026, with benchmarks, architecture details, and honest deployment considerations.

Case Studies

How Gladia's multilingual audio-to-text API supercharges Carv's AI for recruiters

In today's professional landscape, the average workday of a recruiter is characterized by a perpetual cycle of administrative tasks, alternated by intake calls with hiring managers and interviews with candidates. And while recruiters enjoy connecting with hiring managers and candidates, there’s an almost universal disdain for the administrative side of the job.

Speech-To-Text

What is ASR & how do speech recognition models work?

Automatic speech recognition (ASR) is a cornerstone of many business applications in domains ranging from call centers to smart device engineering. At their core, ASR models, also referred to as Speech-to-Text (STT), intelligently recognize human speech and convert it into a written format.