Blog

Technical guides, customer stories, and product updates
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Case Studies

How VEED is streamlining video editing and subtitles with AI transcription

User-generated content has become a cornerstone of the internet-driven economy. As part of this shift, various platforms have emerged to provide easy-to-use tools to create high-quality video content in a matter of minutes — with AI transcription playing a foundational role in their product development.

Tutorials

How to build a speaker identification system for recorded online meetings

Virtual meeting recordings are becoming increasingly used as a source of valuable business knowledge. However, given the large amount of audio data produced in meetings by companies, getting the full value out of recorded meetings can be tricky.

Speech-To-Text

OpenAI Whisper vs Google STT vs Amazon Transcribe: the ASR rundown (2026 edition)

Speech recognition has always been a crowded space. But in the last few years, the models have gotten faster, cheaper, and smarter. New architectures have entered the picture. And the baseline expectation for what "good enough" looks like has shifted dramatically.

Speech-To-Text

Best open-source speech-to-text models in 2026

TL;DR: The open-source ASR landscape has shifted dramatically in the last few years. DeepSpeech is discontinued, Kaldi is legacy, and a new generation of models — NVIDIA Canary-Qwen, Qwen3-ASR, Parakeet, and Moonshine — now compete with or surpass commercial APIs on standard accuracy benchmarks. But benchmark WER and production performance are not the same thing, especially for conversational audio. This guide covers the 8 best open-source speech-to-text models in 2026, with benchmarks, architecture details, and honest deployment considerations.

Case Studies

How Gladia's multilingual audio-to-text API supercharges Carv's AI for recruiters

In today's professional landscape, the average workday of a recruiter is characterized by a perpetual cycle of administrative tasks, alternated by intake calls with hiring managers and interviews with candidates. And while recruiters enjoy connecting with hiring managers and candidates, there’s an almost universal disdain for the administrative side of the job.

Speech-To-Text

What is ASR & how do speech recognition models work?

Automatic speech recognition (ASR) is a cornerstone of many business applications in domains ranging from call centers to smart device engineering. At their core, ASR models, also referred to as Speech-to-Text (STT), intelligently recognize human speech and convert it into a written format.

Speech-To-Text

Fine-tuning ASR models: Key definitions, mechanics, and use cases

Many modern AI models are built for general-purpose applications and require fine-tuning for domain-specific tasks. The fine-tuning process involves taking an existing model and training it further on domain-specific data. The additional training allows the model to understand the new data and improve its performance in a particular field.

Tutorials

Building a song transcription system with profanity filter using Whisper, GPT 3.5 and Spleeter

The inception of music streaming gained initial popularity in 1999 with the founding of Napster, one of the pioneering streaming platforms. Millions of songs were available to listen to and download for free through the platform using the internet. One no longer needed to buy pre-recorded tapes, go to live shows, or tune into radio stations to listen to music.

Case Studies

AI-powered healthcare assistant enhances medical transcription by 120% with Gladia

Medical transcription is among the most critical and challenging verticals for ASR models to date.