Open benchmark for speech-to-text

We evaluated Gladia Solaria against 8 leading providers across 7 datasets and 74 hours of audio. The full methodology is open-sourced so results can be independently reproduced.

Get started free Get started free Read methodology Read methodology

ALL RESULTS AT A GLANCE

WER comparison across datasets

Lower WER is better. Filter by dataset to focus on what matters to you.

OPEN METHODOLOGY

How we benchmark

Evaluation datasets

74+

Hours of audio

Providers compared

Each audio file was sent to every provider's production API using default settings. No custom model tuning or prompt engineering was applied. All providers were tested on identical audio files.

Transcription outputs were normalized using the OpenAI Whisper text normalizer before WER computation. Diarization Error Rate (DER) is measured on the DIHARD III challenge datasets using standard protocols.

The full benchmarking framework is open-sourced to enable transparent, reproducible evaluation of speech recognition systems.

Transparent benchmarks,
open source

Full methodology and evaluation framework available. Reproduce every result independently.

Read methodology report Read methodology report Get the OSS repo Get the OSS repo

Open benchmark for speech-to-text

WER comparison across datasets

How we benchmark

Transparent benchmarks,open source

Transparent benchmarks,
open source