Benchmarks

Open benchmark for speech-to-text

We evaluated Gladia Solaria against 8 leading providers across 7 datasets and 74 hours of audio. The full methodology is open-sourced so results can be independently reproduced.

ALL RESULTS AT A GLANCE

WER comparison across datasets

Lower WER is better. Filter by dataset to focus on what matters to you.

OPEN METHODOLOGY

How we benchmark

7
Evaluation datasets
74+
Hours of audio
8
Providers compared

Each audio file was sent to every provider's production API using default settings. No custom model tuning or prompt engineering was applied. All providers were tested on identical audio files.

Transcription outputs were normalized using the OpenAI Whisper text normalizer before WER computation. Diarization Error Rate (DER) is measured on the DIHARD III challenge datasets using standard protocols.

The full benchmarking framework is open-sourced to enable transparent, reproducible evaluation of speech recognition systems.

Transparent benchmarks,
open source

Full methodology and evaluation framework available. Reproduce every result independently.