We evaluated Gladia Solaria against 8 leading providers across 7 datasets and 74 hours of audio. The full methodology is open-sourced so results can be independently reproduced.
Lower WER is better. Filter by dataset to focus on what matters to you.
Each audio file was sent to every provider's production API using default settings. No custom model tuning or prompt engineering was applied. All providers were tested on identical audio files.
Transcription outputs were normalized using the OpenAI Whisper text normalizer before WER computation. Diarization Error Rate (DER) is measured on the DIHARD III challenge datasets using standard protocols.
The full benchmarking framework is open-sourced to enable transparent, reproducible evaluation of speech recognition systems.
Full methodology and evaluation framework available. Reproduce every result independently.