We evaluated Solaria-1 against 8 leading providers across 8 datasets and 74 hours of audio. The full methodology is open-sourced so results can be independently reproduced.
Lower WER is better. Filter by dataset to focus on what matters to you.
Each audio file is sent to every provider's production API with default settings. Solaria-3, Solaria-1, and every competitor are tested on identical files — no custom tuning or prompt engineering.
Transcription outputs are normalized using the OpenAI Whisper text normalizer before WER computation. Diarization Error Rate (DER) is measured on DIHARD III challenge datasets using standard protocols.
The full framework is open-sourced on GitHub. Real customer audio is Gladia's internal production dataset, annotated by humans — Soniox and Pipecat STT Benchmark excluded on some datasets pending data availability.
Full methodology, evaluation framework, and benchmark report. Reproduce every result independently.