Best on real English audio
The truest test of a speech model isn't a curated benchmark — it's real customer calls. On Gladia's internal English dataset, drawn from real production recordings annotated by humans, Solaria-3 hits 9.6% WER — at the very top of the field, and 26% better than Solaria-1.
"hello ladies and gentlemen thank you for standing by for cugen's third quarter twenty twenty one earnings…"
"Hello, ladies and gentlemen. Thank you for standing by. Qudian's third quarter 2021 earnings conference…"
Company name mangled, numbers written as words. On a 15-minute call, errors compound.