Solaria-3 ASR Benchmark Results — Word Error Rate (WER %, lower is better)

Methodology: benchmarks run on the same framework as the Solaria-1 campaign. Each provider tested on identical audio files via production APIs with default settings. Real customer audio = Gladia internal production dataset, annotated by humans. Soniox and Pipecat STT Benchmark excluded pending data availability. Full open benchmark methodology: https://github.com/gladiaio/normalization

Columns: WER = Word Error Rate (%, lower is better) · Rank = position among tested providers on that dataset

Real customer audio — English (Gladia internal production dataset)

ProviderWER (%)Rank
Solaria-39.6#1
ElevenLabs Scribe v29.9#2
AssemblyAI10.0#3
Deepgram Nova-310.7#4
Mistral Voxtral12.2#5
Solaria-112.9#6

Earnings22 Cleaned AA — Financial Calls (Curated by Artificial Analysis)

ProviderWER (%)Rank
Solaria-36.4#1
AssemblyAI6.9#2
ElevenLabs Scribe v27.7#3
Speechmatics7.8#4
Mistral Voxtral7.9#5
Solaria-18.1#6
Deepgram Nova-312.0#7

Switchboard — Conversational Speech

ProviderWER (%)Rank
Solaria-333.9#1
Solaria-137.3#2
AssemblyAI42.3#3
Speechmatics46.0#4
Mistral Voxtral48.1#5
Deepgram Nova-349.8#6
ElevenLabs Scribe v255.2#7

Noisy audio — Degraded production audio

ProviderWER (%)Rank
Mistral Voxtral1.0#1
Solaria-31.4#2
Solaria-11.9#3
Speechmatics1.9#3
AssemblyAI2.1#5
Deepgram Nova-33.2#6
ElevenLabs Scribe v24.0#7

Solaria-3 vs Solaria-1 — European language WER improvement (negative = Solaria-3 better)

LanguageReal customer audioCommon Voice 24
English (EN)−26%−16%
French (FR)−18%−19%
Italian (IT)−10%−12%
Spanish (ES)−9%≈ flat
German (DE)−3%−13%

Solaria-3 vs Solaria-1 — accuracy by benchmark (WER %, lower is better)

BenchmarkSolaria-3 WER (%)Solaria-1 WER (%)Solaria-3 rank / note
Earnings22 Cleaned AA6.48.1#1 (−21% vs Solaria-1)
Switchboard33.937.3#1 (−9% vs Solaria-1)
Noisy audio1.41.9#2 (−26% vs Solaria-1)
Common Voice 246.98.2−16% vs Solaria-1
FLEURS3.73.9−5% vs Solaria-1
VoxPopuli2.92.2Regression: +32% vs Solaria-1 (formal parliamentary speech)
Multilingual LibriSpeech8.05.9Regression: +36% vs Solaria-1 (clean read-speech)
New model

Solaria-3

Solaria-3 is built for production audio — noisy, fast-paced, and conversational. Best-in-class on real customer recordings in English and core European languages, with higher precision on the names, terms, and entities that matter most in business scenarios.

$200 worth of hours on us with TRY-SOLARIA-3 *

Try Solaria-3 for free Try Solaria-3 for free
#1 on English production audio with real customer calls
#1 on business audio & conversational call center speech
5 langs with superior performance in EN, FR, DE, ES, IT
Why Solaria-3

Accurate where it counts

Solaria-3 excels with the challenging production audio that breaks other models.

01

Best on real English audio

The truest test of a speech model isn't a curated benchmark — it's real customer calls. On Gladia's internal English dataset, drawn from real production recordings annotated by humans, Solaria-3 hits 9.6% WER — at the very top of the field, and 26% better than Solaria-1.

Deepgram — 10.7% WER

"hello ladies and gentlemen thank you for standing by for cugen's third quarter twenty twenty one earnings…"

Solaria-3 — 4.2% WER

"Hello, ladies and gentlemen. Thank you for standing by. Qudian's third quarter 2021 earnings conference…"

Company name mangled, numbers written as words. On a 15-minute call, errors compound.

02

#1 on business calls and telephone speech

Production audio is never clean. Call center recordings, mobile interviews, field audio — they all carry background noise, compression artefacts, and variable microphone quality. Solaria-3 handles them all, with leading WER on Earnings22 and Switchboard.

Solaria-1, Speechmatics, ElevenLabs — 100% WER

"Yeah, not even that much, probably. Well, that would be a good time."

Solaria-3 — 0.0% WER

"Yeah, not not even that much probably. Yeah."

On degraded phone audio, three providers hallucinate an entire sentence that was never spoken.

03

The most accurate model for European languages

Multilingual accuracy has been core to Gladia since day one. Solaria-3 delivers on that commitment with its strongest European performance yet, tested on real customer calls across English, French, German, Spanish, and Italian.

Solaria-1 & Mistral — 50% WER

"Thus the bison tens were focused to fight lone."

Solaria-3 — 0.0% WER

"Thus, the Byzantines were forced to fight alone."

Three independent errors on an 8-word sentence. A proper noun, a verb, and an adverb — all wrong at once.

Benchmarks

The numbers, unfiltered

WER by language. Lower is better. Results include regressions — we publish the full picture.

Solaria-3 vs. Solaria-1 — all European languages

Relative WER improvement. Negative = better.

Language Real customer audio Common Voice 24
English (EN) Launch −26% −16%
French (FR) Launch −18% −19%
Italian (IT) −10% −12%
Spanish (ES) −9% ≈ flat
German (DE) −3% −13%

Launch languages: EN and FR. Real customer audio = Gladia's internal production dataset, annotated by humans.

Solaria-1

Where Solaria-1 is still stronger

Solaria-3 steps back on two benchmarks vs. Solaria-1: VoxPopuli Cleaned AA (+32% — formal parliamentary speech) and Multilingual LibriSpeech (+36% — clean read-speech). We're publishing this openly. If formal institutional speech or clean read-speech is your primary use case, Solaria-1 remains the better choice.

Compare

Solaria-1 vs. Solaria-3 — Which one is right for you

Two models, two jobs. Solaria-3 for real-world European audio quality. Solaria-1 for full breadth — 100+ languages, code-switching, formal speech.

Solaria-3 Solaria-1
Best for
Primary use case Highest accuracy on European real-world audio Maximum language coverage across any domain
Recommended for Business audio, call centers, noisy recordings, real-world European speech Global multilingual, rare languages, clean read-speech, formal/institutional audio
Language coverage
Supported languages Optimized for EN, FR, DE, ES, IT 100+ languages incl. 42 exclusive to Gladia
Code-switching Limited ✓ Supported
Auto language detection Supported ✓ Supported
Accuracy (WER — lower is better)
Earnings22 Cleaned AA #1 — 6.4% (−21%) 8.1%
Switchboard #1 — 33.9% (−9%) 37.3%
Noisy audio #2 — 1.4% (−26%) 1.9%
Common Voice 24 6.9% (−16%) 8.2%
FLEURS 3.7% (−5%) 3.9%
VoxPopuli 2.9% (+32%) 2.2%
Multilingual LibriSpeech 8.0% (+36%) 5.9% — stronger
Architecture & performance
On-premise deployment Available Available
Real-time streaming Async only, for now <103ms partials
Availability
Status Free for 5 days → GA Generally available

VoxPopuli and Multilingual LibriSpeech regressions published openly. For formal institutional speech or clean read-speech, Solaria-1 remains the better choice.

Try Solaria-3

$200 worth of hours on us with TRY-SOLARIA-3 *

Try it now Try it now

* TRY-SOLARIA-3 is redeemable once per account for $200 worth of hours of async transcription and must be used by June 21st at the latest. Make sure to select Solaria-3 as the model of choice. Full pricing applies after.