Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Pricing
Get started
Get started

Read more

Speech-To-Text

Factors affecting the accuracy of speech-to-text transcripts

TL;DR: Production STT accuracy fails not because of model benchmarks, but because of the gap between studio evaluation audio and the messy, multilingual, overlapping speech real users produce. Four root causes drive that gap: input audio quality, speaker traits (accents, code-switching, and overlap), domain vocabulary deficits, and model training data diversity. WER alone doesn't capture production risk. Semantic accuracy and Diarization Error Rate matter just as much when CRM syncs, coaching scores, and AI summaries all depend on what the transcript gets right. Solaria-1 delivers on average 29% lower WER on conversational speech and 3x lower DER compared to alternatives, benchmarked across 7 datasets and 74+ hours of audio with open, reproducible methodology.

Speech-To-Text

Business call transcript analysis techniques for sales and support teams

TL;DR: Upstream transcription errors compound through every downstream system: LLMs, sentiment models, and CRM pipelines are only as reliable as the transcript they process. Core conversation intelligence techniques, including sentiment scoring, BANT extraction, objection mining, and talk-ratio analysis, all depend on transcription quality. Async/batch processing provides full conversation context, making it the right default for post-call workflows.

Speech-To-Text

How AI contact centers determine caller intent

TL;DR: Caller intent routing fails at the transcription layer long before it fails at the NLU layer. If ASR misreads "cancel" as "candle" due to background noise or a non-native accent, no downstream classifier recovers the routing decision. This article covers the full intent pipeline: ASR, NLU, classification, and routing execution, the latency budgets that constrain real-time systems (~700ms total), and the audio conditions that break most production deployments.

Transforming note-taking for students with AI transcription

Published on Nov 6, 2024
Transforming note-taking for students with AI transcription

In recent years, fuelled by advancements in LLMs, the numbers of AI note-takers has skyrocketed. These apps are increasingly tailored to meet the unique needs of specific user groups, such as doctors, sales teams and project managers.

Coconote is focusing specifically on students. Leveraging the power of LLMs, their note-taking app can instantly transcribe, summarize, and contextualize notes from lectures.

Read on to find out why they chose Gladia as the primary transcription provider to power their fast-growing project as they expand globally.

About Coconote

Coconote provides an AI-powered note-taking and study assistance app designed for students.

The company focuses on enhancing accessibility to academic resources by offering features like lecture recording, instant transcription, organized notes, summaries, flashcards, and quizzes.

Their primary target audience is college students, with some reach in high school and continuing education.

Available for mobile and web users, the app first launched in the US in April 2024, reaching over 400k downloads in less than 6 months, and is now expanding to other geographies, starting with Japan.

The project's ultimate goal is to democratize access to academic support tools, making high-quality study assistance available to people from varied socioeconomic backgrounds.

Challenge

The Coconote app addresses the challenge of manual note-taking while in class, which has traditionally distracted students from actively listening and learning in lectures.

By automating the capture and organization of lecture content, Coconote allows students to engage more deeply during class without the worry of missing key points or struggling to keep up with fast-paced lectures.

To succeed in this mission, Coconote uses transcription as the backbone of the app. The app transcribes and organizes recorded lectures (in both audio and video formats) into structured notes, summaries, and study aids.

The transcription process involves Gladia as its main provider, handling recordings from diverse academic subjects in multiple languages.

Objectives

To deploy a top-tier transcription API to power the Coconote app, distinguished by the following characteristics:

  • Ability to effectively capture speech under varying audio conditions, such as echo-laden and noisy lecture halls;
  • Transcription accuracy and adaptability to specific terminologies across various academic disciplines;
  • A truly multilingual approach to asynchronous transcription and translation, with core features and add-ons all available in over 100 languages.
__wf_reserved_inherit

Solution

With Gladia, the Coconote team was able to implement:

  • Instant transcription for lecture recordings and URLs;
  • Translation to and from 100 languages;
  • Organized notes per subject matter;
  • Summaries, flashcards, and quizzes, generated automatically based on the notes.
__wf_reserved_inherit
__wf_reserved_inherit

Impact

By working with the Gladia team to iterate and scale up, they saw a noticeable impact on their rapidly growing user base and retention, with hundreds of happy users reporting remarkable performance of the app’s note-taking capabilities.

  • 4.8 star rating on App Store
  • 420,000+ signups
  • 300,000+ Instagram followers, including celebrity parents like Joe Rogan and Patrick Dempsey
__wf_reserved_inherit

The team at Coconote is just starting to explore the possibilities that Gladia’s transcription brings to their users and is already considering how they can leverage the insights from data to make the app even more tailored to their specific needs.

__wf_reserved_inherit

About Gladia

Gladia provides a speech-to-text and audio intelligence API for building virtual meeting and note-taking apps, call center platforms, and media products, providing transcription, translation, and insights powered by best-in-class ASR, LLMs and GenAI models.

Having read this case study, do you feel like Gladia could be the right fit for your business too?

Don't hesitate to contact our sales team to explore this in more detail, and follow us on X and LinkedIn.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more