Blog

Technical guides, customer stories, and product updates
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Case Studies

How VEED is streamlining video editing and subtitles with AI transcription

User-generated content has become a cornerstone of the internet-driven economy. As part of this shift, various platforms have emerged to provide easy-to-use tools to create high-quality video content in a matter of minutes — with AI transcription playing a foundational role in their product development.

Tutorials

How to build a speaker identification system for recorded online meetings

Virtual meeting recordings are becoming increasingly used as a source of valuable business knowledge. However, given the large amount of audio data produced in meetings by companies, getting the full value out of recorded meetings can be tricky.

Speech-To-Text

Word error rate (WER): Definition, & can you trust this metric?

Word Error Rate (WER) is a metric that evaluates the performance of ASR systems by analyzing the accuracy of speech-to-text results. WER metric allows developers, scientists, and researchers to assess ASR performance. A lower WER indicates better ASR performance, and vice versa. The assessment allows for optimizing the ASR technologies over time and helps to compare speech-to-text models and providers for commercial use. 

Speech-To-Text

OpenAI Whisper vs Google Speech-to-Text vs Amazon Transcribe: The ASR rundown

Speech recognition models and APIs are crucial in building apps for various industries, including healthcare, customer service, online meetings, and entertainment.

Speech-To-Text

Best open-source speech-to-text models

Automatic speech recognition, also known as speech-to-text (STT), has been around for some decades, but the advances of the last two decades in both hardware and software, especially for artificial intelligence, made the technology more robust and accessible than ever before.

Case Studies

How Gladia's multilingual audio-to-text API supercharges Carv's AI for recruiters

In today's professional landscape, the average workday of a recruiter is characterized by a perpetual cycle of administrative tasks, alternated by intake calls with hiring managers and interviews with candidates. And while recruiters enjoy connecting with hiring managers and candidates, there’s an almost universal disdain for the administrative side of the job.

Speech-To-Text

What is ASR & how do speech recognition models work?

Automatic speech recognition (ASR) is a cornerstone of many business applications in domains ranging from call centers to smart device engineering. At their core, ASR models, also referred to as Speech-to-Text (STT), intelligently recognize human speech and convert it into a written format.

Speech-To-Text

Fine-tuning ASR models: Key definitions, mechanics, and use cases

Many modern AI models are built for general-purpose applications and require fine-tuning for domain-specific tasks. The fine-tuning process involves taking an existing model and training it further on domain-specific data. The additional training allows the model to understand the new data and improve its performance in a particular field.

Tutorials

Building a song transcription system with profanity filter using Whisper, GPT 3.5 and Spleeter

The inception of music streaming gained initial popularity in 1999 with the founding of Napster, one of the pioneering streaming platforms. Millions of songs were available to listen to and download for free through the platform using the internet. One no longer needed to buy pre-recorded tapes, go to live shows, or tune into radio stations to listen to music.