Blog

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Tutorials

Integrating Gladia audio transcription API with Make for workflow automation

Embark on a journey to optimize your workflow by seamlessly integrating Gladia through Eden AI with Make. This comprehensive guide will take you through the step-by-step process, empowering you to harness the full potential of automation in your tasks.

Speech-To-Text

A review of the best ASR engines and the models powering them in 2024

Automatic Speech Recognition (ASR), also known as speech-to-text or audio transcription, is a technology that converts spoken language stored in an audio or video file into written text.

Case Studies

Mastering AI transcription for social media captions: Mojo's success story with Gladia

From Reels to ads to YouTube shorts, video content consumed in vertical bite-size format on social media is becoming among the primary ways we interact with the world for both leisure and business.

Tutorials

Transcribing long audios with Whisper using Python and Gladia API

Whisper ASR model released by OpenAI is great for providing transcriptions from audio files but doesn’t come without challenges. In addition to high computational requirements and expenses, Whisper comes with a limit of 25 MB and 30 seconds in duration on input audio files, which usually requires splitting larger audio files into chunks to be transcribed.

Speech-To-Text

Maximizing CRM enrichment with AI audio transcription

In today's fast-paced commercial environment, Customer Relation Management (CRM) systems like Salesforce and HubSpot have become the backbone of successful customer success and sales strategies. Yet, keeping CRMs up to date and in sync with the vast volumes of customer information generated daily has been a challenge to solve.

Product News

Introducing Whisper-Zero

Today, we're thrilled to release a new breakthrough ASR system, Whisper-Zero —a complete rework of Whisper combined with multiple state-of-the-art models, using over 1.5 million hours of diverse audio, including phone-quality and noisy data from real-life environments.