Use case

Content & Media

Enhanced content creation and viewing experience

Create, edit, and distribute audio and video content more efficiently. Gladia API unlocks a number of features to optimizes editing and subtitle creation, while improving content searchability, SEO ranks and moderation.

podcasting
social media
video production
news outlets
media conglomerates
sports networks
educational media

Top features

Audio Indexing & NER

Index every transcribed audio and video in your content library by topics and keywords for easy searchability and accessibility. Invaluable for companies that produce and distribute a large volume of content.

API
Campaign
SEO
Subtitles
Online meeting
Business
Hight
Key decision
Marketing
Optimisation
Content value
Acquisition
Growth

Transcription

Transcribe podcasts and video content quickly and accurately to streamline editing and improve on SEO scores. Variety of output formats optimized for subtitles. Word-level timestamp add-on is recommended for high-precision editing.

Translation

Reach a truly global audience with built-in translation to and from 99 languages. Invaluable for dubbing and subtitles. Multi-language live transcription available soon. A must-have feature for any global media company.

Moderation

Identify and flag hate speech or other inappropriate and offensive verbal content according to pre-determined parameters, internal protocols, and external regulations.

Speech Analytics

Analyze speech patterns in audio and video content to identify keywords, topics, and themes. Gain in-depth insights into audience behavior and interests to optimize content creation and marketing strategies. Especially useful for companies that create and distribute large volumes of content.

Some stats on performance

41
%
boost in sales
746
hours
saved processing calls
54
%
more informed decisions

Customized
for your needs

Transcription

Gladia API utilizes automatic speech recognition technology to convert audio, video files, or URL to text format. It transcribes 1h of audio in less than 60s.

Diarization

Based on a proprietary algorithm, automatically partitions an audio recording into segments corresponding to different speakers.

Topic classification

Refers to the process of categorizing content into one of the 698 predefined topic categories for content indexation.

Sentiment analysis

Determining the sentiment or opinion behind a piece of audio, such as a conversation or dialogue, using natural language processing.

Speech moderation

Allows to automatically identify and flag hate speech or other inappropriate and offensive verbal content according to pre-determined parameters.

Emotion detection

Our emotion recognition system is built upon the latest research and aims to accurately identify and distinguish between 27 human emotions.

Pricing

Free

Perfect for developers, early-stage startups, and individuals

0

$
/month

(10h/month included)

Pro

Designed to grow with scaling digital companies

0.612

$
/hour

+ $0.144 / hour for live transcription

Entreprise

Custom plan tailored to the modern enterprise

Contact us

We initially attempted to host Whisper AI, which required significant effort to scale. Switching to Gladia's transcription service brought a welcome change.

Robin lambert, CPO LIVESTORM

Read more

Speech-To-Text

Should you trust WER?

Word Error Rate (WER) is a metric that evaluates the performance of ASR systems by analyzing the accuracy of speech-to-text results. WER metric allows developers, scientists, and researchers to assess ASR performance. A lower WER indicates better ASR performance, and vice versa. The assessment allows for optimizing the ASR technologies over time and helps to compare speech-to-text models and providers for commercial use. 

Speech-To-Text

OpenAI Whisper vs Google Speech-to-Text vs Amazon Transcribe: The ASR rundown

Speech recognition models and APIs are crucial in building apps for various industries, including healthcare, customer service, online meetings, and entertainment.

Speech-To-Text

Best open-source speech-to-text models

Automatic speech recognition, also known as speech-to-text (STT), has been around for some decades, but the advances of the last two decades in both hardware and software, especially for artificial intelligence, made the technology more robust and accessible than ever before.