How Gladia's multilingual audio-to-text API supercharges Carv's AI for recruiters

How Gladia's multilingual audio-to-text API supercharges Carv's AI for recruiters

How Gladia's multilingual audio-to-text API supercharges Carv's AI for recruiters
Published on
May 2024

In today's professional landscape, the average workday of a recruiter is characterized by a perpetual cycle of administrative tasks, alternated by intake calls with hiring managers and interviews with candidates. And while recruiters enjoy connecting with hiring managers and candidates, there’s an almost universal disdain for the administrative side of the job.

Taking interview notes, writing job descriptions, drafting candidate profiles — these are just few of the many admin tasks that require precious time and attention, but leave recruiters with little opportunity to prioritize more candidate centric initiatives.

And that’s where Carv comes in. The company aims to revolutionize recruiting by integrating AI into the interview process, saving recruiters hours that can now be spent on more impactful initiatives.

A key component in this workflow is capturing relevant data and insights from recruitment calls, for which Carv relies on multilingual AI transcription provided by Gladia.

About Carv

Carv is AI for recruiters, purpose-built to take over admin tasks related to intake calls & interviews. Their mission is to unburden recruiters by eliminating admin so they can prioritize the human aspect of hiring. Carv listens in on job intake calls and interviews and uses the context of those meetings to fully automate admin tasks, reducing time spent on tedious tasks from hours to minutes. 

Founded in Amsterdam in 2022, the company of 25 employees targets recruiting teams and staffing agencies around the globe.


As the recruitment landscape evolves, the need for efficient and streamlined hiring processes grows increasingly important. After all, good talent is hard to find, and providing recruiters with the tools necessary to build meaningful connections with the right candidates is paramount in getting the right people in.

Carv’s founders recognized the challenges faced by recruiters – the significant amount of time spent on repetitive administrative tasks rather than focusing on the human being in front of them. They asked themselves the question: How can we unburden the recruiter so that they can focus on the interactions that truly matter?

Leveraging the momentum created by the emergence of generative AI, the company enlisted the help of state-of-the-art Large Language Models (LLMs) and prompt engineering to address this very issue.

The result is a versatile AI platform that accompanies recruiters in every intake conversation and candidate interview.

Transcription, which serves to convert these meetings into input for LLMs, plays a key role in Carv's mission to optimize productivity for hiring teams. After all, without the right context and accurate data for the LLM, generating top-quality insights for recruiters to work with efficiently is next to impossible. 

Choosing the right provider to establish this foundation was therefore paramount for Carv to ensure no nuance is lost in the process. 


Before switching to Gladia, Carv used another API provider that didn’t provide the range of language support necessary to serve Carv’s international client base. With a user base originating from over 90 countries, flawless multilingual support was critical.

Gladia met their needs based on the following criteria:

  • High-quality transcription at a scalable cost;
  • Language support for transcription and audio intelligence in 99+ languages, with enhanced sensitivity to accents and top accuracy in less widely-spoken languages like Dutch, Albanian and Kazakh;
  • Code-switching, i.e., the ability to detect and transcribe a meeting where multiple languages are used interchangeably.


Enter Gladia! Using our speech-to-text and audio intelligence API, the Carv team was able to implement and improve must-have features like:

  • AI-generated job descriptions based on intake calls with the hiring manager;
  • AI-generated candidate profiles, based on interviews with candidates;
  • Meeting notes, so recruiters can focus on the interview;
  • Free format prompting, for highly specific use cases.
Preview of Carv's core features powered by Gladia


By working with the Gladia team to iterate and scale up, they saw a noticeable impact on their speed of shipping and iteration, with the underlying transcription engine enabling more and more valuable capabilities for Carv's AI for Recruiters.

The team at Carv is already looking forward to implementing more features at the intersection of Gladia APIs and Carv’s proprietary know-how. According to Carv’s VP of Product & Engineering, Valentijn van Gastel, Carv’s future vision involves addressing specific use cases and problems for customers in different types of recruitment, where Gladia's expanding audio intelligence offering could be of service.

We're thrilled to be part of this amazing journey with them, and thank Carv for putting their trust in us! As we move forward, we're excited to team up with more clients, tackle new challenges, and make speech AI more accessible to virtual meeting companies worldwide.

Having read this case study, do you feel like Gladia can be the right fit for your business too?

Don't hesitate to contact our sales team to discuss this in more detail - or sign up directly below. Beyond virtual meetings, we cater to a range of use cases, including content & media, call centers, workspace collaboration, and more with tailored solutions.

About Gladia

At Gladia, we built an optimized version of Whisper in the form of an API,  adapted to real-life professional use cases and distinguished by exceptional accuracy, speed, extended multilingual capabilities, and state-of-the-art features, including speaker diarization and word-level timestamps. Our latest model, Whisper-Zero, that removes hallucinations and improves accuracy across languages is available now.

Contact us

Your request has been registered
A problem occurred while submitting the form.

Read more


Should you trust WER?

Word Error Rate (WER) is a metric that evaluates the performance of ASR systems by analyzing the accuracy of speech-to-text results. WER metric allows developers, scientists, and researchers to assess ASR performance. A lower WER indicates better ASR performance, and vice versa. The assessment allows for optimizing the ASR technologies over time and helps to compare speech-to-text models and providers for commercial use. 


OpenAI Whisper vs Google Speech-to-Text vs Amazon Transcribe: The ASR rundown

Speech recognition models and APIs are crucial in building apps for various industries, including healthcare, customer service, online meetings, and entertainment.


Best open-source speech-to-text models

Automatic speech recognition, also known as speech-to-text (STT), has been around for some decades, but the advances of the last two decades in both hardware and software, especially for artificial intelligence, made the technology more robust and accessible than ever before.