Heading 1

Heading 2

Heading 3

Heading 4

Heading 5
Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

  1. Item 1
  2. Item 2
  3. Item 3

Unordered list

Text link

Bold text

Emphasis

Superscript

Subscript

Read more

Speech-To-Text

Build a customer interview library with Gladia, Airtable & Make.com

TL;DR: Most product teams lose qualitative insights to scattered audio and transcripts that misattribute quotes. A reliable interview library needs accurate async diarization, automated routing, and a searchable database. Gladia's Solaria-1 sets the accuracy floor (29% lower WER, 3x lower DER on conversational speech), and Make.com routes its structured JSON into Airtable automatically, turning raw recordings into a searchable, theme-tagged customer content library.

Speech-To-Text

Build an automated sales call analyzer with Gladia and n8n

TL;DR: Off-the-shelf conversation intelligence platforms cost $1,200 to $2,400 per seat per year, while this n8n and Gladia pipeline scales at $0.20 to $0.61 per hour of audio with all features included. The async pipeline handles transcription, speaker diarization, and audio intelligence in a single API call, and the structured JSON output maps directly into HubSpot or Salesforce through n8n nodes. Gladia's Solaria-1 model covers 100+ languages, including 42 that no other API-level competitor supports, protecting CRM data quality for global sales teams.

Speech-To-Text

How to build a no-touch pipeline from sales calls to CRM

TL;DR: Manual CRM entry breaks sales intelligence pipelines because reps skip fields and misremember details, creating corrupted deal data that spreads into forecasts, coaching scores, and follow-up tasks. The bottleneck in fixing this isn't the CRM API or the LLM prompt, it's the transcription layer, since a high word error rate corrupts every entity Claude extracts downstream. This tutorial walks through a production-ready pipeline using Gladia's async STT for transcription, Claude for entity extraction, and n8n for orchestration, with most teams reaching production in under 24 hours. Gladia's Solaria-1 model delivers on average 29% lower WER than alternatives on conversational speech, directly protecting the accuracy of every deal record written to the CRM.

How to extract buyer intent and sales objections from calls using Gladia and Claude

Published on May 8, 2026
by Ani Ghazaryan
How to extract buyer intent and sales objections from calls using Gladia and Claude

TL;DR: Sales teams are sitting on recorded calls that could populate CRMs automatically, but the most common failure mode is the STT layer dropping words, misattributing speakers, or degrading silently on accented audio. Pairing Gladia's async transcription (Solaria-1) with Claude's strict JSON output mode fixes this, delivering full-context accuracy and diarization that streaming can't match, with on average 29% lower WER and 3x lower DER vs. alternatives so Claude receives a cleaner transcript and produces fewer false signals.

Your sales team is sitting on hundreds of recorded calls, but extracting reliable buyer intent at scale means solving the audio infrastructure problem first. When transcription drops a single "not" or misattributes a speaker, every downstream LLM extraction breaks. The bottleneck in AI sales analysis isn't the prompt, it's the speech-to-text (STT) layer.

To extract reliable buyer intent and objections at scale, you need an audio infrastructure layer that handles messy, multilingual, multi-speaker sales audio without degrading silently, and an LLM configured to output strict, CRM-ready structured data. This guide walks through how to pipe Gladia's async transcription API into Claude, complete with Python code, JSON schemas, and a cost model for production scale.

Uncover buyer needs from call data with AI

AI sales coaching, conversation intelligence, and sales call analysis all describe the same fundamental job: turning raw call recordings into structured pipeline data that a CRM or coaching dashboard can act on. The architecture is consistent across these labels: an audio capture layer, a speech-to-text layer, and an LLM layer that reads the transcript and extracts structured insights.

JSON schema for sales call routing

CRMs can't parse unstructured LLM text output. If Claude returns a paragraph describing a budget objection instead of a structured object with an objection_type, severity, and quoted_text field, RevOps can't route it and Salesforce can't ingest it. The answer is strict JSON schemas that constrain Claude to return only fields your downstream systems can parse.

A production-ready schema for sales call routing covers five objects: buying_signals, objections, decision_makers, next_steps, and summary. Claude's structured outputs documentation covers how to pass a JSON schema in the API request to enforce that format. The full schema appears in Step 2.

AI accuracy vs. manual call errors

Manual CRM entry is where data degrades. Reps log deal stages, next steps, and objection notes from memory, often hours after the call ends. An AI pipeline that reads the actual transcript has no recall problem, but it has an accuracy dependency: the transcript ceiling sets the extraction ceiling.

Production implementations using Gladia's async transcription report WER as low as 1-3%, with transcripts typically returned in under a minute per hour of audio. At that error rate, a prospect saying "we have budget approved for this quarter" arrives in Claude's context window intact. At higher WER levels common with some self-hosted solutions, transcription errors can alter meaning and every downstream extraction suffers. The word error rate explainer covers how to evaluate STT vendors on this metric.

Target users for call analysis AI

  • Sales managers and RevOps (Revenue Operations) who need structured call data in Salesforce or HubSpot without relying on rep self-reporting
  • Product teams building AI sales agents or coaching tools who need a reliable speech layer as the first stage of their pipeline
  • CCaaS (Contact Center as a Service) platforms processing high volumes of calls across multiple languages and needing structured post-call analytics at scale

Steps 1–4 cover the Python implementation. If you're evaluating fit rather than building directly, the real-world example in Step 4 and the FAQ section are the most relevant entry points.

Configure your stack for intent extraction

The full pipeline follows four stages:

  1. Sales call recording: WAV, M4A, FLAC, AAC, or URL
  2. Gladia async API: Diarization, language detection, code-switching enabled
  3. Claude API: JSON schema extraction with strict structured output
  4. CRM integration: Salesforce or HubSpot custom field population

How to set up Gladia API for sales process automation

Gladia's API is REST-based and designed for async workflows where the caller submits an audio job and receives a webhook or polls when the transcript is ready. The getting started documentation covers authentication and your first API call.

The async transcription endpoint provides:

Configure Claude for structured intent extraction

Claude's role is to read the formatted Gladia transcript and return a JSON object matching your schema. Claude's large context window handles typical sales call transcripts in a single call. For longer recordings or multi-call batches, split by call and process each separately to maintain extraction accuracy. The critical configuration is the output_config block with "type": "json_schema", which constrains Claude to return only valid JSON rather than prose.

Test with your sales call data

Don't test on clean studio recordings. Sales calls have background noise, overlapping speech, accented speakers, and code-switching. Solaria-1 is benchmarked against 8 providers across 7 datasets and 74+ hours of audio, with benchmark methodology and results that are open-source and reproducible. For teams switching from other providers, most migrations complete in under a day. The Deepgram migration guide and AssemblyAI migration guide map the exact parameter and endpoint differences so you're not guessing what changed.

Step 1: Convert sales calls to text with Gladia

Send sales call audio to Gladia

The Python implementation below uploads an audio file and submits an async transcription job with diarization and code-switching enabled:

import requests
import json

GLADIA_API_KEY = "your-api-key"
GLADIA_BASE_URL = "https://api.gladia.io"

def submit_audio_for_transcription(audio_file_path):
    # Upload audio file
    with open(audio_file_path, 'rb') as audio_file:
        upload_response = requests.post(
            f"{GLADIA_BASE_URL}/v2/upload",
            headers={"x-gladia-key": GLADIA_API_KEY},
            files={'file': audio_file}
        )

    audio_url = upload_response.json()['file_url']

    # Submit transcription job with diarization and code-switching
    payload = {
        "audio_url": audio_url,
        "diarization": True,
        "diarization_config": {"enable_diarization""diarization": True,
        "detect_language": True,
        "enable_code_switching": True
    }

    response = requests.post(
        f"{GLADIA_BASE_URL}/v2/pre-recorded",
        headers={
            "x-gladia-key": GLADIA_API_KEY,
            "Content-Type": "application/json"
        },
        json=payload
    )

    if response.status_code == 200:
        result = response.json()
        return {"job_id": result['id'], "result_url": result['result_url']}
    else:
        raise Exception(f"Submission failed: {response.text}")

In production, poll /pre-recorded/{job_id} every 5 seconds until status returns done, or configure a webhook for async notification via the transcription init endpoint.

Set up speaker diarization

Diarization is the most critical configuration for sales call analysis. When you can't separate the rep's voice from the prospect's voice, Claude attributes objections to the wrong person, which corrupts coaching scores and CRM entries downstream.

Gladia's speaker diarization is available in async mode only. If you're evaluating real-time transcription for a live-assist use case, handle speaker attribution in post-processing for higher accuracy. For post-call analysis, async diarization produces superior results according to Gladia's benchmarks.

Extract sales call transcript JSON

Gladia's response includes an utterances array where each element carries the speaker ID, transcript text, language tag, word-level timestamps, and confidence scores. The speaker field in each utterance maps to "Rep" and "Prospect" in your downstream schema.

Step 2: Define buyer intent signals and objections

Buying signals are specific, observable behaviors in transcript text that indicate a prospect is evaluating or advancing toward a purchase. Define these in your Claude extraction schema as your signal taxonomy:

Signal type What it detects
budget_mentioned Prospect states a specific approved figure or budget ceiling
timeline_inquiry Prospect asks about implementation timelines or sets a hard deadline
authority_confirmed Prospect names additional stakeholders or explains the buying committee
pain_point_identified Prospect articulates a specific gap in their current setup
competitor_comparison Prospect names a competitor they're currently using or evaluating
implementation_question Prospect asks about technical integration, onboarding, or migration steps
trial_request Prospect asks about POCs (Proof of Concept), pilots, or free tiers

In this extraction framework, you can design each signal to carry a confidence_score between 0 and 1, where scores closer to 1 typically reflect explicitly stated intent and lower scores reflect inferred signals requiring rep review.

Sales call objection taxonomy

Categorize objections by root cause rather than surface language, because the same underlying concern surfaces in a dozen different phrasings:

Objection type Example surface language
budget_constraint "That's above what we budgeted," "we need to stay under $X"
competitor_lock_in "We're already paying for Y through Q3 (third quarter)," "we have a contract"
feature_gap "We need X and I don't see it," "does it do Y?"
implementation_risk "Our last migration took six months," "IT won't approve this quickly"
vendor_fatigue "We've evaluated three tools already," "our team is tool-saturated"
timing_objection "We're not looking to decide until H2 (second half)," "revisit us next quarter"

Implement schema-compliant JSON

Pass this schema to Claude's output_config to enforce structured output. The buying_signals and objections arrays are the core objects. Expanding to include decision_makers, next_steps, and summary follows the same pattern:

{
  "type": "object",
  "properties": {
    "buying_signals": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "signal_type": {
            "type": "string",
            "enum": ["budget_mentioned", "timeline_inquiry", "authority_confirmed",
                     "pain_point_identified", "competitor_comparison",
                     "implementation_question", "trial_request"]
          },
          "confidence_score": {"type": "number", "minimum": 0, "maximum": 1},
          "speaker_id": {"type": "string"},
          "quoted_text": {"type": "string"},
          "timestamp_seconds": {"type": "number"}
        },
        "required": ["signal_type", "quoted_text"]
      }
    },
    "objections": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "objection_type": {
            "type": "string",
            "enum": ["budget_constraint", "competitor_lock_in", "feature_gap",
                     "implementation_risk", "vendor_fatigue", "timing_objection"]
          },
          "severity": {"type": "string", "enum": ["low", "medium", "high"]},
          "speaker_id": {"type": "string"},
          "objection_summary_text": {"type": "string"},
          "quoted_text": {"type": "string"}
        },
        "required": ["objection_type", "objection_summary_text"]
      }
    }
  },
  "required": ["buying_signals", "objections"]
}

Step 3: Claude extracts intent from call transcripts

Crafting Claude's extraction prompt

Keep the system prompt explicit about what counts as evidence for each signal type:

system_prompt = """You are an expert sales intelligence analyst specializing in
conversation analysis. Extract structured buyer intent signals and objections
from sales call transcripts.

Your analysis must:
1. Identify buying signals: budget discussions, timeline questions, authority
   confirmations, pain point admissions, competitor comparisons,
   implementation questions, trial requests.
2. Categorize objections: budget constraints, competitor lock-in, feature gaps,
   implementation risks, vendor fatigue, timing objections.
3. Assign confidence scores (0-1) based on how explicitly the signal is stated.
4. Include only direct quotes from the transcript as evidence.

Exclude signals not present in the transcript. Output only valid JSON."""

Prepare transcripts for Claude AI

Before sending to Claude, convert Gladia's utterances array to labeled plain text. Each line follows Speaker {ID}: {text} so Claude can attribute signals to the correct participant:

def format_gladia_transcript_for_llm(gladia_response):
    transcript_lines = []
    if 'utterances' in gladia_response:
        # Field names per Gladia async transcription response schema: docs.gladia.io/api-reference/v2/pre-recorded/get
        for utterance in gladia_response['utterances']:
            speaker_id = utterance.get('speaker', 'Unknown')
            text = utterance.get('content'transcript', '')
            transcript_lines.append(f"Speaker {speaker_id}: {text}")
    return "\n".join(transcript_lines)

Get reliable JSON buyer signals

Gladia's transcript accuracy determines whether Claude's JSON output is trustworthy. A signal extraction prompt operating on a low-quality transcript produces hallucinated signals because the underlying text is wrong. With Solaria-1 delivering on average 29% lower WER on conversational speech vs. alternatives (benchmarks), Claude receives text that matches what was actually said, and the structured output reflects real prospect intent rather than transcription artifacts.

Higher transcript accuracy at the STT layer directly reduces false signals in downstream extraction. Those false signals don't fail loudly: they corrupt CRM records, inflate pipeline forecasts, and produce misleading coaching scores weeks before anyone notices.

Step 4: Categorize and distribute findings

Validate JSON structure and schema

Before writing Claude's output to your CRM, run programmatic schema validation using Python's jsonschema library. This catches the rare cases where Claude returns a malformed object and routes those calls to a manual review queue rather than silently corrupting your database.

Integrate intent data with your CRM

Map the JSON fields from Claude's output to your CRM's custom objects. A budget signal with a high confidence_score and a quoted_text field can populate a Salesforce custom field called confirmed_budget_amount. An objection with severity: high can trigger a Slack alert to the account manager or create a follow-up task. For teams building on the Gladia meeting assistants use case architecture, the same webhook pattern used for meeting notes applies directly here.

Prevent parsing failures in production

The biggest source of pipeline brittleness isn't Claude's extraction, it's the upstream audio layer. When diarization fails and speaker labels collapse into a single speaker, Claude can't separate rep language from prospect language and every signal is misattributed. When transcription degrades on accented or multilingual audio, Claude extracts noise.

Gladia's managed infrastructure removes the DevOps overhead of maintaining a self-hosted STT stack. Aircall processes 1M+ calls per week through Gladia, which gives you a concrete production-scale reference for infrastructure reliability at enterprise volume.

Real-world example: detecting budget objections and urgency signals

Here is a hypothetical sales call snippet featuring code-switching between English and Spanish. Gladia handles the mid-sentence language switch without breaking diarization or requiring a new session, a capability covered in depth in the code-switching guide:

Speaker 1: We need to get this done by Q3 (third quarter), entiendes? But honestly, the budget
is my main concern. Our CFO (Chief Financial Officer) has approved $50K, but I'm not sure if that's realistic.

Speaker 2: That's a solid starting point. Who else needs to sign off?

Speaker 1: It's primarily me and the VP (Vice President) of Operations, but we'll need IT's blessing too.
Basicamente, you're talking to three decision-makers.

Running this through the Claude extraction pipeline produces:

{
  "buying_signals": [
    {
      "signal_type": "timeline_inquiry",
      "confidence_score": 0.98,
      "speaker_id": "1",
      "quoted_text": "We need to get this done by Q3",
      "timestamp_seconds": 68
    },
    {
      "signal_type": "budget_mentioned",
      "confidence_score": 0.92,
      "speaker_id": "1",
      "quoted_text": "Our CFO has approved $50K, but I'm not sure if that's realistic",
      "timestamp_seconds": 72
    },
    {
      "signal_type": "authority_confirmed",
      "confidence_score": 0.96,
      "speaker_id": "1",
      "quoted_text": "you're talking to three decision-makers",
      "timestamp_seconds": 108
    }
  ],
  "objections": [
    {
      "objection_type": "budget_constraint",
      "severity": "medium",
      "speaker_id": "1",
      "objection_summary_text": "CFO approval limited to $50K; prospect uncertain about fit",
      "quoted_text": "the budget is my main concern. Our CFO has approved $50K, but I'm not sure if that's realistic"
    }
  ]
}

Product teams can extend the system prompt to detect industry-specific objections. For SaaS deals, adding "Flag any mention of SOC 2 compliance requirements or GDPR obligations as implementation_risk objections" captures the security review signals that frequently stall enterprise deals. Gladia's automatic language detection across 100+ supported languages enables consistent extraction regardless of what language the prospect spoke.

On Growth and Enterprise plans, customer audio is never used to retrain models, no opt-out required. Gladia holds SOC 2 Type II (Service Organization Control), ISO 27001 (International Organization for Standardization), HIPAA (Health Insurance Portability and Accountability Act), and GDPR (General Data Protection Regulation) certifications, all documented at the Gladia compliance hub.

Start with 10 free hours and build your sales intent pipeline. The async transcription documentation covers every parameter used in the code above.

FAQs

What is Gladia's production WER on sales calls?

Gladia's Solaria-1 delivers on average 29% lower WERon conversational speech than alternatives. Production implementations report WER as low as 1-3% on their async transcription pipelines.

Can you extract multilingual buyer intent with Gladia?

Yes, Gladia supports 100+ languages with native code-switching detection. Claude processes the multilingual transcript and outputs the JSON analysis in English regardless of the source language.

Should I use real-time or async transcription for sales call AI?

Async batch processing is the right default for post-call intent extraction because it provides superior diarization accuracy and full-context processing that real-time streaming doesn't offer. Gladia supports real-time transcription at approximately 300ms latency for live-assist use cases, but async is the recommended path for summaries, objection detection, and CRM routing.

What does Gladia cost at different call volumes?

Per-hour pricing based on audio duration makes cost modeling straightforward — diarization, NER, and sentiment analysis are included in the base rate, no add-on fees. At Starter ($0.61/hr), 100 hours runs $61/month and 1,000 hours runs $610/month. Growth pricing starts from $0.20/hr based on commitment tier, bringing those same volumes down to $20 and $200/month respectively. Enterprise is annual with custom models and debundled pricing.

Does Gladia use my sales call audio to train its models?

On the Starter plan, customer audio can be used for model training by default. On Growth and Enterprise plans, customer audio is never used for training, no opt-out required. Teams processing sensitive prospect conversations should review the privacy and compliance details for each plan at the Gladia compliance hub and pricing page.

Key terms glossary

Word Error Rate (WER): The percentage of words in a transcript that differ from the actual spoken words, calculated as substitutions plus deletions plus insertions divided by total word count. Understanding WER in production contexts is the first step in evaluating any STT vendor for downstream LLM extraction.

Diarization Error Rate (DER): A measure of how accurately a speech system attributes segments of audio to the correct speaker, accounting for missed speech, false alarms, and speaker confusion. Higher DER means more utterances are assigned to the wrong speaker, which breaks signal attribution in sales call analysis.

Code-switching: The phenomenon where a speaker alternates between two or more languages within a single conversation, often mid-sentence. Most STT APIs fail silently on code-switching in contact centers, returning garbled output where the language switches.

Async transcription: A batch processing workflow where an audio file is submitted to the STT API, processed offline with access to the full recording context, and the completed transcript is returned via polling or webhook. Async delivers higher accuracy and better diarization than streaming because the model can use future context to resolve ambiguous speech.

JSON schema: A declarative format for describing the structure of a JSON object, specifying required fields, data types, and allowed values. Passing a JSON schema to Claude's API constrains the model to return only valid structured output, which is required for reliable CRM integration.

Contact us

280
Your request has been registered
A problem occurred while submitting the form.

Read more