Most automated lead scoring models fail not because the LLM prompt is wrong, but because the underlying transcript is full of hallucinated words and broken speaker labels. When Claude can't tell who said "I have $50K budgeted for Q2," the scoring logic falls apart before it even starts. The foundation of any reliable pipeline is accurate, speaker-attributed transcription, and that's the layer most teams underestimate.
This guide walks through the exact architecture for converting sales call recordings into hot, warm, or cold CRM leads using Gladia's async transcription API and Claude. We cover audio ingestion, diarized transcript generation via Solaria-1, Claude's scoring logic, database storage, and CRM routing in a pipeline you can ship in under a day.
Turn call recordings into CRM-ready lead scores
Call transcript lead qualification analyzes a recorded sales conversation to determine a prospect's readiness for sales engagement. Instead of relying on what a sales rep remembers or chooses to log, the system reads exactly what the prospect said, identifies BANT signals (budget, authority, need, timeline) and writes a standardized score back to your CRM.
This matters because CRM data decay starts the moment a rep manually qualifies a call. By the time they've finished three more calls, written up their notes, and finally updated the contact record, key signals are paraphrased, missed, or gone entirely. An automated pipeline captures everything from the source audio and makes qualification consistent across every rep on your team.
Building the lead scoring model
The build-versus-buy decision here is straightforward. Building a custom acoustic scoring model requires ML expertise, labeled training data, and ongoing infrastructure you'll need to maintain. The utility function, converting a call recording into a qualified lead record, isn't your core differentiator. Buying the transcription and LLM reasoning layers from Gladia and Claude means you wire together two APIs rather than hiring a data science team.
This pattern mirrors what we cover in the async transcription architecture guide: the async transcription layer handles the hard part (accurate, speaker-attributed conversion of messy audio), and the LLM handles reasoning over clean structured text. Teams that skip the accurate transcription layer and feed poor-quality STT output into an LLM end up in what founders call "pilot purgatory," where demos work but production doesn't.
Efficiency: manual vs. AI scoring
Manual call review is slow and subjective. A rep listening to a 30-minute call, taking notes, and logging a CRM entry takes 15-20 minutes per call. Multiply that across your team at scale and the opportunity cost is significant, and scoring criteria vary by rep.
The automated pipeline on Gladia's Starter plan costs $0.61/hr for async transcription. For a 30-minute call (0.5 hours), that's $0.31 on the transcription side. Growth pricing drops async transcription to as low as $0.20/hr, bringing a 30-minute call to $0.10. Both plans include diarization, sentiment, and named entity recognition at no additional cost.
| Method |
Time per call |
Gladia cost per 30-min call |
Monthly cost (100 calls) |
| Manual SDR review |
15-20 min |
n/a |
SDR time at scale |
| AI pipeline (Starter) |
under 4 min |
$0.31 |
$31 |
| AI pipeline (Growth) |
under 4 min |
$0.10 |
$10 |
Beyond cost, standardized scoring eliminates the bus factor risk of having one senior SDR whose qualification criteria lives only in their head.
Automating call lead scoring workflow
We built a five-step pipeline. Audio goes in, a scored lead record comes out, and the only human review required is for calls where Claude returns a confidence score below your defined threshold. Here's the full architecture before we build each layer.
Step 1: Ingesting sales call audio
Your pipeline can ingest audio from three common sources: telephony platforms like Twilio (which Gladia integrates with directly via the Twilio integration docs, meeting recording tools like Recall or MeetingBaaS, or direct file uploads from your call storage bucket. For batch processing, a cloud storage bucket with event notifications that trigger your scoring service whenever a new recording lands is a common setup. Gladia's async API accepts most common audio formats including .wav, .mp3, .m4a, .flac, and .aac (see the supported formats page for the full list) up to 135 minutes and 1,000MB, so most telephony formats work without re-encoding.
Step 2: Accurate call transcripts via Gladia
We recommend async batch processing for post-call lead scoring because it analyzes the full recording before generating output. This full-context approach improves accuracy, speaker attribution consistency, and multilingual handling compared to streaming. Our async endpoint processes recordings significantly faster than real time, so a 30-minute sales call typically returns a complete, diarized transcript in well under a minute.
Our production model, Solaria-1, delivers on average 29% lower WER on conversational speech and on average 3x lower DER compared to alternative providers, per our published async benchmark, benchmarked across 7 datasets and 74+ hours of audio. For lead scoring, that accuracy gap compounds: a wrong name, a missed budget number, or a misattributed speaker corrupts the LLM's BANT analysis.
Step 3: Automate hot/warm/cold scoring
Hot, warm, and cold classifications map to BANT criteria scored from the transcript:
- Hot: Three or more BANT criteria confirmed. Budget allocated and specific, decision-maker on the call, explicit timeline ("by Q2," "in 30 days"), and a clearly articulated problem. Route immediately to an Account Executive.
- Warm: Two BANT criteria present. Budget likely exists but isn't approved, or the contact is an influencer rather than the final decision-maker. Schedule a follow-up and add to a nurture sequence.
- Cold: Fewer than two BANT criteria confirmed. Research-phase inquiry with no allocated budget, no urgency, and no identified authority. Long-term nurture and quarterly re-engagement.
Claude reads the diarized transcript JSON and applies these criteria, returning a structured JSON object with the score, confidence (0.0-1.0), BANT breakdown, identified pain points, and recommended next steps.
Step 4: Log all scoring events
Before pushing anything to the CRM, your pipeline should write the full scoring event to a database. You need the raw transcript JSON, Claude's complete reasoning, the final score, confidence, and timestamps stored together. This gives you an audit trail when a rep challenges a score and the dataset to refine Claude's prompt over time as you validate scores against actual closed deals.
Step 5: Automate lead scores to CRM
Once the score is stored, a webhook triggers a PATCH request to your CRM's contacts API. In HubSpot, this updates custom properties on the contact record and fires enrollment workflows tied to the ai_lead_score property. In Salesforce, the score maps to a Lead field and triggers assignment rules. The pipeline from audio upload to CRM update typically completes in a few minutes, depending on call length and CRM API response time.
Configure Gladia for accurate call transcripts
You can integrate Gladia's async transcription API in under 24 hours. Here's what to build first.
Generate your Gladia API key
Sign up for Gladia and create your API key from the dashboard. The Starter plan includes 10 free hours per month at $0.61/hr for async transcription after that, with all features included. If you're processing sensitive sales calls in production, move to Growth or Enterprise before launch, as those plans never use your audio for model training.
Get your first call transcript
Submit a call recording to the async endpoint with diarization, sentiment analysis, and named entity recognition enabled. All three are included in the base rate on Starter and Growth plans.
import requests
import time
def submit_call_to_gladia(audio_url: str, api_key: str) -> dict:
headers = {
"Authorization": api_key,
"Content-Type": "application/json"
}
payload = {
"audio_url": audio_url,
"diarization": True, # Speaker attribution (async only)
"detect_language": True, # Auto-detect language
"enable_code_switching": True, # Handle mid-call language switches
"sentiment_analysis": True, # Enable sentiment (must be explicitly set)
"named_entity_recognition": True
}
response = requests.post(
"https://api.gladia.io/v2/pre-recorded",
json=payload,
headers=headers
)
response.raise_for_status()
return response.json()
def poll_for_transcript(result_url: str, api_key: str) -> dict:
headers = {"Authorization": api_key}
while True:
response = requests.get(result_url, headers=headers)
data = response.json()
if data.get("status") == "done":
return data["result"]
elif data.get("status") == "error":
raise Exception(f"Gladia error: {data.get('error_message')}")
time.sleep(2)
You can also enable translation across 100+ supported languages in the same API call, so a single request handles a French-English code-switching conversation without any extra processing step.
Ensure speaker identification accuracy
We built this pipeline around diarization because it's the feature that makes LLM scoring reliable. Without it, Claude receives a wall of text with no attribution and cannot distinguish the prospect saying "we have $50K allocated" from the sales rep asking "what's your budget?" Our speaker diarization is powered by pyannoteAI's Precision-2 model and is available in async workflows only, returning speaker-labeled utterances with word-level timestamps.
The diarized output looks like this:
{
"transcription": [
{
"speaker": 1,
"text": "We have $50K budgeted for this quarter.",
"confidence": 0.97
},
{
"speaker": 2,
"text": "And you're the final decision-maker on this?",
"confidence": 0.98
}
]
}
Claude needs this structure to apply BANT criteria accurately. The Gladia x pyannoteAI webinar covers the technical approach to speaker separation in detail if you're evaluating diarization quality for your specific call format. Our Audio to LLM documentation shows you how to structure Gladia's JSON output as input to your LLM pipeline, including how to flatten the utterance array into a prompt-ready format.
Define Claude's lead scoring criteria
Claude needs explicit, data-driven criteria to produce consistent scores. Vague instructions produce inconsistent outputs. The system prompt below defines BANT thresholds, output format, and confidence calibration rules so the scoring logic is reproducible and auditable.
Define Claude's scoring logic
Before writing the prompt, define what each tier means in terms you can verify from transcript text:
- Budget: Confirmed dollar amount allocated this quarter or next. Phrases like "we have budget" without a number or timeline are warm indicators, not hot confirmations.
- Authority: The person on the call has final yes/no power. "I'll need to check with my manager" disqualifies authority regardless of how interested the prospect sounds.
- Need: A specific, named problem the prospect's current system isn't solving, with evidence of pain ("our team spends 3 hours a week on this manually").
- Timeline: An explicit purchase window. "Soon" and "probably Q3" are warm. "We need this live before our April launch" is hot.
HubSpot's lead scoring guide recommends building scores based on record actions and properties rather than rep intuition, which is exactly what this pipeline implements.
Build Claude's lead scoring prompt
Pass the diarized Gladia JSON directly into the system prompt alongside the scoring criteria. The prompt below returns structured JSON that maps cleanly to your database schema and CRM properties.
You are a sales analyst scoring leads from real sales call transcripts.
Classify prospects as "hot," "warm," or "cold" based on BANT criteria.
HOT: 3+ criteria confirmed. Budget stated, decision-maker present,
explicit timeline, articulated pain.
WARM: 2 criteria confirmed, or strong intent without timeline/budget specifics.
COLD: Fewer than 2 criteria, or research-phase inquiry only.
Identify which speaker is the prospect and which is the sales rep.
Only score signals from prospect utterances.
Return ONLY valid JSON:
{
"score": "hot|warm|cold",
"confidence": 0.0-1.0,
"bant_analysis": {
"budget": {"present": true/false, "evidence": "..."},
"authority": {"present": true/false, "evidence": "..."},
"need": {"present": true/false, "evidence": "..."},
"timeline": {"present": true/false, "evidence": "..."}
},
"pain_points": ["..."],
"buying_signals": ["..."],
"next_steps": "...",
"reasoning": "..."
}
Actionable confidence for lead scores
Ask Claude to return a confidence score between 0.0 and 1.0 alongside every hot/warm/cold classification. Set a threshold below which calls get flagged for human review rather than automatically routed; a reasonable starting point is 60–75% confidence for immediate sales attention. This catches edge cases where the transcript is ambiguous or the prospect's intent is genuinely unclear, and it gives your sales managers a queue of calls worth reviewing rather than a random sample.
Record lead scores for CRM integration
The middleware layer between Gladia's output and your CRM needs to store three things: the raw transcript, Claude's full reasoning, and the final score. Don't discard Claude's intermediate BANT analysis, because that's what your sales managers will use to coach reps and refine scoring criteria over time.
Postgres schema for lead scores
CREATE TABLE lead_scores (
score_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
call_id VARCHAR(255) NOT NULL,
contact_email VARCHAR(255) NOT NULL,
lead_score VARCHAR(10) NOT NULL CHECK (lead_score IN ('hot', 'warm', 'cold')),
confidence_score FLOAT8 NOT NULL CHECK (confidence_score BETWEEN 0 AND 1),
bant_budget BOOLEAN,
bant_authority BOOLEAN,
bant_need BOOLEAN,
bant_timeline BOOLEAN,
pain_points TEXT[],
buying_signals TEXT[],
claude_reasoning TEXT,
raw_transcript_json JSONB,
raw_claude_response JSONB,
crm_synced BOOLEAN DEFAULT FALSE,
crm_contact_id VARCHAR(255),
scored_at TIMESTAMP DEFAULT NOW()
);
When inserting Claude's response, parse the JSON arrays for pain_points and buying_signals into PostgreSQL array format using your database driver's array handling (e.g., psycopg2 in Python automatically converts Python lists to PostgreSQL arrays).
Storing raw_transcript_json and raw_claude_response as JSONB gives you queryable audit records. When a rep disputes a cold score, you can pull the exact transcript and Claude's reasoning without re-listening to the call.
Google Sheets setup for lead scores
For teams not yet running Postgres, Google Sheets with Zapier or Make works as a lightweight alternative. Use these column headers:
call_id | contact_email | lead_score | confidence | bant_budget | bant_authority | bant_need | bant_timeline | pain_points | buying_signals | next_steps | scored_at | crm_synced
Zapier's "Create Spreadsheet Row" action handles the write from a webhook payload quickly, letting you validate scoring quality for a few weeks before committing to a full database layer.
Push scored leads to CRM for fast follow-up
The final step converts a scored transcript into a triggered sales workflow. Automating the CRM push means hot leads reach an Account Executive immediately rather than sitting in a queue, which is where the pipeline delivers its fastest return.
Automate lead scoring in HubSpot
Create three custom contact properties in HubSpot Settings to store the scoring data: a lead score field (dropdown or text: hot/warm/cold), a confidence field (number), and a reasoning field (single-line text). Then use a PATCH request to update them after Claude scores the call.
def sync_score_to_hubspot(contact_email: str, claude_score: dict, hubspot_token: str):
update_payload = {
"properties": {
"ai_lead_score": claude_score["score"],
"ai_confidence": round(claude_score["confidence"], 2),
"ai_reasoning": claude_score["reasoning"][:500]
}
}
contact_lookup = requests.get(
"https://api.hubapi.com/crm/v3/objects/contacts",
headers={"Authorization": f"Bearer {hubspot_token}"},
params={"limit": 1, "filterGroups": [
{"filters": [{"propertyName": "email", "operator": "EQ", "value": contact_email}]}
]}
)
contact_id = contact_lookup.json()["results"][0]["id"]
requests.patch(
f"https://api.hubapi.com/crm/v3/objects/contacts/{contact_id}",
headers={"Authorization": f"Bearer {hubspot_token}", "Content-Type": "application/json"},
json=update_payload
)
Build a workflow in HubSpot Automation that enrolls contacts when ai_lead_score is known and re-enrolls on value changes. Add actions to assign to an Account Executive from a round-robin queue, send an internal Slack notification, and add the contact to your "Hot Leads AI Scored" list.
Routing scored leads in Salesforce
In Salesforce, map Claude's JSON output to a custom Lead field (picklist with Hot/Warm/Cold values) and create an assignment rule that routes Hot leads to your fastest-response queue. Add a time-based workflow to alert the assigned rep if no call activity is logged within 2 hours of receiving a hot lead assignment.
Set up custom CRM webhooks
Trigger the CRM push from your backend via a webhook fired when Gladia's polling endpoint returns status: done. This decouples the CRM sync from the transcription polling loop and lets you retry CRM failures independently without re-triggering the Gladia job. Your webhook payload should contain the identifiers and scored data needed for the CRM sync service to complete the push. Wrap every external API call in exponential backoff with jitter, retrying on 500-level errors and timeouts, and avoid retrying on 400, 401, or 403 responses since those are client errors retrying won't fix. Handle edge cases where Claude returns invalid JSON or incomplete analysis by logging the error and routing those calls to manual review.
Operationalizing your lead scoring model
Shipping the pipeline is day one. Day two is making it reliable at volume.
Optimizing WER on challenging audio
Real sales calls have background noise, accents, cross-talk, and code-switching. Solaria-1 delivers on average 29% lower WER on conversational speech versus competing APIs, per our published async benchmark, which matters when a prospect's French-accented English or a mid-call switch to Spanish needs to score cleanly.
Use Gladia's custom_vocabulary parameter to lock in product names, competitor names, and internal acronyms that generic STT models frequently misrecognize:
payload = {
"audio_url": audio_url,
"diarization": True,
"custom_vocabulary": ["Salesforce", "HubSpot", "MQL", "SDR", "PostgreSQL"]
}
The custom_vocabulary parameter helps Gladia recognize domain-specific terms that generic models often misrecognize. Check the custom vocabulary documentation for matching behavior and case sensitivity rules.
Without this, sales-specific acronyms may transcribe incorrectly, causing Claude to miss important signals entirely.
Managing costs for high-volume calls
Model costs before you scale. On Growth pricing at as low as $0.20/hr for async transcription, a 30-minute call costs $0.10 on the transcription side. At 10,000 calls per month totaling approximately 5,000 hours of audio, the Gladia transcription cost is $1,000/month on Growth, with all features included and no add-on fees. The per-hour billing model means costs scale linearly with audio volume, so your 3-year TCO model doesn't break when you grow, and there are no invoice surprises from per-feature billing.
Validating Claude for lead scoring
Run Claude's scores against your historical close data after the first month. Pull every call Claude scored as hot and check the close rate. If hot leads are closing well below expectations, your BANT thresholds are too loose. If almost every hot lead closes, you may be scoring too conservatively and leaving qualified warm leads in the wrong nurture sequence. Sales managers can use the stored Claude reasoning to identify which BANT signals are most predictive for your specific product and market, then update the system prompt accordingly.
Gladia maintains a 4.8/5 rating on G2. The full compliance picture, including SOC 2 Type II, HIPAA, ISO 27001, and data residency configuration, is documented at the Gladia compliance hub. Start with 10 free hours and have your integration in production in under a day.
FAQs
How is call data handled post-transcription?
On Gladia's Growth and Enterprise plans, your audio data is never used to train models, with no opt-out required and no enterprise clause to find. On the Starter plan, data can be used for model training by default, so move to Growth before processing sensitive sales conversations. Gladia is SOC 2 Type II, HIPAA, ISO 27001, and GDPR certified, with full details at the compliance hub.
How long does end-to-end processing take?
Gladia's async endpoint processes recordings significantly faster than real time, so a 30-minute call typically returns a transcript in well under a minute. Add Claude's API response time and the database write and CRM PATCH, and the full pipeline typically completes in a few minutes from audio upload to a triggered CRM workflow, depending on call length and system load.
Is diarization available for real-time transcription?
No. Gladia's speaker diarization, powered by pyannoteAI's Precision-2 model, is available in async (batch) workflows only. For post-call lead scoring, this is the correct architecture: async processing uses full audio context for higher diarization accuracy, which directly improves the quality of Claude's BANT analysis.
What happens if Claude returns a low-confidence score?
Set a confidence threshold below which calls are flagged for human review rather than automatically routed; a reasonable starting point is 60–75% confidence for immediate sales attention. Store every Claude response, including its full BANT breakdown, so reviewers have context to make a fast manual decision without re-listening to the call.
Key terms glossary
Word error rate (WER): The percentage of words in a transcript that differ from the correct transcription, measured as substitutions, deletions, and insertions divided by total words. Lower is better.
Diarization error rate (DER): The percentage of audio time incorrectly attributed to a speaker, covering missed speech, speaker confusion, and false alarms.
Diarization: The process of segmenting audio by speaker identity, answering "who spoke when" across a multi-speaker recording. Gladia's implementation is async-only.
Code-switching: When speakers alternate between two or more languages within a single conversation. Gladia's Solaria-1 model handles code-switching natively across 100+ languages without breaking the transcript into separate language sessions.