惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Privacy International News Feed
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Jina AI
Jina AI
T
Tailwind CSS Blog
WordPress大学
WordPress大学
Scott Helme
Scott Helme
C
Cybersecurity and Infrastructure Security Agency CISA
博客园 - Franky
C
CERT Recently Published Vulnerability Notes
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
雷峰网
雷峰网
Schneier on Security
Schneier on Security
博客园 - 聂微东
T
Tor Project blog
Hugging Face - Blog
Hugging Face - Blog
博客园 - 司徒正美
AI
AI
T
Troy Hunt's Blog
Security Latest
Security Latest
T
The Blog of Author Tim Ferriss
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Check Point Blog
T
Threat Research - Cisco Blogs
W
WeLiveSecurity
V
Vulnerabilities – Threatpost
Recorded Future
Recorded Future
Recent Commits to openclaw:main
Recent Commits to openclaw:main
Cisco Talos Blog
Cisco Talos Blog
C
CXSECURITY Database RSS Feed - CXSecurity.com
Cloudbric
Cloudbric
J
Java Code Geeks
罗磊的独立博客
C
Cyber Attacks, Cyber Crime and Cyber Security
aimingoo的专栏
aimingoo的专栏
L
LangChain Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy & Cybersecurity Law Blog
Google DeepMind News
Google DeepMind News
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
L
Lohrmann on Cybersecurity
I
InfoQ
MongoDB | Blog
MongoDB | Blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The GitHub Blog
The GitHub Blog
The Hacker News
The Hacker News
H
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
P
Proofpoint News Feed
N
News and Events Feed by Topic

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了,但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程,有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now
Build a voice agent that can make outbound calls with AssemblyAI
Mart Schweig · 2026-05-08 · via DEV Community

Why outbound voice agents matter

A voice agent that can dial out, not just answer, unlocks workflows that text channels drop the ball on:

Use case What the agent does Why outbound beats inbound
Appointment reminders Calls the patient 24 h before, confirms or reschedules Reaches people who never read the SMS
Lead qualification Calls a fresh inbound lead, qualifies, books with sales Engages while interest is still hot
Survey + NPS Reads the prompt, captures freeform answers Higher response rate than email
Past-due collections Calls account, takes payment via tool call Lower agent cost than a human dialer
Recall and renewal Notifies of a recall, prescription refill, or expiring policy Cuts through inbox noise
Customer winback Reaches lapsed customers with a personalized offer More personal than a marketing email

In every case the win is the same: the agent reaches the customer through the channel they actually pick up, holds a real conversation, and writes the outcome to your system of record.

Architecture

The system has three components connected by two WebSockets:

Parameter Type Description
vad_threshold 0.0–1.0 Voice activity detection sensitivity. Raise for noisy phone lines.
min_silence ms Minimum silence before the end-of-turn check fires. Raise for deliberate speech.
max_silence ms Hard cap on silence before forcing end-of-turn.
interrupt_response boolean Set to false to disable barge-in entirely.

The key insight: both legs use audio/pcmu (G.711 μ-law at 8 kHz). Twilio Media Streams already deliver base64-encoded μ-law audio, and the Voice Agent API accepts and emits the same format natively. That means zero resampling — bytes pass through end-to-end.

Prerequisites

  • Node.js 18+ and npm
  • An AssemblyAI API key — free tier available
  • A Twilio account plus a voice-capable phone number in your console
  • ngrok (or any public HTTPS tunnel) so Twilio can reach your dev machine

Consent matters. Automated outbound calls are regulated almost everywhere — TCPA in the US, the various state DNC registries, GDPR in the EU, two-party-consent rules for recording, and more. Disclose that the call is automated in the opener, honor “remove me from the list” requests, and consult counsel before dialing real prospects.

Quick start

1. Clone and install

 git clone https://github.com/kelsey-aai/voice-agent-outbound-calls
cd voice-agent-outbound-calls
npm install

Enter fullscreen mode Exit fullscreen mode

2. Configure your environment

 cp .env.example .env
# Fill in:
#   ASSEMBLYAI_API_KEY     — from the AssemblyAI dashboard
#   TWILIO_ACCOUNT_SID     — from console.twilio.com
#   TWILIO_AUTH_TOKEN      — from console.twilio.com
#   TWILIO_FROM_NUMBER     — your Twilio voice number, e.g. +15551234567
#   PUBLIC_URL             — leave blank for now; we'll fill it after ngrok

Enter fullscreen mode Exit fullscreen mode

3. Run the server

 npm start
# → Listening on http://localhost:3000

Enter fullscreen mode Exit fullscreen mode

4. Expose it with ngrok

In a second terminal:

ngrok http 3000

Enter fullscreen mode Exit fullscreen mode

Copy the HTTPS forwarding URL (e.g. https://ab12cd34.ngrok-free.app) and paste it into .env as PUBLIC_URL. Restart npm start.

5. Place a call

 curl -X POST http://localhost:3000/call \
  -H 'content-type: application/json' \
  -d '{"to":"+15551234567"}''

Enter fullscreen mode Exit fullscreen mode

Use your own phone number for the first call so you become the prospect. The phone rings, the agent greets you with the disclosure, and you can talk to it like a human.

How it works

1. Place the call

POST /call receives a JSON body with the target number and asks Twilio to dial it. Twilio's Calls API does the actual dialing and, when the recipient picks up, fetches the URL we passed as url: to get TwiML instructions for the call.

const call = await twilioClient.calls.create({
  to,
  from: TWILIO_FROM_NUMBER,
  url: `${PUBLIC_URL}/twiml`,
});

Enter fullscreen mode Exit fullscreen mode

2. Return TwiML that opens a media stream

When Twilio fetches /twiml, the server returns a tiny piece of XML that wraps the live call in a verb. That verb tells Twilio to open a WebSocket back to our server and pipe the call audio over it.

app.post("/twiml", (_req, res) => {
  const wsUrl = PUBLIC_URL.replace(/^http/, "ws") + "/twilio-stream";
  res.type("text/xml").send(`<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <Connect>
    <Stream url="${wsUrl}" />
  </Connect>
</Response>`);
});

Enter fullscreen mode Exit fullscreen mode

3. Bridge two WebSockets

When Twilio connects to /twilio-stream, we open a second WebSocket to AssemblyAI and shuttle messages between them. The first message we send to AssemblyAI is session.update — it configures the agent's personality, voice, and audio formats.

aaiWs.send(JSON.stringify({
  type: "session.update",
  session: {
    system_prompt: SYSTEM_PROMPT,
    greeting: GREETING,
    input:  { format: { encoding: "audio/pcmu" } },
    output: { voice: "ivy", format: { encoding: "audio/pcmu" } },
  },
}));

Enter fullscreen mode Exit fullscreen mode

Both formats are audio/pcmu. Twilio Media Streams already deliver base64-encoded μ-law 8 kHz audio. AssemblyAI accepts that format natively and can emit it back, which means we never decode, resample, or re-encode any audio in this server.

Greeting is set in session.update. Outbound calls need the agent to speak first — the prospect has no idea why their phone is ringing. Setting session.greeting makes the agent open the conversation as soon as the session is ready.

4. Forward audio in both directions

The Twilio side emits connected, start, media, and stop events. We capture streamSid from start, forward media payloads to AssemblyAI as input.audio events, and close the AAI socket on stop.

case "media": {
  const payload = msg.media.payload;  // already base64 μ-law 8 kHz
  aaiWs.send(JSON.stringify({ type: "input.audio", audio: payload }));
  break;
}

Enter fullscreen mode Exit fullscreen mode

Each reply.audio chunk from AssemblyAI is base64 μ-law that we wrap in a Twilio media event and ship straight back to the call:

case "reply.audio":
  twilioWs.send(JSON.stringify({
    event: "media",
    streamSid,
    media: { payload: evt.data },
  }));
  break;

Enter fullscreen mode Exit fullscreen mode

5. Handle barge-in cleanly

When the user speaks while the agent is talking, AssemblyAI emits reply.done with status: "interrupted". On a phone call we also need to flush whatever audio Twilio still has buffered. Twilio supports a clear event for exactly this:

case "reply.done":
  if (evt.status === "interrupted" && streamSid) {
    twilioWs.send(JSON.stringify({ event: "clear", streamSid }));
  }
  break;

Enter fullscreen mode Exit fullscreen mode

6. Echo cancellation is the carrier's job

On a phone call you don't have to think about acoustic echo cancellation — the carrier and the handset handle it. That's a meaningful difference from terminal-based clients, which need headphones to keep the agent from interrupting itself.

Tuning the agent

Voice

Drop any voice ID from the Voices catalog into session.output.voice. Eighteen English voices and 16 multilingual voices are available; multilingual voices code-switch with English automatically.

output: { voice: "james",  format: { encoding: "audio/pcmu" } }
output: { voice: "sophie", format: { encoding: "audio/pcmu" } }
output: { voice: "diego",  format: { encoding: "audio/pcmu" } }

Enter fullscreen mode Exit fullscreen mode

System prompt and greeting

Both live near the top of server.js. Keep them short — phone-call replies should be one or two sentences. Always disclose that the call is automated in the first sentence; several US states require it.

Turn detection

Outbound calls often run on noisier lines than browser-based agents. The defaults in server.js are tuned a little tighter:

turn_detection: {
  vad_threshold: 0.5,        // 0.0–1.0; raise for noisy lines
  min_silence: 400,          // ms; raise for deliberate speech
  max_silence: 1200,         // ms; max wait before forcing end-of-turn
  interrupt_response: true,  // false to disable barge-in
}

Enter fullscreen mode Exit fullscreen mode

See the session configuration reference for every available knob.

Recording, machine detection, and time limits

Twilio's Calls API takes optional flags that you almost certainly want in production:

const call = await twilioClient.calls.create({
  to,
  from: TWILIO_FROM_NUMBER,
  url: `${PUBLIC_URL}/twiml`,
  record: true,
  machineDetection: "Enable",
  timeLimit: 600,  // hard cap in seconds
});

Enter fullscreen mode Exit fullscreen mode

record: true saves the call to Twilio's media store. machineDetection: "Enable" lets you branch on voicemail vs. live human. timeLimit puts a ceiling on a single call so a stuck LLM can't burn budget.

Tools (Function Calling)

Once the conversation works, add tools to let the agent do things — book a meeting, look up an account, mark the lead as DNC. Tools register on the same session.update you already send. The full pattern is covered in the tool-calling guide.

Troubleshooting

The phone rings but the call drops immediately. Check the Twilio console call log. Most often it's a TwiML fetch failure — Twilio couldn't reach PUBLIC_URL/twiml because ngrok died, the URL still says localhost, or the protocol is http:// instead of https://.

Twilio connects but the agent never speaks. Look for [aai] session.ready in your server logs. If you see UNAUTHORIZED, your AssemblyAI key is wrong. If you see no AAI logs at all, your environment variables aren't loaded — confirm .env is next to server.js.

The agent's voice sounds chipmunky or muffled. Both session.input.format.encoding and session.output.format.encoding must be audio/pcmu. If either is left at the default audio/pcm (24 kHz), the formats won't match and Twilio will play the audio at the wrong rate.

The agent keeps talking over me after I interrupt. Make sure you forward the clear event to Twilio when you receive reply.done with status: "interrupted". Without it, Twilio plays out the rest of its buffered audio.

Twilio trial accounts only call verified numbers. That's a Twilio limitation, not a bug in this code. Verify the recipient number in the Twilio console, or upgrade the account.

The full troubleshooting guide is in the Voice Agent API docs.

Frequently asked questions

How do I make a voice agent that places outbound phone calls?

Use Twilio's Calls API to dial the target number and pass it TwiML that opens a to your server. On your server, accept the resulting Media Streams WebSocket and bridge it to AssemblyAI's Voice Agent API. Configure both session.input.format.encoding and session.output.format.encoding as audio/pcmu so Twilio's μ-law 8 kHz audio passes through without resampling.

What audio format should I use for a Twilio voice agent?

Use audio/pcmu (G.711 μ-law, 8 kHz) on both the input and output of the Voice Agent API. Twilio Media Streams emit base64-encoded μ-law 8 kHz audio natively, and the Voice Agent API accepts and emits the same format. That means no decoding, no resampling, and no re-encoding.

How does the Voice Agent API handle barge-in over a phone call?

When the user speaks while the agent is talking, the Voice Agent API emits reply.done with status: "interrupted". On a Twilio call you also need to flush Twilio's outbound buffer by sending {event: "clear", streamSid} over the Media Streams WebSocket.

Do I need separate STT, LLM, and TTS for an outbound voice agent?

No. The AssemblyAI Voice Agent API bundles speech recognition, the language model, and text-to-speech behind a single WebSocket. You stream telephony audio in and get the agent's spoken audio back, with neural turn detection, barge-in, and tool calling built in.

How do I authenticate from a Node.js server?

Pass your AssemblyAI API key as a Bearer token in the Authorization HTTP header on the WebSocket upgrade request: new WebSocket(url, { headers: { Authorization: "Bearer YOUR_API_KEY" } }).

Is it legal to call prospects with an AI voice agent?

It depends on jurisdiction and use case. In the US, the TCPA and state DNC registries restrict automated calls. Several states require AI disclosure in the opener. The EU's GDPR and ePrivacy rules add their own requirements. Disclose that the call is automated, honor opt-out requests, and consult counsel before dialing real prospects.

How much does it cost?

AssemblyAI offers a free tier so you can prototype without a credit card. For current pricing, see the AssemblyAI pricing page. Twilio bills separately for outbound minutes and the phone number itself.