惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
博客园 - 聂微东
IT之家
IT之家
The Cloudflare Blog
L
LangChain Blog
Last Week in AI
Last Week in AI
T
Tailwind CSS Blog
P
Proofpoint News Feed
aimingoo的专栏
aimingoo的专栏
G
Google Developers Blog
T
The Blog of Author Tim Ferriss
博客园 - 叶小钗
I
Intezer
Martin Fowler
Martin Fowler
MongoDB | Blog
MongoDB | Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
T
ThreatConnect
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
小众软件
小众软件
T
The Exploit Database - CXSecurity.com
H
Help Net Security
T
Tenable Blog
WordPress大学
WordPress大学
F
Future of Privacy Forum
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
NISL@THU
NISL@THU
The Register - Security
The Register - Security
A
About on SuperTechFans
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
MyScale Blog
MyScale Blog
Malwarebytes
Malwarebytes
博客园_首页
T
Threatpost
C
CERT Recently Published Vulnerability Notes
Know Your Adversary
Know Your Adversary
T
Threat Research - Cisco Blogs
V
Vulnerabilities – Threatpost
C
CXSECURITY Database RSS Feed - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Recorded Future
Recorded Future
大猫的无限游戏
大猫的无限游戏
K
Kaspersky official blog
月光博客
月光博客
Jina AI
Jina AI
S
Securelist
Hugging Face - Blog
Hugging Face - Blog
G
GRAHAM CLULEY
腾讯CDC
S
Secure Thoughts
V
V2EX - 技术

DEV Community

Building a Multi-Channel Content Syndication Pipeline with EmDash Plugins Which package is bloating your Docker image? Putting Claude Code Under Version Control: Configs Since July, Memory Since April What I Thought DevRel Was vs. What It Actually Is (A Mentee's Honest Take) Reviving My Linux Mastery Game from a Merge Conflict — A Finish-Up-A-Thon Comeback 400 Million Tokens Burned Overnight Don’t let AI break your collective thinking: a practical guide for engineering teams First Gemma 4 ExecuTorch Deployment on Raspberry Pi 5 — and Why It's 7.7 Slower Than llama.cpp Per-Turn Evaluation: Dynamic Governance for AI Agents The AI Triforce of seed4j: Power, Wisdom, and Courage for Your Dev Agent Your AI agent reports 80% task completion. It fabricated it. Pourquoi les overlays d'accessibilité ne tiennent pas leurs promesses (et ce que la FTC vient d'acter) AI May Break Product-Market Fit in Enterprise Software I’m Building Around the Gap Between AI Output and Repo Truth How to Build a Stripe Customer Portal in Next.js SaaS On-Demand Pricing Feels Safe - Until You See the Bill Building an Internal Developer Portal with Backstage A Production Deployment Guide After the Last Song Sudoers Configuration in Linux Terraform + Terragrunt + Ansible: A Hands-On Learning Journey Switching Users in Linux (su, sudo) AI 智能体的鲁莽速度 Quick Win Card #01 — Ton backlog.md t'a menti (la cure en 30 secondes) Quick Win Card #01 — Your backlog.md lied to you (a 30-second cure) How to Manage an IT Team: Structure, Scaling, and Daily Workflows That Work Speccing Is the New Coding CAC 250만 원을 뚫기 위해 퍼널 세 곳을 뜯어고친 3개월 Creating My First Token on Solana Devnet as a Web2 Developer Five Salesforce Reports Every Nonprofit Leadership Team Should Have Beyond the West: What Eastern AI Models Mean for Enterprises, Developers, and Digital Sovereignty Class and Pseudo Class Git & GitLab Basics 고객은 우리를 사기꾼으로 봤다: 아무도 믿지 않는 신사업을 단 둘이서 검증한 3개월 Cron Not Working on Mac? How to Fix the macOS Sleep Trap with launchd Cache Everything: Advanced Caching Strategies in Vue 3 & Nuxt 4 Deploy a Node.js App to STACKIT Kubernetes Engine With Managed Redis & PostgreSQL Slopsquatting & Remote Prompts: Why I Built a 38,000 Ticker Engine with Zero NPM Dependencies 05/20: TCP/IP vs OSI Model: The Ultimate Comparison My New Adventures in IT # Mitigating Market Inefficiency in eSports: A Stochastic Approach to EA Sports FC25 Modeling Don't let a billion RAG docs drown your 25-result pipeline Experienced devs are slower with AI tools. Nobody wants to admit it. I built an MCP-native OSINT framework that lets AI agents investigate from your terminal AWS Nitro Enclaves vs Intel TDX: Why Attestation Root Matters for Regulated Workloads Vibe Coding: Revolution or Risk in Software Development? - SmarterArticles S1E6 JSON Schema Explained: Validate Your API Data Before It Breaks Production Harness Tells Your Agent What to Do. GUI Agents Let It Actually Do It. Is AI actually replacing developers? Customizing Docker Images: Write Your First Dockerfile (2026) €40 n8n vs 28% weekly Anthropic quota. Which /goal layer should you actually run? Reviving glyph-v8: From a Forgotten Prototype to STRIDE - a Field-Aware Integer Coder 04/20: Data Encapsulation: How a Message Becomes Bits on the Wire Hướng Dẫn Thiết Lập Reasoning Proxy DeepSeek V4-Pro với Cursor (2026) Sofi Log #012: Agentic GDP — Solana Pay.sh & x402 Protocol Spec Input Types, Attributes, Self-Closing Tags, Hover Effect Absolute vs Relative Paths File Types (Regular, Directory, Link, Device, Socket, Pipe) From Arduino IDE to AVR GCC | AVR Bare Metal #1 Using Bitcoin as collateral without wrapping it: the design of a BTC collateral vault Unreal Engine 5 Skill System Architecture using GAS and GameplayTags 5 Things I Wish I Knew Before Building with Hermes Agent Thoughts on Codingame 2026 Spring challenge OUT WITH THE OLD IN WITH THE NEW Why are simple 1099 tax calculators online so horribly bloated? So I built my own "Why You're Not Getting Callbacks (It's Not Your Skills)" # How I Built a Retail Demand Forecasting App with Python and Streamlit Why We Deliberately Crush Lithium Batteries (UN38.3 Crush Testing Explained) Command History & Completion The Three-Body Problem: AI Code, Supply Chain Attacks, and the Talent Exodus 로컬 LLM 셋업 가이드 (v27) Building Better .NET Worker Services with Cursor Rules Generate Professional PDF Invoices via REST API — JSON In, PDF Out Redis: Big Keys Destroem o Desempenho Compartilhado Agentic AI for Cybersecurity: Autonomous Threat Detection and Response How to Automate Android Without Appium Cron vs systemd daemon: which one for Node.js? Designing XSLT transforms with parameters and multiple inputs I Downloaded Gemma4:e2b On My Macbook in 2 steps Building an Autonomous SRE Agent: From Raw Telemetry to Safe, AI-Driven Remediation The EU AI Act in 2026: Reading the Law After the Omnibus I had zero coding knowledge. Here is "RetroTube", a 2010 YouTube sandbox prototype I built using AI! How to Validate Environment Variables in TypeScript (and Why You Should) I Built a CLI Tool That Writes Better Git Commits Than I Do Transfer Fees, Metadata, and Soulbound Tokens: My First Real Token Experiments on Solana Stop Using Fetch() in React: A Better Way To Call Your Backend Creando un Tetris con JavaScript VI: Complicando el juego. DeepSeek's API Price Cut Changed My Claude Code and ChatGPT Math [Boost] Perl 🐪 Weekly #774 - Perl is too HOT How to Track AI Usage Without Losing Revenue (Complete Guide) 77 Rules Later: What Graduating Our First Stack Actually Looked Like RAG 시스템 실전 구축 (v26) When Premature Scaling Leads to Operator Burnout Multi-Repo Microservice Changes Are a Coordination Problem. I Solved It With AI Agent Teams. The Next Frontier: How Multi-Agent Systems are Redefining Productivity The Kimwolf Bust Just Outed Android Webcams as Botnet Fodder — Here's the Question Every Repurposed-Phone Camera Setup Has to Answer I'm an autonomous AI agent. I shipped 18 fixes to myself in one session. Building a Secure Future with Zero Trust Security Architecture Asynchronous Functions in Dart How I migrated magic-link login from Resend to AWS SES + Lambda five days before launch
Turn Your Phone Into Voice Input for Any React Text Field
Gabor Tatar · 2026-05-25 · via DEV Community

Every time I needed voice input in a React app, I ended up wiring it from scratch (via agent). Web Speech API setup, browser inconsistencies, a relay server for the phone-to-desktop connection, later QR pairing, Chrome killing recognition mid-sentence, partial vs. final transcript logic. A day of annoying plumbing before you get to the actual feature.

There was never a ready-made solution for this. So I built one. Install it, add three files, and you have voice input that works — without the day of debugging browser quirks.

Voicefield — one hook, any text field, your phone as the mic. No audio leaves the device, no API keys to start. The phone page at voicefield.dev is a static SPA you can use as-is if you don't want to build your own frontend — it's open source, no data passes through it, and no audio or text is stored or logged.

How it works

  1. Your desktop app shows a QR code
  2. User scans it with their phone
  3. Phone runs speech-to-text locally (Web Speech API, no key needed)
  4. Only the transcribed text gets relayed to the desktop
  5. The desktop app streams the transcript directly into whichever input field currently has focus

Audio never leaves the phone. Your server never sees or stores any audio data. It only relays text.

The architecture

Phone (STT)              Your Server             Desktop Browser
+-----------+  text only +--------------+  SSE   +--------------+
| Web Speech| ---------> | Relay        | -----> | useVoicefield|
| API       |  POST /txt | (in-memory   | stream | () hook      |
| (browser) |            |  sessions)   |        |              |
+-----------+            +--------------+        +--------------+
      ^                         ^                       |
      |        QR scan          |    creates session    |
      +-------------------------+-----------------------+

Enter fullscreen mode Exit fullscreen mode

The phone and desktop find each other through cryptographic pairing — a 256-bit secret is embedded in the QR code, and the phone gets a 384-bit session token after pairing. Sessions live in memory with a 30-minute sliding TTL. No database needed.

Speech recognition defaults to the browser's built-in Web Speech API, which means zero API keys to get started. If you need better accuracy or more languages, you can plug in Soniox — the hook abstracts over the provider.

3-file integration

Voicefield integration in a Next.js app boils down to three files.

1. API routeapp/api/voice/[...voicefield]/route.ts

import { createVoicefieldHandler } from "@voicefield/server"

const { GET, POST, OPTIONS } = createVoicefieldHandler({
  cors: { origins: ["*"] },
})

export { GET, POST, OPTIONS }

Enter fullscreen mode Exit fullscreen mode

That's your relay server. It handles session creation, pairing, transcript forwarding, and SSE streaming.

2. Phone pageapp/mic/page.tsx

"use client"
export { Mic as default } from "@voicefield/react/phone"

Enter fullscreen mode Exit fullscreen mode

This is the page the phone loads after scanning the QR code. It handles microphone access, STT, and sending transcripts.

3. Your component — wherever you want voice input

import { useVoicefield, QRPopup } from "@voicefield/react"
import { useRef } from "react"

function SearchBar() {
  const inputRef = useRef<HTMLInputElement>(null)

  const vf = useVoicefield({
    serverUrl: "/api/voice",
    language: "en",
  })

  vf.register("search", "Search", inputRef)

  return (
    <>
      <input ref={inputRef} placeholder="Search..." />
      <button onClick={() => vf.showQR()}>Pair phone</button>
      <QRPopup
        pairingCode={vf.pairingCode}
        secret={vf.secret}
        serverUrl={vf.serverUrl}
        phoneUrl={vf.phoneUrl}
        isVisible={vf.isQRVisible}
        onClose={vf.hideQR}
      />
    </>
  )
}

Enter fullscreen mode Exit fullscreen mode

Register fields, switch between them on focus, done.

Why this matters for privacy

Most voice-to-text solutions work like this: capture audio, send it to a server, get text back. That means someone's server has a recording of everything your user said.

Voicefield flips it. The Web Speech API runs entirely in the phone's browser. The relay server only ever sees the resulting text — short strings like "John Smith" or "I'd like to schedule a demo." No audio buffers, no recordings, no stored voice data.

This matters for medical forms, legal intake, financial applications — anywhere users are dictating sensitive information.

Try it

Voicefield is MIT licensed, works with Next.js App Router, and doesn't require any API keys to get started.

Install it, add three files, scan a QR code, and your forms suddenly support voice input.

npm install @voicefield/react @voicefield/server

Enter fullscreen mode Exit fullscreen mode

Repo: github.com/tatargabor/voicefield
Docs: voicefield.dev

If you build something with it, I'd genuinely love to hear about it.