惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
量子位
T
Threatpost
V
Vulnerabilities – Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
宝玉的分享
宝玉的分享
腾讯CDC
G
Google Developers Blog
aimingoo的专栏
aimingoo的专栏
Cyberwarzone
Cyberwarzone
有赞技术团队
有赞技术团队
S
SegmentFault 最新的问题
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
U
Unit 42
雷峰网
雷峰网
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
小众软件
小众软件
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
博客园 - 三生石上(FineUI控件)
美团技术团队
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

Webflow SEO Implementation 로컬 LLM 셋업 가이드 (v21) 𝗦𝘁𝗼𝗽 𝗖𝗿𝗮𝗺𝗺𝗶𝗻𝗴 𝗙𝗼𝗿 𝗘𝘅𝗮𝗺𝘀, 𝗦𝘁𝗮𝗿𝘁 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗥𝗲𝗮𝗹 𝗦𝗸𝗶𝗹𝗹𝘀 How to Use EXPLAIN ANALYZE in PostgreSQL: A Visual Guide gRPC Performance: tonic (Rust) vs grpc-go Benchmarked at Scale Visual Search Optimization studygemma: AI study buddy for CS students Architectural Tradeoffs in Webhook Idempotency and SaaS API Versioning One Open Source Project a Day (No. 75): Understand Anything - The AI Engine That Turns Any Codebase Into an Explorable Knowledge Graph From mock-only-works to real-world-works: 48 hours of reCAPTCHA debugging AI Talking Avatar Pipelines Broke Our Ad CTR by 3.7% 800G to 400G Breakout: How to Scale 400G Networks with 800G Ports 터미널 AI 에이전트 구축 (v20) Topical Authority Architecture Inside Hermes Agent's Session Memory: What X-Hermes-Session-Id Actually Does How Logs Travel From Your EKS Pod to Datadog The Hidden Journey Inside / Kubernetes Is it safe to connect my bank account to AI? No Room — The World of Aying (8/12) Fossils — The World of Aying (10/12) Familiar Stranger — The World of Aying (9/12) Being Seen — The World of Aying (7/12) [I Ran an AI Agent for 30 Days Straight — Here's the Boring Engineering That Made It Work] Gemma 4: The 128K Multimodal Powerhouse in Your Terminal How to Consolidate Your QA Toolstack: A Practical Buyer's Guide The Thank-You Email Almost Nobody Sends (And Why That's Your Edge) Schema Types 2026 Idempotency Keys: The API Safety Net You're Probably Not Using How to let Claude see my Plaid bank data Kiro Did It: Build a Simple Portfolio Website with Kiro IDE | From Prompt to HTML Prototype Islands of Commerce: What Marketplace Founders Can Learn from 60 Years of Island Biogeography React Pointer Hooks: Hover, Long-Press, Double-Click, Scratch, and Click-Outside Without the Bugs Engineering decisions for my video call tool VBScript Still Lives: How a Custom Go VM Brought Classic ASP to Linux and Mac What Happens When You Teach Old Scripting Languages New Runtime Tricks? I Tested 6 AI Coding Assistants for a Month. Here's What Actually Works. Extendscript Still Has Life Afriex Webhook Integration Guide: Signature Verification, Event Handling, and Production Best Practices The Blind Alleys of Veltrix Configuration How an ESP32 Turned a LEGO WALL-E Into a Real Working Robot The Flawed Promise of Real-Time Event Handling SSH Login Taking Forever? Check Your DNS Settings Found 897 Fake Followers on DEV.to. Here's How I Proved It. Retry logic, Kafka consumer lag, and the hidden failure pattern that Kubernetes won’t catch WebMCP Might Be the Most Important Announcement at Google I/O 2026 Build a Secure API with Rails 8 - Part-3: Auth Controllers I A/B tested 4 LLMs on the same 500 queries. The results surprised me. Google I/O 2026’s Smartest Developer Release Wasn’t a Model, It Was the Runtime - Managed Agents in Gemini API OSS Monthly Recap: What My Daily Commit Challenge Taught Me About Open Source “Culture” GemmaNotes Cognitive Debt: AI Is Building Your Systems. Do You Actually Understand Them? GeekNews Frontend Weekly Deep Dive - 2026-05-25 I Built a Universal Silicon Loader That Runs on Any SOC (No Bootrom Exploit) Docker容器化部署Node.js应用最佳实践 I Put a Neural Network in a Thermometer — Then It Got Out of Hand Building MGZon: Developer Portfolio + AI Bot + Social Network (9 min demo) Bearing Life (L10): What the Catalog Number Really Tells You Longhorn Volume Health: The Gap Between 'Healthy' and Actually Working Stop Prompting. Start Specifying: How Spec-Driven Development Fixes AI Coding TIL a PowerPoint file is just a zip — so I converted .pptx to Word entirely in the browser 로컬 LLM 셋업 가이드 (v18) Cx Dev Log — 2026-04-24 github's agent audit api is the boring feature that matters # From Teaching Code to Building Real-World Applications Vivado 2026.1 and Linux: why this decision matters beyond the headline Vivado 2026.1 y Linux: por qué la decisión importa más allá del titular ORA-00206 오류 원인과 해결 방법 완벽 가이드 Entidades finas e composição: o design que escolhi para a nova plataforma 10 Open Source Tools Every Developer Should Know 🔥 SSH Config File Mastery: Turning `~/.ssh/config` Into a Productivity Tool I tried to create a programming language... in python I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary I Turned npm outdated into a CI Gate — Here's How Don't fall for the Claude Mythos hype Vestige: A Gemma 4 Brain Tracker That Won't Blow Smoke Up Your Ass Gemminate: Transforming Static Textbooks into Interactive Learning Journeys with Gemma 4 Where Did All the Code Playgrounds Go? I built PROOFER - Privacy first Chrome extension that proofreads your texts using Gemma 4 I Automated My Entire Digital Product Business on a $13/Month GCP VM. Here's the Architecture. Beginner's Mind in Engineering and AI How I use AI agents to turn ideas into public demos I Built a Quotation Generator for Kenyan Street Welders Using Gemma 4's Vision The Math Behind Neural Networks — Explained Like Nobody Did for Me 🧨 Understanding TPC with IEEE802.11h What I’m Starting to Look for in Engineers An npm Downloads Comparison Chart in 300 Lines of Vanilla JS — Nice-Tick Math and API-Direct Fetch Vitreus: Local-First Spreadsheet Intelligence with Gemma 4 Transfer Fees, Metadata, and Soulbound Tokens: A Tour of Solana Token Extensions I got tired of re-explaining my codebase to ChatGPT — so I built a VS Code extension Revisiting My Phone AI After Gemma 4: The Upgrade I Didn't Know I Needed I built a privacy-first PDF merger in 7 hours — here's the stack and the lessons Google I/O 2026 made me ask an uncomfortable question: are we still coding, or are we managing builders? SSR with JavaScript: Escaping Node.js Clunkiness with AxonASP My CKA Exam-Day Experience: What Went Right, What Went Wrong, and Lessons Learned Gemma 4 Soft Tokens: The Rise and Fall of 16x16 Words ⚡👀 Two weeks ago, I built a private AI brain on my phone using Gemma 4. Yesterday, Google dropped a new variant that made everything I built feel like a beta test. 256M parameters. MoE architecture. Apache 2.0 license. I broke down what changed and why it mat I got tired of clicking through the Stripe dashboard, so I built a CLI Getting Data from Multiple Sources in Power BI: A Practical Guide to Modern Data Integration Google Is No Longer Just a Search Engine I built GemmaPod - A truly composable and portable AI agent solution powered by your local LLM Gemma 4 E4B caught three planted fabrications in 50 seconds — on a laptop, no cloud
I built a free music tool
tagmybeat · 2026-05-25 · via DEV Community

Building TagMyBeat: An AI Producer Tag Generator and a Browser-Based BPM & Key Finder That Never Uploads Your Audio

If you produce music, you know the drill: you finish a beat, export it, and then need a vocal tag to brand it. You also need to know the BPM and key of every sample in your library so everything fits together. These are two different problems, but they share the same DNA: audio processing, fast turnaround, and privacy for unreleased material.

I built TagMyBeat to solve both, and in this post I want to walk through how they work under the hood.


What Is TagMyBeat?

TagMyBeat is two things:

  1. A Producer Tag Generator — type text, pick a voice and an effect preset, and get a professional vocal tag (think "DJ Khaled!" but yours).
  2. A Free BPM and Key Finder — drop in an MP3, WAV, or FLAC, and get tempo, musical key, and Camelot code back, entirely in your browser. No upload. No server.

The tag generator is the main product (3 free generations per day, no login). The BPM/Key Finder is a free engineering-as-marketing tool that brings producers into the ecosystem. Both are open about how they work, so let's get into the tech.


Tech Stack at a Glance

Layer Technology
Frontend Next.js 16 (App Router), React 19, TypeScript, Tailwind CSS 4, Zustand 5
Backend FastAPI (Python), FFmpeg
Auth Browser fingerprinting (FingerprintJS) — no accounts, no passwords
Audio Analysis (Client) essentia.js (WASM-compiled C++), Web Workers, Web Audio API
Deployment Cloudflare (frontend), VPS (backend)

Part 1: How the Producer Tag Generator Works

The generation pipeline is a classic producer-consumer pattern:

User types text → quota check → task queue (asyncio.Queue) → worker pool (8 workers)
→ TTS generates speech → FFmpeg applies audio effects → file saved → frontend polls status

Enter fullscreen mode Exit fullscreen mode

Audio Effects with FFmpeg

After the voice is generated, FFmpeg steps in. We have 5 built-in effect presets:

  • Clean — light compression and normalization
  • Hype — pitch shift up, short reverb, stereo widening
  • Chill — pitch shift down, large reverb, warm EQ
  • Cinematic — deep reverb, delay, dramatic EQ
  • Retro — bitcrushing, tape saturation, vinyl noise

Each preset is a chain of FFmpeg audio filters (atempo, aecho, equalizer, etc.) composed programmatically. Advanced mode lets you dial in 6 custom parameters: stutter speed, reverb size, reverb wet, delay time, delay feedback, and pitch shift.


Part 2: Deep Dive — How the BPM and Key Finder Works (Entirely Client-Side)

This is the part I'm most excited about. The BPM/Key Finder runs 100% in the browser using WebAssembly and Web Workers. Your audio files never leave your device.

The Core Engine: essentia.js

Under the hood, we use essentia.js, a WebAssembly port of Essentia, an open-source C++ library for audio analysis developed by the Music Technology Group at Universitat Pompeu Fabra. Essentia has been battle-tested in academic research and production systems (Spotify uses it internally).

Loading essentia.js happens lazily — the ~2MB WASM binary is fetched from our CDN only when you first open the BPM/Key Finder. We use a Web Worker to keep the main thread responsive during initialization and analysis.

Step 1: Smart Windowing

Not all parts of a track are equally useful for analysis. Intros often have no drums, outros fade out, and breakdowns can confuse tempo detection. So we don't feed the entire track to the analyzer.

Instead, we use a smart windowing strategy that varies by track length:

Track Length BPM Windows Key Windows
< 45 seconds Single window covering the full track Same full-track window
45–90 seconds One 20s window starting at 15% in One 40s window starting at 15% in
> 90 seconds Three 20s windows at 25%, 50%, and 70% Two 40s windows at 35% and 60%

All windows avoid the first and last 10% of the track (the edge guard) to dodge intros and outros. This might seem like a small detail, but it dramatically improves accuracy on real-world tracks.

Step 2: BPM Detection with PercivalBpmEstimator

For tempo detection, we use essentia's PercivalBpmEstimator, which is based on the Percival algorithm. Here's how it works at a high level:

  1. Onset Detection — The algorithm detects note and drum onsets across multiple frequency bands using spectral flux.
  2. Inter-Onset Interval (IOI) Histogram — It computes a histogram of time differences between consecutive onsets. Regular rhythms produce peaks at the tempo period and its multiples.
  3. Perceptual Weighting — The algorithm applies a perceptually-motivated weighting function that favors tempos in the 80–200 BPM range (where most music lives) and penalizes half-time and double-time candidates.
  4. BPM Candidate Scoring — Multiple tempo hypotheses are scored against the IOI histogram, with penalties for ambiguous cases.

But here is the key insight: for longer tracks, we run this algorithm on three different 20-second windows at different positions and then aggregate the results. This is critical because a single window might hit a breakdown or a double-time section.

Step 3: Key Detection with KeyExtractor

Key detection uses essentia's KeyExtractor, which works as follows:

  1. Chroma Feature Extraction — The audio is converted into a chromagram: a 12-bin representation of how much energy is present at each pitch class (C, C#, D, ..., B), collapsed across all octaves. This is computed every 4096 samples with a 4096-sample hop size.
  2. Key Profile Correlation — The averaged chroma vector is correlated against 24 key profiles (12 major + 12 minor keys using the Krumhansl-Kessler tonal hierarchy). Each profile represents the expected chroma distribution for a given key.
  3. Strength and Confidence — The algorithm returns not just the best-matching key and scale, but also a strength score (0–1) indicating how strongly the chroma matches the profile. Low strength often means the track modulates or has an ambiguous tonal center.

For tracks longer than 90 seconds, we sample two 40-second windows and use a majority-voting aggregation: if both windows agree on the key, we return it with the averaged strength. If they disagree, we pick the one with higher strength. This handles tracks where the verse and chorus are in different keys.

Step 4: Aggregation and Normalization

Raw BPM estimates need cleanup. Here is our aggregation pipeline:

Raw BPM estimates from each window
→ Normalize into 80-200 BPM range (doubling/halving)
→ Cluster analysis: group estimates within ±2 BPM of each other
→ If largest cluster has ≥ 2 estimates: return median of that cluster
→ Otherwise: return median of all estimates

Enter fullscreen mode Exit fullscreen mode

The cluster approach is important. Imagine a trap beat at 140 BPM. Window 1 might detect 140, Window 2 detects 70 (half-time feel during a sparse section), and Window 3 detects 140. Without clustering, a simple average gives 116.7 — useless. With clustering, the 140 group wins, and we get the correct answer.

The normalization step (multiply by 2 if below 80, divide by 2 if above 200) uses the fact that most music sits in 80–200 BPM. A 65 BPM estimate on a drum and bass track is almost certainly half of 130 (or, more likely, half of ~174).

Step 5: Camelot Notation

For DJs, we convert the detected key to Camelot notation, which is the standard harmonic mixing system used in Pioneer DJ gear, rekordbox, and Mixed In Key:

Key Camelot Key Camelot
Ab minor 1A B major 1B
Eb minor 2A F# major 2B
Bb minor 3A Db major 3B
F minor 4A Ab major 4B
C minor 5A Eb major 5B
G minor 6A Bb major 6B
D minor 7A F major 7B
A minor 8A C major 8B
E minor 9A G major 9B
B minor 10A D major 10B
F# minor 11A A major 11B
Db minor 12A E major 12B

The rule of thumb: adjacent Camelot numbers mix harmonically (8A → 9A, 8A → 7A), and same-number cross-scale transitions work (8A → 8B).

Why No Server Upload?

Three reasons:

  1. Privacy — producers work with unreleased material. They should not have to trust a server with their beats.
  2. Instant UX — no upload progress bar, no network errors, no timeouts. Files are decoded locally in milliseconds.
  3. Zero server cost — audio analysis is CPU-intensive. Offloading it to the client means we can offer this as a free tool indefinitely.

The trade-off is that we cannot analyze YouTube or Spotify links (the browser would need to download the audio first, which raises copyright issues). That feature is on the roadmap, but for now the tool is file-based.


What I Learned Building This

essentia.js is incredible but heavy. The WASM binary is ~2MB, and initialization takes 1-3 seconds on a cold load. The lazy-loading pattern (only fetch when the BPM/Key Finder page opens) was essential to keep the main tag generator fast.

Smart windowing matters more than the algorithm. Switching from full-track analysis to multi-window analysis with edge guards improved BPM accuracy by ~15-20% on real-world test tracks. The algos are mature; the preprocessing is where you win.

Web Workers are underrated for audio. Running essentia on the main thread caused 200-500ms UI freezes per file. Moving to a Worker made batch analysis of 20+ files feel smooth. The message-passing overhead is negligible compared to the analysis time.

Browser fingerprinting is a pragmatic middle ground. No one wants to create an account for a free tool. Fingerprinting gives us abuse prevention and data isolation without signup friction. The trade-off is that clearing browser data resets your quota, but for a free tier that is acceptable.


Try It Yourself

The BPM/Key Finder source is visible in the browser (it is all client-side JS). The backend for the tag generator is in Python/FastAPI. If you are curious about any specific part, drop a comment — I am happy to dive deeper into the FFmpeg effects chains, the Edge TTS integration, or the essentia.js setup.


TagMyBeat is built with Next.js 16, FastAPI, essentia.js, Edge TTS, FFmpeg, and Tailwind CSS. Deployed on Cloudflare + a VPS. No user accounts, no audio uploads for the BPM/Key Finder.