ๆƒฏๆ€ง่šๅˆ ้ซ˜ๆ•ˆ่ฟฝ่ธชๅ’Œ้˜…่ฏปไฝ ๆ„Ÿๅ…ด่ถฃ็š„ๅšๅฎขใ€ๆ–ฐ้—ปใ€็ง‘ๆŠ€่ต„่ฎฏ
้˜…่ฏปๅŽŸๆ–‡ ๅœจๆƒฏๆ€ง่šๅˆไธญๆ‰“ๅผ€

ๆŽจ่่ฎข้˜…ๆบ

N
News and Events Feed by Topic
Malwarebytes
Malwarebytes
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
C
Cybersecurity and Infrastructure Security Agency CISA
F
Future of Privacy Forum
C
Cisco Blogs
T
The Exploit Database - CXSecurity.com
A
Arctic Wolf
S
Securelist
K
Kaspersky official blog
S
Schneier on Security
T
ThreatConnect
T
Tenable Blog
Spread Privacy
Spread Privacy
T
True Tiger Recordings
AWS News Blog
AWS News Blog
F
Fox-IT International blog
้‡
้‡ๅญไฝ
T
Threatpost
V
Vulnerabilities โ€“ Threatpost
C
CERT Recently Published Vulnerability Notes
Cisco Talos Blog
Cisco Talos Blog
GbyAI
GbyAI
ๅฎ็މ็š„ๅˆ†ไบซ
ๅฎ็މ็š„ๅˆ†ไบซ
่…พ
่…พ่ฎฏCDC
G
Google Developers Blog
aimingoo็š„ไธ“ๆ 
aimingoo็š„ไธ“ๆ 
Cyberwarzone
Cyberwarzone
ๆœ‰่ตžๆŠ€ๆœฏๅ›ข้˜Ÿ
ๆœ‰่ตžๆŠ€ๆœฏๅ›ข้˜Ÿ
S
SegmentFault ๆœ€ๆ–ฐ็š„้—ฎ้ข˜
OSCHINA ็คพๅŒบๆœ€ๆ–ฐๆ–ฐ้—ป
OSCHINA ็คพๅŒบๆœ€ๆ–ฐๆ–ฐ้—ป
V
Visual Studio Blog
U
Unit 42
้›ทๅณฐ็ฝ‘
้›ทๅณฐ็ฝ‘
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Simon Willison's Weblog
Simon Willison's Weblog
O
OpenAI News
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
ๅฐไผ—่ฝฏไปถ
ๅฐไผ—่ฝฏไปถ
A
About on SuperTechFans
Last Week in AI
Last Week in AI
Y
Y Combinator Blog
ๅš
ๅšๅฎขๅ›ญ - ไธ‰็”Ÿ็ŸณไธŠ(FineUIๆŽงไปถ)
็พŽ
็พŽๅ›ขๆŠ€ๆœฏๅ›ข้˜Ÿ
Google Online Security Blog
Google Online Security Blog
P
Proofpoint News Feed
MongoDB | Blog
MongoDB | Blog

DEV Community

How to build an AI-powered content moderation pipeline for user comments Running Gemma 4 on a Modest Machine: Unsloth vs LM Studio vs llama.cpp vs Ollama AI Makes Building Cheap. Our Product Architectures Still Assume Itโ€™s Expensive. I built an in-browser Roku TV remote with ~80 lines of TypeScript. Here's how Roku's ECP API actually works The Direction of Blame How I Built a Live SQL Workshop Where Students Can't Break Anything Rescuing a Stranded Protocol: Re-Skinning Legacy Code for the Trestle DeFi Flywheel SOLID Heuristics Reveal Incomplete Domain Knowledge โ€” Nothing More AllasCode Intitute / FullAgenticStack: The Intent-Based Router Introducing LogicGrid โ€” Multi-Agent AI Orchestration for .NET AI Prompt Injection, Drupal SQLi Exploitation, and Nmap for Hardening AI Agents & Python Workflows: Anthropic Skills, Jupyter Challenges, and Edge Deployment SQLite Optimization, PostgreSQL Async Queries, & DuckLake Dataframe Spec RTX 5080 Undervolt Benchmarks, CGO-Free CUDA API Binding, & AMD GPU Compatibility Fix Microsoft Burned Its 2026 AI Budget on Claude Code in Six Months. That's the Real Story. Why I Started Learning FastAPI in 2026 I Abandoned Ghost for Months โ€” Then Came Back and Finally Finished It Building an Open MIT-Licensed Ephemeris Engine in C โ€” JPL Moshier Ephemeris 4 Smart Ways to Manage Retries in Side Projects Securing Web APIs: A Practical Guide to Authentication & Authorization Methods Google I/O 2026: AI Built an OS in 12 Hours. I Spent Mine Sorting Screenshots. ๐Ÿคฆ Half a Day, Not a Week: One Nix Flake for Three Machines ๐ŸŒฑ Keep Feeding Your CI/CD โ€” Or Watch It Die Gemma 4 vs GPT-4o vs Llama 3: What Actually Works Locally? Vessel Ops SSH in 2026: Why Every Developer Should Know It Cold Audit AI-Generated PRs Before You Merge Them (Swarm Orchestrator 10.3.0) App Store Optimization (ASO) I built a tool to visualize Django REST Framework architecture (URLs, Serializers, Models, and more) How I made my React site agent-ready in 100 lines AI Can Generate Interfaces on the Fly. But Users Still Need Orientation. AI-Assisted Content Workflow How We Learned That Most Resume Rejections Happen Before Humans See Your CV How I Prepared for CKA: Resources, Labs, and Strategy That Worked for Me Remix Mini PC: Moving the Whole Operating System Onto the eMMC Stop Flying Blind: We Built an LLM Evaluation Framework That Works Across 17+ Agent Frameworks The Misleading "User is not authorized to access connection" Error in AWS CodeBuild โ€” and Why Your IAM Policy Looks Fine I Resurrected a Dead F1 Project and Accidentally Built a Race Intelligence OS Remix Mini PC: After a Year of Dead Ends, the eMMC Finally Talks Not All Games Are Equal: The Real Difference Between a Trap and a Tool How to add Peppol e-invoicing to your SaaS without making it your team's problem I Built a Hermes Agent to Tell Me Which Hackathons to Enter. It Told Me to Enter This One. The Five Hooks That Change How You Ship With Claude Code Powering Your Progress: Building Robust Solutions with Laravel I built a self-hosted CI/CD platform with persistent queue, encrypted secrets, and rollback UI โ€” here's what I learned Antigravity 2.0 and the $1,000 OS: Why "Agent-First" Feels Like the Direction I've Been Building Toward Anyway I built an AI PR-triage agent in 30 lines of Markdown Core Web Vitals from 74 to 91: A Real Tax Practitioner Site Rebuild I Gave Gemma 4 150 Tools on Windows. Here's What Actually Happened. Beyond the Loop: Why Monolithic AI Agents Fail and How to Build a Microkernel Architecture The Hidden Tax of AI-Assisted Development (And How I Fixed It) I Ditched Cloud LLMs for Gemma 4 4B: A DevOps Engineer's 48-Hour Reality Check Building a Schema.org @graph That Validates on the First Try The "Lift and Shift" Trap: Why Your Integration Layer Needs More Than Just a Cloud Address All 7 OSI Layers Explained with Real-World Analogies Antigravity 2.0 in one day: the four shells and what each is good for Self-Hosting Google Fonts with size-adjust: Zero CLS Web Font Swap The Multi-Provider LLM Problem: Why โ€œOne APIโ€ Is Not Enough How I indexed 69,000 Claude Code skills (and what I learned doing it) RememberMe CareGrid: Local Gemma 4 for dementia memory and safety Google Is Killing Gemini CLI on June 18. Here Is What to Do Before Then Do Domรญnio ao Deploy: Hospedando Arquivos de Deep Links no Cloudflare Pages (Parte 7.1) Running Gemma 4 26B on an Old GTX 1080 with llama.cpp Devlog 1: I tried building an SNES game with the super FX chip Why Gemma 4 Feels Like an Important Moment for AI Developersโœจ From Zero and Confused, This Is How I Started Learning to Code I Built a Local AI Gateway That Talks to Claude, ChatGPT, DeepSeek and Gemini โ€” Without a Single API Key Bootstrapping with AI: Why Gemma 4 is the Micro-SaaS Founderโ€™s Best Friend MyErp Architecture Series - #02 Cellular Architecture: Mapping Biology to Software Systems NodeJS vs Bun vs Go ๐ŸŒ RTL Arabic Style UI How Does an AI Agent Actually Buy Something? Google Just Published the Spec. Google I/O 2026 Is One Uncanny F.R.I.E.N.D.S Group Upgrade I Replaced 70MB Node.js Log Viewer with a 172KB Zig Binary The "MTTR Is All You Need" Trap The Quiet Revolution: How Firebase Became the First Agent-Native Backend at Google I/O 2026 I Built ResuMate! A 100% Private, Local AI Resume Optimizer with Google Gemma 4 Learning DirectX 12 - Part 2 Initialization Theory NeuralHats: I Put Edward de Bonoโ€™s Six Thinking Hats on Local LLMs Using Gemma 4 ๐Ÿ“ Instant Auto Save Notes Engineering the "App-Like" Experience: A Deep Dive into PWA Architecture I built a local first AI CCTV assistant using Gemma 4 + Frigate CrowdShield AI โ€” Smart Stadium Operating System & Crowd Intelligence Platform I built a free AI observability tool, prove your AI is useful, not just running Beyond Autocomplete: Why Google Antigravity 2.0 Changes the Rules for Indie Builders ํ„ฐ๋ฏธ๋„ AI ์—์ด์ „ํŠธ ๊ตฌ์ถ• (v12) Building Instagram-Powered Apps with HikerAPI (Without Fighting Scrapers) Checkpoints, Not Transcripts: Rethinking AI Coding Agent Memory From Side Project to Student Savior: My AI PPT & Resume Tool Crossed 1.5K+ Users Why Story Points Donโ€™t Work in the AI Era, And What Should Take Their Place Instead. Self-Hosted Document AI: How to Run Document Intelligence On Your Own Infrastructure (2026) How to Extract Tables from PDFs with AI: 4 Methods That Actually Work (2026) IDP vs OCR: What's the Difference โ€” and Which Does Your Business Actually Need? Automated PII Detection and Redaction in Business Documents: A Practical Guide Human-in-the-Loop Document Review: When to Use It and How to Set It Up (2026) Document Processing Without RPA: A Modern Approach for Small Teams Reducto Alternative: When You Need More Than a Document Parser (2026) Hermes Agent vs LangChain vs CrewAI: When to Reach for Each SparshAI: I Built an Offline AI Tutor for Students Using Gemma 4 โ€” Here's What Happened Building NeuroSense AI: A Human-Centered Stress Insight Assistant Powered by Gemma
babbled notes: a sound-to-music agent for people who could not make music before
BABBLED77 ยท 2026-05-25 ยท via DEV Community

๐Ÿ’Ž You make a sound. Any sound. The agent hears it. Music comes back.


๐•“๐•’๐•“๐•“๐•๐•–๐•• ๐•Ÿ๐• ๐•ฅ๐•–๐•ค

Hum into a microphone. Tap your desk. Exhale slowly. Click your tongue. Whistle once.

A Gemma 4 agent reads what you made, decides what music lives inside it, and plays it back as piano, cello, marimba, or drums.

You chose nothing. The agent chose everything.

Built for people who have never been able to make music before -- people who are non-verbal, people with ALS, cerebral palsy, locked-in syndrome, quadriplegia, Parkinson's. People who have always heard music inside them and had no way to get it out.

๐Ÿ”— GitHub: https://github.com/brookehoward2008-droid/Babbled-notes-v2
๐ŸŽต Agent architecture: HERMES.md


โ—ˆ Why this is an agent, not a tool

A tool does what you tell it. You configure it. You choose the settings. You push the button.

An agent perceives its environment, reasons about what it observes, and takes action on its own judgment.

babbled notes runs a full agent loop on every sound:

Component What it does
Perceive Web Audio API reads the mic: FFT pitch analysis, RMS amplitude, onset detection. Outputs a structured DspDigest.
Reason Gemma 4 (gemma-4-26b-a4b-it) receives the raw audio AND the DspDigest. Decides mood, instrument voice, articulation, and note timing.
Act Web Audio API synthesizer plays the composition. Real instruments. Real time.
Reflect User edits the Lilt score. Agent re-renders without re-recording.

The user never chooses a key, a tempo, a voice, or a mood. The agent reads the sound and decides all of it.


๐Ÿ’Ž The NeuralGem

The agent communicates its state through the NeuralGem -- a canvas visualizer with no text labels:

IDLE       ->  breathing silver ring. waiting for input.

RECORDING  ->  crystallizing polygon. sides grow as your audio level rises.
              color shifts purple to cyan as the sound builds.

PROCESSING ->  hexagon forming. the agent is reading your sound.

LOCKED     ->  hexagon. facets lit in the mood color the agent chose.
              the agent has heard you. music is loading.

Enter fullscreen mode Exit fullscreen mode

For users who are non-verbal, have cognitive differences, or who cannot read: shape and color carry all the information. No labels to parse. No configuration panel to navigate. Tap once to start. Tap once to stop.


โ—ˆ How the agent reasons

The agent sends two things to Gemma 4 simultaneously:

1. Raw audio (base64 WebM)
The actual sound. Gemma 4 can hear the texture -- a tremor in a hum, the scrape of a breath, the sharp crack of a tongue click. These textures do not survive FFT analysis. They live in the audio.

2. DspDigest (structured JSON)
What the perception layer already calculated precisely:

{
  "duration": 3.2,
  "averageEnergy": 0.11,
  "peakOnsetCount": 2,
  "events": [
    { "time": 0.0,  "frequency": 220, "pitchName": "A3", "amplitude": 0.11 },
    { "time": 1.6,  "frequency": 261, "pitchName": "C4", "amplitude": 0.13 }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Two onsets. A3 moving to C4. 1.6 seconds apart. Average energy 0.11 -- a soft sound.

Gemma 4 reads both and decides: this is a sustained hum that rose in pitch. Mood: pensive. Voice: cinematic cello. Articulation: legato. Two melody notes, one drone pad underneath. Timestamps aligned to the 1.6-second interval in the digest.

The agent's output:

{
  "mood": "pensive",
  "articulation": "legato",
  "voice": "cinematic cello",
  "liltCode": "A3 ! soft @ 0.00s\nC4 ! normal @ 1.60s",
  "notes": [
    { "note": "A3", "duration": 1.4, "velocity": "soft",   "time": 0.0 },
    { "note": "C4", "duration": 1.2, "velocity": "normal", "time": 1.6 },
    { "note": "A2", "duration": 3.5, "velocity": "soft",   "time": 0.0, "voice": "synthesizer ambient" }
  ],
  "explanation": "A rising hum -- two tones, a minor third apart. The cello holds the first note soft, lifts into the second. The drone underneath gives it weight."
}

Enter fullscreen mode Exit fullscreen mode

The agent turned a two-second hum into a composition with melody, countermelody, and an ambient drone. The user made one sound. The agent made the music.


โ—ˆ The Lilt Contract

The agent's reasoning follows a set of guidelines built into the system prompt. These are not hardcoded rules -- Gemma 4 interprets them against what it actually heard:

Slow, soft, or hummed sounds:
  mood = "gentle" or "pensive"
  voice = "cinematic cello" or "grand piano"
  articulation = "legato"

Sharp, rhythmic, or tapped sounds:
  mood = "energetic" or "tight"
  voice = "marimba" or "drum kit"
  articulation = "staccato"

Always keep pitches harmonious (C major, A minor, or pentatonic).
Timestamps must align with DSP onsets but feel musically polished.
Always include a drone layer using "synthesizer ambient" voice.

Enter fullscreen mode Exit fullscreen mode

A tremor-affected tap does not fit cleanly into either category. The agent reads it as closer to a soft sound than a sharp one -- Parkinson's tremor in a hum becomes vibrato in the cello voice. A morse-style rhythm gets staccato articulation but the agent may still choose "grand piano" if the pattern feels musical rather than percussive.

The agent makes judgment calls. That is the point.


โ—ˆ The Lilt format

The agent outputs in Lilt -- a flat timestamp-based musical notation:

A3 ! soft   @ 0.00s
C4 ! normal @ 1.60s
E4 ! accent @ 2.80s
A2 ! soft   @ 0.00s   [synthesizer ambient]

Enter fullscreen mode Exit fullscreen mode

Each line: pitch, velocity flag, timestamp, optional voice override.

The piano roll renders from this. The code is editable live. Change a velocity, shift a timestamp, swap a pitch, add a note. The synthesizer re-renders immediately. No new recording. No new API call.

This is the feedback loop. The agent interprets. The user adjusts. The agent re-renders.


๐Ÿ’Ž Who the agent serves

Profile What they give What the agent produces
๐Ÿ’œ Non-verbal autism Sustained hum, single tone Cello or piano melody in that pitch
๐Ÿ’™ Cerebral palsy Tremor-affected taps Percussive or piano rhythm
๐Ÿค ALS Minimal breath control Ambient drone with gentle melody over it
๐Ÿ’› Locked-in syndrome Single eye-blink switch click One-trigger composition, loops
๐Ÿ’š Quadriplegia Hard puff / soft puff contrast Two-dynamic melody: accent and soft
๐Ÿงก Parkinson's Tremor vocal hum Cello composition that treats tremor as vibrato
๐Ÿฉท Apraxia of speech Broken phonation bursts Legato phrase bridging the silence between bursts
๐Ÿ’Ž AAC / pre-verbal Rising or falling hum Interval-based melodic response
๐Ÿ”ต Spinal cord injury C4 Head tap on mic Beat-based composition from impact events
โšช Selective mutism Barely audible breath Gentle drone that validates the smallest input

The agent does not have a "minimum input" requirement. A breath at 0.02 RMS amplitude -- almost nothing -- produces a composition. This was a deliberate design decision. The quietest input a person can give must be enough.


โ—ˆ 32 profiles tested

The agent was validated against 32 real DSP profiles representing the disability communities it was built for.

Three difficulty levels:

Beginner     -- one event, one sound. proves the agent handles the minimum.
Intermediate -- 2-3 events, some rhythm or pitch shift.
Advanced     -- 4+ events, dynamics, intentional pattern.

Enter fullscreen mode Exit fullscreen mode

Results across all 32 profiles: 32 passed. 0 failed.

Every result is a live Gemma 4 response -- no simulated data, no hardcoded fallback. The test suite fires real DSP payloads at the running Express server and logs every decision the agent made.

node test-runner.mjs   # run all 32 profiles yourself

Enter fullscreen mode Exit fullscreen mode

Full results in test-results.json on GitHub.


โ—ˆ Technical stack

Gemma 4 (gemma-4-26b-a4b-it)   reasoning engine
Web Audio API                   perception layer + action layer (synthesis)
React + Vite + TypeScript       frontend / state machine
Express + @google/genai SDK     backend agent server

Enter fullscreen mode Exit fullscreen mode

The API key stays server-side. The browser never sees it.


โ—ˆ How to run it

git clone https://github.com/brookehoward2008-droid/Babbled-notes-v2.git
cd Babbled-notes-v2
npm install

Enter fullscreen mode Exit fullscreen mode

Add a free Gemini API key to .env.local:

GEMINI_API_KEY=your_key_here

Enter fullscreen mode Exit fullscreen mode

npm run dev

Enter fullscreen mode Exit fullscreen mode

Open http://localhost:3000. Allow microphone access. Tap the silver ring. Make any sound. Wait 30-60 seconds for Gemma 4 to reason. The music plays.

No API key? The app runs in simulation mode -- the full UI and audio play back immediately.


โ—ˆ Agent architecture (detailed)

Full technical breakdown in HERMES.md:

  • Perception layer: FFT signal chain, onset detector, DspDigest schema
  • Reasoning layer: dual-input Gemma 4 call, Lilt Contract, JSON extraction
  • Action layer: per-voice synthesis chains, scheduling via AudioContext
  • Feedback loop: live Lilt editor, re-render without re-recording
  • State machine: idle / recording / processing / playing

๐Ÿ’Ž The gem crystallizes. The music plays. You made that.
You made that with a breath.


GitHub: https://github.com/brookehoward2008-droid/Babbled-notes-v2
Agent docs: https://github.com/brookehoward2008-droid/Babbled-notes-v2/blob/main/HERMES.md

by Brooke Chauntel