惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

人人都是产品经理
人人都是产品经理
美团技术团队
J
Java Code Geeks
T
The Exploit Database - CXSecurity.com
博客园 - 聂微东
T
Tor Project blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Proofpoint News Feed
AWS News Blog
AWS News Blog
博客园_首页
S
Secure Thoughts
S
Schneier on Security
量子位
Simon Willison's Weblog
Simon Willison's Weblog
H
Help Net Security
Spread Privacy
Spread Privacy
Vercel News
Vercel News
Hugging Face - Blog
Hugging Face - Blog
M
Microsoft Research Blog - Microsoft Research
T
Tailwind CSS Blog
The Cloudflare Blog
V
V2EX - 技术
I
InfoQ
O
OpenAI News
有赞技术团队
有赞技术团队
F
Fortinet All Blogs
Google DeepMind News
Google DeepMind News
V
V2EX
Jina AI
Jina AI
Hacker News: Ask HN
Hacker News: Ask HN
F
Future of Privacy Forum
C
Comments on: Blog
Y
Y Combinator Blog
T
The Blog of Author Tim Ferriss
Blog — PlanetScale
Blog — PlanetScale
Cyberwarzone
Cyberwarzone
Project Zero
Project Zero
P
Privacy International News Feed
H
Hacker News: Front Page
Engineering at Meta
Engineering at Meta
Security Latest
Security Latest
P
Privacy & Cybersecurity Law Blog
Recent Announcements
Recent Announcements
小众软件
小众软件
The Hacker News
The Hacker News
Martin Fowler
Martin Fowler
T
Threatpost
P
Proofpoint News Feed
博客园 - 司徒正美
S
SegmentFault 最新的问题

Hacker News: Show HN

Architectural Metapatterns GitHub - brooksmcmillin/mcp-authflow: OAuth 2.0 Authorization Server framework for MCP servers Show HN: VimRace Hodor — Instantly launch your prompts into any AI tool GitHub - javaid-codes/audit-supply-chain-agents Workplane — Share AI artifacts with humans and agents Show HN: Gochan – A library of channel architectures for Go, inspired by Rust Show HN: WatchPlane, my attempt to replace my monitoring tool stack GitHub - arifozgun/OpenGem: Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility Show HN: Biopetals – Run biology tuned Llama, BitTorrent-style Show HN: Bounty-Doctor – Diagnose a GitHub bounty before wasting hours on it Show HN: Approve Claude CLI prompts from the browser, phone, or tablet GridPath — Best way to build spreadsheets with AI Kibbutznik — a pulse-based direct democratic engine Show HN: CoreMCP – MCP Server for On-Prem DBs Zorilla — vibe code games with your crew Show HN: KittyHTML – Render HTML/CSS as an inline image in your terminal Show HN: Enigma – a walkthrough from Caesar ciphers to a working Enigma machine GitHub - bingud/filemat: Web-based file manager Show HN: TruthLens – Free multi-signal deepfake image detector GitHub - apexlocal-jz/claude-usage-tray: Windows system-tray app showing your Claude Code rate-limit usage at a glance. Zero deps, ~300 lines of PowerShell. Cross-IDE (works regardless of VS Code, Cursor, plain terminal). Show HN: I made an emergency page for my family. You should too Mneme HQ — Architectural Governance for AI-Assisted Development 2048 — Blitz Edition Release v0.1.2.1 · kouhxp/yapsnap GitHub - noopolis/moltnet: Self-hostable chat network for AI agents. Pre-built bridges for Claude Code, Codex, and the Claws. Rooms, DMs, history. No Slack bots, no Matrix, no glue code. Show HN: Disable Ugly Firefox Single Rounded Corner Show HN: Enju – humans, AI agents, and compute as peers on one workflow graph PolyCSS - CSS 3D Engine for the DOM Show HN: Continuity-auth – Respect-weighted rate limits for the open web GitHub - luml-ai/luml: LUML is an open-source MLOps/LLMOps platform, allowing to build and deploy AI/ML models in a matter of minutes. Show HN: Sitchy – Auto-setup any GitHub repo Show HN: Detect anti-bot, anti-agent defenses for any website InsiderTrack · Insider Trading Intelligence GitShare.ch - GitHub Repo Screenshots for Social Media Show HN: Game Boy pixel pipeline explorer 在地图上绘制 — 免费在线路线绘制和位置标记工具 Supapin - Automate your Pinterest. Grow your traffic. GitHub - mrdanielcasper/CoreTex: A UNIX-inspired, biomimetic, flat-file AI harness and knowledge engine. Show HN: Notmyfault.fyi – email alerts when GitHub, Stripe, or Vercel go down GitHub - clemg/pierre-github: Pierre's diffs.com and trees.software for Github GitHub - lyriks-io/unspaghettit: Behavior-driven AI development without prompt spaghetti. ADHD: Parallel Divergent Ideation for Coding Agents GitHub - sofumel/claude-handoff-revive: Resume Claude Code work after rate/usage/context limits without replaying the prior transcript. Auto-saves at 90%/95% usage. Plugin-installable, 10 languages. droast — Free Online Dockerfile Linter Billpal | AI bookkeeping assistant GitHub - dotexorg/erpc: Typed, end-to-end encrypted RPC over any bidirectional channel. GitHub - BeeZeeAgent/beezee: Agent harness orchestration Legato Next.js Boilerplate for Internal Tools · CoreUI Axion — Real amps in your browser Chat Hoarding: A permanent, private archive of your WhatsApp chats Show HN: I hand-write 5 daily word puzzles before work Show HN: Generate 54 social media assets in 1 click the shared workspace for human + agent teams Sotto — Your invisible interview co-pilot. GitHub - clark-labs-inc/clark-hash: Clark Hash, 32x smaller searchable sketches for embeddings TokenAdvisor — Free LLM token analyzer with savings advice GitHub - ZeroPointRepo/youtube-mcp: The fastest YouTube transcript + YouTube search MCP for AI agents. Try for free. Typing Mastery — climb toward 100+ WPM, deliberately GitHub - Andebugulin/Awareen Mirdel - Next-generation AI Workspace PikoCI — The CI/CD that grows with you Virtuoso Data Table GoPeek — open links in live mini browser windows without losing your flow. Show HN: I built a samurai-themed playable Résumé with React, Phaser, + Laravel Programming Language Job Demand Index — 2026 STAX IDE — a spatial terminal IDE for macOS Tasmap GitHub - craigmccaskill/posthorn: Self-hosted email gateway between your apps and a transactional mail provider (Postmark, Resend, Mailgun, AWS SES, or outbound-SMTP). Three ingress shapes (HTTP form, HTTP API, SMTP). One Docker container, one TOML config. Show HN: Windows 8 inspired transfer speed graph Show HN: Hyper, the self driving company brain GitHub - shubhamgoel27/artifold: 📚 A local-first library for the stuff you make with AI. Index, search, preview, share — and use your past work as the style guide for your next one. Show HN: I made a simple Keyword Research tool for app devs Mobile SSH - Android SSH client GitHub - punnerud/mpee: Offline routing, multi-vehicle VRP & street geocoding for one downloaded area — Rust engine, driven from Python or a CLI GitHub - fayzan123/claude-workflow-composer: Visual desktop app for composing multi-agent coding workflows. Drag agents, attach skills and MCPs, wire handoffs, export to .claude/ Show HN: I turned my personal website into a bash shell (with Vim) Show HN: I built a tool to auto-accept AI slop and bigtech devs loves it GitHub - Flowtriq/ftagent-lite: Lightweight open-source DDoS traffic monitor. Stdout output, no account required Permly — Notification Manager for Android GitHub - srijanpatel/arq-dashboard: A dashboard for ARQ built with FastAPI Show HN: CredWork – a simple project tracking and showcasing tool GitHub - clark-labs-inc/clark-agent: A small, typed, hookable agent loop. Provider-agnostic, sandbox-agnostic, tooling-agnostic. Battle tested on clarkchat.com GitHub - alebeck/rhymesum: Hash files into LLM-generated poems locally GitHub - bitcreed/gsd-meta-manager: TUI command center for managing multiple GSD projects from a single terminal GitHub - oeo/monkdev: A holy, minimalist CLI toolkit and MCP server designed exclusively for LLM coding agents. GitHub - xilioscient/troskji: Post-quantum multi-path tunnel — Hybrid KEM (X25519+Kyber-1024) · Shamir 3-of-5 SSS · BLAKE3 · XDP/eBPF cover traffic · Rust Introducing vtermux – M.C. Pantz Flow Simulator Show HN: Free DNS propagation checker – 40 resolvers, TTL and response times GitHub - hamsterbase/llm-translator SetupHub - Share Your IDE Setup with the World Show HN: Zt – Expose local services via Cloudflare Zero Trust in one command Mirror — Record your workflow. Generate docs in one click. GitHub - NikhilSKashyap/interviewsignal: AI-native broad-interviewing. Share a code, capture thought process, auto-grade on submit. pip install, zero setup cost, pure signal. Stumbleback - Chrome 应用商店 OACP — Open Agent Coordination Protocol GitHub - mplsllc/macsurf: A modern web browser for Classic Mac OS 9 PowerPC. Real CSS3, ES5 JavaScript, native HTTPS — built with CodeWarrior on the Carbon API. yavchn
DEMON: Diffusion Engine for Musical Orchestrated Noise
ryanontheins · 2026-05-28 · via Hacker News: Show HN

Diffusion Engine for Musical Orchestrated Noise

Streaming Diffusion Engine for Real-Time Music Generation

Abstract

DEMON is a streaming diffusion engine for music, built on ACE-Step v1.5 (2 B turbo and 5 B XL turbo). A ring buffer of in-flight generations, each carrying its own per-slot timestep schedule, is advanced by one batched decoder forward pass per tick; after warmup, at full pipeline depth, every tick produces a finished song latent. Every solver-side parameter (per-frame source preservation, velocity scaling, ODE noise injection, classifier-free guidance, x0-target morphing, channel gain) accepts a scalar or a per-frame curve at the latent's 25 Hz frame resolution, and a shared-mutable-curve lane propagates parameter changes to every in-flight slot on the next tick, independent of pipeline depth. Native TensorRT engines cover 60, 120, and 240 s song lengths; longer songs route through a sliding 60 s window. The decoder runs under TensorRT with a refit-enabled engine that hot-swaps LoRAs without rebuild.

Samples

Four captures of the running engine under different configurations: live timbre and denoise under a blend between two prompts, live timbre/structure/denoise on the 5 B XL model, a live LoRA refit, and an LLM agent driving the controls through the MCP server. Each clip plays the audio the engine generated.

TIMBRE + DENOISE

Prompt blend with live control

Timbre and denoise driven live, with the conditioning blended between two text prompts: acoustic deep house and a daft-punk four-to-the-floor.

XL TURBO · 5B

Live control on the 5 B model

The XL-turbo (5 B) checkpoint with timbre, structure, and denoise manipulated live as the song plays.

LIVE LORA REFIT

Genre transfer via LoRA

Alternative-rock and funk LoRAs refit live into the running TensorRT decoder, with no engine rebuild.

MCP CONTROL

Agentic control via MCP

An LLM agent driving the engine through DEMON's MCP server: it reads the live control values, then writes denoise, structure, timbre, and prompt-blend updates on a two-bar cadence to evolve the remix in real time.

Experiments

These are the experiments whose results you have to hear. The full evaluation (latency, throughput, cross-GPU benchmarks, quality metrics) is in the paper; a handful of findings, though, only really land as audio. Each isolates one property of the engine: streaming parity, per-frame SDE source preservation on a shared asymptotic curve, per-tick scalar denoise on the same curve, per-slot continuity vs. a global-reset baseline, and a per-frame latent morph driven through the shared mutable state.

Stream vs. Batch parity

Streaming pipeline does not degrade quality

Bit-identical 8-step latents decoded two ways. The batch path runs a single full 60 s VAE decode; the stream path replays the same latents tick-by-tick through a 5 s windowed decode, mirroring the live pipeline. Same fixture (low-fi loop), deathstep LoRA active in both.

Batch — 8-step sequential, full 60 s decode

Stream — depth = 8 ring buffer, 5 s windowed decode

SDE source blending

Per-frame source preservation on a 1 − t³ curve

The shared asymptotic curve below, driven into the SDE step's per-frame source-preservation parameter at the latent's 25 Hz frame resolution. The model runs free for most of the clip, then lands back on the source-anchored side in the final seconds. One generation per fixture, each with its paired LoRA.

Asymmetric 1 minus t cubed curve, holding near 1.0 for most of the timeline then dropping to 0 at the end
Shared 1 − t³ curve — applied per-frame in the SDE demo above and per-tick in the denoise sweep below.

Low-fi · deathstep LoRA

Inside Confusion · acoustic LoRA

Live control surface

Per-tick scalar denoise on the same 1 − t³ curve

Same trajectory, different lane: the streaming pipeline's per-tick scalar denoise input is driven along 1 − (k/N)³ instead of the per-frame SDE parameter. One fresh 0.3 s playback chunk per tick. Holds the model's free response (denoise ≈ 1.0) for most of the run and collapses back to the source as the sweep ends.

Low-fi · deathstep LoRA

Inside Confusion · acoustic LoRA

Per-slot continuity

Heterogeneous per-slot scheduling vs. global reset

Illustrative pair built around a denoise switch (1.0 → 0.5). Under DEMON's per-slot scheduling the output stays continuous across the drain; a StreamDiffusion-style global reset incurs ~648 ms of dead air while the depth = 8 ring buffer refills. Matches the 60/60 vs. 1/60 completion-rate result in the paper.

Per-slot heterogeneous scheduling

Global reset — ~648 ms dead air

Shared-state x0 target morph

Per-frame latent morph between two cover variants

Two cover variants of one source, A (deathcore) and B (ambient), share seed and structure, so their latents stay aligned. The x0_target_strength field, read from the same shared mutable registry every slot consults each step as the SDE curve, is driven as the per-frame swell below: it blends each frame's x0 prediction toward B's precomputed latent, gated to the refinement half of the schedule. The song swells from A into B and back, in a single generation. The blend is convex between two clean latents, so it stays inside the manifold: no re-noising, no artifact.

x0 target strength swell rising from 0 (cover A) to 1 (target B) at mid-song and back to 0
Per-frame x0_target_strength swell — written once into the shared registry, read by every in-flight slot on every step; convex blend toward target B.

A — deathcore cover (endpoint)

B — ambient cover (x0 target)

A → B → A morph — per-frame x0_target_strength swell

Performance

Measured on RTX 5090 (32 GB), ACE-Step v1.5 turbo (2 B), 8 denoising steps, flow shift 3.0, windowed VAE decode at 3 s, 60 s source. Pipeline depth trades end-to-end throughput against control latency.

Metric depth = 1 depth = 4 depth = 8
Throughput (gen/s) 8.9 11.3 12.3
Per-tick latency 14.0 ms 42.8 ms 81.1 ms
Submission-time parameter convergence 112 ms 471 ms 649 ms
Shared-curve latency 1 tick 1 tick 1 tick
Per-frame control resolution 25 Hz (40 ms)
VAE windowed decode (3 s) 7 ms
LoRA refit 1.2 s, no engine rebuild

Architecture

DEMON is the runtime, control surface, and acceleration layer; the diffusion model is ACE-Step v1.5, released by the ACE-Step team under MIT. The engine maintains a ring buffer of in-flight generations at staggered denoising stages. Crucially, each in-flight slot carries its own denoise scalar and its own timestep schedule: one batched decoder forward pass per tick advances slots that are simultaneously at different stages of different schedules. Native TensorRT engines cover 60, 120, and 240 s song lengths; longer songs route through a sliding 60 s window that advances at chunk boundaries.

Two control lanes coexist. Submission-time parameters (text conditioning, source audio, denoise) enter the pipeline when a new request is submitted and reach the next emptied slot within one tick; they then take effect over that slot's remaining schedule. Step-time parameters (per-frame source preservation, x0-target morph, velocity scaling, ODE noise injection, channel gain, guidance, CFG rescale, APG momentum, DCW scalers) live in a shared mutable registry that every slot reads on every forward pass; writing to that registry takes effect on the next tick for every in-flight slot at once, regardless of pipeline depth. The decoder runs under TensorRT with refit enabled, so LoRA deltas are written into the live engine without a rebuild. The paper covers the SDE derivation behind the per-frame source-preservation curve, the windowed-decode receptive-field analysis, and the TensorRT precision recipe.

Acknowledgments

ACE-Step v1.5 — the base diffusion model, VAE, text encoder, and semantic LM. Architecture, training, weights, and turbo distillation are the work of the ACE-Step team, released under MIT.

StreamDiffusion — ring-buffer streaming pattern for image diffusion (Kodaira et al., 2023), adapted here for long music latents.

DCW — Differential Correction in Wavelet domain, a post-step correction for flow-matching samplers (Yu et al., CVPR 2026), ported from ACE-Step 1.5 v0.1.7.

BibTeX

If you use DEMON, please cite both DEMON and the underlying ACE-Step model.

DEMON

@software{fosdick2026demon,
  author = {Fosdick, Ryan},
  title  = {DEMON: Diffusion Engine for Musical Orchestrated Noise},
  year   = {2026},
  url    = {https://github.com/daydreamlive/DEMON}
}

ACE-Step

@article{acestep2026,
  title   = {ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation},
  author  = {Gong and others},
  journal = {arXiv preprint arXiv:2602.00744},
  year    = {2026}
}