My Friend Had a Cameras-On Problem. I Wrote Him a Solution.

Show HN

CSP Radar GitHub - awebai/aweb-team-coord-worktrees: An aweb team template for a minimum team with a permanent coordinator and worktrees with local developers. GitHub - fujibee/agmsg GitHub - lucastononro/notify: 100% local, free, offline attention skill for Claude Code: plays a sound and speaks a short status update when a long task finishes, blocks, or needs a decision. GitHub - sebastianwessel/skills: AI Skills tivatdoar / workout-to-work · GitLab GitHub - enumura1/py-sql-cleaner: Find, format, and safely extract embedded SQL from Python files. GitHub - intent-bench/intent-bench: Intent fulfillment benchmark for agentic AI engineering GitHub - steveking-gh/firmion: Firmion is DSL and engine for firmware image generation. GitHub - villagesql/villagesql-skills: Agent skills for VillageSQL - gemini-cli-extension; claude-code-plugin GitHub - 0gsd/enough: a personal language system for planning, writing, and translation. GitHub - Kaelio/ktx: ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer GitHub - ThatXliner/xtras: Xliner's Claude Code Skills GitHub - flightdeckhq/flightdeck: Observability and control plane for AI agents. GitHub - search-router/simple-search: Open-source reference app on top of the Search Router API: FastAPI + Jinja metasearch service with pluggable backends, deterministic mocks (no API key needed), RTL UI, Redis cache, and a demo ads cabinet. CSP Radar GitHub - Light-Heart-Labs/DreamServer: Turn your PC, Mac, or Linux box into an AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation. GitHub - Diplomat-ai/diplomat-agent-ts: What can your TypeScript AI agent do to the real world? Scan your code. See which tool calls have zero checks Code Block Selector - Visual Studio Marketplace Prometheus dependency graph — interactive showcase | Riftmap Show HN: I made a vi-like modal keyboard plugin for Figma GitHub - run-llama/liteparse: A fast, helpful, and open-source document parser GitHub - dalemyers/Roar: A macOS CLI tool for notifications GitHub - district-solutions/open-agent-tools-coder: Enables small-to-large self-hosted ai models to use local source code when running tool-calling agentic workloads. We actively data mine 20,900+ (2+ TB) popular github repos using large and small ai models to create reuseable: json, markdown and parquet files for local-first tool-calling models. GitHub - progapandist/stripeek: A local TUI proxy for real-time Stripe API debugging, built for navigating complex payloads fast. GitHub - sir1st/hermes-desktop: All-in-one cross-platform desktop app for Hermes Agent — bundles Python + hermes-agent + hermes-web-ui GitHub - astefanutti/shaderbang: Shebang for Shaders Show HN: Generate Claude Code Workflows using Spec Driven Development approach GitHub - nixys/nxs-universal-chart: The Helm chart you can use to install any of your applications into Kubernetes/OpenShift Show HN: AI agents for UK GDAD PCF roles and their skills The Two Pillars: Mixer Mode and Meta-Software in the Reorganization of Software Work After AI GitHub - JaiCode08/teleport-env What 1,000+ Harness Experiments Taught Me About Self-Improving Agents Show HN: Liiists, a Markdown-first, iOS and CLI list app SwiperTab – Get this Extension for 🦊 Firefox (en-US) GitHub - kouhxp/fftext: Summarize, explain, fact-check, or translate any text, URL, or file. No GPU. No cloud. One command GitHub - sweetpad-dev/sweetpad: Develop Swift/iOS projects using VSCode GitHub - dogmaticdev/IRON: IRON a.k.a. Intermediate Representation Object Notation is a Interpreter/Database that is used to create Programming Languages. GitHub - sjhalani7/vaen: Package your AI coding harness into a portable .agent file, and share it across repos, teams, & the community without ever having to copy-paste instructions, skills, MCP config, or secrets. Show HN: Gandalf the Grader Show HN: Citadeld – replay any CI failure locally from a single file GitHub - tdortman/cuSBF: High-Performance GPU Super Bloom Filter coral-ai/claude-code-token-xray at main · Coral-Bricks-AI/coral-ai GitHub - ulyssestenn/funes: Funes is a Git-based framework for LLM-managed knowledge work: an AI Librarian ingests raw sources, builds an interlinked Markdown knowledge base, and uses it to produce cited reports, analyses, and other outputs. GitHub - ThatXliner/gah: Git Add Hunk, built for agents to use GitHub - harmont-dev/harmont-cli: Command-line client for the Harmont CI platform GitHub - brooksmcmillin/mcp-authflow: OAuth 2.0 Authorization Server framework for MCP servers GitHub - javaid-codes/audit-supply-chain-agents GitHub - amorey/gochan: A small library of common channel architectures for Go, inspired by Rust GitHub - arifozgun/OpenGem: Free, Open-Source AI API Gateway with Gemini, OpenAI & Anthropic Compatibility in 1 file GitHub - Pranesh950/BioPetals: 🌸 Run BIOxAI models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading GitHub - cnguyen14/bounty-doctor: Diagnose a GitHub bounty issue before you waste hours: detects honeypot scam repos, AI-bot attempt swarms, and stale contests. Show HN: CoreMCP – MCP Server for On-Prem DBs Show HN: KittyHTML – Render HTML/CSS as an inline image in your terminal GitHub - bingud/filemat: Web-based file manager Show HN: TruthLens – Free multi-signal deepfake image detector GitHub - apexlocal-jz/claude-usage-tray: Windows system-tray app showing your Claude Code rate-limit usage at a glance. Zero deps, ~300 lines of PowerShell. Cross-IDE (works regardless of VS Code, Cursor, plain terminal). Release v0.1.2.1 · kouhxp/yapsnap GitHub - noopolis/moltnet: Self-hostable chat network for AI agents. Pre-built bridges for Claude Code, Codex, and the Claws. Rooms, DMs, history. No Slack bots, no Matrix, no glue code. GitHub - tamerh/enju: Coordinating Humans, AI Agents, and Compute as Peers on a Shared Workflow Graph Show HN: Continuity-auth – Respect-weighted rate limits for the open web GitHub - luml-ai/luml: AI lifecycle platform where engineers and agents track experiments, train models, and ship to production. GitHub - mrdanielcasper/CoreTex: A UNIX-inspired, biomimetic, flat-file AI harness and knowledge engine. GitHub - clemg/pierre-github: Pierre's diffs.com and trees.software for Github GitHub - lyriks-io/unspaghettit: Behavior-driven AI development without prompt spaghetti. GitHub - sofumel/claude-handoff-revive: Resume Claude Code work after rate/usage/context limits without replaying the prior transcript. Auto-saves at 90%/95% usage. Plugin-installable, 10 languages. GitHub - dotexorg/saferpc: Typed, end-to-end encrypted RPC over any bidirectional channel. GitHub - BeeZeeAgent/beezee: Agent harness orchestration Legato Next.js Boilerplate for Internal Tools · CoreUI GitHub - clark-labs-inc/clark-hash: Clark Hash, 32x smaller searchable sketches for embeddings GitHub - ZeroPointRepo/youtube-mcp: The fastest YouTube transcript + YouTube search MCP for AI agents. Try for free. Typing Mastery — climb toward 100+ WPM, deliberately GitHub - Andebugulin/Awareen GitHub - fayzan123/claude-workflow-composer: Visual desktop app for composing multi-agent coding workflows. Drag agents, attach skills and MCPs, wire handoffs, export to .claude/ GitHub - StackOneHQ/stack-nudge We hardened an LLM agent. Each defense we added made it more exploitable. GitHub - alkait/WhatsKept: Agent-queryable WhatsApp history from an iOS backup — a single Go binary. GitHub - octelium/cordium: Open-source, general-purpose sandbox platform for devs and AI agents that provides identity-based secure access to infrastructure without credentials. GitHub - scosman/videowright: Build animated explainer videos with your coding agent GitHub - dipankar/dscode: The code editor you can take apart. GitHub - zoharbabin/web-researcher-mcp: MCP server (Go) for AI assistants: web search, content extraction, academic/patent/news research. Multi-provider routing, 4-tier scraping, search lenses. Works with Claude, Cursor, and any MCP client. GitHub - scanaislop/aislop: Catch the slop AI coding agents leave in your code: narrative comments, swallowed exceptions, as-any casts, dead code, oversized functions. 50+ rules across 7 languages (TypeScript, JavaScript, Python, Go, Rust, Ruby, PHP). Sub-second, deterministic, no LLM at runtime. MIT-licensed. GitHub - kouhxp/cheap-im: CPU-only voice agent approximating Thinking Machines' Interaction Models demo GitHub - unprovable/OrchidMantis: Orchid Mantis — standalone framework for Zero-Knowledge Proofs of eXploit (ZKPoX). GitHub - TangibleResearch/Halgorithem: A Algo designed to detect AI Hallucitions GitHub - CarpseDeam/Aura-IDE: An AI coding harness that shaped itself - Planner/Worker agents, repo awareness, surgical edits, validation, recovery, and safe diff approvals. GitHub - chojs23/concord: A feature-rich TUI client for Discord GitHub - aerf-spec/aerf: Agent Evidence Receipt Format (AERF) — an open specification for tamper-evident, independently verifiable records of AI agent actions. GitHub - Jwrede/tokentoll: Catch LLM cost changes in code review. Infracost for LLM spend. GitHub - samchon/ttsc: A `typescript-go` toolchain for compiler-powered plugins and type-safe execution + 500x faster lint integrated into compiler GitHub - Higangssh/homebutler: 🏠 Manage your homelab from chat. Single binary, zero dependencies. GitHub - olalie/tapmap: See where your computer connects and what stands out on a live world map. GitHub - Diplomat-ai/diplomat-agent: What can your AI agent do to the real world? Scan your code. See which tool calls have zero checks GitHub - Bajusz15/beacon: Open-source agent for secure remote access, monitoring, and deploys across home-lab and self-hosted machines like Raspberry Pi, N100, or any Linux server. Open web based TTY or tunnel Home Assistant and other local services securely without opening ports. BigTech AI News - Chrome 应用商店 GitHub - vinhnx/VTCode: VT Code is an open-source coding agent with LLM-native code understanding and robust shell safety. Supports multiple LLM providers with automatic failover and efficient context management. GitHub - Lumen-Labs/brainapi2: BrainAPI is a knowledge graph–powered AI memory layer that transforms unstructured data into structured knowledge, enabling intelligent search, recommendations, and contextual memory for AI agents and applications. GitHub - familiar-software/familiar: Let AI watch you work. Familiar lets your AI update its memory, skills, and knowledge by watching your screen. make sidebar/address bar rounded corner toggleable

Heiner Palmen · 2026-05-13 · via Show HN

ScrumSurvivor — a real-time Wav2Lip lip-sync avatar for mandatory camera meetings.

ScrumSurvivor demo

ScrumSurvivor

A few months ago a friend called me, frustrated. His company had just introduced a cameras-on policy for all internal meetings. The justification was engagement. The actual effect, as he described it, was 45 minutes a day of staring at a grid of tired faces in bad lighting while somebody narrated a PowerPoint everyone had already received by email.

He wasn’t looking to skip meetings or disappear. He attends every one, speaks when spoken to, and does his job. What he objected to was the compulsory performance of presence — the idea that a camera pointed at your face is evidence of engagement.

He asked if there was anything clever to be done about it.

I spent a few weekends finding out. The result is ScrumSurvivor — an open source Windows application that replaces your webcam feed with a photorealistic AI avatar of yourself, rendered in real time. When you speak, Wav2Lip lip-syncs the avatar to your actual voice. When you’re silent, it breathes, blinks, and fidgets — generated procedurally so it never looks like a loop. Everything runs locally on a consumer NVIDIA GPU. No cloud, no subscription, no data leaving the machine.

Requirements: Windows 10 or 11 · NVIDIA GPU with CUDA support (4 GB VRAM minimum; RTX 3050 Laptop GPU tested) · Python 3.10+ · OBS Studio · VB-Audio Virtual Cable

How it actually works

The core idea is simple: intercept what the webcam and microphone send to Teams, process them, and replace them with synthesised versions before Teams ever sees them.

Microphone ──▶ Speech Detector ──────────────────────────────────────▶ VB-Cable ──▶ Teams (audio)
                                │  silent                │  speaking
                                ▼                        ▼
                         Idle Compositor           Wav2Lip Engine
                   (clips + breathing             (lip-syncs face crop
                    + head sway + blink            to mic audio,
                    + sensor noise)                265 ms delayed)
                                │                        │
                                └──────────┬─────────────┘
                                           ▼
                                  Frame Compositor
                             (overlay avatar on background
                              + smoothstep crossfade)
                                           │
                                           ▼
                               OBS Virtual Camera ──▶ Teams (video)

For video: OBS Virtual Camera acts as a perfectly normal webcam from Teams’ perspective. What it actually delivers is a 1280×720 composited frame at 25 fps — a static background photo of the user’s desk, plus an animated avatar layer on top.

For audio: VB-Audio Virtual Cable creates a virtual audio loopback. Teams records from “CABLE Output”; the application writes the processed microphone audio to “CABLE Input”. This gives a 265 ms window to process audio before Teams hears it — exactly enough time to run Wav2Lip inference on a mid-range laptop GPU.

For the face animation: Wav2Lip takes an 80-bin mel spectrogram window and a 96×96 face crop and produces an animated face with lips matching the audio. On an RTX 3050 Laptop GPU (4 GB VRAM) this runs in about 20 ms per frame — fast enough for 25 fps with headroom.

MediaPipe detects the face in a base photo once at startup. All subsequent inference uses that fixed crop — no per-frame face detection required. The face crop is composited back into the full frame with the background image as the backdrop.

The idle state

When the user isn’t speaking, the pipeline plays pre-recorded short video clips of them sitting quietly at their desk, cycling through them with randomised pauses. Clips alone would be unconvincing — a looping video is easy to spot.

On top of each clip, four independent procedural layers run simultaneously:

Breathing — a subtle vertical oscillation of the body region at ~0.25 Hz. Barely perceptible. Consistently present.

Head sway — a sum of two independent sinusoids per axis, giving a non-repeating micro-motion. No two seconds look the same.

Blink — a fast eyelid-close/reopen animation triggered every 4–8 seconds at a randomised interval.

Sensor noise — per-pixel Gaussian noise added to every frame to simulate the organic texture of a live camera sensor. Flat digital video has an uncanny stillness to it. This removes that.

All transitions between clips use a smoothstepped crossfade — no visible cuts.

My friend has been running this for several months. Nobody has said anything.

The hard part: audio sync

Getting audio and video to stay in sync was the most interesting engineering problem.

Wav2Lip takes ~20 ms of GPU time per frame. The audio is already playing by the time the video frame appears. Without compensation, the lips are always slightly behind the voice — an uncanny valley version of an already uncanny valley.

The solution: delay the audio output by exactly the same amount as the video processing latency. The AudioPresentationScheduler maintains a ring buffer of incoming microphone audio and schedules each chunk to be written to VB-Cable exactly audio_delay_ms milliseconds in the future. The video pipeline runs concurrently and produces frames corresponding to the same audio window — so both arrive at Teams simultaneously.

The cold GPU startup problem

There’s a subtlety. The first Wav2Lip inference on a cold CUDA GPU can take 1–2 seconds because NVIDIA’s driver compiles JIT kernels for your specific hardware on the first forward pass. If audio scheduling starts before those kernels are compiled, the audio backlog explodes to 18+ seconds — and stays there permanently.

The fix is a warmup loop at startup that runs up to 20 inferences using the actual face crop and monitors latency. Once inference time drops below 50 ms (typically 5–7 iterations), the pipeline opens for business.

# Warmup loop — runs until inference stabilises below 50 ms
for i in range(MAX_WARMUP_ITERS):
    t0 = time.perf_counter()
    wav2lip_engine.infer(face_crop, mel_chunk)
    elapsed_ms = (time.perf_counter() - t0) * 1000
    if elapsed_ms < WARMUP_TARGET_MS:   # 50 ms
        stable_count += 1
        if stable_count >= STABLE_REQUIRED:
            break
    else:
        stable_count = 0

Any remaining backlog from slow CUDA init is automatically discarded by a 2-second cap in the scheduler — it resets the pointer rather than letting stale audio pile up indefinitely.

What I deliberately did not build

No cloud inference. Everything runs locally. No face or voice data leaves the machine.

No real-time face re-enactment. Tools like Deep Live Cam do full face replacement and require a live webcam feed as input. ScrumSurvivor uses a static photo — simpler, more stable, no additional hardware dependency.

No identity swap. The avatar is always the user themselves, rendered from a photo they took. This is the most important design decision: the tool does not impersonate anyone. It is your face, your voice, your machine, your camera output.

Get it

ScrumSurvivor is open source (MIT). Setup takes about an hour — mostly recording your idle clips and installing OBS and VB-Cable.

→ View on GitHub

Windows 10 / 11 only · NVIDIA GPU with CUDA required (4 GB VRAM minimum) · MIT License

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Show HN

How it actually works

The idle state

The hard part: audio sync

The cold GPU startup problem

What I deliberately did not build

Get it