惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
P
Proofpoint News Feed
L
Lohrmann on Cybersecurity
S
Secure Thoughts
Attack and Defense Labs
Attack and Defense Labs
人人都是产品经理
人人都是产品经理
Stack Overflow Blog
Stack Overflow Blog
W
WeLiveSecurity
O
OpenAI News
SecWiki News
SecWiki News
博客园 - Franky
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
T
Tor Project blog
Microsoft Security Blog
Microsoft Security Blog
aimingoo的专栏
aimingoo的专栏
Security Latest
Security Latest
H
Hacker News: Front Page
Google Online Security Blog
Google Online Security Blog
P
Privacy & Cybersecurity Law Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Darknet – Hacking Tools, Hacker News & Cyber Security
月光博客
月光博客
李成银的技术随笔
Spread Privacy
Spread Privacy
F
Full Disclosure
F
Fortinet All Blogs
T
The Exploit Database - CXSecurity.com
Vercel News
Vercel News
AWS News Blog
AWS News Blog
WordPress大学
WordPress大学
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
V
Visual Studio Blog
J
Java Code Geeks
博客园 - 三生石上(FineUI控件)
G
Google Developers Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Engineering at Meta
Engineering at Meta
Last Week in AI
Last Week in AI
P
Palo Alto Networks Blog
宝玉的分享
宝玉的分享
T
True Tiger Recordings
N
News and Events Feed by Topic
酷 壳 – CoolShell
酷 壳 – CoolShell
Cisco Talos Blog
Cisco Talos Blog
N
News | PayPal Newsroom
S
SegmentFault 最新的问题
Jina AI
Jina AI

Hacker News: Front Page

Michael Keating has died at the age of 79 (1947–2026) Get your passwords out of Bitwarden while you still can Google's Antigravity Bait and Switch AI is just unauthorised plagiarism at a bigger scale Hating AI Is Good US employers spend more than $1.5bn a year to fight labor unions, report finds Magic the Gathering format: Fun 40 Gemini System Prompt Who Wins and Who Loses in Prediction Markets? Evidence from Polymarket FatGid - FreeBSD 14.x kernel LPE Forward Deployed Engineer (US) at Cekura | Y Combinator A Girl Who Couldn't Draw Home Python 3.15: features that didn't make the headlines Flipper One — we need your help Lost Images From the 1945 Trinity Nuclear Test Restored Earth is now heating up twice as fast as in previous decades IBM invented semiconductor manufacturing automation no slop grenade GitHub - Helvesec/rmux: Universal Rust multiplexer with a typed SDK — drive any CLI or TUI app from code. Native on Linux, macOS, and Windows. The famous o3 "GeoGuessr" prompt did not work AI Growth Engineer at Typewise | Y Combinator Vivaldi 8.0: our biggest design overhaul, ever Samuel Alito Has Exposed Himself to Felony Bribery Charges Under New Jersey Law. I’m Filing for His Disbarment and Submitting a Criminal Referral. OpenAI to confidentially file for IPO as soon as Friday: Source Haskell Foundation 2026 Update What is Demand Coop and why tech workers should join one The Letter S, by Donald Knuth [pdf] GitHub - kageroumado/phosphene: A video wallpaper engine for macOS Tahoe DOS Zone | DOS games in browser A Bipartisan Amendment Would End Police License Plate Tracking Nationwide Starship's Twelfth Flight Test On Google declaring war on the Web GitHub - kouhxp/yapsnap: Snap any video URL or audio file into plaintext. No GPU. No cloud. One command. PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-⁠Play Declining America Anthropic is expanding to Colossus2. Will use GB200 Anthropic is expanding to Colossus2. Will use GB200 SpaceX S-1 In Yesterday's IO Keynote Google Declared War on the Remnants of the Web Colorado Amended SB051 (Age Verification Bill) to Exclude Open Source Projects Not alive, but not dead: disembodied human brains used for drug testing Beyond Plastics Tracked Starbucks’ ‘Widely Recyclable’ Plastic Cups. None Ended Up at a Recycling Facility. — Beyond Plastics - Working To End Single-Use Plastic Pollution Flipper One Tech Specs Cooling copper plates could slash data center energy use by 90% Qian Xuesen: The missile genius America lost and China gained (2025) Manton Reece - Why is Inkwell stuck in review Why I Don’t Vibe Code GitHub - mupt-ai/dari-docs: optimize your documentation through fleets of agents Ask HN: Shouldn't Google need to give a public statement about Railway incident? Hormuz closure could trigger 'agrifood shock', price crisis within a year CEO Walks Back Comment About Replacing 'Lower-Value Human Capital' with AI twitter.com OpenAI Is Preparing to File for an IPO Soon A new generation of ads for the AI era of Search Google I/O 2026 had nothing to say and said it badly ahead of Apple's WWDC GitHub - bytedance/Lance: A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing. SBCL: the ultimate assembly code breadboard Structural Backpressure Beats Smarter Agents Intuit to lay off over 3,000 employees to refocus on AI Nobel laureate Olga Tokarczuk apparently used AI to write her latest novel. GitHub - ueberdosis/hocuspocus: The Yjs CRDT WebSocket backend for conflict-free real-time collaboration in your app. Testing distributed systems with AI agents VICTORY! Tennessee man jailed 37 days for Trump meme wins $835,000 settlement after First Amendment lawsuit After Town Bans Flock, Councilmember Crashes Out, Proposes Internet and Phone Ban GitHub confirms breach of 3,800 repos via malicious VSCode extension 560-610 minutes of exercise a week needed for substantial heart benefits - BMJ Group Daniel's Blog · America's Greatest Strategic Blunder: The Imprisonment of Qian Xuesen Adieu Visa et Mastercard : 130 millions d'Européens basculent vers un paiement 100 % souverain dès 2026 Meta blocks human rights accounts from reaching audiences in Saudi Arabia and the UAE Anna’s Archive Hit With $19.5m Default Judgment and Global Domain Takedown Order Saying Goodbye to Asm.js Google's AI is being manipulated. The search giant is quietly fighting back Map of Metal Qwen3.7-Max: The Agent Frontier No way to parse integers in C Learnings from 100K Lines of Rust with AI The weird, wild story of humanity's obsession with gold Raster | Enter the Grid Everything in C is undefined behavior Ex-Apple engineer says Apple deliberately slows older phones via updates Infomaniak secures its independence and its DNA for the long term There’s No Earthly Way of Knowing Which Direction We Are Going… India's hottest district shuts at 10 am as mercury breaches 48 C mark Testing MiniMax M2.7 via API on three real ML and coding workflows GitHub Compromised twitter.com Your Evals Will Break and You Won't See It Coming Japan is gripped by mass allergies. A 1950s project is to blame fivethirtyeightindex.com Incident Report: May 19, 2026 - GCP Account Suspension Railway Status twitter.com An OpenAI model has disproved a central conjecture in discrete geometry css-web-ui-demos/html-in-canvas/awesome-html-in-canvas.md at main · GoogleChromeLabs/css-web-ui-demos Railway Status GitHub - wiltodelta/remove-ai-watermarks: CLI and library for removing visible (Gemini) and invisible (SynthID, C2PA, EXIF) AI watermarks from images A Texas Drainage District Walked Its Ditch on a Routine Inspection. They Found a Pipe They Didn't Recognize Discharging Black Liquid From Tesla's $1 Billion Lithium Refinery Crossview 4.4.0 is now available Era: From Nature publication to catalyzing Computational Discovery Dumb Ways for an Open Source Project to Die
Indexing a year of video locally on a 5-year-old M1 Max with Gemma 4 31B
asenna · 2026-05-21 · via Hacker News: Front Page

While I slept, my 5-year-old MacBook ran Gemma 4 locally and indexed a year of video

I'm in the Maasai Mara about half the year, in three-month stretches. Animals out the front of the lodge, motorcycles, friends in the Maasai villages, kids who think a drone is the funniest thing they have ever seen. That's one half of my year. The other half is sixteen-hour days in front of a terminal, Silicon Valley hacker brain on Africa time. Both real, both consuming attention.

The first half is a constant flood of footage from the iPhone, the DJI Pocket, the drone, the Nikon Z8, and lately the Ray-Ban Metas too. There's always something being recorded. Every photographer or videographer I know is sitting on the same problem: an archive that grows faster than they can edit it. The second half is why mine never gets touched.

Two airport-security trays overflowing with a Nikon DSLR, an action cam, headphones, a sports watch, SSDs, batteries, and a tangle of cables Airport security somewhere between Nairobi and Spain. Two trays of cameras, headphones, drone bits, batteries, SSDs, more cables than anyone needs. Most of it records something. Almost none of what they record gets touched again any time soon.

Three months ago the lodge's social channels went dark. Not for lack of content; the lodge has years of raw footage across multiple SSDs. The bottleneck was editing time, and my time disappeared. Claude Code with Opus 4.5 (and then 4.6) hit the point in February where you could leave agents running for hours and come back to merged PRs. KaribuKit was going live with its first paying property in the same window. I stopped sleeping properly, started running three or four agents in parallel in the background, and the months when I would have cut reels turned into months when I shipped software instead.

So one weekend I sat down to fix it. The first thing I tried was wrong.

The wrong layer

The initial pitch (to myself, after about an hour of research) was a SaaS stack: Eddie AI for iterative editing, Higgsfield MCP for generative B-roll, Submagic for captions, Buffer for cross-posting. About $140 a month, slick on paper.

Two problems showed up before I ran any of it.

First, generative AI video has no place on a real travel brand. Guests pay $300 a night and up to see the actual place, and mislabeled AI shots equals TripAdvisor crucifixion. Higgsfield out.

Second, 3-5 posts a week was aggressive for me, and the realistic floor was more like 2-3. The pitch was optimistic in a way that would have me failing by week two.

Then I remembered I already own DaVinci Resolve Studio, and Resolve 21 ships IntelliSearch (semantic clip search), Smart Bins (auto-organizing folders), and Voice to Subtitle that produces 90-95% accurate captions on the timeline. That's roughly 70% of what Eddie sells, so Eddie was out too.

What I was left with was Claude Code driving Resolve via the open-source DaVinci Resolve MCP, with ElevenLabs handling voiceover on informational clips where it earned its place, and the cost had dropped from $140 a month to $22.

But the deeper thing only landed once I tried to actually use any of this. Every AI video editor on the market assumes your footage is already labeled. Mine is IMG_*.mov and DJI_*.mp4 across folders with names like Mara june 2024 backup final FINAL. Eddie can search by transcript, but none of these tools can find "the elephant on the hill at golden hour" against an unlabeled archive.

The AI editor is solving the wrong problem. Or more precisely, it's solving the second problem; the first problem is the index.

The question

I asked it out loud: how does the agent know what's in each clip?

There's no answer for an unlabeled archive. You can throw transcripts at it, GPS coordinates, filenames, parent folders. None of that gives you "the wide shot at sunrise with the giraffe in the frame" unless something has actually looked at the pixels.

The leverage is upstream. Build the index first, make the archive queryable in English, and the editor on top becomes a thin layer doing what it was designed to do.

So I built the index, locally.

The build

This is the kind of AI-native build I do for clients at SimbaStack, except I was both the client and the engineer this time, which made the decision tree a lot shorter.

Four constraints set the shape:

  • Local-first. The Mara Hilltop archive is on physical SSDs, and most of the personal stuff is on my laptop. Cloud upload was a non-starter both for cost (thousands of files, many gigabytes per clip) and for not handing the entire visual record of my life to a third party.
  • Sidecars, not a central database. A .description.md per clip, living next to it, plain text and grep-able. Survives if my indexer breaks tomorrow, and travels with the data when files move between drives.
  • One vision call captures everything. The expensive operation is the vision pass over the extracted frames, so anything I might want to know about a clip later has to come out of that one call. The schema is exhaustive on day one: rating, technical quality, lighting, time of day, color palette, audio quality, people count, keywords, faces, location, transcript, prose description. All of it in one shot.
  • Three vision backends. Claude via my Max subscription's CLI as default (zero marginal cost), the Anthropic API for speed when I need it, and a local backend pointed at LM Studio for the bulk pass. The local one is the one that matters.

The per-clip pipeline:

  1. ffprobe for metadata.
  2. exiftool for GPS lat/lon/altitude. Works on iPhone, DJI Pocket, drone footage, all the same.
  3. Reverse-geocode via Nominatim. Free, rate-limited, no API key.
  4. ffmpeg extracts five evenly-spaced frames at 1920px.
  5. WhisperX transcribes with word-level alignment and pyannote speaker diarization. Hindi, English, Swahili, 97 languages.
  6. insightface detects faces and stores 512-dim ArcFace embeddings in a centralized SQLite face DB for cross-archive person queries later.
  7. Vision model reads the frames, transcript snippet, and folder context, and returns YAML frontmatter plus a prose description.
  8. Sidecar written to disk.

Here's what that looks like on a real clip from the Mara Hilltop archive.

A frame from IMG_1103.MOV: Ellie on the deck of a Mara Hilltop luxury tent at midday, savanna behind her One frame from IMG_1103.MOV. Ellie on the deck of one of the luxury tents at the lodge, midday. None of that context lives in the filename.

The sidecar file Gemma wrote for IMG_1103.MOV, showing YAML schema and a Description block The sidecar Gemma wrote for the same clip. YAML on top (lighting enum, time-of-day enum, color palette, face embeddings, GPS), prose ## Description below. It picked up the safari-tent setting, the camera pan from interior to savanna, the shot type, and suggested two use cases (marketing reels and travel-vlog B-roll). The filename had IMG_1103.MOV; the sidecar has the rest of what I needed to find it again.

The whole thing is a Claude Code skill at ~/.claude/skills/video-index/, about 1,400 lines of Python. Claude Code wrote almost all of it. My work was the architecture, the prompts, the schema design, and the bug triage when things went wrong.

The absurdity

This is the part that actually surprised me.

I bought a 16-inch MacBook Pro M1 Max with 64GB of RAM in 2021, and the reason had nothing to do with LLMs. I'd been hitting 32GB limits on my previous machine for a while. A messy hacker brain running hundreds of Chrome tabs alongside DaVinci Resolve, Slack, Discord, and Drive was too much for pre-unified-memory hardware to handle without paging constantly. I maxed out the RAM on the new M1 Max because the old one wouldn't stop killing my workflow and I had the money to fix it.

Five years later, that same laptop is running Gemma 4 31B Q4 in LM Studio against a year of video footage.

LM Studio Developer view with gemma-4-31b loaded, 28.40 GB, REST API at 127.0.0.1:1234, server logs showing image encoding LM Studio with Gemma 4 31B Q4 loaded. 28.40 GB of model in memory, REST API at 127.0.0.1:1234. The bottom panel is the server log during a real bulk run, encoding frames one clip at a time.

The bulk run pushed the laptop past where 64GB of RAM alone would carry it. Activity Monitor reported 50.89 GB of swap at the peak.

macOS Activity Monitor showing 64GB physical RAM, 50.89GB swap used during indexing run, memory pressure in the yellow band 64 GB of physical RAM, 50.89 GB of swap used. Memory pressure in the yellow band, the kind of state you absolutely should not run on a normal Tuesday. Apple's swap is designed for it, and the fans were loud.

I Googled whether that would damage the SSD, and apparently for a day or two it's fine. Don't make it your normal operating state, but a weekend of pushing the machine hard is well within tolerance. My laptop ran hot, the fans spun up, and it kept producing sidecars while I worked on other things.

The M1 Max 16-inch is, honestly, legendary. People in the Mac community talk about it that way for good reason: five years on, it's running 31B-parameter models at usable speed with the kind of headroom that should not exist on hardware this old. I expect another three to five years out of this thing, comfortably, because local LLMs only get more efficient and the hardware is the floor, not the ceiling.

I bought it for Chrome. It's running a model that didn't exist when I bought it.

Four bugs, four lessons

The build was mostly Claude Code holding the pen. The interesting work was the four times it almost shipped something wrong.

WhisperX 3.8 broke its diarization API between when I last touched it and now. Two breaking changes had landed: whisperx.DiarizationPipeline moved to the whisperx.diarize submodule, and the constructor kwarg use_auth_token was renamed to token (inherited from pyannote 3.x). The fix was signature introspection: the script tries token= first and falls back to use_auth_token= if the constructor raises a TypeError, so it survives the next API shuffle automatically. Lesson: when shelling out to AI libraries that move fast, defensive constructor calls are cheap insurance.

The Claude CLI returns permission errors as successful responses. On the first test of the CLI backend, all four sidecars came back identical with the text "I need permission to read the image frames...", and the script's success check passed because exit code was 0 and the output wasn't empty. The cause was that in non-interactive mode without --permission-mode bypassPermissions, the CLI returns the permission-denial text as the response body instead of prompting, which means the failure mode looks exactly like success unless you string-match for it. The fix was adding the flag plus a defensive check that flags any short response containing "I need permission" as an error rather than a description. Lesson: when scripting AI tools, the non-interactive permission flow is where silent failures hide.

Gemma returned people_count: "many" instead of an integer. My vision prompt literally said integer or the string "many" if >10. Gemma followed instructions correctly; the bug was schema design. The fix was a stricter prompt (integer 0-99 with explicit guidance to estimate) plus a coercion in the parser for the legacy "many" responses. Don't union-type schema fields. Pick always-int or always-string, never "int or this one specific string," because every downstream consumer pays for the choice.

The motorcycle clip that shouldn't have been culled. My initial cull prompt was photographer-portfolio-shaped: heavy motion blur, soft focus, and jittery stability got rated cull. Technically correct. Then I tested it on a handheld nighttime motorcycle clip from a Spain trip and it culled it. I caught it: that's a fun memory, the blur is the vibe. I reframed the cull criteria to "not a real recording" only (lens cap, pocket footage, two-second test clips, fully clipped exposure), not "imperfect capture." Lesson: photo archives cull aggressively, video memories cull permissively. Same schema, different criteria; be explicit about which mode you're in.

The actual take

Three things I now believe more strongly than I did a week ago.

Enum constraints beat instructions for confabulation prevention. I tested Gemma 4 E4B on a coworking-space photo I'd taken at night, and it described the scene as "brightly lit, abundant natural light, floor-to-ceiling windows" — except the windows were pitch black outside, because it was night. Then I tested 31B with a structured schema prompt that forces the model to pick from golden_hour | bright_daylight | overcast | dim_interior | nighttime | mixed | unclear, and both thinking-off and thinking-on recovered nighttime correctly. A model can lie about open-ended prose, but it can only mis-pick from an enum, never invent a new value. Use schemas, not instructions.

Local 31B with structured prompts closes most of the gap to cloud. Gemma 4 31B Q4 thinking-off against a structured schema produces output that's hard to distinguish from Sonnet 4.6 on most of my test clips. The cloud premium earns its keep on the hard 10-20%. Bulk indexing at scale (thousands of clips overnight) should run local; cloud is the re-rate pass on clips local flagged as review. That two-tier setup is the one that scales.

AI video editors are pitched one layer too high. The valuable layer is the index. Once your archive is queryable in plain English ("show me handheld interior clips from Mara, golden hour, with people, longer than 8 seconds"), the editor on top is straightforward. Most of the AI-editor space is competing for the surface above an index that doesn't exist, and the index is the prerequisite they're all skipping past.

What's next

Looking back, the thing that kept this from getting fixed sooner wasn't really time. I had every AI superpower currently available pointed at the work side of my life: Claude Code refactoring codebases overnight, Codex writing most of my pull requests, the agentic stack I'd just spent three months using to ship KaribuKit. On the editing side, I was using none of it. The not-getting-to-it had become its own small, low-grade frustration that lived in the back of my head all year, the kind of thing you notice every time you open a folder on the SSD and close it again without doing anything. What clicked one Saturday wasn't that I needed to find time. It was that the editing problem was a tooling problem, and tooling is the one kind of problem I happen to be well-equipped to fix right now.

This weekend I'm building the editor: Claude Code as the orchestrator, DaVinci Resolve MCP for the cuts, ElevenLabs for voiceover on informational clips. There's one hard rule baked into the tooling: the voice clone is for utility content only. Directions, room descriptions, multilingual versions, factual stuff I'd say in person anyway. Never for testimonials or founder messages. Disclosure laws are real in 2026, and trust in a hospitality brand is too easy to lose.

The index makes all of that tractable. Without it, I would still be scrubbing through 47GB of DJI Pocket footage looking for the sunrise wide.

For now: a year of Mara Hilltop footage is queryable in English on a five-year-old laptop. Cost was a weekend of my time and 50GB of swap. The remaining years across older SSDs are next.

A fair check on all of this: Mara Hilltop's social channels are still dead today. The indexer solves only half the problem (finding the right clip); the editor that turns those clips into finished reels is the other half, and that's the part I'm building this weekend. If it works, the channels light back up and I write part two. If it doesn't, I write about why.

In all honesty, the right answer here might be to hire someone. Finding an editor with the right sensibility for Mara Hilltop (warm, observational, no over-cut MTV-energy reels) is harder than writing another skill. If you know someone who works in that register, send them my way.

The skill is open at ~/.claude/skills/video-index/. If you're working on something similar (indexing personal archives, getting a local model to do real archival work, building agents that drive editing tools), I'd be glad to compare notes.

— NJ

Building KaribuKit (AI-native PMS for hospitality), running Mara Hilltop (eco-lodge in the Maasai Mara), and consulting through SimbaStack.

#local-llm#claude-code#video-archive#mara-hilltop#simbastack