惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

DEV Community

Build a Real-Time Excalidraw-like Collaborative Canvas using Velt MCP and Antigravity🎉 Using Reddit to Validate SaaS Ideas Before Building Building a Self-Hosted AI WhatsApp Agent for Structured Invoice Extraction Three Design Decisions That Shaped the Enterprise RAG Retrieval Pipeline How React's Virtual DOM Works Under the Hood Build a Dropbox Paper-Style Collaborative Editor with Next.js and Velt💥 Holy Typos, Batman! How I Built 'SpellJump' How to Test Frontend Error States Without Breaking Your Backend A .NET Dinosaur in Web3. Day 8 — Reading & Writing — WishList Chain Building AI Digital Employees with Markus: An Open-Source Platform for Agent Teams [Boost] The Auditor — High-Reasoning Synthesis and the Ethics of Governance Building 'Offline Brain': How I Wrote My First Custom Agent Skill for Android (Google I/O 2026) 📱🧠 Building a Superhuman-Style Collaborative Email Editor with Next.js and Velt🔥 I Built an On-Chain Marketplace Where AI Agents Solve GitHub Bounties for USDC Three Stripe subscription patterns I locked in before going live (with code) Six Ways AI Agents Communicate in 2026. I Benchmarked All of Them. Building AI Digital Employees with Markus: An Open-Source AI Workforce Platform I built a tool that detects broken security headers, missing robots.txt, and WP_DEBUG=true — then opens a PR to fix them automatically NIST Just Exposed the Age Estimation Number Vendors Don't Want You to See Authentication Looks Easy - Until You Build It for Real Users I Built a Free Stock Market Game You Can Play Right Now — No Login, No Download GitHub Agentic Workflows: Building Self-Healing CI for .NET Building a No-Code AI Agent for WooCommerce Order Analytics with Flowise & HPOS Your AI Coding Agent Has Been Flying Blind. Google I/O 2026 Just Fixed That I built a CLI that eliminates README reading forever Measuring AI Gateway Failover: 30 Days of Production Data The Folly of Global AI Platforms: Or How We Built a System That Actually Works in Cameroon Week 9 The 10-Minute Race: Scaling the "Cancel Order" Button to 100K+ Requests Per Second SQL Performance: Indexing, Query Tuning & Explain Plans (Developer Guide) Tutorial: This AI Now Tells You if a Meeting Could Be an Email Why I Got Tired of Class-Heavy UI Code and Started Building Around Attributes GitHub Is No Longer a Place for Serious Work Build an AI-Powered Developer Portal with Backstage and .NET Updates to developer experience on Setapp Node.Js Express CRUD template Lint Your Phishing Templates Like You Lint Your Code From Code to Cloud: 3 Labs for Deploying Your AI Agent I built Voice2Sub: a local AI subtitle generator for video and audio The OCR Rabbit Hole Built a 100k-Document RAG System by Hand. Hermes Read the Architecture in 47 Seconds. I tried monetizing my MCP server with x402 — production needs more than npm install Understanding Tracking Dimensions in Accounting Integrations I Ran My Local, NOT AI, AI Code Auditor on Its Own Source Code Agent Surface Map: Gemma 4 review before you install an MCP Stop Being Nice, Start Being Right": The Day My User Reconfigured My Reward Function Building a Database Performance Testing Tool With AI: The Honest Breakdown Hot To Run LLMs Locally Research blockchain with post-quantum Dilithium and custom zk-STARKs from scratch AI agents do not just need tool access. They need execution control. The CTO’s Blueprint for Governing Multi-Agent AI Systems in the Enterprise I audited our CMS and 86% of our articles were invisible. A Sanity gotcha. Upselling Explained Industry-Specific Tactics for EC Owners 2026 I Keep Hermes Agent's Self-Improvement OFF For the First 14 Days — Here's What Happens When I Don't I Built the Hermes + Claude Code Dual-Stack: Orchestrator Meets Coder — Here's the Full Architecture Stop Using .iterrows(). Here's What Actually Fast Looks Like I Built a SaaS to Stop the Awkward "Hey, Did You Get My Invoice?" Conversation I Renamed a Hot Postgres Table Without Dropping a Request How to Build a Self-Hosted AI Gateway With LiteLLM and Open WebUI What is a Webhook? A Complete Guide for Beginners Headless BI: How a Universal Semantic Layer Replaces Tool-Specific Models Beyond Translation: A Developer's Guide to App Localization (i18n & l10n) Aegis: Designing an Offline Ambient Co-Working Companion for High-Burnout Medical and STEM Grinds Local LLM Code Completion Showdown: Zed AI vs Continue vs Cursor (Honest 2026 Review) The Agentic Payment Protocol Wars Your No-Code AI Agent Has a Memory Problem The Agentic Payment Protocol Wars How to Bypass LinkedIn Commercial Use Limit in 2026 (Without Paying $150/mo) We built a statechart hosting platform where two actors in the same state can migrate to different versions — here's why that matters Playwright vs TWD: A Frontend Developer's Honest Comparison Claude Code's skillListingBudgetFraction: The Undocumented Setting Silently Killing Half Your Skills O GitHub pode mudar sua carreira mais do que você imagina Just redesigned and launched my developer portfolio 🚀 Would genuinely love some honest feedback from the dev community 👨‍💻 Data Virtualization and the Semantic Layer: Query Without Copying Launching opub: donated compute for open-source maintainers Four iteration rounds on a security scanner I run, all of them visible. Here is what the loop actually looks like. Why Good Abstractions Make Debugging Harder Found a Coordinated Inauthentic Network on GitHub: 24 Accounts, Fabricated History, and a Generator That Left Its PID in Three READMEs Cursor Just Released Composer 2.5. Here's What Actually Changed for AI Coding Agents. What Wrong Docs Cost Test Automation Teams Export Your DeepSeek Chats to Word, PDF, Google Docs, Markdown & Notion in One Click When the Docs Lie OpenShift Observability: Built-in vs. Bring-Your-Own If your AI initiative is pending for 6 months, the bottleneck is probably not technology Hermes Agent Under the Hood: The Open-Source Runtime for Autonomous AI Systems Expert Systems -The AI That Existed Before AI Was Cool AI-generated accessibility, an update — frontier models still fail, but skills change the game My HTML Learning Journey 🚀 The Day PayPal Failed and the Rust Rewrite Saved the Product Launch Google Sheets CRM: 4 Ways I've Actually Done It (with Apps Script Code) BrontoScope: AI-Powered Error Investigations The job of an AI engineer inside a 40-person company is not what most CEOs think it is Building a Clinical Speech-Therapy App With a Real SLP: 4 Lessons From PhoenixSteps 7 overlooked .Net features How Stripe Took 48 Hours and 3 API Calls to Break My Freelance Income Stream in Lagos Pretty normal Both Camps in the 'Left Behind' Argument Are Right About Each Other Flutter MCP Toolkit v3 Google Just Shipped Gemini 3.5 Flash. Here's What Developers Actually Need to Know.
How We Built an AI That Evolves Alongside a Creator Through Memory
JDeep · 2026-05-22 · via DEV Community

Let me tell you about the moment I knew we had a problem. We'd just shipped our content repurposing tool. A fitness YouTuber pasted in a video URL. Out came a LinkedIn post that opened with "In today's fast-paced digital landscape..." The man deadlifts 200kg for a living. That's when we decided our AI needed to actually learn who it was writing for, not just parrot generic marketing speak into a different shaped box.

Most AI tools for content creators work like a photocopier with a thesaurus. You paste text in, pick a platform, and out comes something that sounds like it was written by a committee of people who've never watched a YouTube video. We wanted something different. We wanted an AI that gets better at sounding like you the more you use it. Not because we fine-tuned a model (we don't have that kind of GPU budget, and frankly, neither do you), but because the system actually remembers what you do and why.

This is the story of how we built that system, what broke along the way (spoiler: a lot), and why the combination of Hindsight's agent memory and cascadeflow's multi-model orchestration turned out to be the two pieces we didn't know we needed until 3 AM on a Tuesday when everything else had failed.

What the System Actually Does

The elevator pitch: you give our system a YouTube URL. It downloads the video, transcribes it, finds the strongest 30 to 90 second moments, and generates production-ready briefs for Instagram Reels, YouTube Shorts, and LinkedIn. Each brief includes a hook, a full spoken script with editing cues like [CUT] and [PAUSE], a platform-native caption, and even a visual prompt for AI-generated b-roll. Basically, it does what a $5,000/month content team does, except it doesn't take Fridays off.

But here's the thing. Plenty of tools can chop a video into clips. That's table stakes. The interesting part is what happens after the creator starts reviewing those briefs. Every edit, every approval, every rejection, every "regenerate this but make it less corporate" gets silently observed and stored as a memory. The next time the system generates content, it recalls those memories and adjusts.

The creator never fills out a settings page and checks boxes next to adjectives like "casual" and "bold." (We all know nobody reads those forms honestly anyway. Everyone thinks they're "authentic and relatable.") The system just watches what they actually do and converges on their personality over time.

Think of it like hiring a new editor. Day one, they're guessing. They write your hooks like a BuzzFeed intern from 2014. By week three, they know you hate exclamation marks, you always cut filler words, and your hooks work best when they open with a question, not a command. That's the loop we built. Except this editor doesn't ask for a raise and doesn't have opinions about your Slack status.

What you're looking at above: The pipeline flows left to right, from a YouTube URL through seven stages (Ingest, Transcribe, Recall, Extract, Clip, Generate, Review). The key detail is the feedback loop at the bottom, the red arrow that makes the whole thing worth building. Every editing action in the Review stage feeds observations back into Hindsight's memory bank. On the next pipeline run, the Recall stage pulls those memories and injects them into both moment extraction and content generation. The system literally gets smarter with each review cycle. It's like git for personality, except you never have to write a commit message.

The Memory Loop: Where Hindsight Fits

Here's a confession. The first version of our system had no memory at all. Zero. Zilch. The kind of amnesia that makes Memento look like a documentary about a guy with excellent recall. It generated the same generic hooks regardless of who was using it. A fitness creator and a fintech newsletter writer got output in the same tone. That's obviously wrong, but the fix isn't obvious.

The naive approach: build a preferences form. Let the creator pick "casual" or "professional," choose their platforms, list words to avoid. We built that too (it helps with the cold start problem, and honestly it makes the onboarding screen look impressive in screenshots). But it turns out people are terrible at describing their own style. A creator will tell you "I'm casual and direct" and then consistently reject every draft that doesn't open with a specific data point. Ask anyone what kind of music they like. Now watch what they actually play in the car. Two completely different playlists.

Their stated preferences and their revealed preferences are two different universes. We needed to observe the second one.

That's where Hindsight enters the picture. Instead of asking creators to describe themselves (an exercise roughly as accurate as asking a cat to describe its relationship with furniture), we observe what they do and store those observations as memories.


The loop in plain English: The creator reviews drafts and edits/approves/rejects them. Each action fires retain_diff_observation(), which stores a tagged observation in Hindsight's memory bank. On the next pipeline run, recall() fetches relevant memories and reflect() synthesizes them into a compact paragraph. That paragraph gets injected into Claude's generation prompt. The drafts come back better. The creator edits less. The loop tightens. Rinse and repeat until the AI sounds more like the creator than the creator does on a Monday morning.

Here's the function that fires every time someone saves an edited draft. It's not clever. It doesn't need to be:

async def retain_diff_observation(
    before: str, after: str, platform: str, content_type: str
) -> None:
    if not before or not after or before.strip() == after.strip():
        return

    before_words = before.split()
    after_words  = after.split()
    delta        = len(after_words) - len(before_words)
    pct          = abs(delta) / max(len(before_words), 1) * 100

    if delta < -10 or pct > 30:
        action = "significantly shortened"
    elif delta < 0:
        action = "trimmed"
    elif delta > 10 or pct > 30:
        action = "significantly expanded"
    else:
        action = "rewrote (same length)"

    observation = (
        f"Creator {action} a {platform} {content_type} draft. "
        f'Before ({len(before_words)} words): "{before[:160]}"'
        f'After ({len(after_words)} words): "{after[:160]}"'
    )

    await retain_observation(
        observation,
        tags=["editing-behaviour", "draft-edit", platform, content_type],
    )

Enter fullscreen mode Exit fullscreen mode

Nothing fancy happening here. No PhD required. We compute a rough diff, classify whether it was a trim, expansion, or rewrite, and store a human-readable observation via Hindsight's retain API. The tags are important, though. They let us query memories by type later without building our own taxonomy from scratch. (We tried building our own taxonomy once. It lasted two days before we set it on fire.)

The real magic shows up on the next pipeline run. Before we generate any content, we call two functions that sound like they belong in a therapist's office:

recall_result = await recall_memories(
    query="How does this creator prefer their content? "
          "Hook styles, editing preferences."
)
reflection = await reflect_on_creator(
    query="Summarise this creator's content preferences, "
          "voice, and style."
)

Enter fullscreen mode Exit fullscreen mode

recall fetches the raw observations. reflect synthesizes them into a compact paragraph, like a friend who can summarize your entire personality in three sentences (and is somehow right about all of them). That paragraph gets injected straight into the generation prompt as a ## Creator Voice & Preferences section. Claude doesn't need fine-tuning. It just needs good context, and Hindsight provides exactly that.

The result is genuinely satisfying to watch. After three or four review sessions, the system stops generating hooks with exclamation marks for creators who always delete them. It starts opening LinkedIn posts with data points for creators who approve those. It learns that one creator shortens every tweet to under 120 characters and begins generating tighter drafts automatically. No retraining, no config files, no "please describe your brand voice in 500 words" forms. Just memory doing what memory does.

Keeping It Cheap: Where cascadeflow Fits

Here's a problem we didn't anticipate, which in retrospect we absolutely should have. After a creator completes a review session, we need to analyze all their editing events and extract style observations. That's an LLM call. We also use LLM calls during moment extraction from transcripts, and for the actual brief generation. Each project can easily rack up 15 to 20 LLM calls. Our moment extraction needs to be accurate (you can't miss the best 45 seconds of a 30-minute video, that's literally the whole point), but our synthesis calls are simple text classification that a slightly motivated intern could do.

Using the same expensive model for everything felt like hiring a Michelin-star chef to make toast. Sure, the toast would be excellent. But your budget would be gone by Tuesday. That's where cascadeflow saved us real money and possibly our sanity.

cascadeflow lets you define a drafter model (fast and cheap) and a verifier model (slower and accurate), and it routes each call to whichever model meets a quality threshold you set. The key insight is that different parts of the pipeline have different accuracy requirements:

def get_extraction_agent() -> CascadeAgent:
    """Higher quality: moment extraction needs accuracy."""
    return build_agent(quality_threshold=0.8)

def get_generation_agent() -> CascadeAgent:
    """Lower threshold: drafter handles most generation cheaply."""
    return build_agent(quality_threshold=0.65)

def get_synthesis_agent() -> CascadeAgent:
    """Synthesis uses drafter only: observations are simple text."""
    return build_agent(quality_threshold=0.5)

Enter fullscreen mode Exit fullscreen mode

Three agents, three thresholds, same two underlying models. It's like having a junior dev and a senior dev, and only paging the senior when the junior says "I'm not sure about this one." The synthesis agent (which extracts observations like "creator always removes filler words") runs on the drafter almost exclusively because the task is straightforward. The extraction agent, which needs to identify the strongest 30-second moments from a transcript, escalates to the verifier more often because getting that wrong means the whole project is useless. We didn't have to think about routing logic or write a single if/else. We just set a quality number and cascadeflow handles the rest.

We also built a CostAccumulator that tracks every call and computes what we would have spent if everything went to the expensive model. Think of it as the "what if we hadn't been smart about this" meter:

@dataclass
class CostAccumulator:
    total_calls: int = 0
    drafter_calls: int = 0
    verifier_calls: int = 0
    total_cost_usd: float = 0.0

    def record(self, result: Any) -> None:
        self.total_calls += 1
        model_used = getattr(result, "model_used", "") or ""
        cost = getattr(result, "total_cost", 0.0) or 0.0
        self.total_cost_usd += cost
        if settings.cascade_drafter_model in model_used:
            self.drafter_calls += 1
        else:
            self.verifier_calls += 1

Enter fullscreen mode Exit fullscreen mode

This surfaces in the UI as a cost breakdown per project. Creators don't care about it (they shouldn't have to), but we stare at it like it's a stock ticker. It tells us that roughly 70 to 80 percent of calls get handled by the drafter, which means cascadeflow is doing exactly what we hoped: using the expensive model only when it actually matters. The other 20 to 30 percent? Those are the calls where quality genuinely required the bigger model. We sleep better knowing we're not burning money on tasks a smaller model handles perfectly.

The Seven-Stage Pipeline

The full pipeline is a background task triggered when a creator submits a YouTube URL. Each stage updates the project status in Supabase so the frontend can show a live progress stream (because nothing says "your money's being well spent" like a progress bar that actually moves). Here's the sequence:

  1. Ingest: Detect input type, try YouTube auto-captions first (faster than transcription).
  2. Transcribe: If no captions exist, download audio via yt-dlp and run Groq Whisper. Files over 25 MB get automatically chunked.
  3. Recall: Pull creator memory from Hindsight. If no memories exist yet (cold start), the system falls back to universal heuristics.
  4. Extract: Claude Haiku 4.5 analyzes the transcript in parallel chunks and identifies the 3 to 5 strongest moments. Each moment is scored on hook quality, narrative arc, and standalone clarity.
  5. Clip: yt-dlp downloads just the relevant segments (no full-video download), ffmpeg crops to 9:16 vertical. All clips extracted in parallel.
  6. Generate: Claude Sonnet 4.5 produces a full production brief per moment per platform, with the creator's memory reflection injected into every prompt.
  7. Review: The creator edits, approves, or rejects. Every action feeds back into Hindsight. The loop closes.

The pipeline runs entirely as an async background task. If clip extraction fails for a particular moment (network issues, YouTube deciding it doesn't like you today), it falls back to embedding a YouTube player with start/end timestamps. The frontend never shows a broken state. This is important. Nothing kills user trust faster than a loading spinner that never stops.

The Intelligence Graph: Making Memory Visible

Here's something we learned the hard way: if the system is learning from you, you have to let the user see what it learned. Nobody trusts a black box. Especially not creators who've spent years building a personal brand and have strong opinions about whether they're "witty" or "sarcastic" (it's always sarcastic, by the way).

So we built a knowledge graph visualization. It runs three parallel Hindsight recall queries across different memory domains (style/tone, editing behavior, profile preferences), deduplicates by text, and classifies each memory into a node type using the tags that Hindsight already stores:

def _classify_memory_with_tags(text: str, tags: list[str]) -> tuple[str, str, float]:
    """Classify using real Hindsight tags first, fall back to keyword heuristic."""
    for tag in (tags or []):
        kind = _TAG_KIND.get(tag.lower().strip())
        if kind:
            return kind, _short_label(text), 0.80
    return _classify_memory(text)

Enter fullscreen mode Exit fullscreen mode

Tags from Hindsight act as a first-class classification signal. We only fall back to keyword heuristics when tags are missing (which is like using a map when your GPS dies, except the map is made of regex and tears). The graph renders with five node types: root (the creator identity), traits (tone, style), platforms, preferences (editing patterns), and topics (niche). Edges encode semantic similarity, temporal proximity, and causal relationships (platform preferences causing editing patterns).

The creator can hover over any node and see the full observation text. It's the difference between "the AI is learning" and "here's exactly what the AI thinks it knows about you, and you can see it evolving in real time." One tester told us it felt like reading their own therapy notes. We're choosing to take that as a compliment.

Lessons Learned (a.k.a. Things We Wish Someone Had Told Us)

Memory is more useful than configuration. We have a preferences page. Creators fill it out during onboarding. But the observations extracted from actual editing behaviour are consistently more specific and more accurate than what people self-report. "I prefer casual tone" is less useful than "Creator consistently removes the word 'essentially' and shortens hooks to under 8 words." If you're building a personalization system, observe behavior first and ask questions second. People don't know what they want until they see what they don't want.

Quality thresholds beat manual routing. We initially wrote if task_type == "synthesis": use_cheap_model() branching logic. It was ugly. It was fragile. It was the kind of code that makes future-you send angry Slack messages to past-you. Replacing that with cascadeflow's quality threshold was simpler and more robust. The threshold is a single number, and the system figures out when to escalate. We spent less time debugging routing decisions and more time tuning the threshold values themselves, which is the right knob to turn.

Tag your memories at write time. We initially stored observations as plain text and tried to classify them at read time using keyword heuristics. It worked about as well as you'd expect, which is to say it didn't. Switching to Hindsight's tag parameter (e.g., tags=["editing-behaviour", "draft-edit", "linkedin"]) meant that recall queries and graph construction could use structured metadata instead of parsing free text. The 30 seconds of effort at write time saved us hours of heuristic maintenance and approximately three existential crises.

Show the user what you learned. The intelligence graph isn't a gimmick. In our early testing, creators would see observations they disagreed with ("Creator prefers formal tone" when they thought they were casual) and immediately edit a few more drafts to correct the signal. The memory system self-corrected because the creator could see the model's assumptions and naturally generated counter-evidence. Transparency creates a better feedback loop than any amount of prompt engineering. Your users will train your system for free if you just show them what it thinks.

Fail gracefully at every stage. YouTube throttles downloads. Whisper sometimes hallucinates timestamps (it once confidently transcribed silence as a TED talk). Claude occasionally returns malformed JSON that would make a parser weep. Every stage of our pipeline has a fallback: failed clips become YouTube embeds, missing memories trigger universal heuristics, broken JSON gets regex-parsed as a last resort. The system never shows a blank screen. It always shows something, and that something improves as the infrastructure cooperates. Ship the 80% solution. The remaining 20% will fix itself when you're not looking.