惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
G
GRAHAM CLULEY
P
Privacy & Cybersecurity Law Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
宝玉的分享
宝玉的分享
P
Proofpoint News Feed
H
Help Net Security
V
Visual Studio Blog
阮一峰的网络日志
阮一峰的网络日志
C
Cisco Blogs
人人都是产品经理
人人都是产品经理
Know Your Adversary
Know Your Adversary
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Recorded Future
Recorded Future
I
Intezer
罗磊的独立博客
T
The Exploit Database - CXSecurity.com
Blog — PlanetScale
Blog — PlanetScale
Malwarebytes
Malwarebytes
Spread Privacy
Spread Privacy
T
Tor Project blog
V
Vulnerabilities – Threatpost
云风的 BLOG
云风的 BLOG
腾讯CDC
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
F
Future of Privacy Forum
MyScale Blog
MyScale Blog
Latest news
Latest news
IT之家
IT之家
MongoDB | Blog
MongoDB | Blog
The Hacker News
The Hacker News
S
Securelist
博客园 - 【当耐特】
C
CXSECURITY Database RSS Feed - CXSecurity.com
T
Threat Research - Cisco Blogs
Jina AI
Jina AI
Cisco Talos Blog
Cisco Talos Blog
B
Blog
博客园 - 三生石上(FineUI控件)
Last Week in AI
Last Week in AI
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
M
MIT News - Artificial intelligence
V
V2EX
D
Darknet – Hacking Tools, Hacker News & Cyber Security
The Cloudflare Blog
The GitHub Blog
The GitHub Blog
博客园 - 聂微东
F
Full Disclosure
C
CERT Recently Published Vulnerability Notes

DEV Community

Build a Real-Time Excalidraw-like Collaborative Canvas using Velt MCP and Antigravity🎉 Using Reddit to Validate SaaS Ideas Before Building How We Built an AI That Evolves Alongside a Creator Through Memory Three Design Decisions That Shaped the Enterprise RAG Retrieval Pipeline How React's Virtual DOM Works Under the Hood Build a Dropbox Paper-Style Collaborative Editor with Next.js and Velt💥 Holy Typos, Batman! How I Built 'SpellJump' How to Test Frontend Error States Without Breaking Your Backend A .NET Dinosaur in Web3. Day 8 — Reading & Writing — WishList Chain Building AI Digital Employees with Markus: An Open-Source Platform for Agent Teams [Boost] The Auditor — High-Reasoning Synthesis and the Ethics of Governance Building 'Offline Brain': How I Wrote My First Custom Agent Skill for Android (Google I/O 2026) 📱🧠 Building a Superhuman-Style Collaborative Email Editor with Next.js and Velt🔥 I Built an On-Chain Marketplace Where AI Agents Solve GitHub Bounties for USDC Three Stripe subscription patterns I locked in before going live (with code) Six Ways AI Agents Communicate in 2026. I Benchmarked All of Them. Building AI Digital Employees with Markus: An Open-Source AI Workforce Platform I built a tool that detects broken security headers, missing robots.txt, and WP_DEBUG=true — then opens a PR to fix them automatically NIST Just Exposed the Age Estimation Number Vendors Don't Want You to See Authentication Looks Easy - Until You Build It for Real Users I Built a Free Stock Market Game You Can Play Right Now — No Login, No Download GitHub Agentic Workflows: Building Self-Healing CI for .NET Building a No-Code AI Agent for WooCommerce Order Analytics with Flowise & HPOS Your AI Coding Agent Has Been Flying Blind. Google I/O 2026 Just Fixed That I built a CLI that eliminates README reading forever Measuring AI Gateway Failover: 30 Days of Production Data The Folly of Global AI Platforms: Or How We Built a System That Actually Works in Cameroon Week 9 The 10-Minute Race: Scaling the "Cancel Order" Button to 100K+ Requests Per Second SQL Performance: Indexing, Query Tuning & Explain Plans (Developer Guide) Tutorial: This AI Now Tells You if a Meeting Could Be an Email Why I Got Tired of Class-Heavy UI Code and Started Building Around Attributes GitHub Is No Longer a Place for Serious Work Build an AI-Powered Developer Portal with Backstage and .NET Updates to developer experience on Setapp Node.Js Express CRUD template Lint Your Phishing Templates Like You Lint Your Code From Code to Cloud: 3 Labs for Deploying Your AI Agent I built Voice2Sub: a local AI subtitle generator for video and audio The OCR Rabbit Hole Built a 100k-Document RAG System by Hand. Hermes Read the Architecture in 47 Seconds. I tried monetizing my MCP server with x402 — production needs more than npm install Understanding Tracking Dimensions in Accounting Integrations I Ran My Local, NOT AI, AI Code Auditor on Its Own Source Code Agent Surface Map: Gemma 4 review before you install an MCP Stop Being Nice, Start Being Right": The Day My User Reconfigured My Reward Function Building a Database Performance Testing Tool With AI: The Honest Breakdown Hot To Run LLMs Locally Research blockchain with post-quantum Dilithium and custom zk-STARKs from scratch AI agents do not just need tool access. They need execution control. The CTO’s Blueprint for Governing Multi-Agent AI Systems in the Enterprise I audited our CMS and 86% of our articles were invisible. A Sanity gotcha. Upselling Explained Industry-Specific Tactics for EC Owners 2026 I Keep Hermes Agent's Self-Improvement OFF For the First 14 Days — Here's What Happens When I Don't I Built the Hermes + Claude Code Dual-Stack: Orchestrator Meets Coder — Here's the Full Architecture Stop Using .iterrows(). Here's What Actually Fast Looks Like I Built a SaaS to Stop the Awkward "Hey, Did You Get My Invoice?" Conversation I Renamed a Hot Postgres Table Without Dropping a Request How to Build a Self-Hosted AI Gateway With LiteLLM and Open WebUI What is a Webhook? A Complete Guide for Beginners Headless BI: How a Universal Semantic Layer Replaces Tool-Specific Models Beyond Translation: A Developer's Guide to App Localization (i18n & l10n) Aegis: Designing an Offline Ambient Co-Working Companion for High-Burnout Medical and STEM Grinds Local LLM Code Completion Showdown: Zed AI vs Continue vs Cursor (Honest 2026 Review) The Agentic Payment Protocol Wars Your No-Code AI Agent Has a Memory Problem The Agentic Payment Protocol Wars How to Bypass LinkedIn Commercial Use Limit in 2026 (Without Paying $150/mo) We built a statechart hosting platform where two actors in the same state can migrate to different versions — here's why that matters Playwright vs TWD: A Frontend Developer's Honest Comparison Claude Code's skillListingBudgetFraction: The Undocumented Setting Silently Killing Half Your Skills O GitHub pode mudar sua carreira mais do que você imagina Just redesigned and launched my developer portfolio 🚀 Would genuinely love some honest feedback from the dev community 👨‍💻 Data Virtualization and the Semantic Layer: Query Without Copying Launching opub: donated compute for open-source maintainers Four iteration rounds on a security scanner I run, all of them visible. Here is what the loop actually looks like. Why Good Abstractions Make Debugging Harder Found a Coordinated Inauthentic Network on GitHub: 24 Accounts, Fabricated History, and a Generator That Left Its PID in Three READMEs Cursor Just Released Composer 2.5. Here's What Actually Changed for AI Coding Agents. What Wrong Docs Cost Test Automation Teams Export Your DeepSeek Chats to Word, PDF, Google Docs, Markdown & Notion in One Click When the Docs Lie OpenShift Observability: Built-in vs. Bring-Your-Own If your AI initiative is pending for 6 months, the bottleneck is probably not technology Hermes Agent Under the Hood: The Open-Source Runtime for Autonomous AI Systems Expert Systems -The AI That Existed Before AI Was Cool AI-generated accessibility, an update — frontier models still fail, but skills change the game My HTML Learning Journey 🚀 The Day PayPal Failed and the Rust Rewrite Saved the Product Launch Google Sheets CRM: 4 Ways I've Actually Done It (with Apps Script Code) BrontoScope: AI-Powered Error Investigations The job of an AI engineer inside a 40-person company is not what most CEOs think it is Building a Clinical Speech-Therapy App With a Real SLP: 4 Lessons From PhoenixSteps 7 overlooked .Net features How Stripe Took 48 Hours and 3 API Calls to Break My Freelance Income Stream in Lagos Pretty normal Both Camps in the 'Left Behind' Argument Are Right About Each Other Flutter MCP Toolkit v3 Google Just Shipped Gemini 3.5 Flash. Here's What Developers Actually Need to Know.
Building a Self-Hosted AI WhatsApp Agent for Structured Invoice Extraction
Rohan · 2026-05-22 · via DEV Community

Rohan

As an engineering manager and developer, I constantly look for ways to eliminate repetitive business friction using automation. One of the most common manual bottlenecks is bookkeeping—specifically, reading utility bills or vendor invoices and logging them into financial trackers.

To solve this, I built a production-ready, self-hosted AI Agent using a pure automation stack. It allows users to simply snap a photo of an invoice, send it over WhatsApp, and have the structured data extracted and logged automatically in seconds.

Here is a breakdown of the actual system architecture, the code nodes, and how to handle it using an AI-first approach.


The System Architecture

A robust automation pipeline requires strict separation of concerns. This entire Proof of Concept (POC) runs without an external Node.js server, relying entirely on a self-hosted orchestration engine:

  • Ingress: WhatsApp Business API Cloud Webhooks (configured with a temporary developer test number).
  • Orchestration & Data Flow: Self-hosted n8n environment.
  • Cognitive Layer: Gemini 1.5 Flash (chosen for its native multimodal capabilities, large context window, and fast inference speeds).
  • Data Formatting: Native n8n Code Nodes running isolated JavaScript.

[WhatsApp Client] ──> [Cloud Webhook] ──> [n8n Code Node (JS)] ──> [Gemini 1.5 Flash] ──> [Database/Sheet]


Prerequisites & WhatsApp Developer Setup

To build this yourself, you don't need a paid enterprise WhatsApp account immediately. You can start building today using Meta's developer ecosystem:

  1. Create a Developer Account: Head to the Meta for Developers Portal and register.
  2. App Creation: Follow the Meta App Setup Guide to create a new App. Select Other -> Business as your app type.
  3. Add WhatsApp Product: Inside your app dashboard, add the WhatsApp product. Meta will instantly provision a free temporary test number and a test Business Account (WABA) for development.
  4. Configure Webhooks: Point Meta's webhooks to your n8n production webhook URL to start capturing real-time message payloads.

Step 1: Handling Multimodal Input via Webhooks

When a user uploads an invoice image or PDF via WhatsApp, the webhook doesn't deliver the file directly; it delivers a media ID.

We use an HTTP Request node to securely request the download URL using your Meta Access Token, pull the binary data, and hold it in the workflow memory to pass directly to our AI node.


Step 2: Designing the System Prompt for Document Extraction

Instead of relying on fragile, traditional OCR software that breaks if a vendor moves a logo, we pass the raw image straight to Gemini 1.5 Flash. The magic lies in instructing the model to act as a strict data parser.

Here is the sample System Prompt structure used in the model configuration:

"You are an expert financial data extraction AI. Your sole task is to analyze the provided invoice or utility bill image and extract data with absolute accuracy.

Follow these strict formatting rules:

  1. Extract the primary Vendor Name.
  2. Extract the Invoice Date and normalize it into standard YYYY-MM-DD format.
  3. Extract the Grand Total Amount as a pure floating-point number.
  4. Extract individual line items into an array containing description, quantity, and total price.

Do not include any conversational text, markdown formatting, or wrappers. Your response must be a single, raw, valid JSON object."


Step 3: Formatting Data with n8n Code Nodes (JavaScript)

Once Gemini returns the text string, we avoid spinning up an external server or application layer. Instead, we pipe the model's output directly into an internal n8n Code Node configured to execute JavaScript.

This node isolates the extraction logic, runs a quick safety validation, and formats the properties perfectly for our target database or sheet.

Here is the exact layout of the JavaScript snippet inside the n8n Code Node:

// Loop through incoming items from the Gemini node
for (const item of $input.all()) {
  try {
    // Parse the raw string response from the AI model
    const extractedData = JSON.parse(item.json.output);

    // Format and return the details explicitly for downstream nodes
    item.json.formattedInvoice = {
      vendor: extractedData.vendor_name || 'Unknown Vendor',
      date: extractedData.invoice_date || new Date().toISOString().split('T')[0],
      total: parseFloat(extractedData.grand_total) || 0.0,
      lineItems: extractedData.line_items || []
    };
  } catch (error) {
    // Handle parsing errors safely if the response was malformed
    item.json.formattedInvoice = {
      vendor: 'Parsing Error',
      error: error.message,
      rawOutput: item.json.output
    };
  }
}

return $input.all();

Enter fullscreen mode Exit fullscreen mode

By keeping this execution inside the internal JavaScript node, the data mapping remains incredibly fast, entirely self-contained, and exceptionally easy to debug directly from the execution logs.

Handling Edge Cases in Production

Building a POC is easy; making it production-ready is where the real engineering begins. When deploying this system, you must design around:

  1. Rate Limiting: Managing concurrent incoming webhooks from active users by placing a lightweight Redis queue ahead of the API calls.

  2. Data Security: Ensuring that processed invoice data is deleted from local temporary server storage immediately after database insertion to protect PII.

What's Next?

This pattern proves that you can build highly sophisticated AI Agents entirely inside a visual workflow manager when paired with robust system prompts and minor inline JavaScript engineering. Next up, I am scaling this to run automated cross-matching against incoming banking transactions.

The full workflow JSON schema will be open-sourced on my GitHub profile soon. Follow along as I map out more production-ready AI engineering architectures.