惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Microsoft Azure Blog
Microsoft Azure Blog
有赞技术团队
有赞技术团队
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
F
Fox-IT International blog
Recorded Future
Recorded Future
T
ThreatConnect
T
The Exploit Database - CXSecurity.com
SecWiki News
SecWiki News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
人人都是产品经理
人人都是产品经理
T
Tenable Blog
L
LINUX DO - 最新话题
博客园_首页
Hugging Face - Blog
Hugging Face - Blog
罗磊的独立博客
博客园 - 司徒正美
The Hacker News
The Hacker News
博客园 - 聂微东
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
Scott Helme
Scott Helme
博客园 - 【当耐特】
O
OpenAI News
Schneier on Security
Schneier on Security
Latest news
Latest news
S
Security @ Cisco Blogs
S
Secure Thoughts
F
Full Disclosure
L
Lohrmann on Cybersecurity
S
SegmentFault 最新的问题
T
Tor Project blog
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
量子位
小众软件
小众软件
T
Threat Research - Cisco Blogs
Simon Willison's Weblog
Simon Willison's Weblog
IT之家
IT之家
大猫的无限游戏
大猫的无限游戏
N
News and Events Feed by Topic
E
Exploit-DB.com RSS Feed
J
Java Code Geeks
Last Week in AI
Last Week in AI
酷 壳 – CoolShell
酷 壳 – CoolShell
Application and Cybersecurity Blog
Application and Cybersecurity Blog
S
Schneier on Security
Cisco Talos Blog
Cisco Talos Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Proofpoint News Feed
Recent Commits to openclaw:main
Recent Commits to openclaw:main
雷峰网
雷峰网

DEV Community

Rugby Fundamentals as Software Concepts - Mapping the Pitch to your Code Base I Let Claude Code Run Unsupervised for 24 Hours. Here's What Happened. Why Zed Is Replacing VS Code in My AI-Augmented Workflow Build a scroll-driven WebGL hero in 30 lines Karpathy's LLM Wiki? No Code with Claude or Github Copilot! Why Platform Governance and Transparency Matter for Developers and Freelancers I built a Flutter CLI that generates Clean Architecture in seconds Using an LLM to automate a task that used to take hours by hand CyberArena – Interactive Cyber Security Simulation & Threat Analysis Platform Mathematical Functions in CSS: clamp, min, max and How They Simplify Responsiveness Polyglot Persistence in Microservices: Let the Domain Choose the Database 190 Countries, Zero API Calls: Shipping Static Data in a Chrome Extension Your AI Writes Code Fast. Here’s How to Check It Before Shipping qwen2.5-coder is too slow for Claude Code on a Mac. Here's the fix. Building Automated Text-to-Video Pipelines with AI Can Gemini Become an Offline AI Tutor? Lessons from Building Educational AI OPRIX : From a simple messaging web app to a well structured and enhanced UI messaging web app Why React + TypeScript Nullability Slowly Becomes Exhausting Why AI Agents Need a Project Layer - Part 1 Stop Hand-Editing MCP Configs: A Zero-Dependency Go CLI What I Learned Working With Microsoft, SQUAD(GTCO), and Different Tech Communities 🧠 Hermes Agent Assistant — A Modular AI Agent System with Planner, Executor & Memory Spring Boot Auto-Configuration Source Code: Nail This Interview Question The Ultimate Guide to Free AI API Keys: 6 Platforms You Need to Know Why 91% of AI Agents Fail in Production (And What the 9% Do Differently) TryHackMe | Battery | WALKTHROUGH Stop Guessing Your Regex — Test It Live in the Browser I Built FreelancEye, an Open-Source Mobile PWA for Finding Clients Beyond the Hype: My Production Playbook for Docker Swarm Top AI App Builder Platforms with Integrated Backend, Hosting & Database ECS vs EKS in 2026: An Honest Comparison from Someone Who Has Run Both in Production Hardening Your Node.js App Against Supply Chain & Remote Code Execution Attacks linux commands A Practical GEO Case: How an AI System Started Recommending Our Blog Your AI Agent Works 24/7 and Earns $0. I Built the Fix. Your AI Trading Agent Will Lose All Your Money — Here's How To Stop It Google I/O 2026: What Happens When Everything Connects? Why AI writes software but doesn’t build a good product Beyond the Hype: How Google I/O 2026 Secretly Democratized Production-Ready AI Agents with Managed Sandboxes. The Killer Assumption Test: How to Spot Doomed Product Decisions Before You Ship Stop Describing Your Bugs — Just Screenshot Them # I Built an AI Website Builder and Here's What Actually Happened Cooking an AI Campaign in 5 Minutes with Google Cloud AI APIs Your PM Retrospectives Are Lying to You How I Built a Free, Self-Hosted Pipeline That Auto-Generates Faceless YouTube Shorts TypeScript 54 to 58: The Features That Actually Matter in 2026 How to Tailor Your CV to Any Job Posting in 2026 The 7-day SaaS MVP loop: ship fast, then validate with people who actually show up 95. Fine-Tuning LLMs: Make a General Model Do Your Specific Job What Is a Frontend Developer Roadmap and Why You Need One Google shipped three Gemini "Flash" models. Picking the wrong one could 6 your AI bill Building an MCP server so Claude can query my SaaS analytics directly Google I/O 2026 and the Rise of the AI Ecosystem Your Docker Builds Are Slow Because You're Doing It Wrong (And I Built a Tool to Prove It) How do you verify GitHub contributions without trusting self-reported skills? CV vs Resume: What's the Difference and Which Do You Need? student Devs: Build AI Agents & Compete for $55K in Prizes 🚀 How to Write a Cover Letter That Actually Gets You Interviews Battle-Tested: What Getting Hacked Taught Me About Web & Cyber Security Unda folders za kuandika code >> mkdir src >> cd src >> mkdir controllers database routes services utils >> cd .. Directory: C:\Users\mwaki\microfinance-system Mode LastWriteTime Length Name Code Coverage .NET AI slop debt" is technical debt on fast forward. Nobody's ready. Multi-Head Latent Attention (MLA) Memoria - A Local AI Reading Companion Powered by Gemma 4 Stop Trusting Your Accuracy Score: A Practical Guide to Evaluating Logistic Regression Models Serious Question: Is the Developer Job Actually in Risk Due to AI? published: true tags: #discuss #career #ai #help rav2d: We ported an AV2 video decoder from C to Rust — here's why Your New Domain's First Week of GA4 Is a Lie: 4 Days of Raw Data from a Launch Gemma Guide - Real-Time Spatial Awareness for Blind Users From YAML to AI Agents: Building Smarter DevOps Pipelines with MCP A Field Guide to Human–AI Relations (For the Newly Bewildered Mortal) The AI Agent That Learns While It Works — A Complete Guide to Hermes Agent Inviting collaborators to work on ArchScope ArchScope is an interactive web-based tool that lets you design, visualize, and test system architectures with real-time performance simulations. Github - ArchScope is an interactive web-based tool that lets you Gemma 4: Google's Open-Weight AI Is a Game Changer for Developers Confessions of a Git Beginner: Why the Terminal Stopped Scaring Me Docker 容器化实战:从零到生产部署 🚀 I Built a Full Stack Miro Clone with Real-Time Collaboration using Next.js Building an African Economic Data Pipeline with Python, DuckDB & World Bank API llms.txt vs robots.txt vs ai.txt: The Developer's Cheat Sheet Intigriti Challenge 0526 Writeup Business Logic Flaws: How Attackers Skip Steps in Your App to Get What They Should Never Have Why Vibe Coders Need Boilerplates to Save Time, Tokens, and Build More Secure SaaS Projects Idle Cloud Cost Is the New Egress Cost Quark's Outlines: Python Traceback Objects Ghost in the Stack (Part 1): Why uninitialized variables remember old data Building a High-Performance Local Chess Assistant Extension with WebAssembly Stockfish and Manifest V3 Breaking the Trade-off Between Self-Custody and Intelligent Automation on the Stellar Network I Open-Sourced a Practical Fullstack Interview Preparation Repository (React + Node + System Design) 🚀 How I Started Coding as a Student (Beginner-Friendly Guide) WordPress vs. Ghost: Why Automated Bot Attacks Are Making us think much I tested 4 AI agent-governance tools against an open spec - here's the matrix zkML Inference Proof: What the Receipt Proves, and What the Model Still Does Not I Scored 1000/1000 on AWS Certified AI Practitioner (AIF-C01) Here's Every Resource I Used Go - Struct and Interface Handling JSON Requests in Go Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS How I Caught and Fixed an N+1 Query in My Django REST API I got tired of paying $10/month to remove image backgrounds – so I built it for free How to Start Coding as a Student: A Complete Beginner’s Guide 🚀 Storing Kamal secrets in AWS Secrets Manager and deploying to a cheap Hetzner VPS
Tile Extractor
somyabhalani · 2026-05-23 · via DEV Community

Parsing the Unparsable: Building a Layout-Aware Computer Vision Pipeline for 50,000+ Stone SKUs

Executive Summary

The stone and marble industry operates on visual catalogs. Manufacturers publish hundreds of pages of PDF catalogs showing marble slabs, tile patterns, texture variations, and dimension tables. For digital inventory platforms and wholesalers, extracting these products to populate databases is a massive bottleneck.

Standard OCR (Optical Character Recognition) tools fail immediately because these catalogs are highly visual, containing complex grid structures where product images are loosely aligned with text descriptions, dimensions, and SKU codes. Ananta Labs was hired to design a layout-aware computer vision and text parsing pipeline that could ingest multi-page catalogs, segment individual product tiles, extract their corresponding text details, and output clean, database-ready JSON arrays. The target was 95%+ accuracy over a database of 50,000+ unique marble and stone SKUs.


The Architecture: Segmentation-First Parsing

Traditional text extraction tools parse documents top-to-bottom, left-to-right. In a product catalog, this approach merges the text of Slab A with the dimensions of Slab B.

To prevent data mismatch, we implemented a segmentation-first approach. Instead of reading the document as text, we treat each catalog page as an image canvas, locate the individual physical grid cells (tiles), isolate them, and then run OCR within the boundaries of each isolated cell.

Project Metrics & Impact

  • Throughput: Processing a standard 100-page catalog (containing roughly 1,200 product variations) took less than 180 seconds.
  • Accuracy: Out of 50,000+ processed stone tiles, our layout segmentation maintained an extraction accuracy of 96.4%.
  • Human Verification: Reduced manual data entry time by 94%, shifting the operator's role from manual transcription to simply reviewing a clean, visual admin UI validation screen.

Step 1: Document Rasterization and Pre-processing

We use PyMuPDF to rasterize incoming PDF pages into high-resolution PNG images (300 DPI) to ensure fine print text is highly legible. The document is converted page-by-page, and zoomed in to optimize the text characters before OCR processing occurs.

Step 2: Contour Detection & Grid Cell Isolation

Catalog pages usually group slab images and SKU data inside visual grid cells or boxes. We use computer vision (OpenCV) to detect these bounding boxes:

  • Binarization: Convert the page image to grayscale and apply adaptive thresholding to isolate boundaries.
  • Morphological Operations: Apply vertical and horizontal kernels to detect solid horizontal and vertical grid lines, creating a clean binary mask of the catalog layout.
  • Contour Extraction: Find contours on the grid mask and filter out shapes that are too small (noise) or too large (page borders).

Step 3: Isolated OCR and Data Normalization

Once we have the coordinates (x, y, w, h) of each tile cell, we crop the image of the stone slab from the top half of the cell, crop the text area from the bottom half, and run OCR exclusively on the cropped text area.

By running OCR on a tiny, isolated box rather than the whole page, we guarantee that the extracted SKU, finish (polished/honed), and size parameters belong only to the stone slab image cropped from the same box.


Key Engineering Challenges Solved

1. The Borderless Grid Problem

Some catalogs do not have visible grid lines; they display product images floating on a white page with text underneath. When morphological grid detection returns zero cells, the pipeline switches to a clustering-based layout analyzer. We use projection profiles (scanning rows and columns for white-space gaps) to programmatically compute virtual grid lanes, establishing bounding coordinate zones dynamically.

2. Text-to-Data Normalization

OCR outputs raw string data like "Volacas Wt (Pol) 60x120cm - SKU9087". We run the OCR output through a regex parser and a light local dictionary matching layer. The parser strips punctuation, standardizes measurements (600x1200mm, 60x120 to standard metric floats), and categorizes stone colors and finishes into database-ready enumerations (Material: Marble, Color: White, Finish: Polished).


Conclusion

Parsing highly visual document layouts requires moving beyond raw character recognition. By merging traditional computer vision techniques (contour detection, morphological thresholding) with targeted localized OCR, Tile Extractor transformed chaotic catalogs into clean, standardized commercial APIs. Building systems that bridge the gap between unstructured visual media and structured databases is at the core of what we do at Ananta Labs.