惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - 司徒正美
aimingoo的专栏
aimingoo的专栏
MongoDB | Blog
MongoDB | Blog
云风的 BLOG
云风的 BLOG
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
酷 壳 – CoolShell
酷 壳 – CoolShell
博客园 - 聂微东
Y
Y Combinator Blog
T
Tailwind CSS Blog
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
S
SegmentFault 最新的问题
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - 【当耐特】
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
IntelliJ IDEA : IntelliJ IDEA – the Leading IDE for Professional Development in Java and Kotlin | The JetBrains Blog
J
Java Code Geeks
美团技术团队
Google DeepMind News
Google DeepMind News
博客园_首页
Apple Machine Learning Research
Apple Machine Learning Research
T
The Blog of Author Tim Ferriss

DEV Community

Building CogniPlan: A Local-First Task Planning System Using Apache Iceberg with Python and MPP Query Engines I built CodeArchy: an open-source that turns any codebase into a visual, explainable architectural experience, powered by Gemma 4. The Day Our Bot Ran Out of Money How we're using Gemini Embeddings to build a smarter, community-driven feed on DEV The Speculative Decoding Pattern The PKCE "Gotcha" in Expo’s exchangeCodeAsync TharVA : Keeping India's Desert Heritage Alive with Offline AI (Gemma4) n8n for Healthcare: 5 Automations for Clinics, Practices, and Health Tech Teams (Free Workflow JSON) How I Built an OWASP Memory Guard for AI Agents (ASI06) I Tested Spam Protection on Formspree vs Formgrid. The Results Were Surprising. May 27 - Video Understanding Workshop Beyond Keywords: How Google's 2026 Algorithms are Redefining SEO From Click to Cart: Ensuring an Accessible Customer Journey in WooCommerce Your company won't replace you with good AI. They'll replace you with bad AI. How to Use an SVG Icon Search Engine as a Claude Custom Connector O fim do “modelo que faz tudo”? Conheça o Conductor, a IA que orquestra outras IAs 10 First-Principles Strategies to Learn Any Programming Language Deeply 10 First-Principles Strategies to Learn Any Programming Language Deeply Understanding Embeddings easily. The Hidden Cost of “Move Fast and Break Things” Why Your Logs Are Useless Without Traces DressCode: Your AI Stylist for Tomorrow The Documented Shortcoming of Our Production Treasure Hunt Engine I'm 16, and I Built an AI Tool That Audits Your Technical Debt Without Ever Touching code Building Your Own Crypto Poker Bot: A Developer's Guide to Blockchain Gaming Logic Apache Iceberg Metadata Tables: Querying the Internals Hermes, The Self-Improving Agent You Can Actually Run Yourself Unity vs Unreal: 5 Things I Had to Relearn the Hard Way Building Agentic Commerce Infrastructure: Overcoming SQLite Concurrency for Autonomous Procurement Agents Solana Accounts vs Databases HTML Table Borders I built a skill that makes AI-generated AWS diagrams actually usable My first post! I'm kinda excited The Page Root Was the Wrong Unit How to audit what your IDE extension actually sends to the cloud I Migrated 23 Make.com Scenarios to n8n and Cut My Bill by 60% — Complete Migration Guide (2026) Solving a Logistics Problem Using Genetic Algorithms Claude Code Skills Explained: What They Are & When to Use Them (2026) Maintaining Apache Iceberg Tables: Compaction, Expiry, and Cleanup Zero-Idle Local LLMs: Running Llama 3 in AWS Lambda Containers We scanned 8 B2B SaaS companies across 5 categories. ChatGPT named the same 12 brands in every answer. How To "Market" Yourself As A Tech Pro We scanned 500 MCP servers on Smithery. Here is what we found. HTML Basics for Beginners – Markup Language, Elements and Types of CSS DiffWhisperer: How I Turned Cryptic Git Diffs into Architectural Stories with Gemma 4 I built a version manager for llama.cpp using nothing but vibe coding. Unit Testing vs System Testing: Key Differences, Use Cases, and Best Practices for 2026 A game design textbook explains why products with fewer features win How to Build a Raydium Launchpad Bonding Curve in 5 Minutes with forgekit How to turn an AI prototype into a production system How Data Lake Table Storage Degrades Over Time Partition and Sort Keys on DynamoDB: Modeling data for batch-and-stream convergence Auto-Generate Optimized GitHub Actions Workflows For Any Stack With This New CLI Tool Unchaining the African Creator Economy The Treasure Hunt Engine Gotcha - A Lesson in Constrained Performance great_cto v2.17 - no more tambourine dance When Catalogs Are Embedded in Storage SafeMind AI: Instant Health & Safety Intelligence What Is PKCE, How It Works & Flow Examples AI Agent Failure Modes Beyond Hallucination Fastest Way to Understand Stryker Solana Accounts Explained to a Web2 Developer TV Yayın Akışı Sitesi Geliştirirken Öğrendiğim Teknik Dersler $500 Challenge Drop My First Look at Google's Gemma 4: A Quick Introduction How I use an LLM as a translation judge Best Calendar and Scheduling API for Developers — 2026 Comparison Agentic AI in Travel: Why UCP Isn't Travel-Ready Yet — and What We Measured I Finished Machine Learning. And Then Changed The Plan. The Five-Thousand-Line File The AI Whirlwind: Why Your Local Agent Matters More Than Ever I Built an Oracle DBA That Lives in Telegram. It Cut a 500K-Row Scan to 5 - After Asking Permission. The Day 2 Reality of Running a Kubernetes Lab on Your Mac: Stop/Start, CKS Scenarios, and What I Learned Building It. n8n for Airtable Power Users: 5 Automations That Take Your Base to the Next Level Validating Gemma 4 for Industrial IoT: A Governance Pattern VS Code Now Credits Copilot on Every Commit by Default Astro and Islands Architecture: Why Your Portfolio Doesn't Need React for Everything Booting from FAT12: How I added file reading to my x86 kernel Unity’s AI agent went public: the developers of a static analysis tool on what that means for code quality Anna's Archive publica un llms.txt para los LLMs que rastrean su catálogo CRDTs for Offline-First Mobile Sync Why I Built Mneme HQ: Preventing AI Agent Architectural Drift Google Antigravity 2.0 Is the I/O 2026 Announcement You Should Actually Care About I Built a Pay-Per-Call Crypto Signal API with x402 — Heres the Architecture JWT Token Refresh Patterns in React 19: Avoiding the Silent Auth Death Spiral 🚀 “From Prompts to Autonomous Agents: What Google I/O 2026 Changed” The Power of Distributed Consensus in Autonomous SOCs Sixteen TUI components, copy-paste, no dependency The Boring Reliability Layer Every Autonomous Agent Needs Nven - Secret manager Building Multi-Tenant Row-Level Security in PostgreSQL: A Production Pattern The Hardest Part of Being a Developer Isn't Coding Building Vylo — Looking for Collaborators, Partners & Early Support I Thought Memory Fades With Time. It Actually Fades With Information. ORA-00064 오류 원인과 해결 방법 완벽 가이드 I registered an AI agent at 1 AM and something cracked open in my head Pitch: Nven - Sync secrets. Ship faster. Why y=mx+b is the heart of AI From Routines to a Crew — Building a System That Plans Its Own Work & executes it
Condition-Based vs Time-Based Maintenance: Making the Switch
Guatu · 2026-05-23 · via DEV Community

I spent a weekend reviewing a maintenance log for a conveyor system that was costing thousands in "preventative" parts replacements every quarter, only to find that the technicians were throwing away bearings that had 60% of their life left. At the same time, a motor had burned out three weeks before its scheduled service because it had been running hot for a month, but the calendar said it wasn't time to check it yet.

Time-based maintenance is a gamble where you bet that the average failure rate of a component matches the actual failure rate of your specific machine. In the real world, that bet usually loses.

If you're managing industrial assets, you've likely lived through this. You either over-maintain, wasting money and introducing "infant mortality" failures by disturbing a working system, or you under-maintain and deal with unplanned downtime. The move to Condition-Based Maintenance (CBM) is the only way out, but the gap between the theory of "predictive maintenance" and a working system on the factory floor is massive.

What I tried first

My first attempt at CBM was naive. I thought I could just slap a few sensors on the equipment, pipe the data into a dashboard, and let the operators decide when to perform maintenance. I set up a basic MQTT pipeline using Mosquitto (which I've written about before regarding broker selection) and pushed raw vibration and temperature data to a Grafana dashboard.

It failed miserably.

First, I created a "noise apocalypse." I had alerts firing every time a sensor spiked for a millisecond due to electrical noise. The operators started ignoring the alerts entirely. Second, I didn't define what "bad" actually looked like. I was giving them raw data, not actionable intelligence. An operator doesn't care if a motor is at 72 degrees Celsius; they care if 72 degrees is a 10% increase over the baseline for that specific load.

I also tried to automate the ticketing system using simple cron jobs that checked for thresholds every hour. This led to a flood of "HEARTBEAT_OK" messages in the logs and redundant tickets. I was basically just building a more expensive version of a time-based system, just with different triggers.

The actual solution

The shift happens when you stop treating sensors as "alarms" and start treating them as "state providers." You need a pipeline that filters noise, establishes a baseline, and triggers actions based on deviations rather than arbitrary numbers.

1. Filtering the Noise

Instead of raw thresholds, I implemented a sliding window average. If you're using Python for your edge processing, don't just trigger on val > threshold. Use a buffer.

import collections

class SensorMonitor:
    def __init__(self, threshold, window_size=10):
        self.threshold = threshold
        self.window = collections.deque(maxlen=window_size)

    def is_anomaly(self, current_value):
        self.window.append(current_value)
        if len(self.window) < self.window.maxlen:
            return False

        # Calculate moving average to ignore transient spikes
        avg = sum(self.window) / len(self.window)
        return avg > self.threshold

# Example: Triggering maintenance only if the average 
# vibration stays high over 10 readings
monitor = SensorMonitor(threshold=10.5) 
if monitor.is_anomaly(current_vibration):
    trigger_maintenance_alert("Sustained high vibration detected")

Enter fullscreen mode Exit fullscreen mode

2. Condition-Based Escalation Rules

Once the data is clean, you can't just send an email. You need escalation logic that understands the context. I moved away from simple cron-based alerts to a condition-based rule engine. This is similar to how I handle equipment health scoring, where we consolidate multiple signals into one status.

Here is how I structured the escalation logic in the configuration:

# Condition-based escalation rules for maintenance tickets
escalation_rules:
  - condition: "sensor.vibration > 12.0 AND asset.criticality == 'high'"
    action: "immediate_dispatch"
    priority: 1
  - condition: "sensor.temp_deviation > 15% AND ticket.age > 4h"
    action: "notify_maintenance_lead"
    priority: 2
  - condition: "sensor.vibration > 8.0 AND ticket.age > 24h"
    action: "schedule_inspection_next_shift"
    priority: 3

Enter fullscreen mode Exit fullscreen mode

3. Optimizing the Alerting Pipeline

To stop the "alert fatigue" I mentioned earlier, I overhauled the cron jobs that monitored the system health. I stopped the unconditional "Everything is OK" messages and moved to a "silent success" model.

# Optimized payload for condition-based alerting
# Only sends a notification if the status is not 'success'
payload:
  message: "Asset {{ asset_id }} monitoring failed with status: {{ status }}"
  condition: "status != 'success'"
  # The system remains silent if the condition is met (success)
  reply: "All assets healthy" if status == 'success'

Enter fullscreen mode Exit fullscreen mode

Why it works

Time-based maintenance assumes a linear degradation of parts. In reality, degradation is stochastic. A bearing might last 10,000 hours or 100 hours depending on the lubrication quality and the load it carries.

By moving to CBM, you're monitoring the actual degradation. When you track vibration (using the architecture I've detailed here), you're seeing the physical manifestation of wear (pitting, spalling, or misalignment) long before the part actually fails.

The logic of using a sliding window and deviation-based thresholds works because it separates the signal from the noise. In an industrial environment, electrical interference is a constant. A single high reading is usually a fluke; a sustained increase in the moving average is a mechanical reality.

also, the condition-based escalation rules prevent the "crying wolf" effect. By tying the action to both the sensor value and the asset's criticality, you ensure that the maintenance team only drops what they're doing when it actually matters.

Lessons learned

The biggest surprise was that the hardware wasn't the hard part. Getting the sensors to talk via MQTT is trivial. The hard part is the cultural shift. Operators who have spent twenty years changing oil every three months don't trust a dashboard telling them they can wait another two months.

If I did this again, I'd start with a "shadow period." I would run the CBM system in parallel with the time-based schedule for six months. I'd log every time the CBM system predicted a failure and every time the time-based schedule replaced a perfectly good part. Having that data is the only way to convince a skeptical plant manager to change the schedule.

A few other caveats:

  • Sensor Drift: Sensors fail too. If you rely solely on CBM, a failing sensor can look like a failing motor. You still need a basic time-based schedule for sensor calibration.
  • The "Silent" Trap: When you move to a "silent success" alerting model, you run the risk of not knowing if your monitoring system has died. I fixed this by implementing a dead-man's switch (heartbeat) that alerts if the monitoring service itself stops reporting.
  • Data Overload: Don't try to monitor everything. Pick the top 20% of assets that cause 80% of your downtime. Trying to implement CBM on every single small fan in the building is a waste of engineering hours.

For those looking to implement this at scale, you can check my services page for consulting on predictive maintenance and IIoT infrastructure. Moving from a calendar to a condition is a steep climb, but it's the only way to stop wasting money on parts that aren't broken.