Everyone talks about deploying ML on edge devices. Very few people show what happens when you actually try.
I ran a full benchmark of four lightweight transformer models - DistilBERT, MobileBERT, TinyBERT-6L, and TinyBERT-4L — against traditional ML baselines on three real-world fault detection datasets.
The Setup
- NASA C-MAPSS: Turbofan engine degradation (20,631 samples, 15% failure rate)
- SECOM: Semiconductor manufacturing (1,567 samples, 6.6% failure rate)
- UCI Predictive Maintenance: Industrial machine failure (10,000 samples, 3.4% failure rate)
All experiments ran on a T4 GPU with consistent hyperparameters.
The Results
| Model | F1 | Size | CPU Latency |
|---|---|---|---|
| XGBoost | 87.9% | 0.5 MB | 0.002 ms |
| TinyBERT-4L | 87.8% | 55 MB | 18 ms |
| DistilBERT | 87.6% | 255 MB | 138 ms |
MobileBERT: The Surprise Failure
MobileBERT — specifically designed for mobile deployment — scored 0% F1 on every dataset. It predicted the majority class for every sample across all configurations.
“Designed for mobile” does not mean “works for your use case.”
The Adaptive Pipeline
The most promising result came from combining models:
- Quantized TinyBERT-4L handles confident predictions
- DistilBERT steps in only for uncertain cases
- 87.6% F1 with 97.9% of samples handled by the lightweight model
- 19.5 ms average latency instead of 138 ms
Key Takeaways
- Start with XGBoost for tabular data — a 0.5MB model beating 255MB transformers is hard to ignore.
- TinyBERT-4L is the edge sweet spot — smallest transformer with near-best accuracy.
- Quantize aggressively — INT8 cuts size significantly with minimal loss.
- Use adaptive pipelines — route easy predictions through small models, escalate only when needed.
- Class imbalance is still unsolved — SECOM remained extremely difficult across all models.
Code
All code and results:
https://github.com/disha8611/edge-fault-detection-benchmark
Previous research on LLM-based anomaly detection:
https://arxiv.org/abs/2604.12218
Disha Patel — Software Engineer & ML Researcher. I write about engineering, on-device ML, and building systems that work in the real world.
























