Last Updated on June 14, 2026 by
Author(s): Sai Bhargav Rallapalli
Originally published on Towards AI.
How I built a 98.8% accurate prediction model — and discovered that the “cleanest” fuel is hiding a dirty secret
When the Global Automotive Council wants to reduce vehicle emissions, where do they start? Do they target fuel types? Engine sizes? Vehicle classes? The answer, it turns out, is not as straightforward as you’d think — and the data tells a story that completely contradicts common intuition.

The article walks through a CO₂ emissions data science workflow: starting with a dataset of 7,000+ vehicles, the author cleans duplicates and verifies target distribution, then addresses multicollinearity (dropping redundant fuel-consumption columns) using variance inflation factor and Ridge regression for stability. They argue against removing high-emission outliers because those “top 1%” vehicles represent the category policy makers most need to regulate. The core result overturns raw fuel-type averages via Simpson’s Paradox: ethanol (E85) appears worst when averaged, but once the model controls for engine size and fuel consumption, ethanol is actually the cleanest fuel in the dataset—its benefit is “hidden” because it’s used in larger, higher-consuming engines. The author describes building a scikit-learn pipeline with one-hot encoding and evaluating performance (very high R², low error), then shows model weaknesses concentrated in rare alternative fuel categories and proposes policy recommendations tying fuel mandates to vehicle/engine constraints and targeted actions against super-emitters.
Read the full blog for free on Medium.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.




























