惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

Part 2: Enterprise Decision Intelligence Architecture: AI Governance, Threshold Policy Engines, and Operational AI Systems I will continue using Devise with Rails 8! The Developer's Guide to Picking the Right AI Code Model in 2026 (I Spent $500 So You Don’t Have To) 30 Kubernetes Tasks Every CKA Candidate Should Practice Before Exam Day Why Some Websites Feel Instantly Better to Use Advanced React Patterns I Wish I Knew 5 Years Ago ¿Cómo optimizar algoritmos en arreglos y listas con la técnica de dos punteros? I scanned 8 popular open source repos with one command. Here's what I found. mcp-probe v1.6.0: Stricter GitHub Actions checks for MCP CI gates How we connect two strangers' webcams fast (and keep the TURN bill small) LLM Agents Are Now Finding Zero-Days: How AI is Autonomously Rewriting the Rules of Vulnerability Research Minimal Code Doesn’t Mean Stable Code How I manage 40+ skills across Claude Code, Codex, and .agents folders Hardening Stealth Browser Fingerprint Integrity and State Persistence Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes How I Slashed My AI API Bill by 92% in 2026 — A Cost Optimizer's Speed Benchmark Guide How I Slashed My AI API Bill by 95% — A Practical Guide for 2026 A Go outbox library that runs inside your own DB transaction How I Built a Credit Optimizer That Saves 30-75% on AI Agent Costs (Open Architecture) The Missing POP: How I Ported a Yul Contract to Huff by Reading Every Opcode The Moment the Config Parser Became the Bottleneck Churn Tool Stack by Revenue Stage ($5K to $50K+) What I Learned Exploring AI-Generated 3D: A Hands-On Tour of Meshy, Tripo, and Three.js Day 15 - Software Composition Analysis(SCA) Contributing Upstream Instead of Forking: My grape-swagger-rails Story Behind The Badge: How We Built 2,000 Hackable Badges For Temporal Replay Access Control Doesn't Scale Linearly -- Part 3 33x faster than Rust: Why I stopped waiting for my compiler and built my own. I Built My First Production AWS Project as a Career Changer Why Detecting PII Matters More Than Ever JSON Schema in 10 Minutes — Validation, Types & Real Examples Python Tasks How I Started My Cybersecurity Journey as an SQA Engineer 🔐 Why "fancy fonts" in Discord and Instagram bios turn into boxes ☁️ GKE private cluster setup — common mistakes and how to avoid them I Thought a Username Didn’t Matter… Until I Saw How Much People Care About It Claude for Small Business: 382K Day-One Buyer's Guide I Built a Diagnostic Toolkit for PyTorch Because I Was Tired of Guessing Why Models Fail How I Built an AI-Powered Incident RCA Platform with LangGraph and RAG The Paywall Was a Painted Door Sonnet hallucinated. My agent stored it as fact. How React-Style Time-Slicing Keeps UIs Responsive 这个 Princeton 开源项目让 AI 自己修 Bug,19K Stars 但 90% 的人只用了 1% 功能 🔥 SWE-agent's 5 Hidden Uses Nobody Told You About 🔥 Decompiling Serial Number U-36: Python TERCOM Reconstruction, Cryptographic Logistical Forensics, and Swarm Consensus Fault Tolerance Microservices Patterns You Cannot Outrun a Wave I Fired My Entire Node.js Stack — Rust Rebuilt It in 3 Weeks (The Ugly Truth) BoxAgnts Introduction (2) — AI Agent Toolbox Cursor 3 ships parallel AI agents. Here is the multi-agent workflow that actually works. Prisma-7 A Complete Beginners Guide (With Free Cloud Database!) Akses HDD Rumah dari Laptop Kantor Pakai Tailscale + SMB (Tanpa VPN Ribet) Content Pipeline in MonoGame: Why I Don't Use It Debug Log #1 — The Pipeline That Looked Broken Data Structures in JavaScript: When to Use What (2026) BGP Route Flap Damping: A Solution or a New Problem? First look at AWS DevOps Agent The Next Big “Cult App” Probably Isn’t Another Social Media Platform From Template to Production-Shaped: An AI-Native Dev Flow for Go Side Projects Idempotency Keys: The API Pattern That Saves You From Duplicate Payments and Phantom Records Everyone's Building Jarvis. Nobody's Even Close. The Moment the Jaeger Tracer Exhausted Itself and What We Switched To How to Fix Tool-Use Loops in Autonomous Coding Agents Months of self-testing: Citations shine, other features remain unproven. Claude Code for Canary Deployments: How I Ship to 1% of Users Before Breaking Everything Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET) 20 Years of GPUs in Numbers: How FLOPS & TDP Grew, and Who Led the NVIDIA vs AMD Race (open dataset, 13.5k GPUs) Espressif Reveals CoreBoard and Korvo Dev Kits for ESP32-S31 Composable Abstraction Layer: o pattern que faltava entre Pinia e seus componentes Vue Your GitHub Actions Logs Are Leaking LLM Keys and Your SIEM Isn't Catching It Solving Complex Logic with Claude and Research Papers Building TheEpicBook: A Deep Dive into a Node.js Monolithic Web Application Haber yazilimi, haber scripti, haber sistemi: ayni urun, uc ayri arama niyeti Predicting Blood Glucose Fluctuations: Building a Transformer-based CGM Forecaster with PyTorch & InfluxDB Pre-task hooks: the one-line wire-up that gives your Hono agent shared memory Concurrent writes to a shared agent memory: what we shipped, what we punted on Building a Production Serverless URL Shortener on AWS — 21 Articles, Every Test Run for Real My CKA Cheat Sheet: Commands, Aliases, and Documentation Tricks I Used During the Exam Frontend Engineering Beyond Pixels: The Architecture of Digital Accessibility VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA into a Fancy Imitation Learner Fabric AI Functions Turn GenAI Into a Data Pipeline Step Proximate vs Ultimate: The Bug Is Never Just the Bug The Treasure Hunt Engine That Broke Before the Traffic Did Reset Windows Update: The Definitive MSP Guide to RWU Your Resume Was Never Built for This AI Writes 46% of Code Now: What Snap's Layoffs Mean for Developers in 2026 From Chatbot to Agent — Tool Calling with NVIDIA NIM Fatigue and Fracture Mechanics: Why Parts Break Below Their Yield Strength I built a token-level debugger for comparing two LLMs VCP-Virtual Private Cloud Embedding sing-box in an iOS messenger to bypass Russian DPI (no VPN) Microsoft Copilot just exfiltrated a company's files. The attack was one email. Here's the mechanism. RAG 시스템 실전 구축 (v42) copilot cloud agent is becoming an automation api Cx Dev Log — 2026-04-23 Why Tesla Is Becoming the AI Enterprise Case Study Every Leader Should Understand ORA-00214 오류 원인과 해결 방법 완벽 가이드 SpecAgnt v2.0: The Agent Lifecycle Framework for AI-Native Engineering Optimizing Signal Latency and Weight Allocations in Algorithmic Pipelines SSH Under the Hood: Protocols, Mechanisms, and the Full Technical Story
A Curious Journey Into Reverse Engineering an AI-Generated Python .exe
Umitomo · 2026-05-26 · via DEV Community

Introduction

I usually post weekly learning and development updates on Dev.to📝

This time, however, I decided to write a standalone article about something a little different — my first attempt at reverse engineering🦾

What started as simple curiosity quickly turned into an exciting journey of uncovering how a modern AI-generated Python application was actually structured internally🔎

TL;DR

  • Reverse engineered a PyInstaller-based Python .exe
  • Reconstructed a surprisingly large portion of the application's architecture from the packaged .exe
  • Analyzed .pyc files using tools like strings, pycdc, and pycdas
  • Learned how React/Vite frontend assets can be bundled into a standalone executable
  • Realized how difficult production frontend bundles are to understand without the original source code
  • Thought deeply about maintainability in the age of AI-generated applications

What I Reverse Engineered

As someone who works in IT administration and internal tooling, I often become curious about how applications are actually built under the hood.

This time, a coworker showed me a PDF-processing desktop application that had been created with the help of generative AI.

The overall architecture had already been explained to me verbally beforehand.
However, that led me to a simple but exciting question:

How much of an application's internal structure can actually be reconstructed just by reverse engineering the final .exe file?

That curiosity became the starting point of this exploration.

The application itself was a harmless internal utility designed for local use, and this investigation was performed purely within an authorized and educational context.

Rather than trying to analyze malware or bypass protections, I wanted to understand:

  • What information remains inside packaged executables
  • How modern Python applications are bundled
  • Whether frontend/backend structures could still be inferred after packaging
  • How much architectural detail could realistically be reconstructed from compiled artifacts alone

What made the process especially exciting was slowly piecing together the architecture from small clues hidden inside the executable.


Reverse Engineering Environment Setup

Since I was using Kali Linux on WSL for this experiment, I first prepared a small reverse engineering workspace.

Creating a Python Virtual Environment

mkdir -p ~/reverse
cd ~/reverse

python3 -m venv venv
source venv/bin/activate

Enter fullscreen mode Exit fullscreen mode

At first, the virtual environment failed because python3-venv was missing:

sudo apt install python3.13-venv

Enter fullscreen mode Exit fullscreen mode

After that, I recreated the environment successfully.


Installing Basic Analysis Tools

I installed a few basic tools for inspecting the executable and analyzing Python bytecode.

pip install pyinstaller

Enter fullscreen mode Exit fullscreen mode

I also installed binutils so I could use strings:

sudo apt install binutils

Enter fullscreen mode Exit fullscreen mode


Building pycdc

To inspect .pyc files more deeply, I built pycdc from source:

sudo apt install -y cmake g++ git

mkdir -p ~/reverse/tools
cd ~/reverse/tools

git clone https://github.com/zrax/pycdc.git

cd pycdc
mkdir build
cd build

cmake ..
make -j4

Enter fullscreen mode Exit fullscreen mode

This generated:

pycdc
pycdas

Enter fullscreen mode Exit fullscreen mode

which I later used to inspect Python bytecode files.


Extracting the PyInstaller Executable

After confirming the executable was likely packaged with PyInstaller, I used pyinstxtractor to extract its contents:

git clone https://github.com/extremecoders-re/pyinstxtractor.git

Enter fullscreen mode Exit fullscreen mode

Then:

cd ~/reverse/pdf_exe

python ~/reverse/pyinstxtractor/pyinstxtractor.py PDF.exe

Enter fullscreen mode Exit fullscreen mode

This generated a directory like:

PDF.exe_extracted/

Enter fullscreen mode Exit fullscreen mode

Inside the extracted directory, I was finally able to inspect files such as:

app.pyc
pdf_stamp_processor.pyc
pdf-stamp-frontend/dist

Enter fullscreen mode Exit fullscreen mode

This was the point where the application's overall structure started becoming much clearer.


How I Reverse Engineered It

Step 1 — Running strings

I first started with:

strings PDF.exe

Enter fullscreen mode Exit fullscreen mode

Very quickly, I noticed Python-related strings:

python313.dll
pyi-python-flag
...

Enter fullscreen mode Exit fullscreen mode

This strongly suggested that the application had been packaged using PyInstaller.


Step 2 — Inspecting the PyInstaller Archive

Next, I used:

pyi-archive_viewer PDF.exe

Enter fullscreen mode Exit fullscreen mode

This helped confirm that the executable had been packaged using PyInstaller and allowed me to inspect the internal archive structure.


Step 3 — Analyzing .pyc Files

I then used:

pycdc
pycdas

Enter fullscreen mode Exit fullscreen mode

to inspect the extracted Python bytecode files.

However, when running pycdc, I noticed that some parts of the bytecode could not be fully reconstructed.

In many cases, the output stopped after displaying messages like:

Unsupported opcode: CALL_KW (247)
from fastapi import FastAPI, File, UploadFile, Form, HTTPException
...
# WARNING: Decompyle incomplete

Enter fullscreen mode Exit fullscreen mode

Instead of fully recovering the original source code, I had to combine multiple fragmented clues together:

  • Partial pycdc output
  • pycdas disassembly output
  • Extracted strings
  • Module names
  • API route names
  • Library imports

I also used generative AI to help interpret and organize those fragmented technical details while reconstructing the application's architecture.

Even with incomplete reconstruction, I was still able to identify:

  • FastAPI routes
  • PDF processing logic
  • OpenCV-based blank-space detection
  • PyMuPDF page rendering
  • Automatic browser launching
  • Local API endpoints such as:
/api/scan
/api/stamp_and_merge
/api/shutdown

Enter fullscreen mode Exit fullscreen mode


Step 4 — Investigating the Frontend

The frontend bundle was much harder to understand.

The built JavaScript looked like this:

var e=Object.create,t=Object.defineProperty,...

Enter fullscreen mode Exit fullscreen mode

At first, it felt almost impossible to read.

The extracted JavaScript was difficult to understand, and I could not initially tell what kind of frontend structure had originally existed before packaging.

By combining:

  • The extracted dist/ directory structure
  • The bundled JavaScript files
  • API communication behavior observed in the browser developer tools
  • And explanations generated through conversations with AI

I gradually started to understand how the frontend had likely been packaged and bundled, and that the application was probably using a modern frontend workflow similar to React/Vite.

At the same time, I also realized that the original frontend source structure itself was no longer included inside the executable.


Reconstructing the Architecture

By combining clues from strings, embedded .pyc files, frontend assets, and API routes, I was eventually able to reconstruct a rough picture of the application's architecture:

PDF.exe
    ↓
Launch FastAPI server
    ↓
Open browser automatically
    ↓
Serve React frontend
    ↓
React sends API requests
    ↓
Python processes PDFs locally

Enter fullscreen mode Exit fullscreen mode

The application was not rendering a desktop GUI directly.

Instead:

  • FastAPI served static frontend files
  • React rendered the UI inside the browser
  • Python handled backend processing

What fascinated me most was not simply discovering the architecture itself, but realizing how much of it could still be reconstructed purely from packaged artifacts.


What I Learned

Reverse Engineering Can Reveal More Than I Expected

Before starting this experiment, I assumed that most of an application's architecture would disappear once everything had been packaged into a standalone .exe.

However, I was surprised by how many clues still remained inside the executable:

  • Python runtime artifacts
  • PyInstaller structures
  • Embedded .pyc files
  • Frontend build outputs
  • API routes
  • Localhost references
  • Technology-specific strings

By connecting those small clues together step by step, I was able to reconstruct a surprisingly large portion of the application's overall architecture.

That process itself was one of the most exciting parts of the experience.


“Working Software” and “Understandable Software” Are Different Things

This experience also made me think deeply about AI-generated applications and software maintainability.

Generative AI can absolutely help create working applications quickly.
However, once only compiled artifacts remain, reconstructing the original design and development intent becomes much harder.

Even after reverse engineering the executable, I still could not fully reconstruct the original frontend source code or understand every implementation detail.

That limitation itself became an important lesson for me.

It reminded me that understanding software architecture and preserving maintainable source structures are just as important as making software work.

Especially in the age of AI-assisted development.


Final Thoughts

This reverse engineering journey was honestly a lot of fun.

What made the experience especially enjoyable was gradually reconstructing the application's architecture from small technical clues hidden inside the executable.

At the same time, the experience gave me a deeper appreciation for software architecture, maintainability, and the importance of preserving understandable source code alongside AI-generated applications.