I Trained a Markdown File to Boost GPT-5.5 by 23 Points — It Shouldn't Work

Last Updated on June 18, 2026 by

Author(s): Chew Loong Nian – AI ENGINEER

Originally published on Towards AI.

I did not fine-tune anything. I did not touch a single weight. I ran a training loop whose only output was a 1,400-token Markdown file, dropped that file into the context window of a frozen GPT-5.5, and watched its six-benchmark average jump from 58.8 to 82.3. That is +23.5 points from a text file you can open in any editor.

After the intro, the article explains the core idea behind SkillOpt: treat the skill document (a Markdown “skill file”) like trainable state while keeping the target model frozen, and use a stronger optimizer model during training to propose bounded edits (add/delete/replace) that are only accepted if they strictly improve a held-out validation score—mirroring concepts from gradient descent stability in text space. It then summarizes the reported results across 52 (model, benchmark, harness) combinations, highlighting that SkillOpt is best-or-tied-best everywhere and that GPT-5.5 in direct chat rises from 58.8 to 82.3 (+23.5), with especially large gains on procedural, format-checked tasks like SpreadsheetBench. The author shows what the resulting “trained” skills look like—rules for checking structure, writing explicitly evaluated values, tracking state in embodied navigation, and anchoring answers to the correct table row—and notes that improvements can come from just a few accepted edits and a small artifact size. It provides a practical setup for reproducing the workflow quickly (installing SkillOpt, configuring backends, running the loop, and deploying by prepending the learned Markdown to the model’s context). The piece also covers SkillOpt-Sleep, a plugin-style extension that learns from a user’s own past transcripts with a review-and-adopt, validation-gated offline consolidation loop. Finally, it addresses two limitations—reliance on automatic scoring judges and the fact it optimizes one document at a time—before concluding that for procedural, checkable agent tasks, training the document instead of the model is a more reliable and cheaper optimization than fine-tuning.

Read the full blog for free on Medium.

Published via Towards AI

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day

→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.

推荐订阅源

Towards AI

Frequently Used, Contextual References

Resources

Author(s): Chew Loong Nian – AI ENGINEER

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.