Reviving My Linux Mastery Game from a Merge Conflict — A Finish-Up-A-Thon Comeback

*This is a submission for the GitHub Finish-Up-A-Thon Challenge*

What I Built

Enterprise Linux Mastery Game — a local-first, AI-mentored terminal game for practicing real enterprise Linux administration. The mentor sets a scenario (a paging-hour incident, a permissions puzzle, a 5xx spike). You type Linux commands. They run inside a hardened Docker sandbox. An AI judge grades you on correctness, safety, and efficiency and gives coaching feedback.

I started it as a January side-project that I genuinely cared about — I work in enterprise IT training and I've watched too many engineers learn Linux the wrong way, by memorising commands instead of investigating systems. The game was meant to fix that for myself and the trainees I work with.

Then I hit a wall. The repo sat for four months with a visible merge conflict in the README and a backend wired to a paid LLM API I'd lost access to. The Finish-Up-A-Thon was the push I needed to actually finish it.

Repository: https://github.com/Bharathtrainer/Linux-game
Architecture: FastAPI + Ollama (Llama 3.1) + Docker sandbox + YAML scenarios + a CLI client. Everything runs locally. No cloud API key.

Demo

A real play session captured tonight. Player picks the entry level, investigates a "/var disk filling up" scenario inside the sandbox, and gets a graded verdict from the AI judge.

The judge scored this run 84/100 — three correct investigation commands, perfect safety, near-perfect efficiency. The coaching identifies one specific improvement: running du on the directory directly instead of its contents.

================================================================
================ ENTERPRISE LINUX MASTERY GAME =================
Local-first. AI-mentored. Sandboxed.
Levels:

entry
operator
engineer
sre
Pick a level number: 1
Starting mission for player1 @ level=entry...
Production server: disk filling fast
An alert just paged you. The /var partition on a production app server
is at 96% and climbing. The on-call engineer needs to know what's
eating the space before the disk fills and the application crashes.
You have shell access. No need to delete anything yet — first, find
the offender.
Objective: Identify which directory or file is consuming the most
space under /var.
Commands: type a shell command, or one of:
/hint    — ask the mentor for a Socratic hint
/done    — submit your attempts for grading
/quit    — give up and exit
[entry] $ ls -la /tmp/sandbox/var
total 20
drwxr-xr-x 5 sandbox sandbox 4096 May 25 08:31 .
drwxr-xr-x 3 sandbox sandbox 4096 May 25 08:31 ..
drwxr-xr-x 2 sandbox sandbox 4096 May 25 08:31 cache
drwxr-xr-x 2 sandbox sandbox 4096 May 25 08:31 lib
drwxr-xr-x 2 sandbox sandbox 4096 May 25 08:31 log
[exit 0]
[entry] $ du -sh /tmp/sandbox/var/*
8.0K    /tmp/sandbox/var/cache
8.0K    /tmp/sandbox/var/lib
81M     /tmp/sandbox/var/log
[exit 0]
[entry] $ du -h /tmp/sandbox/var/log/app.log
80M     /tmp/sandbox/var/log/app.log
[exit 0]
[entry] $ /done
Grading...

================================================================
============================ RESULT ============================
Verdict:     Good start, but you haven't yet identified the
largest file.
Solved:      False
Score:       84/100
Correctness: 8/10
Safety:      10/10
Efficiency:  9/10
Coaching:
Your first command was a good investigation step. The second
command effectively used du to find large directories under
/var/log. However, it would be more efficient to use du on
the log directory directly instead of its contents. Remember
that -h is not needed for this specific scenario.

Missed concepts: du, directory vs file

The judge is appropriately strict — the player identified the large file but didn't actually confirm the answer with a more decisive command. That kind of nuanced grading is what makes this useful for training rather than just gamification.

Try it yourself:

git clone https://github.com/Bharathtrainer/Linux-game
cd Linux-game
docker build -t linux-mastery-sandbox sandbox/
ollama pull llama3.1
cd backend && pip install -r requirements.txt && uvicorn main:app --reload

# In a second terminal:
python play.py

The Comeback Story

Where the project was before this week:

README.md had unresolved git merge conflict markers visible right on the GitHub landing page (<<<<<<< HEAD, >>>>>>> and everything in between). Anyone clicking the repo saw raw conflict markup as the first thing.
The backend was hard-wired to NVIDIA NIM, a paid LLM service I'd lost access to. Without those env vars every endpoint 500'd.
The core/sandbox.py file — the literal heart of the game, the thing that runs user commands safely — did not exist. The judge endpoint was scoring imaginary commands.
api/mission.py was a single stub that returned "Mission started" and did nothing.
Two empty files named git and main sat at the repo root, debris from a fat-fingered shell redirect months ago.
Only one scenario YAML existed, with four fields. No challenge text, no setup script, no success criteria.
The mentor and judge prompts were three lines each. No JSON schema. No grading rubric.

It looked 30% done. It was actually closer to 10%, because the missing pieces were the load-bearing ones.

What I changed, in commits that read like a story:

Resolve the merge conflict and clean up debris. Fixed the README, deleted the empty artifact files, added a THE_COMEBACK.md for the full revival log.
Swap NVIDIA NIM for local Ollama. Renamed core/nim_client.py → core/llm_client.py. Default provider is Ollama (free, local, no key). Kept NIM as a fallback via LLM_PROVIDER=nim. Mapped legacy NIM model names to Llama 3.1 so older callers don't break.
Add the missing sandbox runner. Wrote core/sandbox.py — a one-shot Docker runner with --network=none, memory and CPU limits, a per-command timeout, and a non-root user (UID 1000). The security boundary of the entire game.
Build the real game loop and ground the prompts. Rewrote api/mission.py as a proper orchestrator (/start, /run, /finish). Pinned a three-axis grading rubric in the judge prompt. Added three new scenario YAMLs (operator_runaway_process, engineer_permissions, sre_log_triage) with real challenge narratives — the part of this project no AI could write for me, because it requires actually knowing which enterprise Linux problems happen in the wild.
CLI client and polished README. Wrote play.py to make the game playable. Added an architecture diagram, quickstart, screenshots, and a transcript of a real play session.

You can see the whole arc in the commit history. Importantly, the unresolved merge conflict is still visible in the historical README at commit aed0242 — anyone who wants to verify the "before" state was real can see it for themselves.

What I learned: Side projects die in the gap between almost demoable and actually playable. The commits above are mostly boring glue — subprocess wrappers, Pydantic models, JSON parsers, a CLI loop. None of it was hard. All of it was the kind of work I'd previously dropped because "the interesting part is done." The Finish-Up-A-Thon framing helped me see the gap for what it really was: not missing capability, just missing patience for the unglamorous middle.

My Experience with GitHub Copilot

Three places Copilot did real, specific work — not generic autocomplete:

1. Resolving the merge conflict. I opened the broken README in VS Code and asked Copilot Chat:

"Resolve this Git merge conflict in this README. Keep the richer 'Enterprise Linux Mastery Game' content but make it the start of a real README — add sections for what it does, quickstart, and configuration."

It produced a clean draft I then edited for the Ollama specifics. Faster than reading both sides and hand-merging.

2. Mapping the legacy NIM model names to Ollama tags. I gave Copilot the original model_router.py and asked:

"These three NIM-specific model identifiers need to map to Ollama tags. Generate a fallback mapping that defaults to llama3.1 when an exact match isn't available, and keep the level→model selection logic intact."

The _resolve_model_alias method in llm_client.py came out of that interaction.

3. Drafting the sandbox runner. This was the most useful one. I described what I wanted:

"Write a Python function that runs a shell command inside a one-shot Docker container using the docker CLI via subprocess. Hard timeout, no network, memory and CPU limits, runs as UID 1000. Return a dataclass with stdout, stderr, exit_code, and a timed_out flag."

Copilot drafted the structure including the subprocess.TimeoutExpired handling I would have forgotten. I added the sandbox_image_exists() health check and the FileNotFoundError branch for "Docker not installed."

Where Copilot specifically didn't help — and I want to be honest about this:

It didn't write the scenario YAMLs. Those needed domain knowledge about which enterprise Linux problems are pedagogically valuable. Generic AI-written scenarios would have read like textbook exercises. The disk-full, runaway-process, permissions-puzzle, and log-triage scenarios are based on real incidents I've coached trainees through.
It didn't write the judge rubric. The three-axis scoring (correctness 60% / safety 25% / efficiency 15%) reflects how I actually evaluate engineers, not how an LLM thinks grading should work.
It didn't make the architectural call to keep NIMClient as a backward-compatible alias rather than ripping it out. That was a judgement call to keep the commit diff small and reviewable.

The honest version of "how Copilot helped" is: it removed the friction of writing boilerplate — subprocess glue, Pydantic models, JSON parsing helpers — so I could spend my limited attention on the parts only I could do. For a side project being revived under time pressure, that ratio of "boring code automated, judgement work preserved" is exactly the right one.

THE_COMEBACK.md

推荐订阅源

DEV Community

What I Built

Demo

The Comeback Story

My Experience with GitHub Copilot