From Half‑dead Prototype to Local‑Only AI Medical Assistant: Rewiring MedClinic with GitHub Copilot

Posted on May 25 • Edited on May 27

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish‑Up‑A‑Thon Challenge

What I Built

I built MedClinic, a fully local AI‑powered medical assistant that runs on a MedGamma‑2B‑class model without any third‑party APIs or cloud services.

Instead of slapping a shiny frontend on an off‑the‑shelf API, I:

Wrote the entire orchestration layer by hand (no pre‑trained wrappers).
Pipelined plain user text → MedGamma‑2B inference → structured JSON response as a pure inference pipeline.
Did not use any external API — everything lives on‑device.

The abandoned prototype (3 months ago)

Demo

Link: https://github.com/pulipatikeerthana9-wq/medclinic-voice-scribe

Now changed to

The Comeback Story

MedClinic started as a half‑dead prototype buried in a forgotten branch. The older version had:

Basic voice‑to‑text that I struggled to build without much prior experience, and it felt extremely hard to even get working.
A single monolithic function.
A 90‑second pause before every answer due to unoptimized inference.

I had just one ingredient: a local MedGamma‑2B‑like model sitting idle on my machine. No Play‑Cloud, no “API magic” — just raw model weights and a stubborn idea that a local‑only doctor‑in‑your‑laptop is possible.

What changed everything was GitHub Copilot:

Copilot became my architect for the pipeline.
My job was to sanity‑check the model design, trim the boilerplate, and own the safety guardrails.

In under a month, the MedClinic branch went from “proof of concept” to a hands‑on assistant that gives coherent, structured medical‑style answers — all without a single API call.

GitHub Copilot’s role (how it changed everything)

Here is where Copilot stepped in:

Pipeline design

I asked:

“How do I structure a voice‑input → MedGamma‑2B inference → structured JSON medical‑assistant pipeline?”

Copilot returned three layers:

input‑sanitizer
inference‑router
JSON‑formatter

I kept all three and wired them around MedGamma‑2B.

Model‑context scaffolding

Copilot generated:

Prompt templates
Role‑system messages
Safety guardrails

that were tailored to MedGamma‑2B’s capabilities.

Token‑aware logic

Copilot reminded me to:

Chunk user input
Trim old context
Stay under MedGamma‑2B’s context window

This is critical when you have no API retries and must avoid timeouts.

Testing scripts

Copilot wrote unit‑style tests that simulate patient‑style input and validate MedClinic’s JSON output shapes.

Where I pushed back

Copilot once suggested serializing the entire conversation into every call — a 10k‑token‑drag. I forced it to keep only the last 3 turns to stay under budget.
Early templates were too verbose; I cut about 40% of the prompt after reviewing Copilot’s own “better‑prompt” suggestions.

BEFORE VS AFTER

Aspect	Before Copilot & MedGamma‑2B	After Copilot‑Rewired MedClinic
Source code	Single file, spaghetti inference	Modular: voice → parser → inference → JSON formatter
Model usage	Raw prompt, no context-window awareness	Context-aware; trims history to stay under MedGamma‑2B’s token budget
Response format	Free-text paragraph	Structured JSON: diagnosis, symptoms, next_steps
Token pressure	No control, often past window	Token-sensitive trimming, pre-compressed chunks
UI feel	10s delays, no structure	Fast, structured, feels like talking to a junior doctor

SOAP Note transcription

My Experience with GitHub Copilot

Ease

Copilot removed the design friction, not the code‑writing.

I keep writing HTML/CSS myself.
But whenever I touched MedGamma‑2B orchestration logic, Copilot sketched the architecture and I polished it.

Power amplified by tokens

MedGamma‑2B’s context window is the hard limit — no retries.

Copilot helped me design a pipeline that never spills tokens:

Automatically summarize long patient histories.
Drop irrelevant context before sending to the model.
Pre‑compress repeated info into short tags.

In practice:

A 2‑minute patient voice transcript → ~1.2k tokens sent to MedGamma‑2B.
Copilot‑generated logic trimmed ~400 useless tokens just by removing filler and rephrasing.

MedClinic stays under budget while giving answers that feel like a human‑style consultation, not a chat‑bot‑style dump.

Copilot as co‑founder

GitHub Copilot didn’t just speed up my development — it rewired MedClinic’s brain.

Before: a local‑model prototype that felt like a toy.
After: a token‑aware, structured, local‑only AI physician assistant that I can run on my laptop with zero cloud dependencies.

推荐订阅源

DEV Community