This is a submission for the GitHub Finish‑Up‑A‑Thon Challenge
What I Built
I built MedClinic, a fully local AI‑powered medical assistant that runs on a MedGamma‑2B‑class model without any third‑party APIs or cloud services.
Instead of slapping a shiny frontend on an off‑the‑shelf API, I:
- Wrote the entire orchestration layer by hand (no pre‑trained wrappers).
- Pipelined plain user text → MedGamma‑2B inference → structured JSON response as a pure inference pipeline.
- Did not use any external API — everything lives on‑device.
The abandoned prototype (3 months ago)
Demo
Link: https://github.com/pulipatikeerthana9-wq/medclinic-voice-scribe
Now changed to
The Comeback Story
MedClinic started as a half‑dead prototype buried in a forgotten branch. The older version had:
- Basic voice‑to‑text that I struggled to build without much prior experience, and it felt extremely hard to even get working.
- A single monolithic function.
- A 90‑second pause before every answer due to unoptimized inference.
I had just one ingredient: a local MedGamma‑2B‑like model sitting idle on my machine. No Play‑Cloud, no “API magic” — just raw model weights and a stubborn idea that a local‑only doctor‑in‑your‑laptop is possible.
What changed everything was GitHub Copilot:
- Copilot became my architect for the pipeline.
- My job was to sanity‑check the model design, trim the boilerplate, and own the safety guardrails.
In under a month, the MedClinic branch went from “proof of concept” to a hands‑on assistant that gives coherent, structured medical‑style answers — all without a single API call.
GitHub Copilot’s role (how it changed everything)
Here is where Copilot stepped in:
Pipeline design
I asked:
“How do I structure a voice‑input → MedGamma‑2B inference → structured JSON medical‑assistant pipeline?”
Copilot returned three layers:
- input‑sanitizer
- inference‑router
- JSON‑formatter
I kept all three and wired them around MedGamma‑2B.
Model‑context scaffolding
Copilot generated:
- Prompt templates
- Role‑system messages
- Safety guardrails
that were tailored to MedGamma‑2B’s capabilities.
Token‑aware logic
Copilot reminded me to:
- Chunk user input
- Trim old context
- Stay under MedGamma‑2B’s context window
This is critical when you have no API retries and must avoid timeouts.
Testing scripts
Copilot wrote unit‑style tests that simulate patient‑style input and validate MedClinic’s JSON output shapes.
Where I pushed back
- Copilot once suggested serializing the entire conversation into every call — a 10k‑token‑drag. I forced it to keep only the last 3 turns to stay under budget.
- Early templates were too verbose; I cut about 40% of the prompt after reviewing Copilot’s own “better‑prompt” suggestions.
BEFORE VS AFTER
| Aspect | Before Copilot & MedGamma‑2B | After Copilot‑Rewired MedClinic |
|---|---|---|
| Source code | Single file, spaghetti inference | Modular: voice → parser → inference → JSON formatter |
| Model usage | Raw prompt, no context-window awareness | Context-aware; trims history to stay under MedGamma‑2B’s token budget |
| Response format | Free-text paragraph | Structured JSON: diagnosis, symptoms, next_steps |
| Token pressure | No control, often past window | Token-sensitive trimming, pre-compressed chunks |
| UI feel | 10s delays, no structure | Fast, structured, feels like talking to a junior doctor |
SOAP Note transcription
My Experience with GitHub Copilot
Ease
Copilot removed the design friction, not the code‑writing.
- I keep writing HTML/CSS myself, just like the e‑commerce example from the challenge.
- But whenever I touched MedGamma‑2B orchestration logic, Copilot sketched the architecture and I polished it.
Power amplified by tokens
MedGamma‑2B’s context window is the hard limit — no retries.
Copilot helped me design a pipeline that never spills tokens:
- Automatically summarize long patient histories.
- Drop irrelevant context before sending to the model.
- Pre‑compress repeated info into short tags.
In practice:
- A 2‑minute patient voice transcript → ~1.2k tokens sent to MedGamma‑2B.
- Copilot‑generated logic trimmed ~400 useless tokens just by removing filler and rephrasing.
MedClinic stays under budget while giving answers that feel like a human‑style consultation, not a chat‑bot‑style dump.
Copilot as co‑founder
GitHub Copilot didn’t just speed up my development — it rewired MedClinic’s brain.
- Before: a local‑model prototype that felt like a toy.
- After: a token‑aware, structured, local‑only AI physician assistant that I can run on my laptop with zero cloud dependencies.



































