Agent-driven development in Copilot Applied Science

The GitHub Blog

GitHub for Beginners: Your roadmap to mastering the GitHub essentials Better tools made Copilot code review worse. Here's how we actually improved it. How GitHub gave every repository a durable owner Automating cross-repo documentation with GitHub Agentic Workflows GitHub availability report: June 2026 How GitHub Copilot enables zero DNS configuration for GitHub Pages Q1 2026 Innovation Graph update: Open source collaboration is accelerating worldwide How GitHub used secret scanning to reach inbox zero 6 security settings every GitHub maintainer should enable this week How GitHub maintains compliance for open source dependencies Highlights from Git 2.55 Inside the Advisory Database and what happens when vulnerability volume breaks records GitHub and UNDP team up to advance development priorities in Ghana with open source Transitioning as a Hubber Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks I automated my job (and it made me a better leader) GitHub joins coalition advocating for fixes to California AI Transparency Act to protect open source From pledge to practice: Building a more inclusive open source ecosystem How we built an internal data analytics agent How pull request limits are cutting down the noise Getting more from each token: How Copilot improves context handling and model routing What are git worktrees, and why should I use them? GitHub Copilot CLI for Beginners: Overview of common slash commands Accelerating researchers and developers building multilingual AI with a new open dataset How we made GitHub Copilot CLI more selective about delegation GitHub availability report: May 2026 Making secret scanning more trustworthy: Reducing false positives at scale Give GitHub Copilot CLI real code intelligence with language servers From one-off prompts to workflows: How to use custom agents in GitHub Copilot CLI GitHub for Beginners: Answers to some common questions GitHub Universe is back: All together now, in the agentic era GitHub Copilot app: The agent-native desktop experience Still a developer. Just outside. Our latest GitHub Shop collection is here. GitHub for Beginners: Getting started with Git and GitHub in VS Code GitHub recognized as a Leader in the Gartner® Magic Quadrant™ for Enterprise AI Coding Agents for the third year in a row Beyond the engine: 10 open source projects shaping how games actually get made Building GitHub’s next chapter in accessibility Investigation update: GitHub Enterprise Server signing key rotation Take your local GitHub sessions anywhere Building a general-purpose accessibility agent—and what we learned in the process Raising the bar: Quality, shared responsibility, and the future of GitHub’s bug bounty program GitHub availability report: April 2026 From latency to instant: Modernizing GitHub Issues navigation performance Dungeons & Desktops: 10 roguelikes that never die (because their communities won’t let them) GitHub Copilot individual plans: Introducing flex allotments in Pro and Pro+, and a new Max plan Dungeons & Desktops: Building a procedurally generated roguelike with GitHub Copilot CLI GitHub for Beginners: Getting started with OSS contributions Why age assurance laws matter for developers How researchers are using GitHub Innovation Graph data to reveal the “digital complexity” of nations Improving token efficiency in GitHub Agentic Workflows Agent pull requests are everywhere. Here’s how to review them. Validating agentic behavior when “correct” isn’t deterministic Welcome to Maintainer Month: Celebrating the people behind the code Register now for OpenClaw: After Hours @ GitHub GitHub Copilot CLI for Beginners: Interactive v. non-interactive mode GitHub for Beginners: Getting started with Markdown Securing the git push pipeline: Responding to a critical remote code execution vulnerability Highlights from Git 2.54 Building an emoji list generator with the GitHub Copilot CLI Bringing more transparency to GitHub’s status page How GitHub uses eBPF to improve deployment safety Build a personal organization command center with GitHub Copilot CLI Developer policy update: Intermediary liability, copyright, and transparency Hack the AI agent: Build agentic AI security skills with the GitHub Secure Code Game How exposed is your code? Find out in minutes—for free GitHub for Beginners: Getting started with GitHub Pages GitHub Copilot CLI for Beginners: Getting started with GitHub Copilot CLI GitHub availability report: March 2026 GitHub Universe is back: We want you to take the stage GitHub Copilot CLI combines model families for a second opinion The uphill climb of making diff lines performant Securing the open source supply chain across GitHub Run multiple agents at once with /fleet in Copilot CLI GitHub for Beginners: Getting started with GitHub security What’s coming to our GitHub Actions 2026 security roadmap

Tyler McGoffin · 2026-04-01 · via The GitHub Blog

I may have just automated myself into a completely different job…

This is a familiar pattern among software engineers, who often, through inspiration, frustration, or sometimes even laziness, build systems to remove toil and focus on more creative work. We then end up owning and maintaining those systems, unlocking that automated goodness for the rest of those around us.

As an AI researcher, I recently took this beyond what was previously possible and have automated away my intellectual toil. And now I find myself maintaining this tool to enable all my peers on the Copilot Applied Science team to do the same.

During this process, I learned a lot about how to effectively create and collaborate using GitHub Copilot. Applying these learnings has unlocked an incredibly fast development loop for myself as well as enabled my team mates to build solutions to fit their needs.

Before I get into explaining how I made this possible, let me set the stage for what spawned this project so you better understand the scope of what you can do with GitHub Copilot.

The impetus

A large part of my job involves analyzing coding agent performance as measured against standardized evaluation benchmarks, like TerminalBench2 or SWEBench-Pro. This often involves poring through tons of what are called trajectories, which are essentially lists of the thought processes and actions agents take while performing tasks.

Each task in an evaluation dataset produces its own trajectory, showing how the agent attempted to solve that task. These trajectories are often .json files with hundreds of lines of code. Multiply that over dozens of tasks in a benchmark set and again over the many benchmark runs needing analysis on any given day, and we’re talking hundreds of thousands of lines of code to analyze.

It’s an impossible task to do alone, so I would typically turn to AI to help. When analyzing new benchmark runs, I found that I kept repeating the same loop: I used GitHub Copilot to surface patterns in the trajectories then investigated them myself—reducing the number of lines of code I had to read from hundreds of thousands to a few hundred.

However, the engineer in me saw this repetitive task and said, “I want to automate that.” Agents provide us with the means to automate this kind of intellectual work, and thus eval-agents was born.

The plan

Engineering and science teams work better together. That was my guiding principle as I set about solving this new challenge.

Thus, I approached the design and implementation strategy of this project with a couple of goals in mind:

Make these agents easy to share and use
Make it easy to author new agents
Make coding agents the primary vehicle for contributions

Bullets one and two are in GitHub’s lifeblood and are values and skills I’ve gained throughout my career, especially during my stint as an OSS maintainer on the GitHub CLI.

However, goal three shaped the project the most. I noticed that when I set GitHub Copilot up to help me build the tool effectively, it also made the project easier to use and collaborate on. That experience taught me a few key lessons, which ultimately helped push the first and second goals forward in ways I didn’t expect.

Making coding agents your primary contributor

I’ll start by describing my agentic coding setup:

Coding agent: Copilot CLI
Model used: Claude Opus 4.6
IDE: VSCode

It’s also noteworthy that I leveraged the Copilot SDK to accelerate agent creation, which is powered under the hood by the Copilot CLI. This gave me access to existing tools and MCP servers, a way to register new tools and skills, and a whole bunch of other agentic goodness out of the box that I didn’t have to reinvent myself.

With that out of the way, I could streamline the whole development process very quickly by following a few core principles:

Prompting strategies: agents work best when you’re conversational, verbose, and when you leverage planning modes before agent modes.
Architectural strategies: refactor often, update docs often, clean up often.
Iteration strategies: “trust but verify” is now “blame process, not agents.”

Uncovering and following these strategies led to an incredible phenomenon: adding new agents and features was fast and easy. We had five folks jump into the project for the first time, and we created a total of 11 new agents, four new skills, and the concept of eval-agent workflows (think scientist streams of reasoning) in less than three days. That amounted to a change of +28,858/-2,884 lines of code across 345 files.

Holy crap!

Below, I’ll go into detail about these three principles and how they enabled this amazing feat of collaboration and innovation.

Prompting strategies

We know that AI coding agents are really good at solving well-scoped problems but need handholding for the more complex problems you’d only entrust to your more senior engineers.

So, if you want your agent to act like an engineer, treat it like one. Guide its thinking, over-explain your assumptions, and leverage its research speed to plan before jumping into changes. I found it far more effective to put some stream-of-consciousness musings about a problem I was chewing on into a prompt and working with Copilot in planning mode than to give it a terse problem statement or solution.

Here’s an example of a prompt I wrote to add more robust regression tests to the tool:

> /plan I've recently observed Copilot happily updating tests to fit its new paradigms even though those tests shouldn't be updated. How can I create a reserved test space that Copilot can't touch or must reserve to protect against regressions?

This resulted in a back and forth that ultimately led to a series of guardrails akin to contract testing that can only be updated by humans. I had an idea of what I wanted, and through conversation, Copilot helped me get to the right solution.

It turns out that the things that make human engineers the most effective at doing their jobs are the same things that make these agents effective at doing theirs.

Architectural strategies

Engineers, rejoice! Remember all those refactors you wanted to do to make the codebase more readable, the tests you never had time to write, and the docs you wish had existed when you onboarded? They’re now the most important thing you can be working on when building an agent-first repository.

Gone are the days where deprioritizing this work over new feature work was necessary, because delivering features with Copilot becomes trivial when you have a well-maintained, agent-first project.

I’ve spent most of my time on this project refactoring names and file structures, documenting new features or patterns, and adding test cases for problems that I’ve uncovered as I go. I’ve even spent a few cycles cleaning up the dead code that the agents (like your junior engineers) may have missed while implementing all these new features and changes.

This work makes it easy for Copilot to navigate the codebase and understand the patterns, just like it would for any other engineer.

I can even ask, “Knowing what I know now, how would I design this differently?” And I can then justify actually going back and rearchitecting the whole project (with the help of Copilot, of course).

It’s a dream come true!

And this leads me to my last bit of guidance.

Iteration strategies

As agents and models have improved, I have moved from a “trust but verify” mindset to one that is more trusting than doubtful. This mirrors how the industry treats human teams: “blame process, not people.” It’s how the most effective teams operate, because people make mistakes, so we build systems around that reality.

This idea of blameless culture provides psychological safety for teams to iterate and innovate, knowing that they won’t be blamed if they make a mistake. The core principle is that we implement processes and guardrails to protect against mistakes, and if a mistake does happen, we learn from it and introduce new processes and guardrails so that our teams won’t make the same mistake again.

Applying this same philosophy to agent-driven development has been fundamental to unlocking this incredibly rapid iteration pipeline. That means we add processes and guardrails to help prevent the agent from making mistakes, but when it does make a mistake, we add additional guardrails and processes—like more robust tests and better prompts—so the agent can’t make the same mistake again. Taking this one step further means that practicing good CI/CD principles is a must.

Practices like strict typing ensure the agent conforms to interfaces. Robust linters impose implementation rules on the agent that keep it following good patterns and practices. And integration, end-to-end, and contract tests—which can be expensive to build manually—become much cheaper to implement with agent assistance, while giving you confidence that new changes don’t break existing features.

When Copilot has these tools available in its development loop, it can check its own work. You’re setting it up for success, much in the same way you’d set up a junior engineer for success in your project.

Putting it all together

Here’s what all this means for your development loop when you’ve got your codebase set up for agent-driven development:

Plan a new feature with Copilot using /plan.
- Iterate on the plan.
- Ensure that testing is included in the plan.
- Ensure that docs updates are included in the plan and done before code is implemented. These can serve as additional guidelines that live beside your plan.
Let Copilot implement the feature on /autopilot.
Prompt Copilot to initiate a review loop with the Copilot Code Review agent. For me, it’s often something like: request Copilot Code Review, wait for the review to finish, address any relevant comments, and then re-request review. Continue this loop until there are no more relevant comments.
Human review. This is where I enforce the patterns I discussed in the previous sections.

Additionally, outside of your feature loop, be sure you’re prompting Copilot early and often with the following:

/plan Review the code for any missing tests, any tests that may be broken, and dead code
/plan Review the code for any duplication or opportunities for abstraction
/plan Review the documentation and code to identify any documentation gaps. Be sure to update the copilot-instructions.md to reflect any relevant changes

I have these run automatically once a week, but I often find myself running them throughout the week as new features and fixes go in to maintain my agent-driven development environment.

Take this with you

What started as a frustration with an impossibly repetitive analysis task turned into something far more interesting: a new way of thinking about how we build software, how we collaborate, and how we grow as engineers.

Building agents with a coding agent-first mindset has fundamentally changed how I work. It’s not just about the automation wins—though watching four scientists ship 11 agents, four skills, and a brand-new concept in under three days is nothing short of remarkable. It’s about what this style of development forces you to prioritize: clean architecture, thorough documentation, meaningful tests, and thoughtful design—the things we always knew mattered but never had time for.

The analogy to a junior engineer keeps proving itself out. You onboard them well, give them clear context, build guardrails so their mistakes don’t become disasters, and then trust them to grow. If something goes wrong, you blame the process. Not the agent. If there’s one thing I want you to take away from this, it’s that the skills that make you a great engineer and a great teammate are the same skills that make you great at building with Copilot. The technology is new. The principles aren’t.

So go clean up that codebase, write that documentation you’ve been putting off, and start treating your Copilot like the newest member of your team. You might just automate yourself into the most interesting work of your career.

Think I’m crazy? Well, try this:

Download Copilot CLI
Activate Copilot CLI in any repo: cd <repo_path> && copilot
Paste in the following prompt: /plan Read <link to this blog post> and help me plan how I could best improve this repo for agent-first development

Written by

Tyler is a Sr. Applied Researcher on the Copilot Applied Science team. He has an eclectic background in scientific research, education, game design/development, and software. His favorite part about his current position is accelerating learning and research for his team and the organization as a whole.

Explore more from GitHub

Docs

Everything you need to master GitHub, all in one place.

Go to Docs

GitHub

Build what’s next on GitHub, the place for anyone from anywhere to build anything.

Start building

Customer stories

Meet the companies and engineering teams that build with GitHub.

Learn more

The GitHub Podcast

Catch up on the GitHub podcast, a show dedicated to the topics, trends, stories and culture in and around the open source developer community on GitHub.

Listen now

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The GitHub Blog

The impetus

The plan

Making coding agents your primary contributor

Prompting strategies

Architectural strategies

Iteration strategies

Putting it all together

Take this with you

Written by

Explore more from GitHub

Docs

GitHub

Customer stories

The GitHub Podcast