惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

博客园 - 司徒正美
D
Darknet – Hacking Tools, Hacker News & Cyber Security
M
MIT News - Artificial intelligence
腾讯CDC
IT之家
IT之家
Microsoft Azure Blog
Microsoft Azure Blog
M
Microsoft Research Blog - Microsoft Research
阮一峰的网络日志
阮一峰的网络日志
H
Help Net Security
L
LangChain Blog
G
Google Developers Blog
Stack Overflow Blog
Stack Overflow Blog
人人都是产品经理
人人都是产品经理
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - 【当耐特】
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
U
Unit 42
Recent Announcements
Recent Announcements
S
SegmentFault 最新的问题
大猫的无限游戏
大猫的无限游戏
博客园 - Franky
T
The Blog of Author Tim Ferriss
罗磊的独立博客
宝玉的分享
宝玉的分享
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
雷峰网
雷峰网
D
DataBreaches.Net
爱范儿
爱范儿
Schneier on Security
Schneier on Security
P
Palo Alto Networks Blog
Spread Privacy
Spread Privacy
Hugging Face - Blog
Hugging Face - Blog
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
K
Kaspersky official blog
P
Privacy & Cybersecurity Law Blog
博客园_首页
T
Threat Research - Cisco Blogs
I
InfoQ
有赞技术团队
有赞技术团队
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Recorded Future
Recorded Future
量子位
H
Hackread – Cybersecurity News, Data Breaches, AI and More
GbyAI
GbyAI
Cyberwarzone
Cyberwarzone
B
Blog
C
Check Point Blog
P
Proofpoint News Feed
S
Securelist
A
Arctic Wolf

DEV Community

Export all of Andoid apps Azure Container Apps Express: The Agent-First Platform You've Been Waiting For Vibe Coding Is Fun Until Production NVIDIA CUTLASS: High-Performance CUDA Templates for AI Linear Algebra The creator told 2,000 people to ship in 30 days. Nobody built the structure for it. EC2 Beginner Guide: Launch Your First AWS Instance JaisCloud — A Free, Single-Binary AWS Emulator in Go Cursor IDE Review: What Makes It a Genuinely Different AI Code Editor How to Debug Complex Regex Patterns Offline Without Leaking Proprietary Data The Moment the Default Runtime Became the Payload HoneyCloud — Reviving My Final Year Cybersecurity Project Rust Kernel Modules, Ready-to-Ship: A cargo-generate Template with Tests, CI, and Zero-Panic… Genetec Security Center 5.14.0.0: What's Coming, What's Already Here, and Why I'm Still Telling Clients to Wait Day 9 - Sparse embedding continued - RAG I coded an Air Hockey game where a local SLM hacks the DOM to cheat (and trash-talks you) 🤖🏓 Docker Desktop Alternatives: OrbStack vs Colima vs Rancher Desktop The Veltrix Treasure-Hunt Engine Litmus Test Quantizing Gemma 4 on Mac with llama.cpp How I Built an AI-Powered Google Maps Scraper for Lead Generation Readsb ADS-B Aircraft Local State Archive The Veltrix Event Engine Blew Up Because We Trusted the Defaults MiniScript Weekly News — May 27, 2026 The benchmark that made me change my mind about Jakarta EE in 2026 El benchmark que me hizo cambiar de opinión sobre Jakarta EE en 2026 Building an Offline-First Bushfire Response Platform With Hermes Agent The Golang Trinity: Functions, Methods, Interfaces gitwink — a read-only tray git glance for the AI-agent era I built a free PHP & Laravel tutorial site — 49 lessons from scratch Sniffing Modbus Traffic with 5 Lines of Python (And Why It Should Scare Your OT Team) Why Hytale Treasure Hunts Explode In Production (And How We Fixed It) XGroundControlStation The Worst Time to Quit Software Engineering Might Be Right Now Closiq Discord Agent: An AI Customer Support Monolith 🚀 Why Does Using an ORM Decrease Database Performance? An Experience... Document photos are a tiny image-processing problem with sharp edges Meet the G2 Nano: A 1GHz Dev Board Built for Robotics The 34x Pricing Gap: Why AI Model Selection in 2026 Is a Math Problem, Not a Loyalty Problem I Built an Open-Source Multi-Agent Fact-Checker — Here's How It Works The Sovereign Privacy Illusion: Why GDPR Compliance Doesn’t Equal Data Control Mastering Azure Entra ID: A Hands-On Guide to User Management and Privileged Roles Solana Transactions Are Not What I Expected (Coming From EVM) How We Broke the Hytale Treasure Hunt Engine (And Fixed It at 3 AM) AI Memory Is Broken. Here's What's Finally Starting to Fix It State Management in Production Flutter Apps: What Actually Held Up at Scale The Role of QA in the New AI SDLC We spent 6 months feeding our compliance data to a major cloud AI. Here's what we got back. open-source coding agents need maintainers, not just models Oracle ORA-00018 오류 원인과 해결 방법 완벽 가이드 I built agmsg so Claude Code and Codex could stop using me as a copy-paste relay Gmail ate my formatting. So I built PasteClean. Amazon Bedrock AgentCore Payments: The Spending Limit Is the Product I Should Have Put Events in the Same Database as the Aggregate Root—Heres What Happened The SilentRecon Agent Loop Architecture: How We Build AI That Doesn’t Stall How I Built a Dark Cinematic Restaurant Landing Page Template and Listed It For Sale Claude Code Slash Commands You Should Know (I wasn't either) "7 Free GitHub Repos Every Beginner Should Star Right Now" Blockless Scope: JavaScript Shenanigans 🧠 Plano Físico vs Plano Funcional en AWS Networking: cómo los arquitectos senior resuelven lo que otros no ven. I Accidentally Built an AI Employee Out of Scripts and Bad Sleep Habits Stop Chasing Shiny Tools: A Minimalist AI Stack That Actually Makes You Money Day 7 of trying to get 20 paying customers in 40 days. Currently at 0. How We Blew Up Our Event Pipeline at 3 AM Because the Treasure Hunt Engine Had No Clear Operator Bounds Autograder - Finish-Up-A-Thon Federico@Cursor,Dimma@Fireworks深入探讨Composer2技术 Hermes Memory Providers: A Complete Breakdown for New Users How I Built a Cinematic Scroll Experience with GSAP and ScrollTrigger I Built a Free Spelling Bee Solver and Analysis Tool — Here's What It Does Stop Over-Engineering Your UI: Material 3 for Blazor (Without the JS State Management Nightmare) I just created the best web FullStack framework in Rust language: the Rullst! I did with the help of AI, but my tokens are over, can you help me? ASF Project Spotlight: Apache Iceberg babelForge TIL 5/27/2026 Broken Software I built a CLI that scaffolds agentic workflows for Claude Code Testing a LiveView App with Playwright: Fixing Navigation Timeouts I Turned on Agent Tracing for 30 Days. 4 Hidden Bottlenecks Were Eating 47% of My Tokens. How I monitor CVEs daily with a 50-line Python script Apache Geode 2.0, Part II: Rebuilding a Distributed System for the Modern Java Era HiTerm: A Free Remote Terminal for AI Coding Agents (Claude Code, Codex, Gemini CLI) 5 Free Online Tools You Didn't Know About — No Signup How I Finished My AI Code Reviewer Using GitHub Copilot The auth_rls_initplan linter has a blind spot: SECURITY DEFINER bodies Upgrading OtakuShelf to JHipster 9.1.0 Polishing the catalog (and reading the agent's receipts) Adding the anime side without holding my breath Pairing up: scaffolding OtakuShelf with an agent State.js Tutorial: Creating Reusable UI Components with Pure CSS Reactivity Deskbrid: A Linux Desktop HAL Built Entirely by AI Agents The Day the Treasure Hunt Engine Drowned in 300 ms Queries How I Built a Marriage Biodata PDF Generator in Next.js Supply Chain & AI Security: GlassWorm Takedown, Prompt Injection RCE, Ubuntu 24 Hardening AI Agent Production Challenges: Failures, Starlette Vulnerability, Code Gen SQLite Bugfix, PostgreSQL Migrations & Filesystem API Paradigm CUDA 13.3 Lands, AI Writes Blackwell Kernels, & FP4 VRAM Optimization for LLMs How I Found a Fake Job Assessment Repo Hiding Malware Inside SVG Files Building the Pipes: Core Data Engineering Concepts Explained Ultimate 1-Minute Xray/3x-ui Setup: VLESS, Hysteria2, Caddy Self-Steal & Smart Outbounds in One Script [Boost] Vamos falar de IA. Mas de outro lugar. Stop Duplicating Code! Is "Integration Hell" Just Laziness or a Systemic Architecture Failure?
GitHub Copilot Workspace Review: Task-Level AI Coding in the Browser
pickuma · 2026-05-28 · via DEV Community

I tested GitHub Copilot Workspace on 12 real tasks across three repositories in May 2026 — a mix of bug fixes, feature additions, and documentation updates. My goal was to figure out whether the spec-first, browser-based workflow actually produces useful code, or whether it is a demo that falls apart when you ask it to do real work. The answer sits somewhere between impressive and frustrating, with the tool's success rate varying dramatically based on task size, repository maturity, and how well you write the initial specification.

The Spec-First Workflow Forces Better Communication

Copilot Workspace changes the AI coding interaction in one fundamental way that I have not seen elsewhere: you do not start with code, you start with a specification. When I opened Workspace on a Next.js project and typed "add rate limiting to the API routes using the existing rate-limit.ts utility," the system did not immediately generate code. It spent roughly 15 seconds reading my repository, then produced a three-step implementation plan: (1) import the rate-limit utility in each route, (2) wrap the route handler with the rate limiter, (3) add a test for the rate-limited behavior.

I could approve the plan as-is, reject individual steps, or add revision notes before any code was written. On this particular task, the plan was correct and I approved it. Workspace then executed each step, modified five route files, and produced a draft pull request with a clear description and a summary of what changed. The entire process — from typing the specification to having a reviewable PR — took 4 minutes and 12 seconds.

This planning phase is not window dressing. On a different task where I asked Workspace to "add WebSocket support to the chat feature," it read the repository and surfaced during planning that the project was deployed on Vercel's serverless functions, which do not support persistent WebSocket connections. It suggested using Vercel's Edge Functions with a third-party real-time service instead. That kind of environment-aware planning caught a mistake I would have made if I had started coding immediately. The tool did not just generate code — it prevented me from heading down a dead-end path.

Across the 12 tasks I tested, Workspace's planning phase identified a structural or compatibility issue before code generation in 3 cases — roughly 25 percent of the time. When the plan was wrong, I could revise it before any code changed. That alone makes Workspace safer than tools that start generating code immediately based on a one-sentence prompt with no repository awareness.

Repository Awareness Creates Better First Drafts

Workspace's access to GitHub's full repository context — commit history, issue discussions, existing PR review comments, and file structure — produces code that fits the project better than any other AI tool I have tested. The generated code observes the same naming conventions, file organization patterns, and error handling styles that the human-authored code uses.

I tested this systematically. On a Python FastAPI project where all existing endpoints used a custom handle_errors decorator for error handling, I asked Workspace to add a new health check endpoint. The generated code used handle_errors without being told to — it had learned the pattern from existing routes. When I asked Cursor and Copilot Chat the same prompt, both generated correct code but used a try-except block instead, because neither had the repository-level context to know that handle_errors was the project's standard pattern.

The self-correction loop during execution is also worth describing because it is not widely documented. When Workspace generates code that fails the project's linter or type checker, it re-reads the error, revises the file, and tries again. I watched this happen on a TypeScript task where the generated code used a type that no longer existed in the project. Workspace caught the TypeScript compilation error, checked the current type definitions, and corrected the import — all without my intervention. On a separate task, the self-correction loop got stuck: it fixed one ESLint warning but introduced another, fixed the second but reintroduced the first, and cycled three times before I stepped in and resolved the conflict manually.

The self-correction success rate in my testing was roughly 70 percent for lint errors and 60 percent for type errors. When it works, it saves the tedious cycle of watching CI fail, reading the error, fixing the code, and pushing again. When it fails, it wastes time you could have spent fixing the issue yourself. I learned to watch the execution log for signs of cycling and intervene after the second failed correction attempt.

The Test Generation Is a Starting Point, Not a Safety Net

Here is where I have to draw a hard line between what Workspace promises and what it delivers. Workspace generates tests for every change it makes, and the tests follow the project's existing testing conventions — Jest for the Next.js project, pytest for the Python project, Go's testing package for the Go project. The tests run and pass.

But the tests are shallow in a way I find concerning. On 12 tasks, Workspace wrote tests that covered happy paths and one or two obvious edge cases for 11 of them. It never wrote a test for an error boundary, a race condition, a timeout scenario, or an integration failure. On a rate-limiting task, it tested that the rate limiter allowed requests under the limit and blocked requests over the limit, but missed the case where the rate limit counter reset mid-window due to a clock skew. On a file upload task, it tested that files under the size limit were accepted and files over the limit were rejected, but never tested what happened when the file upload itself failed due to a network error.

I do not think this is a bug in Workspace. It writes the tests a human would write if they were given the specification and told to produce a first draft in ten minutes. The tests are directionally correct but do not replace a human review of test coverage. If you merge Workspace-generated code with only its own tests as verification, you are shipping untested edge cases. I reviewed each Workspace-generated PR against my team's code review checklist and found that 8 out of 12 PRs required additional tests I had to write manually.

Where the Tool Goes Wrong

The sweet spot for Workspace is changes that touch three to five files. I ran five tasks in this range, and all five produced correct plans and working code on the first attempt. The sixth task — adding a new API endpoint with database migrations, schema changes, and frontend updates — touched 14 files. The plan proposed reasonable steps, but during execution, step three depended on a schema change from step two that had not been applied yet. Workspace generated the frontend code against the old schema, then corrected it after the migration ran, but the correction introduced a type mismatch that the TypeScript compiler caught. The self-correction loop fixed it eventually, but the process took three correction cycles and produced a diff that would take a reviewer longer to verify than the initial code was worth.

The language support is uneven in ways that affect the planning quality, not just the code output. With TypeScript and Python, the plans were specific and well-structured — they named actual files, referenced existing functions, and proposed concrete changes. With Go, the plans were more generic — they described what needed to change but were less precise about where and how. With a Rust project I tested for curiosity, the plan was essentially a high-level summary with no file-specific guidance. Workspace still generated code for the Rust task, but it was wrong in ways the planning phase should have caught.

The browser-based workflow is polarizing enough that I need to address it directly. You cannot edit code locally during a Workspace session. The planning, execution, and review happen entirely in GitHub's web interface. If you discover a mistake during execution that requires manual intervention, you either fix it through the browser-based diff viewer — which is functional but not comfortable for significant edits — or abort the session, fix the code locally, and try again. I found myself wanting to open the code in my editor at least once per task, and the inability to do so made the experience feel constraining rather than freeing.

When I Use Workspace and When I Skip It

After 12 tasks, my behavior has settled into a clear pattern. For well-scoped additions to mature repositories — adding a configuration option where similar options already exist, implementing a new API endpoint that mirrors existing endpoints, adding a documentation page that follows the existing doc structure — I reach for Workspace first. The planning phase catches environment-specific issues I would otherwise miss, the generated code follows project conventions, and the draft PR format integrates cleanly with my team's review process.

For greenfield features that introduce new patterns, architectural changes that touch more than eight files, or work on repositories with inconsistent conventions, I skip Workspace and code manually. The planning phase loses coherence above roughly eight files, the self-correction loop becomes a time sink instead of a time saver, and the browser-based editing makes manual intervention more painful than it needs to be.

The most interesting use case I found is onboarding. When a new team member asked me how to add a rate limit to an endpoint — not knowing which utility to use, which files to modify, or which testing patterns to follow — I had them open Workspace, write the specification themselves, review the generated plan, and walk through the resulting PR. The plan taught them the project's conventions faster than I could have explained them, and the generated tests showed them the testing patterns we use. Workspace worked better as a teaching tool than as an autonomous developer, and I expect that use case to grow as the team behind the tool improves the planning phase reliability for larger tasks.


Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.