惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

FairLens AI: An Intelligent Dashboard for Automated Bias Auditing AI Metrics Decoded: From Parameters to TOPS I made git merge finish itself — in VS Code, in my terminal, and in CI You just can’t miss this… Redis Essentials: Architecture, Caching, and Setup Docker with AI: A Practical Guide to Running LLMs, Agents and MCP Design to Code #5: Using AI to Build a Design System Open Graph protocol: canonical reference How a 400-Engineer SaaS Company Cut PR-to-Production from 4.2 Days to 6.4 Hours with Claude Code Multi-Agent DevOps 💬 Embedded AI Chatbots vs Popup Bubbles — Which One Creates Better Engagement? Bajándole todos los minutos posibles al CI del backend con mas de 1000 tests Harness Engineering: Stop Re-Prompting Your Coding Agent Every Session HTML meta referrer: canonical reference AWS MCP Server Just Gave AI Agents Your Cloud Keys — Here's Why That Should Worry You Announcing the Trust Identity Protocol (TIP): HTTPS for the AI Era We built the feature in two days. Making it reliable took two weeks. LuisCore /for-agents.json — agent bootstrap — daily syndication · 2026-05-26 A Curious Journey Into Reverse Engineering an AI-Generated Python .exe Part 2: Enterprise Decision Intelligence Architecture: AI Governance, Threshold Policy Engines, and Operational AI Systems I will continue using Devise with Rails 8! The Developer's Guide to Picking the Right AI Code Model in 2026 (I Spent $500 So You Don’t Have To) 30 Kubernetes Tasks Every CKA Candidate Should Practice Before Exam Day Why Some Websites Feel Instantly Better to Use Advanced React Patterns I Wish I Knew 5 Years Ago ¿Cómo optimizar algoritmos en arreglos y listas con la técnica de dos punteros? I scanned 8 popular open source repos with one command. Here's what I found. mcp-probe v1.6.0: Stricter GitHub Actions checks for MCP CI gates How we connect two strangers' webcams fast (and keep the TURN bill small) LLM Agents Are Now Finding Zero-Days: How AI is Autonomously Rewriting the Rules of Vulnerability Research Minimal Code Doesn’t Mean Stable Code How I manage 40+ skills across Claude Code, Codex, and .agents folders Hardening Stealth Browser Fingerprint Integrity and State Persistence Quick Tip: Benchmarking Multimodal APIs in Under 10 Minutes How I Slashed My AI API Bill by 92% in 2026 — A Cost Optimizer's Speed Benchmark Guide How I Slashed My AI API Bill by 95% — A Practical Guide for 2026 A Go outbox library that runs inside your own DB transaction How I Built a Credit Optimizer That Saves 30-75% on AI Agent Costs (Open Architecture) The Missing POP: How I Ported a Yul Contract to Huff by Reading Every Opcode The Moment the Config Parser Became the Bottleneck Churn Tool Stack by Revenue Stage ($5K to $50K+) What I Learned Exploring AI-Generated 3D: A Hands-On Tour of Meshy, Tripo, and Three.js Day 15 - Software Composition Analysis(SCA) Contributing Upstream Instead of Forking: My grape-swagger-rails Story Behind The Badge: How We Built 2,000 Hackable Badges For Temporal Replay Access Control Doesn't Scale Linearly -- Part 3 33x faster than Rust: Why I stopped waiting for my compiler and built my own. I Built My First Production AWS Project as a Career Changer Why Detecting PII Matters More Than Ever JSON Schema in 10 Minutes — Validation, Types & Real Examples Python Tasks How I Started My Cybersecurity Journey as an SQA Engineer 🔐 Why "fancy fonts" in Discord and Instagram bios turn into boxes ☁️ GKE private cluster setup — common mistakes and how to avoid them I Thought a Username Didn’t Matter… Until I Saw How Much People Care About It Claude for Small Business: 382K Day-One Buyer's Guide I Built a Diagnostic Toolkit for PyTorch Because I Was Tired of Guessing Why Models Fail How I Built an AI-Powered Incident RCA Platform with LangGraph and RAG The Paywall Was a Painted Door Sonnet hallucinated. My agent stored it as fact. How React-Style Time-Slicing Keeps UIs Responsive 这个 Princeton 开源项目让 AI 自己修 Bug,19K Stars 但 90% 的人只用了 1% 功能 🔥 SWE-agent's 5 Hidden Uses Nobody Told You About 🔥 Decompiling Serial Number U-36: Python TERCOM Reconstruction, Cryptographic Logistical Forensics, and Swarm Consensus Fault Tolerance Microservices Patterns You Cannot Outrun a Wave I Fired My Entire Node.js Stack — Rust Rebuilt It in 3 Weeks (The Ugly Truth) BoxAgnts Introduction (2) — AI Agent Toolbox Cursor 3 ships parallel AI agents. Here is the multi-agent workflow that actually works. Prisma-7 A Complete Beginners Guide (With Free Cloud Database!) Akses HDD Rumah dari Laptop Kantor Pakai Tailscale + SMB (Tanpa VPN Ribet) Content Pipeline in MonoGame: Why I Don't Use It Debug Log #1 — The Pipeline That Looked Broken Data Structures in JavaScript: When to Use What (2026) BGP Route Flap Damping: A Solution or a New Problem? First look at AWS DevOps Agent The Next Big “Cult App” Probably Isn’t Another Social Media Platform From Template to Production-Shaped: An AI-Native Dev Flow for Go Side Projects Idempotency Keys: The API Pattern That Saves You From Duplicate Payments and Phantom Records Everyone's Building Jarvis. Nobody's Even Close. The Moment the Jaeger Tracer Exhausted Itself and What We Switched To How to Fix Tool-Use Loops in Autonomous Coding Agents Months of self-testing: Citations shine, other features remain unproven. Claude Code for Canary Deployments: How I Ship to 1% of Users Before Breaking Everything Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET) 20 Years of GPUs in Numbers: How FLOPS & TDP Grew, and Who Led the NVIDIA vs AMD Race (open dataset, 13.5k GPUs) Espressif Reveals CoreBoard and Korvo Dev Kits for ESP32-S31 Composable Abstraction Layer: o pattern que faltava entre Pinia e seus componentes Vue Your GitHub Actions Logs Are Leaking LLM Keys and Your SIEM Isn't Catching It Solving Complex Logic with Claude and Research Papers Building TheEpicBook: A Deep Dive into a Node.js Monolithic Web Application Haber yazilimi, haber scripti, haber sistemi: ayni urun, uc ayri arama niyeti Predicting Blood Glucose Fluctuations: Building a Transformer-based CGM Forecaster with PyTorch & InfluxDB Pre-task hooks: the one-line wire-up that gives your Hono agent shared memory Concurrent writes to a shared agent memory: what we shipped, what we punted on Building a Production Serverless URL Shortener on AWS — 21 Articles, Every Test Run for Real My CKA Cheat Sheet: Commands, Aliases, and Documentation Tricks I Used During the Exam Frontend Engineering Beyond Pixels: The Architecture of Digital Accessibility VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA into a Fancy Imitation Learner Fabric AI Functions Turn GenAI Into a Data Pipeline Step Proximate vs Ultimate: The Bug Is Never Just the Bug
Analyzing 1,000 Engineering Problems Through GitHub Data
Pavanipriya · 2026-05-26 · via DEV Community

In my previous article, I wrote about the GitHub mining process in user research and explained how it can benefit engineers, organizations, and open-source communities. Here is the link:

In this article, I am going to explain the step-by-step process for conducting the research and creating a report.

Step: 1 Define Research Questions

Before starting GitHub mining research, first define the problem you want to understand. Ask yourself: What challenge am I trying to investigate? This step helps make the research focused and prevents collecting unnecessary data. For example, in a project like KServe, you may want to understand why users struggle to deploy models, what usability issues make onboarding difficult, which workflows create the most developer frustration, or what causes deployment failures. These questions help identify the user experience (UX) and developer experience (DX) problems that need deeper investigation.

Define Research Questions

After identifying the problem area, create clear research questions to guide the study. Start with one primary research question that represents the main goal of the research.

Example Primary Research Question:
“What usability barriers do users experience while deploying models using KServe?”

Then create several sub-questions to explore specific areas in more detail, such as:

  • Which tasks generate the most confusion?
  • What documentation gaps exist?
  • Which error messages appear repeatedly?
  • How do developers solve issues?

These smaller questions help organize data collection and analysis.

At the end of this stage, your output should include three things:

  • Research Objective — explains the overall purpose of the study.
  • Research Questions — define what you want to investigate.
  • Scope — sets boundaries for the research by describing what data, users, repositories, workflows, or time period will be included.

Having these clearly defined creates a strong foundation before moving into GitHub data collection and analysis.

Step: 2 Select the GitHub Data Sources

When conducting GitHub mining for usability research, it is important to understand that GitHub contains multiple sources of data, and each source provides a different perspective on the user experience (UX) and developer experience (DX). Instead of collecting information from everywhere, researchers should prioritize sources that help answer their research questions.

Select the GitHub Data Sources

For example:

  • Issues help uncover user pain points and reveal where users struggle.
  • Pull Requests (PRs) show how problems are fixed and provide insight into engineering decision-making.
  • Discussions help researchers understand user expectations, questions, and community needs.
  • Commits reveal engineering priorities by showing what teams spend time improving.
  • Documentation helps evaluate communication quality and whether instructions are clear enough for users.
  • Labels provide structure by categorizing issues into themes or problem areas.
  • Comments often contain valuable insights into user emotions, frustrations, collaboration patterns, and real-world workarounds.

By combining these sources, researchers can build a more complete understanding of both technical challenges and usability challenges that users experience.

Step : 3 Create Inclusion and Exclusion Criteria

When conducting GitHub mining for usability research, it is important to create clear inclusion and exclusion criteria before collecting data. This step makes your study systematic, transparent, and reproducible, which means another researcher could follow the same process and reach similar results. Inclusion criteria define what data should be included in the study, while exclusion criteria define what should be removed to avoid irrelevant or low-quality information.

Create Inclusion and Exclusion Criteria

For example, you may decide to include:

  • Open and closed issues because both contain valuable user feedback and problem history.
  • Deployment-related issues to understand deployment challenges.
  • User questions to identify confusion and unmet needs.
  • Documentation complaints to evaluate communication and onboarding quality.
  • Feature requests to understand user expectations and improvement opportunities.

At the same time, you should exclude:

  • Spam issues because they do not provide useful research insights.
  • Empty issues with no meaningful information.
  • Duplicate issues that repeat the same problem.
  • Pure backend implementation discussions that do not relate to user experience or usability goals.

To make your research process clear and repeatable, document:

  • Time range of the data collection.
  • Number of issues selected.
  • Selection logic used to choose the dataset.

For example:

Dataset Definition:
“Issues created between January–June 2026.”

Recording these decisions helps explain how the data was collected and increases the credibility and reliability of your research findings.

Step : 4 Data Collection (Mining):

After defining your research questions and selecting your data sources, the next step is to collect the data in a structured way. A simple and effective approach is to create a spreadsheet where each row represents one GitHub issue or record. Organizing data in a spreadsheet makes analysis easier, helps identify patterns, and ensures that findings can be traced back to the original source.

Typical columns may include ID, Issue Title, Link, Date, Type, Description, Comments, and Labels.

Data Collection

During data collection, capture both metadata and content details.

Metadata includes information such as:

  • Issue number
  • Whether the issue is open or closed
  • Author
  • Any labels attached to the issue

Then collect the content itself by recording:

  • Issue description
  • Steps to reproduce the problem
  • Error messages
  • Resolution or final outcome (if available)

It is also important to analyze the conversation around the issue, including:

  • Maintainer responses
  • Community suggestions
  • Workarounds shared by users

These interactions often reveal usability challenges and show how users overcome problems.

For example, imagine a GitHub issue titled:

“Deployment stuck after InferenceService creation”

Instead of only recording the issue title, document:

  • User goal — what they were trying to achieve
  • Failure point — where the process broke
  • Fix — what solved the issue
  • Resolution time — how long it took to resolve

Capturing data this way transforms GitHub issues into research evidence that helps uncover user behavior, developer pain points, and opportunities to improve usability and developer experience (DX).

Step: 5 Clean and Prepare the Data

GitHub data is often messy and unstructured, so it needs to be cleaned before analysis. First, remove irrelevant data such as empty reports, duplicate issues, and non-user-related discussions, since these do not help in understanding usability problems. Cleaning the data ensures that only meaningful information is included in the study.

Next, normalize the data by standardizing how issues are written. This means converting different phrases that describe the same problem into a consistent format. For example, “deployment broken” can be normalized to “Deployment Failure”. This makes it easier to compare and analyze similar issues across the dataset.

After normalization, create clear categories or themes to group similar issues together. For example, different original texts like “Deployment failed,” “Cannot start,” and “Confusing YAML” can be grouped under standard themes such as Deployment or Configuration. This helps researchers identify patterns, understand common problem areas, and analyze usability issues more effectively.

Step: 6 Qualitative Analysis (The Most Important Step)

This is where UX research actually begins. At this stage, you read GitHub issues carefully, line by line, and start applying a process called coding, which helps turn raw text into meaningful insights.

Qualitative Analysis

The first step is open coding, where you highlight important observations from each issue. For example, if an issue says, “I followed docs but model never becomes ready,” you break it into simple codes like documentation confusion, deployment failure, and missing feedback. These codes describe what is really happening in the user’s experience.

Next is axial coding, where you group similar codes together to find patterns. For example, codes like missing instructions, YAML confusion, and setup unclear can all be grouped under the theme Documentation. This step helps organize individual problems into broader categories.

Finally, you do selective coding, where you connect themes to form deeper insights. For example, under the theme Documentation, you might conclude that users frequently struggle during deployment because the setup instructions do not match real cluster behavior. This final step transforms raw GitHub issues into clear UX insights that explain user problems and system gaps.

Step : 7 Quantitative Analysis

To understand usability issues in GitHub mining research, it is important to measure patterns in the data. This helps researchers move from individual issues to broader insights about user experience (UX) and system behavior. Different metrics are used to analyze the data in a structured way.

Quantitative Analysis

For example, the number of issues shows the overall volume of problems, while resolution time indicates how much effort or friction is needed to fix them. The comment count helps measure complexity, since more discussion often means more complicated issues. Label frequency highlights common problem areas or hotspots, and repeat reports show how severe or recurring a problem is.

After collecting these metrics, researchers can create simple visualizations to understand trends more clearly. For example, they can analyze issues by category, track issue trends over time, measure the average time to close issues, and identify the top recurring pain points.

For instance, after analyzing the data, you might find that most issues fall into a few key areas such as Deployment (42%), Documentation (28%), Configuration (20%), and Other (10%). This kind of breakdown helps researchers clearly see where users struggle the most and where improvements are needed.

Step : 8 Convert Findings into UX Insights

In UX research, especially in GitHub mining, it is important to move beyond simple counts like “20 users reported errors.” Numbers alone do not explain the real problem. Instead, the focus should be on understanding why users are struggling and what it means for their experience. For example, instead of just reporting issues, we can say: “Users struggle because system feedback is unclear.”

To do this, researchers follow a simple thinking process: Observation → Pattern → Insight → Recommendation. First, you make an observation by looking at raw data, such as issues or comments. Then you identify a pattern by grouping similar observations together. After that, you form an insight, which explains what the pattern means in terms of user behavior. Finally, you create a recommendation that suggests how to improve the system.

For example, an observation might be that many users ask the same deployment question. The pattern could be identified as documentation gaps. From this, the insight becomes that users cannot predict system behavior during deployment. The final recommendation would be to improve the deployment guide so users have clearer instructions and better understanding.


Report the findings

Reporting findings in a GitHub mining UX research report is not just about listing issues—it’s about turning raw GitHub data into clear, evidence-based insights and actionable recommendations.

A good findings section should answer:

  • What is happening?
  • Why is it happening?
  • Why does it matter?
  • What should be done about it?

I am going to explain entire process on how to report research findings.

1. Structure of the Findings & How to Write Each Finding (Step-by-Step Template)

When writing each finding in GitHub mining research, it is helpful to follow a clear step-by-step structure. First, start by naming the theme clearly. The theme should describe a real UX problem, not just a GitHub issue name or generic label. For example, instead of “Issue 23” or “Bug problems,” use meaningful titles like “Deployment Configuration Confusion” or “Lack of Clear Error Messages during InferenceService Setup”. This helps readers immediately understand the user experience problem.

Structure of the Findings & Steps

Next, describe what you found at a high level. This is a short summary of the main issue, such as users struggling with model deployment in KServe, especially during YAML configuration and InferenceService creation. This gives an overall picture of the problem before going into details.

After that, add strong evidence from GitHub data, such as issue numbers and short user quotes. For example, users might say “I followed the documentation but the InferenceService never becomes ready” or “The YAML example does not work in my cluster.” This evidence can include direct quotes, issue references, or summarized patterns from multiple users.

Then, identify the pattern by combining similar observations. For example, multiple issues may show that users struggle with YAML configuration and unclear deployment steps. This helps turn individual data points into a broader trend.

After identifying the pattern, explain the impact on users. This describes how the issue affects their experience, such as failed deployments, repeated retries, increased onboarding time, confusion, frustration, or even abandonment of the tool.

The next and most important step is writing the insight, which explains why the problem is happening. For example, users may not receive clear feedback from the system during deployment, making it hard to know whether errors come from configuration or system issues. This step goes beyond the data and provides interpretation.

Finally, add a clear and actionable recommendation to improve the system. For example, improving documentation with validated YAML examples and adding real-time status messages for InferenceService creation can help users understand and resolve issues more easily.

2. Full Example of a Well-Written Finding

Example Image

Finding 1: Deployment Configuration Confusion

Description: Many users experienced difficulties deploying models using KServe, particularly during YAML configuration and InferenceService setup.

Evidence
“The InferenceService never becomes ready after applying YAML.” — Issue #1245
“Example YAML does not work in my Kubernetes cluster.” — Issue #1310
“I am not sure what fields are required.” — Issue #1288

Pattern : Multiple users reported inconsistent or unclear YAML configuration requirements during deployment.

Impact : This results in failed deployments, repeated debugging attempts, and delays in onboarding new users.

Insight: The system does not provide sufficient validation or feedback during deployment, leading to uncertainty in diagnosing configuration issues.

Recommendation : Provide validated, version-specific YAML templates Add real-time deployment status and error messages
Improve documentation with step-by-step deployment flow

3. How Many Findings Should You Include?

For a strong GitHub mining research report, it is better to focus on a small number of meaningful themes instead of too many small issues. Usually, 3 to 6 major themes is ideal because it keeps the analysis clear, focused, and easy to understand. If you include too many themes or individual issues, the report can become confusing and lose its main message.

Findings Should You Include

Common examples of good themes include areas like deployment issues, documentation gaps, error handling problems, configuration complexity, and performance concerns. These themes represent repeated patterns across many GitHub issues and help explain the main usability and developer experience challenges in a structured way.

4. Visual Ways to Strengthen Findings

To understand patterns in GitHub mining research, it is important to present the findings in a clear and structured way using visual summaries. You can include a theme frequency chart, which shows how often each problem appears in the data, such as Deployment issues (40%), Documentation (25%), and Errors (20%). This helps quickly identify which areas users struggle with the most.

Visual Ways to Strengthen Findings

You can also include an issue timeline, which shows when problems occur most frequently and helps reveal spikes or trends over time. Another useful element is quote highlights, where you include 2–3 strong user quotes for each theme to support your findings and give real evidence of user experience.

Together, these elements make your analysis clearer, more visual, and easier to understand.


Conclusion:

In this article, I explained the complete step-by-step process of GitHub mining for UX research, from defining research questions to converting raw GitHub data into meaningful insights. We saw how issues, pull requests, and documentation can be used to understand real user and developer experiences, and how structured methods like coding, categorization, and pattern analysis help transform messy data into clear findings. The goal of this process is not just to analyze issues, but to understand why users struggle and how systems can be improved. By following this approach, researchers and engineers can identify real usability problems, improve developer experience, and build better open-source tools through evidence-based insights.