In my previous article, I wrote about the GitHub mining process in user research and explained how it can benefit engineers, organizations, and open-source communities. Here is the link:
In this article, I am going to explain the step-by-step process for conducting the research and creating a report.
Step: 1 Define Research Questions
Before starting GitHub mining research, first define the problem you want to understand. Ask yourself: What challenge am I trying to investigate? This step helps make the research focused and prevents collecting unnecessary data. For example, in a project like KServe, you may want to understand why users struggle to deploy models, what usability issues make onboarding difficult, which workflows create the most developer frustration, or what causes deployment failures. These questions help identify the user experience (UX) and developer experience (DX) problems that need deeper investigation.
After identifying the problem area, create clear research questions to guide the study. Start with one primary research question that represents the main goal of the research.
Example Primary Research Question:
“What usability barriers do users experience while deploying models using KServe?”
Then create several sub-questions to explore specific areas in more detail, such as:
- Which tasks generate the most confusion?
- What documentation gaps exist?
- Which error messages appear repeatedly?
- How do developers solve issues?
These smaller questions help organize data collection and analysis.
At the end of this stage, your output should include three things:
- Research Objective — explains the overall purpose of the study.
- Research Questions — define what you want to investigate.
- Scope — sets boundaries for the research by describing what data, users, repositories, workflows, or time period will be included.
Having these clearly defined creates a strong foundation before moving into GitHub data collection and analysis.
Step: 2 Select the GitHub Data Sources
When conducting GitHub mining for usability research, it is important to understand that GitHub contains multiple sources of data, and each source provides a different perspective on the user experience (UX) and developer experience (DX). Instead of collecting information from everywhere, researchers should prioritize sources that help answer their research questions.
For example:
- Issues help uncover user pain points and reveal where users struggle.
- Pull Requests (PRs) show how problems are fixed and provide insight into engineering decision-making.
- Discussions help researchers understand user expectations, questions, and community needs.
- Commits reveal engineering priorities by showing what teams spend time improving.
- Documentation helps evaluate communication quality and whether instructions are clear enough for users.
- Labels provide structure by categorizing issues into themes or problem areas.
- Comments often contain valuable insights into user emotions, frustrations, collaboration patterns, and real-world workarounds.
By combining these sources, researchers can build a more complete understanding of both technical challenges and usability challenges that users experience.
Step : 3 Create Inclusion and Exclusion Criteria
When conducting GitHub mining for usability research, it is important to create clear inclusion and exclusion criteria before collecting data. This step makes your study systematic, transparent, and reproducible, which means another researcher could follow the same process and reach similar results. Inclusion criteria define what data should be included in the study, while exclusion criteria define what should be removed to avoid irrelevant or low-quality information.
For example, you may decide to include:
- Open and closed issues because both contain valuable user feedback and problem history.
- Deployment-related issues to understand deployment challenges.
- User questions to identify confusion and unmet needs.
- Documentation complaints to evaluate communication and onboarding quality.
- Feature requests to understand user expectations and improvement opportunities.
At the same time, you should exclude:
- Spam issues because they do not provide useful research insights.
- Empty issues with no meaningful information.
- Duplicate issues that repeat the same problem.
- Pure backend implementation discussions that do not relate to user experience or usability goals.
To make your research process clear and repeatable, document:
- Time range of the data collection.
- Number of issues selected.
- Selection logic used to choose the dataset.
For example:
Dataset Definition:
“Issues created between January–June 2026.”
Recording these decisions helps explain how the data was collected and increases the credibility and reliability of your research findings.
Step : 4 Data Collection (Mining):
After defining your research questions and selecting your data sources, the next step is to collect the data in a structured way. A simple and effective approach is to create a spreadsheet where each row represents one GitHub issue or record. Organizing data in a spreadsheet makes analysis easier, helps identify patterns, and ensures that findings can be traced back to the original source.
Typical columns may include ID, Issue Title, Link, Date, Type, Description, Comments, and Labels.
During data collection, capture both metadata and content details.
Metadata includes information such as:
- Issue number
- Whether the issue is open or closed
- Author
- Any labels attached to the issue
Then collect the content itself by recording:
- Issue description
- Steps to reproduce the problem
- Error messages
- Resolution or final outcome (if available)
It is also important to analyze the conversation around the issue, including:
- Maintainer responses
- Community suggestions
- Workarounds shared by users
These interactions often reveal usability challenges and show how users overcome problems.
For example, imagine a GitHub issue titled:
“Deployment stuck after InferenceService creation”
Instead of only recording the issue title, document:
- User goal — what they were trying to achieve
- Failure point — where the process broke
- Fix — what solved the issue
- Resolution time — how long it took to resolve
Capturing data this way transforms GitHub issues into research evidence that helps uncover user behavior, developer pain points, and opportunities to improve usability and developer experience (DX).
Step: 5 Clean and Prepare the Data
GitHub data is often messy and unstructured, so it needs to be cleaned before analysis. First, remove irrelevant data such as empty reports, duplicate issues, and non-user-related discussions, since these do not help in understanding usability problems. Cleaning the data ensures that only meaningful information is included in the study.
Next, normalize the data by standardizing how issues are written. This means converting different phrases that describe the same problem into a consistent format. For example, “deployment broken” can be normalized to “Deployment Failure”. This makes it easier to compare and analyze similar issues across the dataset.
After normalization, create clear categories or themes to group similar issues together. For example, different original texts like “Deployment failed,” “Cannot start,” and “Confusing YAML” can be grouped under standard themes such as Deployment or Configuration. This helps researchers identify patterns, understand common problem areas, and analyze usability issues more effectively.
Step: 6 Qualitative Analysis (The Most Important Step)
This is where UX research actually begins. At this stage, you read GitHub issues carefully, line by line, and start applying a process called coding, which helps turn raw text into meaningful insights.
The first step is open coding, where you highlight important observations from each issue. For example, if an issue says, “I followed docs but model never becomes ready,” you break it into simple codes like documentation confusion, deployment failure, and missing feedback. These codes describe what is really happening in the user’s experience.
Next is axial coding, where you group similar codes together to find patterns. For example, codes like missing instructions, YAML confusion, and setup unclear can all be grouped under the theme Documentation. This step helps organize individual problems into broader categories.
Finally, you do selective coding, where you connect themes to form deeper insights. For example, under the theme Documentation, you might conclude that users frequently struggle during deployment because the setup instructions do not match real cluster behavior. This final step transforms raw GitHub issues into clear UX insights that explain user problems and system gaps.
Step : 7 Quantitative Analysis
To understand usability issues in GitHub mining research, it is important to measure patterns in the data. This helps researchers move from individual issues to broader insights about user experience (UX) and system behavior. Different metrics are used to analyze the data in a structured way.
For example, the number of issues shows the overall volume of problems, while resolution time indicates how much effort or friction is needed to fix them. The comment count helps measure complexity, since more discussion often means more complicated issues. Label frequency highlights common problem areas or hotspots, and repeat reports show how severe or recurring a problem is.
After collecting these metrics, researchers can create simple visualizations to understand trends more clearly. For example, they can analyze issues by category, track issue trends over time, measure the average time to close issues, and identify the top recurring pain points.
For instance, after analyzing the data, you might find that most issues fall into a few key areas such as Deployment (42%), Documentation (28%), Configuration (20%), and Other (10%). This kind of breakdown helps researchers clearly see where users struggle the most and where improvements are needed.
Step : 8 Convert Findings into UX Insights
In UX research, especially in GitHub mining, it is important to move beyond simple counts like “20 users reported errors.” Numbers alone do not explain the real problem. Instead, the focus should be on understanding why users are struggling and what it means for their experience. For example, instead of just reporting issues, we can say: “Users struggle because system feedback is unclear.”
To do this, researchers follow a simple thinking process: Observation → Pattern → Insight → Recommendation. First, you make an observation by looking at raw data, such as issues or comments. Then you identify a pattern by grouping similar observations together. After that, you form an insight, which explains what the pattern means in terms of user behavior. Finally, you create a recommendation that suggests how to improve the system.
For example, an observation might be that many users ask the same deployment question. The pattern could be identified as documentation gaps. From this, the insight becomes that users cannot predict system behavior during deployment. The final recommendation would be to improve the deployment guide so users have clearer instructions and better understanding.
Report the findings
Reporting findings in a GitHub mining UX research report is not just about listing issues—it’s about turning raw GitHub data into clear, evidence-based insights and actionable recommendations.
A good findings section should answer:
- What is happening?
- Why is it happening?
- Why does it matter?
- What should be done about it?
I am going to explain entire process on how to report research findings.
1. Structure of the Findings & How to Write Each Finding (Step-by-Step Template)
When writing each finding in GitHub mining research, it is helpful to follow a clear step-by-step structure. First, start by naming the theme clearly. The theme should describe a real UX problem, not just a GitHub issue name or generic label. For example, instead of “Issue 23” or “Bug problems,” use meaningful titles like “Deployment Configuration Confusion” or “Lack of Clear Error Messages during InferenceService Setup”. This helps readers immediately understand the user experience problem.
Next, describe what you found at a high level. This is a short summary of the main issue, such as users struggling with model deployment in KServe, especially during YAML configuration and InferenceService creation. This gives an overall picture of the problem before going into details.
After that, add strong evidence from GitHub data, such as issue numbers and short user quotes. For example, users might say “I followed the documentation but the InferenceService never becomes ready” or “The YAML example does not work in my cluster.” This evidence can include direct quotes, issue references, or summarized patterns from multiple users.
Then, identify the pattern by combining similar observations. For example, multiple issues may show that users struggle with YAML configuration and unclear deployment steps. This helps turn individual data points into a broader trend.
After identifying the pattern, explain the impact on users. This describes how the issue affects their experience, such as failed deployments, repeated retries, increased onboarding time, confusion, frustration, or even abandonment of the tool.
The next and most important step is writing the insight, which explains why the problem is happening. For example, users may not receive clear feedback from the system during deployment, making it hard to know whether errors come from configuration or system issues. This step goes beyond the data and provides interpretation.
Finally, add a clear and actionable recommendation to improve the system. For example, improving documentation with validated YAML examples and adding real-time status messages for InferenceService creation can help users understand and resolve issues more easily.
2. Full Example of a Well-Written Finding
Finding 1: Deployment Configuration Confusion
Description: Many users experienced difficulties deploying models using KServe, particularly during YAML configuration and InferenceService setup.
Evidence
“The InferenceService never becomes ready after applying YAML.” — Issue #1245
“Example YAML does not work in my Kubernetes cluster.” — Issue #1310
“I am not sure what fields are required.” — Issue #1288
Pattern : Multiple users reported inconsistent or unclear YAML configuration requirements during deployment.
Impact : This results in failed deployments, repeated debugging attempts, and delays in onboarding new users.
Insight: The system does not provide sufficient validation or feedback during deployment, leading to uncertainty in diagnosing configuration issues.
Recommendation : Provide validated, version-specific YAML templates Add real-time deployment status and error messages
Improve documentation with step-by-step deployment flow
3. How Many Findings Should You Include?
For a strong GitHub mining research report, it is better to focus on a small number of meaningful themes instead of too many small issues. Usually, 3 to 6 major themes is ideal because it keeps the analysis clear, focused, and easy to understand. If you include too many themes or individual issues, the report can become confusing and lose its main message.
Common examples of good themes include areas like deployment issues, documentation gaps, error handling problems, configuration complexity, and performance concerns. These themes represent repeated patterns across many GitHub issues and help explain the main usability and developer experience challenges in a structured way.
4. Visual Ways to Strengthen Findings
To understand patterns in GitHub mining research, it is important to present the findings in a clear and structured way using visual summaries. You can include a theme frequency chart, which shows how often each problem appears in the data, such as Deployment issues (40%), Documentation (25%), and Errors (20%). This helps quickly identify which areas users struggle with the most.
You can also include an issue timeline, which shows when problems occur most frequently and helps reveal spikes or trends over time. Another useful element is quote highlights, where you include 2–3 strong user quotes for each theme to support your findings and give real evidence of user experience.
Together, these elements make your analysis clearer, more visual, and easier to understand.
Conclusion:
In this article, I explained the complete step-by-step process of GitHub mining for UX research, from defining research questions to converting raw GitHub data into meaningful insights. We saw how issues, pull requests, and documentation can be used to understand real user and developer experiences, and how structured methods like coding, categorization, and pattern analysis help transform messy data into clear findings. The goal of this process is not just to analyze issues, but to understand why users struggle and how systems can be improved. By following this approach, researchers and engineers can identify real usability problems, improve developer experience, and build better open-source tools through evidence-based insights.































