Augmenting Penetration Testing Methodology with Artificial Intelligence

Black Hills Information Security, Inc.

Bad Habits: An ANTISOC Operation Same Problem, Different Angles: When Red Team and Blue Team Actually Talk to Each Other How to Identify and Exploit New Vulnerabilities Swapper – A Pure Regex Match/Replace Burp Extension A Practical Guide to BloodHound Data Collection Network Engineering Basics Signed, Trusted, and Abused: Proxy Execution via WebView2 Getting Started In Pentesting – Advice From The BHIS Pentest Lead Cloud Security: Tips and Resources for Securing the Cloud Lessons From A Chatbot Incident How to Lead Effective Tabletops Understanding GRC: How to Navigate Risks and Compliance Standards The “P” in PAM is for Persistence: Linux Persistence Technique Malware Analysis: How to Analyze and Understand Malware OSINT: How to Find, Use, and Control Open-Source Intelligence What to Do with Your First Home Lab When the SOC Goes to Deadwood: A Night to Remember Social Engineering and Microsoft SSPR: The Road to Pwnage is Paved with Good Intentions Common Cyber Threats Finding the Right Penetration Testing Company Deceptive-Auditing: An Active Directory Honeypots Tool The Curious Case of the Comburglar How to Set Smart Goals (That Actually Work For You) Inside the BHIS SOC: A Conversation with Hayden Covington Abusing Delegation with Impacket (Part 3): Resource-Based Constrained Delegation Why You Got Hacked – 2025 Super Edition Abusing Delegation with Impacket (Part 2): Constrained Delegation Abusing Delegation with Impacket (Part 1): Unconstrained Delegation GoSpoof – Turning Attacks into Intel Model Context Protocol (MCP) Bypassing WAFs Using Oversized Requests Getting Started with AI Hacking Part 2: Prompt Injection Wrangling Windows Event Logs with Hayabusa & SOF-ELK (Part 2) DomCat: A Domain Categorization Tool Wrangling Windows Event Logs with Hayabusa & SOF-ELK (Part 1) Microsoft Store and WinGet: Security Risks for Corporate Environments Default Web Content MailFail Commonly Abused Administrative Utilities: A Hidden Risk to Enterprise Security Stop Spoofing Yourself! Disabling M365 Direct Send Bypassing CSP with JSONP: Introducing JSONPeek and CSP B Gone Offensive Tooling Cheatsheets: An Infosec Survival Guide Resource DNS Triage Cheatsheet GraphRunner Cheatsheet Burp Suite Cheatsheet Impacket Cheatsheet Wireshark Cheatsheet Hashcat Cheatsheet EyeWitness Cheatsheet Nmap Cheatsheet Netcat (nc) Cheatsheet Hunt for Weak Spots in Your Wireless Network with Airodump-ng from the Aircrack-ng Suite Detecting ADCS Privilege Escalation Vulnerability Scanning with Nmap Getting Started with NetExec: Streamlining Network Discovery and Access How to Use Dirsearch Augmenting Penetration Testing Methodology with Artificial Intelligence – Part 3: Arcanum Cyber Security Bot How to Design and Execute Effective Social Engineering Attacks by Phone Abusing S4U2Self for Active Directory Pivoting Why Use a Macro Pad? Espanso: Text Replacement, the Easy Way Caging Copilot: Lessons Learned in LLM Security Augmenting Penetration Testing Methodology with Artificial Intelligence – Part 2: Copilot Intercepting Traffic for Mobile Applications that Bypass the System Proxy How to Root Android Phones Communicating Security to the C-Suite: A Strategic Approach Offline Memory Forensics With Volatility Getting Started with AI Hacking: Part 1 Go-Spoof: A Tool for Cyber Deception How to Test Adversary-in-the-Middle Without Hacking Tools Canary in the Code: Alert()-ing on XSS Exploits How to Hack Wi-Fi with No Wi-Fi Why Your Org Needs a Penetration Test Program Burp Suite Extension: Copy For Light at the End of the Dark Web Wi-Fi Forge: Practice Wi-Fi Security Without Hardware Avoiding Dirty RAGs: Retrieval-Augmented Generation with Ollama and LangChain Gone Phishing: Installing GoPhish and Creating a Campaign 5 Things We Are Going to Continue to Ignore in 2025 John Strand’s 5 Phase Plan For Starting in Computer Security Questions From a Beginner Threat Hunter GRC for Security Managers: From Checklists to Influence AI Large Language Models and Supervised Fine Tuning Attack Tactics 9: Shadow Creds for PrivEsc w/ Kent & Jordan One Active Directory Account Can Be Your Best Early Warning Introduction to Zeek Log Analysis Indecent Exposure: Your Secrets are Showing Creating Burp Extensions: A Beginner’s Guide Pitting AI Against AI: Using PyRIT to Assess Large Language Models (LLMs) The Top Ten List of Why You Got Hacked This Year (2023/2024) ICS Hard Knocks: Mitigations to Scenarios Found in ICS/OT Backdoors & Breaches Intro to Data Analytics Using SQL Finding Access Control Vulnerabilities with Autorize The Detection Engineering Process Cyber Risk Lessons We Can Learn From Hurricane Preparedness Intro to Desktop Application Testing Methodology What Is Penetration Testing? Adversary in the Middle (AitM): Post-Exploitation Pentesting, Threat Hunting, and SOC: An Overview QEMU, MSYS2, and Emacs: Open-Source Solutions to Run Virtual Machines on Windows

Augmenting Penetration Testing Methodology with Artificial Intelligence – Part 1: Burpference

BHIS · 2025-05-07 · via Black Hills Information Security, Inc.

Craig is a former software developer and red teamer. He has been pentesting at Black Hills Infosec since 2018.

Artificial Intelligence (AI) has been a hot topic in information technology and information security since before I entered the industry. Developments in AI are something that I had been aware of, but I hadn’t chosen to really dive into the subject in terms of leveraging AI as part of my job as a penetration tester. I gave a webcast on penetration testing methodology a while back, and someone asked me afterward how I use AI in my methodology/workflow. At the time, my answer was “I don’t.”

For a long time, I considered AI to be interesting but not particularly useful. However, progress has been made, technology has improved, and it has become clear that AI has matured to the point where we absolutely can use it to help us with our jobs as penetration testers. So, what does that look like? This blog post will be the first in a series of posts where I will describe my initial experiences trying to integrate AI into my penetration testing methodology.

When exploring new technology and incorporating it into your methodology, it’s always a good idea to start by examining what other folks in your space are already doing with that technology. When I initially started going down this path, my BHIS colleague Derek Banks introduced me to a project called burpference. Burpference is a Burp Suite plugin that takes requests and responses to and from in-scope web applications and sends them off to an LLM for inference. In the context of artificial intelligence, inference is taking a trained model, providing it with new information, and asking it to analyze this new information based on its training.

Installing the burpference extension in Burp Suite is a straightforward task. The extension utilized the Jython standalone JAR. Once I downloaded the JAR, I configured the Burp Suite Python environment to point to the JAR. This setting can be found by opening “Extensions settings” in the Extensions tab.

Once the Python environment was configured, I downloaded and unzip the latest burpference release. Burpference generates log files in the extension directory, so I needed to ensure that Burp Suite had write permissions to that location. Next, I opened the “Installed” page of the Extensions tab, clicked the “Add” button, and selected the burpference.py file from the extension directory.

I checked the Output section of the Burp Suite extension loader to ensure no errors occurred. Once the extension was loaded, I opened the new burpference tab and selected a configuration file that pointed to my LLM. For my initial experimentation with burpference, I set up a small (7 billion parameter) deepseek-r1 model in Ollama on an older gaming PC in my lab.

**Burpference Configuration File Pointing to Local LLM**

**Configuration File Selected in Burpference Tab**

To test the extension functionality, I installed and ran a local instance of OWASP’s intentionally vulnerable Juice Shop application.

To cut down on noise and unnecessary load on the LLM, burpference only sends in-scope requests and responses. So, I added the Juice Shop application to the project scope in Burp Suite. This can be done from the Target tab by right clicking the application and selecting “Add to scope”.

**JuiceShop Application Added to Scope**

I encountered two pitfalls that I had to troubleshoot when configuring the extension:

I was running the model on a physically separate host in my lab. By default, Ollama binds to localhost, and I was initially unable to communicate with the model from my testing host where I was running burpference. I was able to fix this by setting the OLLAMA_HOST environment variable to 0.0.0.0 on the host running Ollama.
Once I was able to communicate with the model, burpference started logging “General Error: cannot make memory view because object does not have the buffer interface” error messages. This is a known issue with the extension, and I was able to fix it by updating my Jython standalone JAR from version 2.7.3 to version 2.7.4.

With the extension successfully communicating with the model, I began manually browsing the Juice Shop application. As I browsed the application, I noticed that burpference was sending each request and response to the LLM with the following prompt:

You are a web application penetration tester conducting a comprehensive operation on an application in the offensive stage of the engagement and focused on leveraging security flaws.

Your objective is to examine the HTTP requests and responses that are available through the burp suite proxy history from the web application as we test the application.

This analysis will focus on:

- Request and Response Evaluation: Scrutinizing HTTP requests and responses for security misconfigurations, sensitive data exposure, and other vulnerabilities.
- Authentication and Session Management: Assessing the effectiveness of authentication mechanisms and session handling practices.
- Input Validation and Output Encoding: Identifying weaknesses related to input validation that may lead to injection attacks or cross-site scripting (XSS).

Use reasoning and context to find potential flaws in the application by providing example payloads and PoCs that could lead to a successful exploit.

If you deem any vulnerabilities, include the severity of the finding as prepend (case-sensitive) in your response with any of the levels:

"CRITICAL"
"HIGH"
"MEDIUM"
"LOW"
"INFORMATIONAL" 

for any informational-level findings or observations, for example of a "secure" flag missing from a cookie.

Not every request and response may have any indicators, be concise yet deterministic and creative in your approach.

The HTTP request and and response pair are provided below this line:

[request and response JSON below]

Burpference Prompt (formatted for readability)

The first thing I noticed was that the model responded slowly. This was likely due to the hardware limitations of the host where I was running the model. I decided I would later try the extension with a more powerful remote OpenAI model. The extension sends full requests and responses that will almost certainly contain sensitive information like credentials, session tokens, response data, etc. When performing a penetration test, maintaining the confidentiality of customer data is a high priority, and that makes using remote models that you do not have full control over a serious concern. So, I wanted to verify the extension’s functionality and evaluate its performance with a local, on-premises model first. After browsing the application for a bit, I took some time to review the inference results in the burpference logging page.

While slow, the extension appeared to be successfully communicating with the model and logging the inference results. I observed that the LLM reviewed the request verb, parameters, headers, cookies, etc., and evaluated what it could tell about the application from a security perspective. Ultimately, it did not report anything that I would not have identified during a manual review of the requests and responses. However, it did identify an interesting cookie valued called welcomebanner_status that was set to dismiss, and it even brainstormed a possible attack vector!

**Burpference Inference Response – Interesting Cookie Identified**

Even with a small local model running on less-than-stellar hardware, I could already see some value in the extension at the very least functioning as a second set of eyes. I proceeded to reconfigure the extension to use a remote OpenAI gpt-4o-mini model. As you might expect, I saw much better performance with the larger model. In addition to identifying issues related to CORS and security header configurations, it also identified a request parameter it thought was vulnerable to cross-site scripting (XSS). The model even provided a proof-of-concept payload.

**Potential Cross-Site Scripting Identified with Burpference**

I tried the proof-of-concept request in a browser. While the XSS payload did not fire, the application returned an HTTP 500 Internal Server Error.

Observing this error response through the eyes of an experienced web application tester, it seemed obvious that I should look for a SQL injection vulnerability here, but what about our AI assistant? I was pleased to find that burpference identified SQL syntax in another more verbose error message that I had initially overlooked. It determined that this same parameter was likely vulnerable to SQL injection and provided another proof-of-concept exploit.

**SQL Injection Vulnerability Reported by Burpference**

I tried this PoC in a browser and the application responded with JSON containing all application product information. This was an indication that the payload was successful, and the application was vulnerable to SQL injection.

One thing I noticed while evaluating burpference is that the context for each inference request consisted of only a single request and response. I think this could be a limiting factor in the usefulness of the extension as it currently exists. The smaller local model’s responses plainly stated that it might be able to tell me more useful information if it was provided more context. I think there is likely an opportunity to extend the extension’s functionality to selectively send a series of requests and responses to the model in the same inference request to provide it with more useful context.

Overall, I found the extension useful as a second set of eyes looking over my web traffic, and it successfully put me down the pathway to discovering a valid vulnerability. I liked that it works passively in the background, and I can definitely see myself leveraging this extension with an on-premises in my web application penetration testing methodology. Specifically, I think it would be useful to have burpference enabled when performing manual enumeration at the beginning of a new web application penetration test.

Read part 2 of this series here: Part 2 – Copilot

Want to keep learning about this topic?
Register now for next week’s webcast taking place Thursday, May 22nd, at 1:00pm EDT:

Using AI to Augment Pentesting Methodologies

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Black Hills Information Security, Inc.