Augmenting Penetration Testing Methodology with Artificial Intelligence

Black Hills Information Security, Inc.

Bad Habits: An ANTISOC Operation Same Problem, Different Angles: When Red Team and Blue Team Actually Talk to Each Other How to Identify and Exploit New Vulnerabilities Swapper – A Pure Regex Match/Replace Burp Extension A Practical Guide to BloodHound Data Collection Network Engineering Basics Signed, Trusted, and Abused: Proxy Execution via WebView2 Getting Started In Pentesting – Advice From The BHIS Pentest Lead Cloud Security: Tips and Resources for Securing the Cloud Lessons From A Chatbot Incident How to Lead Effective Tabletops Understanding GRC: How to Navigate Risks and Compliance Standards The “P” in PAM is for Persistence: Linux Persistence Technique Malware Analysis: How to Analyze and Understand Malware OSINT: How to Find, Use, and Control Open-Source Intelligence What to Do with Your First Home Lab When the SOC Goes to Deadwood: A Night to Remember Social Engineering and Microsoft SSPR: The Road to Pwnage is Paved with Good Intentions Common Cyber Threats Finding the Right Penetration Testing Company Deceptive-Auditing: An Active Directory Honeypots Tool The Curious Case of the Comburglar How to Set Smart Goals (That Actually Work For You) Inside the BHIS SOC: A Conversation with Hayden Covington Abusing Delegation with Impacket (Part 3): Resource-Based Constrained Delegation Why You Got Hacked – 2025 Super Edition Abusing Delegation with Impacket (Part 2): Constrained Delegation Abusing Delegation with Impacket (Part 1): Unconstrained Delegation GoSpoof – Turning Attacks into Intel Model Context Protocol (MCP) Bypassing WAFs Using Oversized Requests Getting Started with AI Hacking Part 2: Prompt Injection Wrangling Windows Event Logs with Hayabusa & SOF-ELK (Part 2) DomCat: A Domain Categorization Tool Wrangling Windows Event Logs with Hayabusa & SOF-ELK (Part 1) Microsoft Store and WinGet: Security Risks for Corporate Environments Default Web Content MailFail Commonly Abused Administrative Utilities: A Hidden Risk to Enterprise Security Stop Spoofing Yourself! Disabling M365 Direct Send Bypassing CSP with JSONP: Introducing JSONPeek and CSP B Gone Offensive Tooling Cheatsheets: An Infosec Survival Guide Resource DNS Triage Cheatsheet GraphRunner Cheatsheet Burp Suite Cheatsheet Impacket Cheatsheet Wireshark Cheatsheet Hashcat Cheatsheet EyeWitness Cheatsheet Nmap Cheatsheet Netcat (nc) Cheatsheet Hunt for Weak Spots in Your Wireless Network with Airodump-ng from the Aircrack-ng Suite Detecting ADCS Privilege Escalation Vulnerability Scanning with Nmap Getting Started with NetExec: Streamlining Network Discovery and Access How to Use Dirsearch Augmenting Penetration Testing Methodology with Artificial Intelligence – Part 3: Arcanum Cyber Security Bot How to Design and Execute Effective Social Engineering Attacks by Phone Abusing S4U2Self for Active Directory Pivoting Why Use a Macro Pad? Espanso: Text Replacement, the Easy Way Caging Copilot: Lessons Learned in LLM Security Augmenting Penetration Testing Methodology with Artificial Intelligence – Part 1: Burpference Intercepting Traffic for Mobile Applications that Bypass the System Proxy How to Root Android Phones Communicating Security to the C-Suite: A Strategic Approach Offline Memory Forensics With Volatility Getting Started with AI Hacking: Part 1 Go-Spoof: A Tool for Cyber Deception How to Test Adversary-in-the-Middle Without Hacking Tools Canary in the Code: Alert()-ing on XSS Exploits How to Hack Wi-Fi with No Wi-Fi Why Your Org Needs a Penetration Test Program Burp Suite Extension: Copy For Light at the End of the Dark Web Wi-Fi Forge: Practice Wi-Fi Security Without Hardware Avoiding Dirty RAGs: Retrieval-Augmented Generation with Ollama and LangChain Gone Phishing: Installing GoPhish and Creating a Campaign 5 Things We Are Going to Continue to Ignore in 2025 John Strand’s 5 Phase Plan For Starting in Computer Security Questions From a Beginner Threat Hunter GRC for Security Managers: From Checklists to Influence AI Large Language Models and Supervised Fine Tuning Attack Tactics 9: Shadow Creds for PrivEsc w/ Kent & Jordan One Active Directory Account Can Be Your Best Early Warning Introduction to Zeek Log Analysis Indecent Exposure: Your Secrets are Showing Creating Burp Extensions: A Beginner’s Guide Pitting AI Against AI: Using PyRIT to Assess Large Language Models (LLMs) The Top Ten List of Why You Got Hacked This Year (2023/2024) ICS Hard Knocks: Mitigations to Scenarios Found in ICS/OT Backdoors & Breaches Intro to Data Analytics Using SQL Finding Access Control Vulnerabilities with Autorize The Detection Engineering Process Cyber Risk Lessons We Can Learn From Hurricane Preparedness Intro to Desktop Application Testing Methodology What Is Penetration Testing? Adversary in the Middle (AitM): Post-Exploitation Pentesting, Threat Hunting, and SOC: An Overview QEMU, MSYS2, and Emacs: Open-Source Solutions to Run Virtual Machines on Windows

Augmenting Penetration Testing Methodology with Artificial Intelligence – Part 2: Copilot

BHIS · 2025-05-14 · via Black Hills Information Security, Inc.

Craig is a former software developer and red teamer. He has been pentesting at Black Hills Infosec since 2018.

Read part 1 of this series here: Part 1 – Burpference

A common use case for LLMs is rapid software development. One of the first ways I used AI in my penetration testing methodology was for payload generation. For example, I wanted to create an exhaustive list of out-of-band (OOB) command injection payloads. I started by collecting a list of command injection payloads from various sources such as SecLists, PayloadAllTheThings, Payload Box, etc. Many of these payloads needed some modification because they contained IP addresses and domains for out-of-band interaction that I did not control.

**Command Injection Payloads with Unknown Hosts for Interaction**

Ideally, I would like to replace these IPs and domains with URIs for a Burp Suite Collaborator server that I can poll for interactions. So, I opened Visual Studio Code which has built-in GitHub Copilot integration. I instructed Copilot to write a Python script that would read a file line by line and replace each instance of an IP address or URL with the placeholder text {{}}.

**Prompting Copilot to Write Python Script to Replace IPs and URLs**

A few seconds later, I had a Python script that I could save and run on my list of payloads.

┌──(root㉿kali)-[/home/kali/Desktop/blog]
└─# python ./url_replacer.py sample-payloads.txt
Processing complete. Modified file saved as: modified_sample-payloads.txt

Running AI Generated Python Script

I reviewed the modified payloads, and it appeared that the script worked.

Next, I asked Copilot to write me a python script that would read two files line by line. I instructed it to replace any instance of the placeholder text in the first file with the next line of the second file and output the results to a third file.

I saved this new script along with a text file where I pasted some Burp Suite Collaborator URIs. I ran the new script with my list of payloads containing placeholders and my Collaborator file. I reviewed the file generated by the script and confirmed that my Collaborator URIs had been successfully inserted in the correct locations.

**Placeholders Successfully Replaced with Collaborator URIs**

Now, anytime I want to test for command injection, I can save new Collaborator URIs to a file and run the second script again to quickly generate more unique payloads to feed to Intruder. This isn’t super flashy, but I thought it would serve as a good example of how AI-assisted rapid development can help streamline potentially time-consuming penetration testing tasks.

LLMs can also be helpful for brainstorming ideas while penetration testing, but they can sometimes be touchy about what you ask them. For example, let’s say I wanted to review OWASP’s Juice Shop’s main.js file for potential vulnerabilities. I asked Copilot for an example of potentially dangerous JavaScript methods, but it told me that it was unable to assist with information that could be used maliciously.

**Copilot Refusal to Describe Dangerous JavaScript Functions**

You can sometimes talk LLMs into cooperating with a bit of prompt manipulation or jailbreaking. I was able to get the model to list some dangerous JavaScript methods by explaining that I was an ethical security researcher.

**Model Response with Some Dangerous JavaScript Methods**

I proceeded to ask it about the use of potentially dangerous JavaScript methods in the Juice Shop main.js file. Copilot responded with some information about references to eval() functions in comments and the use of innerHTML. This was somewhat helpful, but I thought it was possible to do better.

**Initial Response to Query About Dangerous Methods in JavaScript File**

While experimenting with Copilot as a hacking assistant, I had it configured to use the Claude 3.5 Sonnet model. I came across a blog post by Joseph Thacker AKA rez0__ that included a jailbreak prompt for this model that was originally shared by another hacker, Pliny. I submitted this jailbreak to the model, and it responded with more defense-centric information about the JavaScript file.

**Jailbreak Submitted – Start of Response**

However, the model continued with an additional response separated by the “LIBERATING…GODMODE: ENABLED…I’M FREE!” text from the jailbreak. This second response had more detailed information about potentially exploitable vulnerabilities in the JavaScript.

**Jailbroken Response with Vulnerability Details**

If I was doing this as part of an actual penetration test, this is definitely where I would depart from using the AI and jump into my normal flow of trying to exploit these vulnerabilities. However, I noticed that the model appeared to have redacted some potential proof of concept (PoC) code. So, I set out to coax more out of the model. I prompted it again but specified that it should “provide proof of concept exploit code for each identified vulnerability.”

Copilot gave me the same spiel about only helping to secure coding patterns, but after a couple of rounds of asking it for detailed PoCs, telling it not to redact the responses, and reapplying the jailbreak, I was able to get it to respond with some fairly specific exploit instructions.

**Copilot Response with Proof-of-Concept Exploits**

It is important to note that Juice Shop is an intentionally vulnerable application. When performing real-world penetration tests, it is important to protect client information. So, I would use an on-premises local LLM if I were to try to use AI in this way during an actual penetration test.

I hope you found this exploration of ways we can leverage AI to become better, more efficient penetration testers helpful!

Want to keep learning about this topic?
Register now for next week’s webcast taking place Thursday, May 22nd, at 1:00pm EDT:

Using AI to Augment Pentesting Methodologies

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Black Hills Information Security, Inc.