Leaking Secrets in the Age of AI

Wiz Blog | RSS feed

Meet Wiz for M365: Bringing SaaS into the Security Graph How to Harden GitHub Actions: An Updated Guide Bringing Security Visibility to Vercel with Wiz Axios NPM Distribution Compromised in Supply Chain Attack Tracking TeamPCP: Investigating Post-Compromise Attacks Seen in the Wild The Wiz Blue Agent, now Generally Available Beyond the Badge: What Achieving Microsoft’s Certified Software Designation Means for Your Cloud Security Introducing the Green Agent: AI-Powered Remediation for the Cloud Three’s a Crowd: TeamPCP trojanizes LiteLLM in Continuation of Campaign KICS GitHub Action Compromised: TeamPCP Strikes Again in Supply Chain Attack Introducing the Wiz Red Agent- AI-Powered Attacker Introducing Wiz AI Application Protection Platform (AI-APP) Introducing Wiz Agents & Workflows: Security at the Speed of AI AI Runtime Threat Detection: From Input to Real-World Impact Trivy Compromised: Everything You Need to Know about the Latest Supply Chain Attack It’s Official: Wiz Joins Google Understanding and Reducing AI Risk in Modern Applications Introducing Wiz Tenant Manager: Multi-Tenant Management for Federated Organizations The Agile FedRAMP Playbook, Part 4: Reactive Risk Management through Enriched Incident Response Wiz Achieves CPSTIC Certification in Spain Seeing AI Clearly: Building Visibility Across Modern AI Applications The Agile FedRAMP Playbook, Part 3: Preventative Risk Management by building Secure by Design Wiz Leads the 2026 Latio Application Security Report with awards in 4 categories Building an Agentic Cloud Security Ecosystem: A Reference Architecture with Wiz MCP and Infosys Cyber Next The Agile FedRAMP Playbook, Part 2: Proactive Risk Management with Continuous Monitoring Cloud-native Security for your Windows environment: Announcing the Wiz Runtime Sensor for Windows Would You Click ‘Accept’? Automatically detecting malicious Azure OAuth applications using LLMs Wiz Named a Leader in The Forrester Wave™: Cloud Native Application Protection Solutions, Q1 2026 From Detection to Remediation: It’s Time to Rethink AppSec Around Exploitability and Root Cause Fixes The Agile FedRAMP Playbook, Part 1: Why Risk is Your Best Starting Point Introducing AI Cyber Model Arena: A Real-World Benchmark for AI Agents in Cybersecurity Wiz + Spotify Backstage: Security at the Developer’s Desk Building AI Security Together: New Ways to Partner with Wiz for AI Security in 2026 Hacking Moltbook: The AI Social Network Any Human Can Control The Year in Wiz Research: 2025 Most Read Blogs WizExtend is Here: AI and Cloud Security Insights in Your Daily Workflow From Detection to Remediation: Wiz in Your JetBrains IDE Agentic Browser Security: 2025 Year-End Review CodeBreach: Infiltrating the AWS Console Supply Chain and Hijacking AWS GitHub Repositories via CodeBuild A 90-Day Action Plan to Turn Resolutions into Results with Wiz Introducing the Wiz Partner Alliance: A New Chapter for Partner Success Preparing for Post-Quantum Cryptography Wiz Recognized as a 2025 Customers’ Choice in the Gartner® Peer Insights™ Voice of the Customer for CNAPP Expanding the Zero Critical Club to set a new standard for AppSec and SecOps teams Snipping the Long Tail of Shai-Hulud 2.0 Protecting Against Zero-Day Vulnerabilities with SOC-Level ASM Alert MongoBleed (CVE-2025-14847) exploited in the wild: everything you need to know The Kenna Transition: Your Strategic Shift to Exposure Management From MCP to Vibe Coding: Full Endpoint Visibility in Wiz AI Security Bringing Oracle Cloud Identity to Wiz Zero‑Days in the Age of AI: Behind the Scenes of ZeroDay.cloud 2025, with a Record High of CVEs in Critical Cloud Infra Gogs 0-Day Exploited in the Wild Code to Cloud Attacks: From Github PAT to Cloud Control Plane Top AWS re:Invent Announcements for Security Teams in 2025 React2Shell: Technical Deep-Dive & In-the-Wild Exploitation of CVE-2025-55182 React2Shell (CVE-2025-55182): Everything You Need to Know About the Critical React Vulnerability Wiz Product Announcements at re:Invent 2025: Expanding Visibility from Code to Cloud Introducing Wiz SAST: Where Code Risk Meets Cloud Context Wiz Becomes Fastest Security ISV to Reach $1 Billion in AWS Marketplace Lifetime Sales It's Here! Wiz Exposure Management is Now GA Shai-Hulud 2.0 Aftermath: Trends, Victimology and Impact Service Catalog is Here: Expand Risk Visibility for Your Service and Its Dependencies, Simplify Issue Ownership WizOS: Powering Secured Image Adoption with AI 3 OAuth TTPs Seen This Month — and How to Detect Them with Entra ID Logs Mastering Software Governance with Hosted Technologies Inventory Shai-Hulud 2.0 Supply Chain Attack: 25K+ Repos Exposing Secrets Get Certified on Wiz Defend for Threat Detection and Response Blueprint for Security: A Guide to Code, Governance, and Response Frameworks Google Unified Security Recommended Program Names Wiz Among First 3 Strategic Partners Introducing Posture Issues: Transform Security Findings into Actionable Outcomes Empower and Accelerate Your SOC with the Blue Agent Exposure Report: 65% of Leading AI Companies Found with Verified Secret Leaks Wizdom 2025 Product Announcements: Extending the Cloud Operating Model When AI Becomes the Heart of Security: Powering a Future You Can Trust AI-Powered Wiz: From Agents to Everyday Intelligence Defend Agentless Workload Detection: Bringing Visibility to Blind Spots in Threat Detection Securing AI Agents with Wiz AI-SPM Introducing Wiz ASM: Context-Driven Attack Surface Management Securing Critical Infrastructure in the Cloud Era: A Policy and Technology Blueprint How CISOs Should Plan Security Budgets for 2026 Beyond the Checkbox: How Wiz Transforms SOC 2 into a Security Powerhouse Bringing Visibility to Kubernetes: Unified Inventory and Network Insight The Foundation Modern AppSec Is Still Missing: Code to Cloud, Rebuilt the Right Way Dismantling a Critical Supply Chain Risk in VSCode Extension Marketplaces Introducing HoneyBee: How We Automate Honeypot Deployment for Threat Research RediShell: Critical Remote Code Execution Vulnerability (CVE-2025-49844) in Redis, 10 CVSS score Defending against database ransomware attacks AI Security 101: Mapping the AI Attack Surface Introducing zeroday.cloud: First-of-its-kind cloud and AI hacking competition Unifying Cloud Risk and Network Defense: Wiz and Check Point The emerging use of malware invoking AI Wiz achieves FedRAMP High authorization Wiz + HCP Terraform: Close the IaC-to-Cloud Infrastructure Security Gap IMDS Abused: Hunting Rare Behaviors to Uncover Exploits Beyond CVEs: The Exploitation of Everyday Misconfigurations Wiz Research Discovers One in Five Organizations Exposed to Systemic Risks in Vibe-Coded Applications - Here's How to Secure Them Introducing Wiz Incident Response: Your Expert Partner for Cloud Security Incidents Shai-Hulud: Ongoing Package Supply Chain Worm Delivering Data-Stealing Malware DORA Compliance in the Cloud Era: Insights from Deloitte and Wiz How Wiz Customers like Brex and FICO See AI Changing Security

Shay Berkovich, Rami McCarthy · 2025-06-18 · via Wiz Blog | RSS feed

Motivation

In a rush to adopt and experiment with AI, developers and other technology practitioners are willing to cut corners. This is evident from multiple recent security incidents, such as:

Platform resource abuses (attackers hijack cloud infrastructure to power their own LLM applications)
Vendors offering unsafe 3rd-party model execution (Probllama)
Model escape vulnerabilities in hosting services (Replicate, HuggingFace and SAP-AI vulnerabilities)

Yet another side-effect of these hasty practices is the leakage of AI-related secrets in public code repositories. Secrets in public code repositories are nothing new. What’s surprising is the fact that after years of research, numerous security incidents, millions of dollars in bug bounty hunters' pockets, and general awareness of the risk, it is still painfully easy to find valid secrets in public repositories.

TL;DR

In this blog we present the results of a simple, month-long side quest scanning for active secrets in public code repositories. After analyzing the resulting dataset, we were surprised to learn that AI-related secret instances constitute a disproportional majority of the findings (4 out of top 5 secrets found were AI-related). This prompted further investigation distilling three distinct use cases of AI secret leakages:

Python notebook .ipynb files as a secrets goldmine.
Secrets in mcp.json, .env and AI agent config files. Vibe coders are not familiar with secrets management best practices, and neither are their AI coding assistants.
New secret types belonging to emerging AI vendors are pervasive and the secrets scanning industry doesn't seem to be keeping up.

We were able to find valid secrets belonging to over 30 companies and startups, among them multiple Fortune 100 companies. Hopefully this blog will serve as a wake-up call for the AI and data science communities to urgently improve their development practices.

Background and Approach

Secrets in public repositories are an established attack vector. Uber (2016), Scotiabank (2019), Mercedes-Benz (2024), and the most recent xAI secret leak incident are just a few of the notable instances. In fact, stolen / leaked secrets are a major attack vector in many widely known supply-chain attacks (i.e. codecov incident). GitHub, being the most popular code hosting platform (our State of Code Security Report puts the share of repositories hosted on GitHub at 81%), naturally gets the most interest from malicious actors and security researchers alike.

Wiz Code includes a secrets scanner, offering customers protection. Wiz Research supports the product through ongoing investigations of public repositories and emergent patterns in secrets leakage. Unlike the State of Code Security Report, this time we focused on public environments, casting a wide net. Unlike some secrets research, we are specifically interested in validated secrets. This automatically filters out the false positives and testing patterns and as such yields higher-quality signals.

In short, we scanned thousands of repositories and found hundreds of validated secrets, many of those are still active. Our focus in this blog is on the high-level trends causing these exposures. While we won’t share the full methodology of choosing the scanning targets, suffice to say focusing on development activity showed significant improvements over the naive focus on repository popularity used in most research on secrets.

Overall Trends

In terms of secret occurrence, we found wide differences among the secret types:

Three of the top five, and half of the top ten most common validated secrets in some way tied to AI. This result is notable because we did not purposefully target AI-related repositories. Yet, AI secrets constitute such a large proportion of identified exposed secrets. This discovery motivated us to dive deeper into research on AI-related secrets leakage.

Additionally, we wanted to understand the secret locality parameters to answer the following question: What file types include the most secrets? (and thus, deserve special attention by scanners and security policies). Turns out, one file type stands out in particular – notebook ipynb files:

Secrets in notebooks is not a new finding. There were couple good publications on that since 2020 (for example here), yet this topic did not get wide exposure. It’s unfortunate, because supremacy of notebooks as a source of leaked secrets comparing to other file types is remarkable.

Another interesting question would be the correlation between the file types and the secret types. What kind of secret would you expect to see in this file? Such a determination can also be helpful to adjust the secret scanners and scanning policies. The relationships are intriguing:

File	1st most common secret	2nd most common secret	3rd most common secret
ipynb	HuggingFace	AzureOpenAI	WeightsAndBiases
python	HuggingFace
.env	HuggingFace
yaml	WeightsAndBiases
json	AzureOpenAI
md	Postgres
sh	WeightsAndBiases
ts	AlgoliaAdminKey
js	AlgoliaAdminKey

Only for the 6-ranked md files the most common leaking secret is conventional Postgres credential.

Patterns in Secrets Leakage

With AI secrets being responsible for such an overwhelming majority of the findings, naturally we wanted to better understand the use cases that lead to secrets leaking.

Python notebooks

As mentioned earlier, ipynb files are by far the most leak-prone file type. This is because they pack a unique combination of code, code output, and descriptive elements. According to JupyterLab docs, a notebook is a “shareable document that combines computer code, plain language descriptions, data, rich visualizations like 3D models, charts, graphs and figures, and interactive controls“. As a result, there is a natural confusion as to how to treat these files – as a logs, as a code, or as a text. This is exacerbated by the fact that .gitignore files are not expressive enough to pick different use cases – one can either allow checking-in ipynb files or not, regardless of whether the file contains execution output or not.

There are several distinct leak patterns that we can learn from. The most obvious is straight-up secret usage in a source code; this can be an embedded secret in the Python snippet of as a comment:

Another common usage pattern is dumping secrets in the execution output. Obviously, print() function will do the job, however, due to the interactive nature of notebooks, simply typing the variable will print it. In the case below, even though the developer has used the proper way to load the API key from the environment (load_dotenv(); os.environ["AZURE_OPENAI_API_KEY"] = os.environ.get("API_KEY")), the successive action of printing the loaded config renders the code insecure:

In addition, there is a more subtle way to unintentionally disclose secrets – via the usage of debug and diagnostics functions. In the following examples functions show() and list() output, among other things, API keys:

Finally, given the large number of failed outputs in notebooks, it is also common to observe sensitive details pertaining the local development setup, filesystem layout and networking as in these error messages:

Beyond just secrets, code execution results in Python notebooks should be generally treated as sensitive. Their content, if correlated to a developer’s organization, can provide reconnaissance details for malicious actors.

Vibe-coding secrets into mcp.json

AI-assisted code generation is known to favor hardcoding secrets. This forms a toxic combination with the emergence of Model Context Protocol servers. MCP is a rapidly developing technology, with thousands of servers launched in the months after release.

Unfortunately, many MCP servers favor configuration through hardcoded credentials within the mcp.json configuration file. Take, for example, the instructions for an unofficial Perplexity MCP server.

Taskmaster AI is another example of a broadly popular server (over 11k stars) that makes this unsafe recommendation – impacting customer secrets for AI providers like OpenAI and Anthropic:

Unfortunately, this pattern is pervasive, impacting even the official Github MCP server. Once these secrets are hardcoded into mcp.json, it’s easy to see how they end up leaked publicly.

Gaps in current tooling

The ease in which you can still find secrets with a simple GitHub search is a cause for concern. AI is accelerating the process of writing code, and increased leakage of secrets appears to be a byproduct.

GitHub’s secrets scanning was transformational – the auto-remediation for supported platforms has meaningfully cut down on major incidents. But the scanning relies on the relevant platforms integrating with Github’s scanner, leading to limited coverage. Additionally, most integrations do not auto-revoke secrets, to avoid disrupting customer operation. The tradeoff is that we see numerous secrets that should have been detected and alerted by Github’s scanner that don’t end up remediated, or at least not promptly.

In looking at popular secrets scanning tools, we observe that despite supporting hundreds of types of secrets, they’re unable to keep pace with the rate of innovation. Pattern based matching will always lag new secrets and isn’t suitable for all kinds of secrets.

This is why Wiz Research takes a diverse approach to secrets detection, such as our recent session at BSidesSF on Enhancing Secret Detection in Cybersecurity with Small LMs.

Take China’s AI Tigers: immensely popular AI platforms, but overlooked by Western centric platforms and tools. GitHub has a huge user base shared with these platforms. The result? Dramatic volumes of unaddressed credential leakage for Chinese AI platforms, relative to Western platforms where leaks are automatically detected and reported.

However, even more common AI services are often missed by secrets scanners.

We have compiled a list of AI secrets that are missed by one or more popular secret scanning tools, informed by our search of validated secrets in code:

Most common	Less common	AI Tigers
Perplexity, WeightsAndBiases, Groq, NVIDIA API	Tavily, Langchain, NVIDIA-NGC, Cohere, Pinecone, Clarifai, Gemini, AI21 Labs, IBM Watsonx AI, Cerebras, FriendliAI, FireworksAI, TogetherAI	Zhipu AI, Moonshot AI, Baichuan Intelligence, 01.AI, StepFun, MiniMax

As part of the WizCode and WizCloud, our secret scanning module now detects the vast majority of the above secret types. In addition, the work is underway to add the rest under an AI-based classification.

Impact on Companies

About a third of the secrets we found belong to personal projects, while the rest is divided equally between companies, company employees, startups, open source, and research/university projects:

This means that about 40% of discovered secrets can have a real impact on companies. We were able to find valid secrets belonging to over 30 companies and startups, including multiple Fortune 100 companies. To take one example: one of the secrets reported to MSRC was assigned Critical severity and could have led to the disclosure of sensitive HR data.

Another particularly interesting finding: 56% of the detected secrets with company impact were found in personal repositories of company employees, rather than in the company organizations themselves. This highlights the dangers of adjacent discovery, but that's a story for another blog.

Since we did not perform a systematic study, we cannot state the overall percentage of vulnerable companies. What we can say is that we found exposed secrets in around 20% of the organizations we checked. This suggests that these initial findings are just the tip of the iceberg.

Takeaways and Disclosures

To conclude, our research highlights several important messages, all of them are underscored by the crazy pace of AI adoption and evolution: AI providers multiply and with them new secret types and new secret usage cases. On the flip side, the progress in AI-for-code brings an “AI-automated” way to setup the dev environment. This setup is often leak-prone.

The good news is the mitigation in most part remained the same – secret scanning pre-commit hooks, periodic scans, CI/CD pipeline integration and git history scanning. Assuming, of course, that the existing secret scanners will catch up with the gaps. The integration of new secret types and usage patterns into the existing protection and detection flows must be accompanied by a scrupulous inspection of the use cases. For example, how does your org use ipynb files? Is there a policy preventing check-ins of the notebooks with execution output? Does your secret scanner scan non-renderable / large ipynb files?

As an aftermath of this project, our research team has disclosed the most prominent findings to customers, partners and 3rd-party companies. The most difficult part turned out to be reporting what seems like a production-level secret leaks to small startups without a dedicated security team. Overwhelming majority of initial contact attempts to founders or GitHub org members via LinkedIn / X / email were left unanswered, therefore, we do not disclose the findings in detail. Hopefully, we will be able to talk more on that and expand on the hunting methodology in one of the upcoming conferences.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Wiz Blog | RSS feed

Motivation

TL;DR

Background and Approach

Overall Trends

Patterns in Secrets Leakage

Python notebooks

Vibe-coding secrets into mcp.json

Gaps in current tooling

Impact on Companies

Takeaways and Disclosures