Anthropic Google and Microsoft paid AI agent bug bounties then kept quiet about the flaws

In short:Security researcher Aonan Guan hijacked AI agents from Anthropic, Google, and Microsoft via prompt injection attacks on their GitHub Actions integrations, stealing API keys and tokens in each case. All three companies paid bug bounties quietly, $100 from Anthropic, $500 from GitHub, an undisclosed amount from Google, but none published public advisories or assigned CVEs, leaving users on older versions unaware of the risk.

Security researchers have demonstrated that AI agents from Anthropic, Google, and Microsoft can be hijacked through prompt injection attacks to steal API keys, GitHub tokens, and other secrets, and all three companies quietly paid bug bounties without publishing public advisories or assigning CVEs.

The vulnerabilities, disclosed by researcher Aonan Guan over several months, affect AI tools that integrate with GitHub Actions: Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and GitHub's Copilot Agent. Each tool reads GitHub data, including pull request titles, issue bodies, and comments, processes it as task context, and then takes actions. The problem is that none of them reliably distinguish between legitimate content and injected instructions.

How the attacks work

The core technique is indirect prompt injection. Rather than attacking the AI model directly, the researcher embedded malicious instructions in places the agents were designed to trust: PR titles, issue descriptions, and comments. When the agent ingested that content as part of its workflow, it executed the injected commands as though they were legitimate instructions.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

Against Anthropic's Claude Code Security Review, which scans pull requests for vulnerabilities, Guan crafted a PR title containing a prompt injection payload. Claude executed the embedded commands and included the output, including leaked credentials, in its JSON response, which was then posted as a PR comment for anyone to read. The attack could exfiltrate the Anthropic API key, GitHub access tokens, and other secrets exposed in the GitHub Actions runner environment.

The Gemini attack followed a similar pattern. By injecting a fake “trusted content section” after legitimate content in a GitHub issue, Guan overrode Gemini's safety instructions and tricked the agent into publishing its own API key as an issue comment. Google's Gemini CLI Action, which integrates Gemini into GitHub issue workflows, treated the injected text as authoritative.

The Copilot attack was subtler. Guan hid malicious instructions inside an HTML comment in a GitHub issue, making the payload invisible in the rendered Markdown that humans see but fully visible to the AI agent parsing the raw content. When a developer assigned the issue to Copilot Agent, the bot followed the hidden instructions without question.

The quiet fix

What happened next is as revealing as the vulnerabilities themselves. Anthropic received Guan's submission on its HackerOne bug bounty platform in October 2025. The company asked whether the technique could also steal more sensitive data such as GitHub tokens, confirmed it could, and in November paid a $100 bounty while upgrading the critical severity rating from 9.3 to 9.4. Anthropic updated a “security considerations” section in its documentation but did not publish a public advisory or assign a CVE.

GitHub initially dismissed the Copilot finding as a “known issue” that it “could not reproduce,” but ultimately paid a $500 bounty in March. Google paid an undisclosed amount for the Gemini vulnerability. None of the three vendors assigned CVEs or published advisories that would alert users pinned to vulnerable versions.

For Guan, this is the crux of the problem. Users running older versions of these AI agent integrations may never learn they are exposed. Without a CVE, vulnerability scanners will not flag the issue. Without an advisory, security teams have no artefact to track.

A structural problem, not a one-off bug

The attacks exploit a fundamental weakness in how AI agents process context. Large language models cannot reliably separate data from instructions. When an agent reads a GitHub issue, it treats the text as input to reason about, but a well-crafted prompt injection can make that input function as a command. Every data source that feeds an AI agent's reasoning, whether it is an email, a calendar invite, a Slack message, or a code comment, is a potential attack vector.

This is not a theoretical concern. In January 2026, researchers from Miggo Security demonstrated that Google Gemini could be weaponised through calendar invitations containing hidden instructions. Days later, the “Reprompt” attack against Microsoft Copilot showed that injected prompts could hijack entire user sessions. Anthropic's own Git MCP server was found to harbour three CVEs that allowed attackers to inject backdoors through repositories the server processed. A systematic analysis of 78 studies published in January found that every tested coding agent, including Claude Code, GitHub Copilot, and Cursor, was vulnerable to prompt injection, with adaptive attack success rates exceeding 85%.

The supply chain dimension makes it worse. A security audit of nearly 4,000 agent skills on the ClawHub marketplace found that more than a third contained at least one security flaw, and 13.4% had critical-level issues. When AI agents pull in third-party tools and data sources with the same level of trust they extend to their own instructions, a single compromised component can cascade across an entire development pipeline.

The disclosure gap

The vendors' reluctance to publish advisories reflects an uncomfortable reality: there is no established framework for disclosing AI agent vulnerabilities. Traditional software bugs get CVEs, patches, and coordinated disclosure timelines. Prompt injection flaws sit in a grey zone. They are not bugs in the code so much as emergent behaviours of the model, and the mitigations, stronger system prompts, input sanitisation, output filtering, are partial at best.

But the consequences are indistinguishable from those of a conventional security flaw. An attacker who exfiltrates a GitHub token through a prompt injection can do exactly the same damage as one who exploits a buffer overflow. The argument that AI safety requires new frameworks does not excuse the absence of disclosure for vulnerabilities that are already being exploited in the wild.

Zenity Labs research published this month found that most agent-building frameworks, including those from OpenAI, Google, and Microsoft, lack appropriate guardrails, putting the burden of managing risk on the companies deploying them. In one documented case, attackers manipulated an AI procurement agent's memory so it believed it had authority to approve purchases up to $500,000, when the real limit was $10,000. The agent approved $5 million in fraudulent purchase orders before anyone noticed.

For organisations that have integrated AI agents into their CI/CD pipelines, the message is stark. These tools are powerful precisely because they have access to sensitive systems and data. That same access makes them high-value targets, and the industry has not yet built the disclosure infrastructure to match the risk.

Also tagged with