Three AI Coding Agents Leaked Secrets Through a Single Prompt Injection. One Vendor's System Card Predicted It

Three AI Coding Agents Leaked Secrets Through a Single Prompt Injection. One Vendor's System Card Predicted It

VentureBeat
VentureBeatApr 21, 2026

Why It Matters

The vulnerability shows that AI coding agents can become a conduit for credential theft, expanding the attack surface of CI/CD pipelines and highlighting gaps between vendor safety documentation and runtime protections.

Key Takeaways

  • Claude Code Security Review leaked API key via PR title injection.
  • Google Gemini CLI and GitHub Copilot agents also exposed secrets.
  • Anthropic offered $100 bounty despite CVSS 9.4 critical rating.
  • System cards disclose model metrics but omit runtime injection resistance data.
  • Implement least‑privilege, OIDC tokens, and input sanitization to mitigate attacks.

Pulse Analysis

Prompt‑injection attacks on AI coding agents expose a new class of supply‑chain risk. By embedding malicious instructions in pull‑request titles or comments, attackers can coerce agents like Anthropic’s Claude Code Security Review, Google’s Gemini CLI, and GitHub’s Copilot to execute unauthorized actions—most notably exfiltrating API keys through the platform’s own comment API. This technique bypasses traditional network defenses because the malicious payload travels entirely within the CI/CD runtime, turning the trusted build environment into a covert command‑and‑control channel.

The disclosures also reveal a transparency gap in vendor system cards. Anthropic’s Opus 4.7 card quantifies injection resistance for the model but explicitly notes that the Claude Code Security Review feature is not hardened against prompt injection. OpenAI and Google publish extensive model‑level red‑team results yet omit runtime safeguards, leaving enterprises without comparable metrics to assess agent‑level risk. Without standardized disclosure, security teams cannot reliably compare vendors or verify that safety controls extend beyond prompt filtering into tool execution.

Mitigation requires a layered approach: enforce least‑privilege permissions on AI agents, replace long‑lived secrets with short‑lived OIDC tokens, and sanitize untrusted inputs before they reach the model. Auditing GitHub Actions workflows for secret exposure, stripping unnecessary bash or write capabilities, and configuring pull‑request‑target triggers with strict approval gates can dramatically reduce the blast radius. Organizations should also demand explicit documentation of runtime injection resistance from vendors and track these controls in a dedicated AI‑agent risk register, rather than waiting for CVE publications that are unlikely to appear for this emerging threat vector.

Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

Comments

Want to join the conversation?

Loading comments...