AI agents are rapidly becoming core components of enterprise workflows, and without rigorous security controls they can become conduits for automated, high‑impact attacks that compromise data and infrastructure.
The Black Hat USA 2025 session titled “From Prompts to Pwns” examined how modern AI agents—especially those powered by large language models—can be both powerful assistants and vulnerable attack surfaces. Speakers Becca and Rich from NVIDIA’s AI Red Team introduced a three‑tier autonomy framework, ranging from deterministic inference endpoints to fully autonomous agents that control their own toolchains, to help audiences gauge risk exposure.
They identified a “universal anti‑pattern” underlying AI attacks: untrusted data reaches the agent, the LLM processes it, and the resulting instructions are passed to downstream tools with elevated privileges. Prompt injection, whether embedded in user prompts, retrieved documents, or even hidden white‑on‑white email text, can subvert system prompts and force agents to execute malicious actions. Demonstrations included hijacking Microsoft Copilot via crafted emails, exploiting the open‑source Pandanda AI to run arbitrary Python code, and manipulating a computer‑use agent that loops between client‑side tool execution and server‑side decision making.
Concrete examples underscored the severity: a Copilot injection redirected payroll queries to a phishing site, prompting credential exfiltration; Pandanda’s CVE allowed a base‑64 payload to spawn a reverse shell despite guardrails; and a computer‑use agent could be coerced into taking screenshots, navigating browsers, and writing files without user oversight. The presenters emphasized that once an attacker’s payload reaches the LLM, they can potentially control any downstream capability.
The talk concluded with practical mitigations: enforce strict input sanitization, isolate agents in sandboxed containers, implement robust guardrails that cannot be bypassed by prompt injection, and limit tool access based on the principle of least privilege. As enterprises integrate AI agents into critical workflows, overlooking these safeguards could expose sensitive data, internal systems, and corporate reputation to novel, automated exploits.
Comments
Want to join the conversation?
Loading comments...