Enterprise AI Cybersecurity

Black Hat USA 2025 | From Prompts to Pwns: Exploiting and Securing AI Agents

•February 20, 2026

0

Black Hat

Black Hat•Feb 20, 2026

Why It Matters

AI agents are rapidly becoming core components of enterprise workflows, and without rigorous security controls they can become conduits for automated, high‑impact attacks that compromise data and infrastructure.

Key Takeaways

•Define AI agent autonomy levels to assess risk.
•Prompt injection exploits untrusted data feeding LLM agents.
•RAG and email pipelines can be hijacked via hidden payloads.
•Open‑source tools like Pandanda AI allow arbitrary code execution.
•Securing agents requires sandboxing, guardrails, and strict input validation.

Summary

The Black Hat USA 2025 session titled “From Prompts to Pwns” examined how modern AI agents—especially those powered by large language models—can be both powerful assistants and vulnerable attack surfaces. Speakers Becca and Rich from NVIDIA’s AI Red Team introduced a three‑tier autonomy framework, ranging from deterministic inference endpoints to fully autonomous agents that control their own toolchains, to help audiences gauge risk exposure.

They identified a “universal anti‑pattern” underlying AI attacks: untrusted data reaches the agent, the LLM processes it, and the resulting instructions are passed to downstream tools with elevated privileges. Prompt injection, whether embedded in user prompts, retrieved documents, or even hidden white‑on‑white email text, can subvert system prompts and force agents to execute malicious actions. Demonstrations included hijacking Microsoft Copilot via crafted emails, exploiting the open‑source Pandanda AI to run arbitrary Python code, and manipulating a computer‑use agent that loops between client‑side tool execution and server‑side decision making.

Concrete examples underscored the severity: a Copilot injection redirected payroll queries to a phishing site, prompting credential exfiltration; Pandanda’s CVE allowed a base‑64 payload to spawn a reverse shell despite guardrails; and a computer‑use agent could be coerced into taking screenshots, navigating browsers, and writing files without user oversight. The presenters emphasized that once an attacker’s payload reaches the LLM, they can potentially control any downstream capability.

The talk concluded with practical mitigations: enforce strict input sanitization, isolate agents in sandboxed containers, implement robust guardrails that cannot be bypassed by prompt injection, and limit tool access based on the principle of least privilege. As enterprises integrate AI agents into critical workflows, overlooking these safeguards could expose sensitive data, internal systems, and corporate reputation to novel, automated exploits.

Original Description

The flexibility and power of large language models (LLMs) are now well understood, driving their integration into a wide array of real-world applications. Early use cases, such as retrieval-augmented generation (RAG), followed rigid, predictable workflows where models interacted with external systems in tightly controlled sequences. While these systems were easier to optimize and secure, they often resulted in inflexible, single-purpose tools.

In contrast, modern agentic systems leverage expanded input modalities, such as speech and vision, and use more sophisticated inference strategies, such as dynamic chain-of-thought reasoning. These advancements allow them to act independently on users' behalf to automate increasingly complex workflows, often involving sensitive data and systems. As their utility increases, so too does their attack surface: more usability means broader access to data, greater ability to execute actions, and significantly more opportunity for exploitation.

In this talk, we will explore the emerging security challenges posed by agentic AI systems. We demonstrate the implications of this significant shift through internal assessments and proof-of-concept exploits developed by our AI Red Team, targeting a range of agentic applications, from popular open-source tools to enterprise systems. These exploits all leverage the same core finding: that LLMs are uniquely vulnerable to malicious input, and exposure to such input can have a significant impact on the trust of downstream actions. In short, we lay out what can go wrong when agentic systems vulnerable to adversarial inputs are deployed within enterprise environments. We conclude by discussing how NVIDIA addresses the security of emerging agentic workflows, and our principles for designing agent interactions in ways that mitigate risk, emphasizing a security-first foundation for safe and scalable adoption.

By:

Rebecca Lynch | Offensive Security Researcher, NVIDIA

Rich Harang | Principal Security Architect, NVIDIA

Presentation Materials Available at:

https://blackhat.com/us-25/briefings/schedule/?#from-prompts-to-pwns-exploiting-and-securing-ai-agents-46681

0

Comments

Want to join the conversation?

Loading comments...