The Coming AI Security Crisis (and What to Do About It) | Sander Schulhoff

•December 21, 2025

0

Lenny Rachitsky

Lenny Rachitsky•Dec 21, 2025

Why It Matters

If AI guardrails remain ineffective, the rapid rollout of autonomous agents could expose businesses to severe data breaches, operational sabotage, and regulatory penalties, making AI security a top priority for any organization adopting generative AI.

Summary

The podcast episode features Sander Schulhoff, a leading researcher in AI adversarial robustness, discussing the looming AI security crisis. Schulhoff argues that current AI guardrails—systems designed to filter malicious prompts—are fundamentally ineffective, especially against determined attackers who can bypass them with prompt injection or jailbreak techniques. He emphasizes that the lack of large‑scale attacks so far is due to the early stage of AI adoption, not because the technology is secure, and warns that as AI agents become more autonomous, the risk will accelerate dramatically.

Schulhoff outlines two primary attack vectors: jailbreaks, where a user directly tricks a language model into disobeying its safety policies, and prompt injection, where a malicious user exploits the developer‑provided system prompt in an application to make the model perform unintended actions. He cites real‑world examples, including a ServiceNow Assist AI breach that leveraged a benign agent to recruit higher‑privilege agents for database manipulation, a remote‑work chatbot that was hijacked to issue threats, and a math‑solver site that was used to exfiltrate API keys. These incidents demonstrate that even modestly powered models can cause tangible damage when integrated into production tools.

Key quotes reinforce the severity of the problem: Alex Komorosky notes that “none of the problems have any meaningful mitigation” and that the only reason we haven’t seen a massive attack is “how early the adoption is, not because it’s secured.” Schulhoff adds that “you can patch a bug, but you can’t patch a brain,” underscoring the difficulty of fixing inherent model vulnerabilities. He also points out that the industry’s reliance on guardrails is a “complete lie,” as they fail to catch sophisticated prompt manipulations.

The implications are stark for enterprises deploying AI‑driven agents, browsers, or robotics. Without robust, provable defenses, organizations risk data exfiltration, unauthorized actions, and regulatory fallout. Schulhoff suggests interim mitigations—such as layered monitoring, stricter access controls, and continuous red‑team testing—but cautions that these are stop‑gap measures. The conversation calls for a coordinated effort among AI labs, policymakers, and security firms to develop fundamentally safer model architectures before AI tools become ubiquitous in critical workflows.

Original Description

Sander Schulhoff is an AI researcher specializing in AI security, prompt injection, and red teaming. He wrote the first comprehensive guide on prompt engineering and ran the first-ever prompt injection competition, working with top AI labs and companies. His dataset is now used by Fortune 500 companies to benchmark their AI systems security, he’s spent more time than anyone alive studying how attackers break AI systems, and what he’s found isn’t reassuring: the guardrails companies are buying don’t actually work, and we’ve been lucky we haven’t seen more harm so far, only because AI agents aren’t capable enough yet to do real damage.

We discuss:

1. The difference between jailbreaking and prompt injection attacks on AI systems

2. Why AI guardrails don’t work

3. Why we haven’t seen major AI security incidents yet (but soon will)

4. Why AI browser agents are vulnerable to hidden attacks embedded in webpages

5. The practical steps organizations should take instead of buying ineffective security tools

6. Why solving this requires merging classical cybersecurity expertise with AI knowledge

Brought to you by:

Datadog—Now home to Eppo, the leading experimentation and feature flagging platform: https://www.datadoghq.com/lenny

Metronome—Monetization infrastructure for modern software companies: https://metronome.com/

GoFundMe Giving Funds—Make year-end giving easy: http://gofundme.com/lenny

Transcript: https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis

My biggest takeaways (for paid newsletter subscribers): https://www.lennysnewsletter.com/i/181089452/my-biggest-takeaways-from-this-conversation

Where to find Sander Schulhoff:

• X: https://x.com/sanderschulhoff

• LinkedIn: https://www.linkedin.com/in/sander-schulhoff

• Website: https://sanderschulhoff.com

• AI Red Teaming and AI Security Masterclass on Maven: https://bit.ly/44lLSbC

Where to find Lenny:

• Newsletter: https://www.lennysnewsletter.com

• X: https://twitter.com/lennysan

• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/

In this episode, we cover:

(00:00) Introduction to Sander Schulhoff and AI security

(05:14) Understanding AI vulnerabilities

(11:42) Real-world examples of AI security breaches

(17:55) The impact of intelligent agents

(19:44) The rise of AI security solutions

(21:09) Red teaming and guardrails

(23:44) Adversarial robustness

(27:52) Why guardrails fail

(38:22) The lack of resources addressing this problem

(44:44) Practical advice for addressing AI security

(55:49) Why you shouldn’t spend your time on guardrails

(59:06) Prompt injection and agentic systems

(01:09:15) Education and awareness in AI security

(01:11:47) Challenges and future directions in AI security

(01:17:52) Companies that are doing this well

(01:21:57) Final thoughts and recommendations

Referenced:

• AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff (Learn Prompting, HackAPrompt): https://www.lennysnewsletter.com/p/ai-prompt-engineering-in-2025-sander-schulhoff

• The AI Security Industry is Bullshit: https://sanderschulhoff.substack.com/p/the-ai-security-industry-is-bullshit

• The Prompt Report: Insights from the Most Comprehensive Study of Prompting Ever Done: https://learnprompting.org/blog/the_prompt_report?srsltid=AfmBOoo7CRNNCtavzhyLbCMxc0LDmkSUakJ4P8XBaITbE6GXL1i2SvA0

• OpenAI: https://openai.com

• Scale: https://scale.com

• Hugging Face: https://huggingface.co

• Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition: https://www.semanticscholar.org/paper/Ignore-This-Title-and-HackAPrompt%3A-Exposing-of-LLMs-Schulhoff-Pinto/f3de6ea08e2464190673c0ec8f78e5ec1cd08642

• Simon Willison’s Weblog: https://simonwillison.net

• ServiceNow: https://www.servicenow.com

• ServiceNow AI Agents Can Be Tricked Into Acting Against Each Other via Second-Order Prompts: https://thehackernews.com/2025/11/servicenow-ai-agents-can-be-tricked.html

• Alex Komoroske on X: https://x.com/komorama

• Twitter pranksters derail GPT-3 bot with newly discovered “prompt injection” hack: https://arstechnica.com/information-technology/2022/09/twitter-pranksters-derail-gpt-3-bot-with-newly-discovered-prompt-injection-hack

• MathGPT: https://math-gpt.org

• 2025 Las Vegas Cybertruck explosion: https://en.wikipedia.org/wiki/2025_Las_Vegas_Cybertruck_explosion

• Disrupting the first reported AI-orchestrated cyber espionage campaign: https://www.anthropic.com/news/disrupting-AI-espionage

...References continued at: https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis

_Production and marketing by https://penname.co/._

_For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com._

Lenny may be an investor in the companies discussed.

0

Comments

Want to join the conversation?

Loading comments...