AI Videos

All News Deals Social Blogs Videos Podcasts Digests

AI Cybersecurity

SecTor 2025 | Investigate & Respond to Attacks on GenAI Chatbots

•April 18, 2026

Black Hat

Black Hat•Apr 18, 2026

Why It Matters

Chatbot vulnerabilities can transform harmless user interactions into data leaks, legal exposure, or full system compromise, making proactive incident‑response frameworks vital for protecting brand reputation and operational security.

Key Takeaways

•Classify chatbot risk levels to prioritize incident response.
•Log prompts, IDs, timestamps, model version for forensic analysis.
•Use rule‑based filters and LLM judges as layered guardrails.
•Monitor guardrail scores and entropy to detect bypass attempts.
•Implement robust system prompts to enforce intended chatbot behavior.

Summary

The SecTor 2025 session, led by Airbnb senior engineer Alan Sto, examined how organizations can investigate and respond to attacks on generative‑AI chatbots. Sto framed the discussion around a practical incident‑response playbook, emphasizing the need to understand chatbot architecture, threat vectors, and the escalating risk as bots move from informational to action‑driven roles. Key insights included a three‑tier risk classification (low, medium, high), the critical importance of comprehensive logging—capturing user prompts, thread IDs, timestamps, model versions, and guardrail scores—and the deployment of layered defenses. Simple rule‑based filters were shown to be easily bypassed, prompting the use of LLM‑based judges that score inputs and outputs against policy criteria. Robust system prompts were recommended to give the model higher‑priority instructions, while monitoring score distributions and entropy helps surface covert bypass attempts. Sto illustrated these concepts with vivid examples: a New York City chatbot that inappropriately advised on “human meat” sales, a car‑dealer bot that accepted absurd offers, and a weather bot that began delivering Taylor Swift‑themed forecasts after malicious reinforcement‑learning feedback. He also highlighted a remote‑code‑execution exploit in the Vanna SQL‑to‑Python pipeline, where dynamically generated Plotly code was weaponized. The takeaway for enterprises is clear: chatbot incidents can quickly evolve from brand embarrassment to legal liability or system compromise. Building a dedicated playbook, instrumenting detailed logs, and applying both static and adaptive guardrails are essential steps to detect, contain, and remediate attacks before they impact customers or expose sensitive data.

Original Description

It's coming, and you aren't ready—your first generative AI chatbot incident. GenAI chatbots, leveraging LLMs, are revolutionizing customer engagement by providing real-time, automated 24/7 chat support. But when your company's virtual agent starts responding inappropriately to requests and handing out customer PII to anyone who asks nicely, who are they going to call? You.

You've seen the cool prompt injection attack demos and may even be vaguely aware of preventions like LLM guardrails; but are you ready to investigate and respond when those preventions inevitably fail? Would you even know where to start? It's time to connect traditional investigation and response procedures with the exciting new world of GenAI chatbots.

In this talk, you'll learn how to investigate and respond to the unique threats targeting these systems. You'll discover new methods for isolating attacks, gathering information, and getting to the root cause of an incident using AI defense tooling and LLM guardrails. You'll come away from this talk with a playbook for investigating and responding to this new class of GenAI incidents and the preparation steps you'll need to take before your company's chatbot responses start going viral—for the wrong reasons.

By: Allyn Stott | Senior Staff Engineer, Airbnb

Presentation Materials Available at:

https://blackhat.com/sector/2025/briefings/schedule/?#tinker-tailor-llm-spy-investigate--respond-to-attacks-on-genai-chatbots-47094

Comments

Want to join the conversation?

Loading comments...