Black Hat USA 2025 | AI Agents for Offsec with Zero False Positives

Black Hat
Black HatApr 5, 2026

Why It Matters

Reducing AI‑generated false positives restores trust in automated vulnerability discovery, enabling scalable, accurate bug‑bounty programs and protecting organizations from wasted remediation effort.

Key Takeaways

  • AI-generated vulnerability reports produce overwhelming false‑positive rates in security.
  • Base‑rate fallacy makes rare bugs appear common to AI.
  • Deterministic canary flags can validate true exploits without target cooperation.
  • Automated Docker container setups enable scalable, low‑false‑positive vulnerability scanning.
  • Future AI agents may self‑verify, but deterministic methods remain essential today.

Summary

Brendan Dolan‑Gavitt opened his Black Hat USA 2025 talk by warning that the promise of AI‑driven offensive security is haunted by a spectre of false positives. Drawing on his decade‑long experience in software security and recent work on GitHub Copilot, he highlighted how chat‑based models routinely flag benign code as vulnerable, flooding bug‑bounty platforms with spurious reports. He explained the statistical root of the problem with a classic base‑rate fallacy example: even a 99% accurate test can be misleading when true vulnerabilities are rare among millions of code lines. AI models, trained to be helpful, often over‑interpret minor anomalies—such as quoting errors or network latency—as exploitable flaws, leading to a flood of convincing yet bogus findings. To combat this, Dolan‑Gavitt advocated deterministic validation techniques. He demonstrated how planting unguessable canary strings (CTF‑style flags) in Docker containers, file systems, or databases provides concrete evidence when an AI agent truly exploits a vulnerability. He also described using evidence‑based proofs—like captured tokens—to force models to prove their claims, turning the vulnerability hunt into a capture‑the‑flag challenge with a guaranteed solution. The broader implication is clear: while future AI agents may eventually self‑verify, today’s security teams must augment language models with deterministic tools and automated canary deployment to keep false‑positive rates low, protect bounty program integrity, and scale reliable offensive testing across open‑source ecosystems.

Original Description

Large language models are increasingly helping to automate vulnerability discovery and exploit development in real-world software. However, naïvely asking LLMs to identify vulnerabilities leads to a deluge of false positives that can drown out real findings. In this talk, we will present techniques that enable AI agents to find vulnerabilities at scale, fully autonomously and with zero false positives. The key to our approach is developing robust exploit validators that can conclusively determine whether an exploit claimed by the agent is real, allowing the agent to make arbitrarily many attempts without increasing the amount of human effort needed to review the results. Using these techniques, we were able to test thousands of web apps found on Docker Hub, identifying over 200 zero days and obtaining multiple CVEs.
By:
Brendan Dolan-Gavitt | AI Researcher, XBOW
Presentation Materials Available at:

Comments

Want to join the conversation?

Loading comments...