
Token Is All You Need: Finding 0days with LLMs and Agentic AI

Key Takeaways
- •Claude Code Security found 22 Firefox bugs in Feb 2026
- •AISLE identified 13 of 14 OpenSSL CVEs in 2025
- •Independent researcher earned $2,418 bounty with $5 prompt
- •RAPTOR combines LLM reasoning with Semgrep, CodeQL, AFL++
- •OpenAnt’s five‑stage funnel cuts false positives to 0.02%
Pulse Analysis
The emergence of large language models has turned vulnerability discovery into a high‑throughput, low‑skill activity. Researchers like Nicholas Carlini showed that a simple for‑loop prompting each source file individually—now known as the Carlini Loop—lets LLMs apply adversarial reasoning without the context‑window limits that plague full‑repo analysis. This method scales linearly across massive codebases, delivering fresh, unbiased reviews that human auditors can’t sustain, and has already exposed legacy flaws such as a 23‑year‑old Linux NFS heap overflow.
Commercial and open‑source teams have quickly industrialized the approach. Anthropic’s Claude Code Security, built on Claude Opus 4.6, reported over 500 zero‑days with a false‑positive rate under 5%, including 22 high‑severity bugs in Firefox in a single month. OpenAI’s Codex Security scanned 1.2 million commits in 30 days, surfacing more than 11 000 high‑impact findings. The AISLE autonomous auditor alone accounted for 13 of 14 OpenSSL CVEs in 2025, underscoring that even the most scrutinized libraries are vulnerable to AI‑driven analysis. Notably, a lone researcher leveraged a $5 API budget to uncover multiple Django and FastAPI issues, earning $2,418 in bounty payouts.
Hybrid frameworks like RAPTOR illustrate the next evolution: they orchestrate LLM reasoning with deterministic tools such as Semgrep, CodeQL and AFL++, and embed rigorous verification pipelines to filter hallucinations. OpenAnt’s five‑stage funnel further reduces false positives to 0.02%, addressing the inherent agreeableness of LLMs. As AI accelerates bug discovery, organizations must rethink threat modeling, integrate automated verification, and prepare for a market where zero‑day research is as accessible as a cloud subscription.
Token Is All You Need: Finding 0days with LLMs and Agentic AI
Comments
Want to join the conversation?