AI dramatically boosts the efficiency and coverage of Web3 audits, but human insight is still crucial for detecting complex, business‑logic vulnerabilities and for building a zero‑trust security posture that protects against non‑code threats.
Philip, co‑founder of Oak Security, outlines how artificial intelligence is reshaping Web3 security audits. He traces Oak’s evolution from a boutique firm in 2017 to a 52‑researcher operation that has completed over 600 audits, and he explains the rise of “vibe‑coded” smart contracts—AI‑generated code that is fast but often opaque, under‑tested, and riddled with hidden complexities.
The talk highlights stark performance gaps between naïve LLM prompting and specialized AI audit pipelines. Single‑shot prompts to models like ChatGPT achieve only about 40 % precision and recall, generating many false positives and missing the majority of vulnerabilities. In contrast, multi‑agent frameworks and machine‑learning classifiers can exceed 90 % on both metrics, outperforming traditional static analyzers such as Slither and Mythril. Yet these tools excel mainly on known Solidity flaws; they struggle with novel logic, cross‑chain exploits, or Rust‑based code where training data is scarce.
Philip backs his claims with case studies: a Cosmos SDK audit that uncovered a DOS‑inducing compute exhaustion bug—detectable only through deep business‑logic analysis—and the XBOW agent that has filed over 1,000 zero‑day reports on HackerOne, illustrating AI’s strength in surface‑level vulnerability hunting but its limits on complex, context‑dependent issues. He stresses that human auditors remain essential for interpreting economics, game theory, and nuanced protocol interactions, while AI handles repetitive checks, fuzzing setup, and report generation, dramatically reducing audit fatigue.
The broader implication is a shift toward a hybrid security model where AI augments, rather than replaces, human expertise. Oak advocates for zero‑trust, Swiss‑cheese architectures and encourages clients to run AI tools internally—preferably on open‑source models to respect NDAs—while layering additional defenses such as rate limiting and circuit breakers. This approach promises faster, cheaper audits with higher residual confidence, but also underscores the need for comprehensive security culture beyond code reviews.
Comments
Want to join the conversation?
Loading comments...