Black Hat USA 2025 | LLM-Driven Reasoning for Automated Vulnerability Discovery Behind Hall-of-Fame

Black Hat
Black HatApr 8, 2026

Why It Matters

Automating binary‑level vulnerability discovery with LLMs dramatically speeds up security testing and lowers reliance on labor‑intensive reverse engineering, reshaping how firms protect software ecosystems.

Key Takeaways

  • LLM‑based tool “Whisper” automates vulnerability discovery in binaries.
  • Achieved Samsung Mobile Security Hall of Fame for 2024 bug.
  • Pipeline combines human selection with AI‑driven code analysis.
  • Reconstructs data structures from stripped ARM64 binaries for accurate checks.
  • Model router balances heavy LLM tasks and lightweight JSON fixing.

Summary

The Black Hat USA 2025 talk introduced “Whisper,” a large‑language‑model‑driven system that automatically discovers vulnerabilities in stripped ARM64 binaries. The presenter, a researcher guiding an undergraduate team, explained how the tool earned a Hall of Fame award at Samsung Mobile Security 2024 by uncovering a critical RTCP buffer‑overflow bug.

Whisper’s architecture fuses human oversight—selecting target processes and validating results—with a cascade of AI agents that decompile binaries, rebuild global call graphs, and reconstruct data structures absent source symbols. By feeding precise pre‑conditions and value ranges into the LLM, the system can answer binary‑level “yes/no” vulnerability queries with high confidence, eliminating the ambiguous “may‑be” responses that plagued earlier chat‑bots.

A concrete example highlighted CVE‑2024‑34587, where the model identified an attacker‑controlled length field leading to a buffer overflow in Samsung’s video engine service. The pipeline generated a JSON report detailing the bug, confidence score, and step‑by‑step reasoning, even repairing malformed JSON outputs via a lightweight model router that balances cost and accuracy.

The broader implication is a shift toward AI‑augmented security testing: routine reverse‑engineering tasks become scalable, human analysts focus on strategic decisions, and organizations can integrate continuous, automated code review into their development lifecycles, potentially reducing time‑to‑patch and exposure to zero‑day exploits.

Original Description

Vulnerability discovery traditionally relies on two primary approaches: manual auditing and fuzzing. Each method possesses distinct strengths and inherent limitations. Manual auditing is good at identifying complex logic flaws due to its reliance on deep contextual understanding and expert insight, ensuring comprehensive analysis; however, this method is labor-intensive, time-consuming, and heavily dependent on specialized knowledge. Conversely, fuzzing offers automation, scalability, and efficiency, yet it may overlook vulnerabilities that require intricate semantic comprehension or encounter limitations in scenarios where fuzzing is infeasible.
Recent advancements in artificial intelligence have created opportunities to bridge the gap between the precision of manual auditing and the scalability of fuzzing, paving the way for more sophisticated vulnerability discovery tools. In this presentation, we will introduce our LLM-powered automated binary vulnerability discovery tool, which integrates LLM reasoning capabilities with established static analysis and dynamic debugging methods. Despite its experimental approach, our tool demonstrates exceptional efficiency and effectiveness in identifying vulnerabilities.
We will illustrate the effectiveness of this approach through our application to Samsung's remote attack surface, successfully uncovering multiple sophisticated memory corruption vulnerabilities. This significant achievement secured us the Rank 1 position in the 2024 Hall of Fame for vulnerability research.
By:
Qinrun Dai | Independent Researcher,
Yifei Xie | Independent Security Researcher/Student
Presentation Materials Available at:

Comments

Want to join the conversation?

Loading comments...