SecTor 2025 | Security and Safety Testing for Agentic AI
Why It Matters
Without stateful, context‑aware testing, agentic AI deployments risk catastrophic, hard‑to‑detect failures that can undermine business operations and erode trust.
Key Takeaways
- •AI adoption surges; over 80% of large firms use it
- •Agentic systems expand threat surface beyond simple input-output
- •Current testing remains stateless, missing stateful attack vectors
- •Context‑agnostic benchmarks fail to reflect real‑world risks accurately
- •Map‑test‑promote framework needed for continuous AI security assessment
Summary
The SecTor 2025 talk highlighted the urgent need for robust security and safety testing of agentic AI systems. Presented by a ServiceNow AI R&D leader, the speaker framed the discussion around the explosive growth of AI adoption—200 million weekly ChatGPT users, half of professional developers using coding assistants, and over 80% of large enterprises integrating AI into core functions—while warning that the complexity of modern agentic architectures is outpacing traditional evaluation methods.
Key insights emphasized that agentic AI no longer operates as a simple input‑output chatbot; instead, it incorporates memory, tool use, and real‑time data streams, creating a vastly larger attack surface. Current testing practices remain largely stateless, focusing on the "front door" of user prompts and ignoring side‑door vectors such as poisoned memory, malicious tool interactions, and environmental manipulation. Moreover, public benchmarks are context‑agnostic, leading teams to overestimate security and underestimate functional degradation when hardening systems.
The speaker illustrated these points with analogies—comparing front‑door testing to a house’s main entrance while side doors remain unsecured—and outlined a five‑area threat‑modeling framework (outcomes, architecture, users/roles, surface vectors, invariance). He advocated for a "map, test, promote" workflow: map risks via detailed threat modeling, test using contextualized benchmarks and automated red‑team exploration, then promote validated findings into regression suites without over‑fitting to specific attack patterns.
Implications are clear: enterprises must shift from static, benchmark‑driven validation to continuous, stateful security assessments that balance safety with functional utility. Adopting the map‑test‑promote methodology will help organizations anticipate tail‑risk scenarios, integrate security into the development lifecycle, and sustain AI deployment at scale.
Comments
Want to join the conversation?
Loading comments...