Why AI Agents Shouldn't Replace Your Fraud Models

MLOps Community
MLOps CommunityMay 11, 2026

Why It Matters

By integrating agents with a robust, isolated feature platform, firms can iterate fraud defenses faster without risking production outages or audit failures, delivering both agility and reliability in high‑stakes environments.

Key Takeaways

  • Chronon unifies feature engineering for real‑time fraud and trust models.
  • Single API cut model rollout time from months to days.
  • Agentic experimentation must remain reviewable, safe, and production‑ready.
  • Branch‑based resource isolation prevents agents from impacting live traffic.
  • Cached partial aggregates enable compute reuse while adding new features.

Summary

The talk centered on why AI agents should augment, not replace, fraud detection and other high‑stakes ML systems. Ferran Sanoyan highlighted Chronon, an open‑source data‑foundation platform originally built at Airbnb to streamline feature engineering, and described how it now powers real‑time decision engines at Stripe, Netflix, OpenAI and other enterprises. Chronon’s single‑API approach eliminated the fragmented, engineering‑heavy pipelines that slowed fraud model updates. By automating both offline training data generation and online serving pipelines, the time to ship a new feature or model shrank from months to days, enabling rapid response to evolving attack patterns across payments, trust‑and‑safety, personalization and customer‑support use cases. A key example is that 100% of Stripe’s charge‑path models rely on Chronon, and the platform’s branch‑based resource isolation lets agents experiment on separate compute and storage without taxing production systems. Cached partial aggregates further allow agents to add new windows—like a 7‑day signal—while reusing existing computations, ensuring efficiency and consistency. The broader implication is clear: AI agents can accelerate feature creation and model iteration, but only when the underlying infrastructure guarantees auditability, safety, and production readiness. Enterprises that adopt such automated, isolated pipelines can stay ahead of fraudsters while preserving regulatory compliance and operational stability.

Original Description

Varant Zanoyan, Co-founder & CEO of Zipline AI and original author of Chronon — the open-source feature platform built at Airbnb that now powers Stripe's charge path, OpenAI's Sora 2 personalization, and Netflix content ranking — explains why AI agents should NOT make high-stakes decisions directly, and what to do instead.
This talk introduces "agentic experimentation": a pattern where agents iterate on production ML systems (creating features, training new model versions, deploying to dev) while a human reviews and ships — without ever touching live infrastructure. Varant breaks down the three challenges that kill most agent-on-prod-ML projects: infrastructure sprawl, safety, and reproducibility, and shows how branch-based isolation + semantic hashing + compute reuse make it actually work.
Topics covered:
- Why fraud detection, search ranking, and underwriting CAN'T tolerate full agentic decisioning
- The difference between agents replacing models vs. agents improving models
- How Chronon went from Airbnb payments fraud to powering Stripe, OpenAI Sora 2, Netflix, Uber, and Roku
- Branch-based resource isolation: keeping agent experiments off production compute
- Partial aggregate caching and compute reuse so agents don't blow up your infra bill
- Semantic hashing for reproducible agent-generated pipelines
- Data isolation without losing cross-team feature sharing
- Resource limits as the real organizational guardrail when running 2,000+ experiments
- Why agent-written SQL across Spark, Flink, Kafka, and Airflow is unreviewable
- The handoff: what an agent should produce so a human can actually ship it to prod
For ML engineers, data platform teams, and anyone building agentic systems on top of business-critical pipelines.
Links and Resources:
- Zipline AI: https://zipline.ai/
- Chronon (open source): https://github.com/airbnb/chronon
- Chronon docs: https://chronon.ai/
- Varant Zanoyan on LinkedIn: https://www.linkedin.com/in/vzanoyan/
- MLOps Community: https://mlops.community/
Timestamps (approximate — adjust on upload):
00:00 Intro: building agents for high-stakes systems
01:20 Chronon origin story at Airbnb payments fraud
03:30 From fraud to search ranking, trust and safety, customer support
05:15 Stripe partnership and going fully open source
06:00 OpenAI Sora 2, Netflix, and the high-stakes use case pattern
07:30 Why full agentic decisioning breaks high-stakes systems
08:45 Agentic experimentation: agents that improve, not replace, models
10:30 What "production ready" actually means for agent output
12:00 Challenge 1: infrastructure sprawl across Spark, Flink, Kubernetes, Airflow
14:00 Chronon's semantic API and infrastructure automation
15:30 Challenge 2: safety and branch-based resource isolation
17:30 Compute reuse via partial aggregate caching
19:30 Shared feature repository and the economics of agent collaboration
21:00 Challenge 3: reproducibility and semantic hashing
22:30 Summary: data foundation for high-stakes agentic workflows
23:30 Q&A: data isolation, the agent layer above Chronon, and scaling to 2,000 experiments
#AgenticAI #FeatureStore #AIAgents

Comments

Want to join the conversation?

Loading comments...