AI Dev 26 X SF | Adit Abraham: Better Agents with Better Data

DeepLearning.AI
DeepLearning.AIMay 20, 2026

Why It Matters

Accurate, deterministic document extraction eliminates silent errors, enabling AI agents to execute high‑value, compliance‑sensitive tasks at scale.

Key Takeaways

  • Better data directly improves AI agent performance and reliability.
  • PDFs remain a major bottleneck due to complex layouts.
  • Combining traditional CV with vision‑language models yields deterministic extraction.
  • Agentic OCR uses speculative decoding for high‑accuracy, traceable outputs.
  • Choosing output format (markdown vs HTML) optimizes token efficiency and reasoning.

Summary

In this talk Adit Abraham of Reductto outlines the company’s mission to turn raw documents into reliable inputs for next‑generation AI agents. He explains that while large language models have matured, their real‑world utility still hinges on the quality of the data they ingest, especially when that data lives in PDFs and other unstructured formats.

Reductto identifies a universal enterprise bottleneck: extracting structured, deterministic information from heterogeneous sources. PDFs pose particular challenges—irregular layouts, scanned images, and ambiguous reading order often cause silent failures that can be costly in regulated domains like healthcare or finance. By pairing traditional computer‑vision pipelines (object detection, layout analysis) with modern vision‑language models, Reductto achieves both high accuracy and repeatable outputs.

A standout innovation is “agentic OCR,” which applies speculative decoding to generate token‑level edits, preserving bounding‑box metadata and enabling a deterministic, human‑in‑the‑loop‑free review process. The company also tailors output formats: simple tables become markdown for token efficiency, while complex merged‑cell tables are emitted as HTML to retain structural nuance. Reductto has processed over three billion documents for Fortune‑10 firms, hedge funds, and legal‑tech startups, backed by Andre Horowitz and Benchmark.

The implications are clear: as AI agents shift from chat‑only interfaces to end‑to‑end task execution, reliable document extraction becomes a prerequisite. Reductto’s hybrid approach reduces costly hallucinations, improves compliance, and accelerates deployment of action‑based AI systems across high‑stakes industries.

Original Description

As AI agents become more capable, their performance is increasingly bottlenecked not by model quality but by the quality of data they consume.
This talk by Adit Abraham, co-founder and CEO of Reducto explores how leading AI teams across startups and Fortune 10 enterprises tackle the challenge of ingesting unstructured data at scale — from complex PDFs and scanned documents to messy real-world files — and shares practical patterns for building more reliable agents through better data pipelines.

Comments

Want to join the conversation?

Loading comments...