AI Dev 26 X SF | Adit Abraham: Better Agents with Better Data
Why It Matters
Accurate, deterministic document extraction eliminates silent errors, enabling AI agents to execute high‑value, compliance‑sensitive tasks at scale.
Key Takeaways
- •Better data directly improves AI agent performance and reliability.
- •PDFs remain a major bottleneck due to complex layouts.
- •Combining traditional CV with vision‑language models yields deterministic extraction.
- •Agentic OCR uses speculative decoding for high‑accuracy, traceable outputs.
- •Choosing output format (markdown vs HTML) optimizes token efficiency and reasoning.
Summary
In this talk Adit Abraham of Reductto outlines the company’s mission to turn raw documents into reliable inputs for next‑generation AI agents. He explains that while large language models have matured, their real‑world utility still hinges on the quality of the data they ingest, especially when that data lives in PDFs and other unstructured formats.
Reductto identifies a universal enterprise bottleneck: extracting structured, deterministic information from heterogeneous sources. PDFs pose particular challenges—irregular layouts, scanned images, and ambiguous reading order often cause silent failures that can be costly in regulated domains like healthcare or finance. By pairing traditional computer‑vision pipelines (object detection, layout analysis) with modern vision‑language models, Reductto achieves both high accuracy and repeatable outputs.
A standout innovation is “agentic OCR,” which applies speculative decoding to generate token‑level edits, preserving bounding‑box metadata and enabling a deterministic, human‑in‑the‑loop‑free review process. The company also tailors output formats: simple tables become markdown for token efficiency, while complex merged‑cell tables are emitted as HTML to retain structural nuance. Reductto has processed over three billion documents for Fortune‑10 firms, hedge funds, and legal‑tech startups, backed by Andre Horowitz and Benchmark.
The implications are clear: as AI agents shift from chat‑only interfaces to end‑to‑end task execution, reliable document extraction becomes a prerequisite. Reductto’s hybrid approach reduces costly hallucinations, improves compliance, and accelerates deployment of action‑based AI systems across high‑stakes industries.
Comments
Want to join the conversation?
Loading comments...