Tutorial: @Landingai Pipelines That Self-Improve | Future of Data and AI | Agentic AI Conference

Data Science Dojo
Data Science DojoApr 16, 2026

Why It Matters

Automating complex document extraction reduces labor costs and error rates, giving businesses a competitive edge in data‑driven operations.

Key Takeaways

  • Landing AI’s DPT models extract structured data from messy documents.
  • APIs provide parse, split, and extract functions with layout awareness.
  • Multi‑agent orchestration enables self‑improving document pipelines at scale.
  • New extract API supports infinite schemas for large‑scale field extraction.
  • Visual playground lets users test document extraction without credit card.

Summary

Andrea Crop of Landing AI opened the session by framing "agentic document extraction" as a purpose‑built alternative to OCR and vision‑language models. The talk highlighted Landing AI’s Document Pre‑trained Transformers (DPT) that ingest real‑world, multi‑language, hand‑written, and diagram‑rich files and output fully auditable, layout‑aware structured data.

The core offering consists of three APIs—parse, split, and extract—each designed for production‑scale throughput. The parse API returns markdown and JSON with cell‑level grounding; split breaks long documents into logical chunks; and the newly released extract API can handle "infinite" schemas, enabling extraction of hundreds of fields from a single request. Multi‑agent orchestration ties these steps together, allowing the pipeline to learn from feedback and improve autonomously.

During the live demo, Crops processed a 12‑page lab report using DPT2, showcasing real‑time markdown rendering, color‑coded chunk ontologies, and precise grounding of a hemoglobin value. She emphasized that the system is "agentic by design" and referenced founder Dr. Andrew Ing’s vision of purpose‑built models superseding one‑size‑fits‑all AI.

For enterprises, the technology promises to replace manual data entry on mortgage applications, tax forms, and healthcare records, while also enriching retrieval‑augmented generation pipelines with figures and charts previously ignored by OCR. As purpose‑built transformers become mainstream, Landing AI’s self‑improving pipeline positions it to capture a growing market for scalable, accurate document intelligence.

Original Description

Andrea Kropp, Applied AI Engineer at @landingai, walks developers through a production-grade document extraction architecture that doesn’t just process; it learns. Using LandingAI’s Agentic Document Extraction API and modern multi-agent frameworks, you’ll see how to build a pipeline that measures its own accuracy, identifies failures, and refines itself automatically across high volumes, multi-page layouts, and edge cases.
In this session, you’ll learn to:
- Design an end-to-end extraction pipeline from raw documents to structured outputs with automated routing, evaluation, and feedback loops built in.
- Build systems that measure accuracy against benchmark datasets, identify failure points, and drive targeted improvements using evidence instead of guesswork.
_____
Learn data science, AI, and machine learning through our hands-on training programs: https://www.youtube.com/@Datasciencedojo/courses
Check our latest Future of Data and AI Conference: https://www.youtube.com/playlist?list=PL8eNk_zTBST9Wkc6-bczfbClBbSKnT2nI
Subscribe to our newsletter for data science content & infographics: https://datasciencedojo.com/newsletter/
Love podcasts? Check out our Future of Data and AI Podcast with industry-expert guests: https://www.youtube.com/playlist?list=PL8eNk_zTBST_jMlmiokwBVfS_BqbAt0z2

Comments

Want to join the conversation?

Loading comments...