Is RAG Dead? Not If Accuracy Matters [Alex Bowcut] - 769

TWiML AI (This Week in Machine Learning & AI)
TWiML AI (This Week in Machine Learning & AI)Jun 9, 2026

Why It Matters

For regulated businesses, auditability and exact sourcing trump raw language-model fluency — meaning RAG-based systems that surface citations will remain necessary to manage legal risk and enable scalable international expansion. Firms that rely solely on large-context LLMs risk compliance errors and slower regulatory adoption without provable provenance.

Summary

As large context windows expand, Alex Bokeut of Sphere argues retrieval-augmented generation (RAG) remains essential for high-stakes, accuracy-sensitive domains like sales tax and VAT compliance. Sphere built TRAM, a document-centric system that combines retrieval, OCR and expert workflows to speed tax review nearly two orders of magnitude while preserving precise citations and provenance. Bokeut says pure LLM ingestion risks losing verifiable source links critical for legal and regulatory answers, and that messy, heterogeneous government documents still require targeted retrieval and human-in-the-loop validation. Sphere has scaled this approach through engineering and raised a Series A from a16z to tackle global tax complexity.

Original Description

As context windows grow into the millions of tokens, many AI practitioners are questioning whether retrieval-augmented generation (RAG) is still necessary. If modern models can ingest entire libraries of documents, why bother with retrieval at all?
In this episode, Alex Bowcut, Head of Engineering at Sphere, explains why the answer depends on the application. Sphere uses AI to automate global tax compliance—an environment where getting the answer right isn’t enough. Every conclusion must be backed by the correct legal citation, and every decision must withstand expert review.
We explore how Sphere built TRAM (Tax Review and Assessment Model), a production AI system that combines retrieval, reasoning models, legal review workflows, reinforcement learning, and deterministic systems to help tax experts move nearly two orders of magnitude faster while maintaining accuracy.
Along the way, we discuss why RAG remains critical in high-stakes domains, how Sphere processes legal and regulatory documents from jurisdictions around the world, retrieval architectures, semantic chunking, dense versus sparse retrieval, expert feedback loops, and the challenges of building AI systems that people can actually trust.
🗒️ Full show notes: https://twimlai.com/go/769.
🔔 Subscribe to our channel for more great content just like this: https://youtube.com/twimlai?sub_confirmation=1
📖 CHAPTERS
===============================
00:00 Intro: Is RAG Obsolete?
01:24 Meet Sphere and TRAM
02:02 Why Tax Content Is Hard
03:54 How AI Supercharges Tax Experts
05:14 Alex Background Story
07:03 Messy Legal Data Ingestion
08:58 How TRAM Review Works
11:19 What Triggers Updates
15:13 Legal Review Not Labeling
16:08 Chunking and Indexing Law
21:21 Dense Versus Sparse Search
25:07 Taxonomy Driven Queries
27:55 RAG Is Dead Debate
29:55 Citations and Traceability
31:22 RFT for Accuracy Gains
34:50 Evals and Model Drift
37:47 LLM Reranking and Expansion
40:28 Chasing Nines Accuracy
42:39 Context Windows Impact
44:46 Costs and Latency Reality
45:40 Future Roadmap for TRAM
48:57 Personal AI Workflow Tools
50:30 Closing
🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: https://twimlai.com/podcast/twimlai/
Follow us on Twitter: https://twitter.com/twimlai
Join our Slack Community: https://twimlai.com/community/
Subscribe to our newsletter: https://twimlai.com/newsletter/
Want to get in touch? Send us a message: https://twimlai.com/contact/

Comments

Want to join the conversation?

Loading comments...