Why AI “Forgets” Your Conversation

•November 18, 2025

0

Louis Bouchard

Louis Bouchard•Nov 18, 2025

Why It Matters

Understanding the context‑window limitation is crucial for businesses building conversational AI, as it drives product design, user experience strategies, and the need for sophisticated retrieval systems to sustain engagement.

Summary

The video explains why large language models (LLMs) like ChatGPT appear to “forget” earlier parts of a conversation: they simply lack a true memory and are constrained by a fixed context window of only a few thousand tokens.

When a dialogue exceeds this limit, engineers must employ techniques such as summarization, embedding compression, and retrieval‑augmented generation to fit the most relevant information into the model’s context. These methods reconstruct prior exchanges on the fly rather than providing the model with continuous, unbounded recall.

Louis‑François, CTO and co‑founder of Towards AI, illustrates the practice by advising users to start a new thread when shifting topics or when the model’s responses become erratic. He notes that, for long chats, the system retrieves the “closest embeddings” to surface the most useful past messages, a workaround that mimics memory but is entirely engineered.

The limitation has direct business implications: product teams must design interfaces that manage context length, educate users on conversation hygiene, and invest in retrieval infrastructure to maintain conversational coherence. Failure to address these constraints can degrade user satisfaction and limit the commercial viability of AI‑driven chat services.

Original Description

Ever wondered why AI suddenly forgets half your conversation?

It’s not confusion. It’s capacity.

LLMs only see what fits inside their context window, and once it’s full, older messages quietly fall out or get compressed. That’s why long chats feel like the model has amnesia.

The fix isn’t bigger memory. It’s retrieval.

Instead of stuffing the whole conversation back in, the system pulls only the pieces you need for your next question and stitches the flow together so it feels like real memory.

Simple idea, huge impact on how we build reliable AI systems today.

I’m Louis-François, PhD dropout, now CTO and co-founder at Towards AI. Follow me for tomorrow’s no-BS AI roundup 🚀

#ArtificialIntelligence #LLMs #AIExplained #short

0

Comments

Want to join the conversation?

Loading comments...