
The video argues that large language models (LLMs) do not think like humans; they are trained to predict the next token in a sequence, not to understand meaning or intent. Luis Frana explains that while both humans and machines learn from patterns, the purpose of prediction differs: for humans it is a by‑product of comprehension, for LLMs it is the sole objective. Frana highlights that LLMs operate by minimizing prediction error across trillions of token‑level guesses, treating language as a series of numerical identifiers. In contrast, human writers imagine scenes, emotions, and narratives, often omitting words without harming the story. This fundamental distinction explains why LLMs can produce superficially logical output yet fail in unexpected ways, as they lack true reasoning. He uses the analogy of painting: a machine copies every brushstroke, while an artist internalizes technique, composition, and intent, reproducing only the final effect. Notable quotes include, “The words are the thinking,” underscoring that LLMs generate text that merely resembles reasoning. He also offers to explore chain‑of‑thought prompting, which can mask but not eliminate these limitations. The implication is clear for businesses and developers: relying on LLMs for tasks requiring genuine understanding or nuanced decision‑making carries risk. Prompt engineering may improve surface performance, but the underlying gap between pattern prediction and comprehension remains a strategic limitation.

The video explains that large language models (LLMs) are inherently limited—hallucinating facts, faltering on complex reasoning, inheriting biases, and being bound by a static knowledge cutoff. It argues that recognizing these constraints is the first step toward building dependable AI...

You're probably already using AI agents without realizing they're making decisions you can't trace back. And if something goes wrong, you won't know until it's too late. AI agents aren't chatbots waiting for prompts. They set goals, plan multiple steps ahead, and take...

The video explains how AI developers use model ensembles—multiple models or versions working together—to cut errors that single models inevitably make. By aggregating diverse outputs and merging them intelligently, teams can achieve more reliable, stable results in high‑stakes environments. Three primary...
Something funny happened during our recent AI training at NYPL. We had ~30 professionals in the room. Developers, IT folks, managers. Different stacks. Different backgrounds. And suddenly… they were all vibe coding. People who: didn’t have Python installed a week ago hadn’t touched frontend in years were used to...

The video introduces self-consistency, a technique that transforms the inherent randomness of large language models into a reliability boost by generating several independent answers and aggregating them. Instead of forcing a single deterministic response, the model is run multiple times...

Most AI projects don’t fail because of bad models. They fail because the wrong decisions are made before implementation even begins. Here are 12 questions we always ask new clients about our AI projects before we even begin work, so you don't...

The video introduces Reinforcement Learning from Verifiable Reward (RLVR), a framework that replaces human or model‑based preference judgments with an automated verifier that checks factual correctness. By tying rewards directly to objective outcomes—such as passing unit tests, solving equations, or...

The video introduces Reinforcement Learning from AI Feedback (RLAF), a method that replaces costly human reviewers with an AI “judge” to evaluate and rank model outputs, enabling small teams to scale alignment work. Human feedback is slow, expensive, and inconsistent, limiting...

The video introduces preference tuning as the next step after instruction‑following models, focusing on shaping responses to sound helpful, clear, and human‑like. Rather than merely judging right or wrong answers, developers present paired outputs and label the one people prefer,...

The video explains that large language models (LLMs) are vulnerable to two distinct attack vectors—prompt injection and prompt hacking—where malicious text can override system instructions or bypass safety filters. Prompt injection occurs when an LLM consumes external content, such as a...

This couple asked AI to generate a family for them. The result wasn’t what they asked for... This wasn’t AI judging them. It wasn’t choosing values. And it wasn’t preferring one family over another. It happened because of what AI is trained on. Image models learn...

Some questions are easy. Others need real reasoning. If a model jumps straight to the final answer, it can easily make mistakes. That’s where reasoning and chain-of-thought matter. Instead of guessing, the model breaks a problem into steps before reaching a conclusion. Once the prompt is set,...

The video clarifies the often‑confused terminology around AI‑driven workflows, agents, tools and multi‑agent systems, warning that many clients overengineer solutions by mis‑labeling simple pipelines as complex agents. The presenter draws a clear line: workflows are deterministic sequences you predefine, while agents...

Ever noticed how an AI suddenly forgets what you were talking about? That’s not a mistake. It’s the context window. A model can only see a limited amount of text at once. Once that window fills up, older context drops out. Inside that window, prompting decides...

Please stop considering LLM-based systems like bulletproof super humans. It’s powerful but has as much if not more vulnerabilities than one individual could. https://t.co/sYCgwmBDUO

The video explains a growing solution to a fundamental bottleneck in AI development: evaluating model outputs at scale. Traditional human review of thousands of conversational turns is impossible, so researchers are turning to a technique called “LLM-as-judge,” where a state‑of‑the‑art...

Our book Building LLMs for Production is now being cited by research papers.📚✨ This book came from real work - building systems, fixing failures, rewriting chapters and learning what actually matters in production. It covers how to design, evaluate, and deploy reliable LLM...

When choosing between LLMs such as GPT‑5, LLaMA or Claude, the video stresses that objective comparison hinges on benchmarks—standardized tests that quantify raw capabilities across diverse tasks. By applying the same evaluation suite, practitioners can rank models and pinpoint strengths...

The video explains how to choose between reasoning models and compact instruct models, emphasizing that architectural labels alone don’t guarantee suitability. Reasoning models are a newer class of large language models built to handle multi‑step problem solving by taking a...

For the past 29 days, I’ve been posting one short video every day explaining an AI term🎯 It’s part of a series I’m calling: “Introduction to AI in 42 Terms.” Each video explains one AI concept in simple language - no jargon, no hype,...

The video explains that most existing AI systems are limited to a single modality—typically text—meaning they cannot directly interpret images or audio. This constraint hampers their usefulness when users pose questions that involve visual or auditory data, such as asking...

Let's finally make LLMs work for you instead of against you, so your drafts stop sounding generic and start sounding like you. We’ll break down how to spot and remove “AI slop,” fix the generated-looking structure that gives it away, and...

The video explains how to edit AI‑generated text so it reads like a human author rather than a generic LLM output. Drawing on two years of experience at TORZI, the presenter outlines concrete techniques and a prompt template that keep...

Your Prompts Aren’t the Problem❌ A full guide to getting consistently better answers from AI👇 When you use an AI assistant and think “Why is it suddenly confused?” Or “Why did this work 5 minutes ago but not now?” It’s rarely about wording. It’s about context. What the...

Distillation is the core method for turning massive, high‑performing AI models into compact, fast‑running versions without sacrificing much capability. By treating a large pretrained model as a teacher and a smaller model as a student, developers let the student mimic...

One model calling 10 APIs is NOT a multi-agent system.❌ This is one of the most common mistakes I see. Tools are capabilities. Agents are decision-makers. If one model decides what to do next and calls multiple tools, you still have one agent, not many. Misunderstanding this...

Do you know why a model feels “better” after fine-tuning? And why a very smart model can still give unsafe or confusing answers? In this post, we break down Fine-tuning and Alignment. This is part of Introduction to AI in 42 terms (we’ve covered...

The video explains that an application programming interface (API) is the conduit through which software interacts with large language models, whether the model is proprietary, open‑weight, or open‑source. When a developer sends a prompt, the API forwards it to the provider’s...

The video examines how developers must decide which class of large‑language model to adopt when moving from experimentation to production. It outlines three categories—proprietary models such as OpenAI’s GPT‑5 or Google’s Gemini, open‑weight models like Meta’s Llama 3.1, Mistral, and Google’s Gemma,...

If you’re a student or a professional using AI daily, you’ve seen this happen. A prompt works great today. Tomorrow it gives a weird answer. Next week it breaks after a model update 😅 A prompt that works once for one model isn’t reliable. A...

The video examines the emerging class of agentic AI systems and warns against indiscriminate deployment. Unlike traditional reactive chatbots that wait for a prompt and return a single answer, agentic models can formulate plans, execute multiple actions, and deliver complex...

LLMs don’t wake up smart. They’re trained into it. Before a model can answer questions, follow instructions, or sound helpful, it goes through a long phase called pre-training. This is where: • random parameters • massive amounts of text and code • and one simple task come...
We created a free Agents Architecture Cheatsheet. Here’s why 👇 A lot of people are building agent systems without a clear reason to do so. They mix tools with agents, over-complicate architectures, and struggle to move from demos to production. This cheatsheet is designed to be...

The video explains grounding – the practice of constraining large language model (LLM) responses to information drawn from verifiable external sources – as a core strategy to curb hallucinations. By forcing the model to rely on trusted data rather than...

The video explains how the temperature parameter governs the randomness of token selection in large language models, shaping whether outputs are deterministic or stochastic.\n\nA temperature of zero forces the model to pick the single most probable token, producing identical responses...

I’ve done quite a few AI workshops recently, and I keep getting the same questions 👇 “Where do I start?” “How long does it really take to learn AI?” “Can I actually become job-ready?” So to clear the confusion, I put all the resources...

In this talk Luis Franis, CTO of TORZI, explains how AI engineers decide between workflows, single agents, and multi‑agent systems when building client solutions. He frames AI engineering as a bridge between model development and product integration, emphasizing constraints such...
People say “LLMs learn like humans, we both copy patterns.” Sounds right. It’s also misleading. LLMs don’t learn language to understand meaning. They learn to predict the next token. Not the next word. Tokens. IDs. Math. Over and over, trillions of times,...

The video explains how large language models (LLMs) often stumble on multi‑step questions because they attempt to jump straight to a final answer, leading to logical slips and hallucinations. To mitigate this, practitioners employ a prompt‑engineering technique called chain‑of‑thought (CoT),...

My 2025 wrapped: - released our first ever course & product 🚀 - followed up with 3 more courses (and a 4th coming soon with a great friend, @pauliusztin_) - invited to NVIDIA GTC and briefly met Jensen + many amazing people - landed...

The video explains two foundational prompting strategies—zero-shot and few-shot learning—used to shape large language model outputs. Zero-shot prompting presents a plain instruction without any exemplars, trusting the model’s pre‑trained knowledge to generate an answer, such as asking a general‑purpose assistant...