How to Run LLMs Locally - Full Guide

•December 19, 2025

0

Tech With Tim

Tech With Tim•Dec 19, 2025

Why It Matters

Running LLMs locally empowers businesses to lower AI operating costs, safeguard proprietary data, and deliver faster, on‑premise inference, making advanced language capabilities accessible without reliance on third‑party APIs.

Summary

The video provides a step‑by‑step guide for developers who want to run large language models (LLMs) on their own hardware, focusing on two primary approaches: the open‑source Ollama tool and Docker’s model runner. It begins by positioning local inference as a solution for speed, privacy, and cost concerns that arise when relying on hosted services like ChatGPT, and then walks viewers through downloading, installing, and verifying the Ollama client across macOS, Windows, and Linux.

Key insights include the mechanics of pulling models—using commands such as "ollama pull"—and the importance of matching model size to hardware capabilities. The presenter demonstrates running a tiny 271 MB model (small‑m‑2) interactively, highlights the latency advantage of local execution, and shows how to expose the model via an HTTP REST API (default port 11434) for programmatic access. Python examples illustrate both raw HTTP calls and the convenience of the "ollama" Python package, while the Docker model runner is presented as a more robust, GPU‑accelerated alternative that runs on port 12434 and integrates seamlessly with containerized workflows.

Notable examples feature the model incorrectly answering a factual question (the capital of Canada) to underscore the limitations of very small models, and a successful generation of a 500‑word essay on the fall of Rome, retrieved via both Ollama and Docker endpoints. The speaker also points out practical UI differences—Ollama’s command‑line interface versus Docker Desktop’s graphical model browser—and provides concrete commands for listing, running, and inspecting models in both environments.

The implications are clear: developers can replace external API calls with locally hosted LLMs, cutting subscription fees and eliminating data‑exfiltration risks while achieving near‑zero network latency. By leveraging either Ollama for quick CLI‑based experimentation or Docker for production‑grade container deployment, teams gain flexibility to integrate AI capabilities into existing stacks, from custom back‑end services to LangChain pipelines, fostering greater control over cost, compliance, and performance.

Original Description

Click this link https://boot.dev/?promo=TECHWITHTIM and use my code TECHWITHTIM to get 25% off your first payment for boot.dev.

If you're not running LLMs locally, then you're missing out. ChatGPT and other hosted solutions are great, but if you care about speed, privacy and cost, then you'll want to learn how to run them on your own machine. In this video, I'll show you two methods of running LLMs locally from a developer perspective.

DevLaunch is my mentorship program where I personally help developers go beyond tutorials, build real-world projects, and actually land jobs. No fluff. Just real accountability, proven strategies, and hands-on guidance. Learn more here - https://training.devlaunch.us/tim?video=km5-0jhv0JI

🎞 Video Resources 🎞

Download Ollama: https://ollama.com/download

Ollama Library: https://ollama.com/library

Ollama GitHub: https://github.com/ollama/ollama

Docker Model Runner Full Video: https://www.youtube.com/watch?v=GOgfQxDPaDw&t=24

⏳ Timestamps ⏳

00:00 | Overview

00:38 | Method 1 - Ollama

04:29 | Ollama from Code

08:27 | Method 2 - Docker Model Runner

12:32 | Docker Model Runner from Code

Hashtags

#Ollama #Docker #LLM

UAE Media License Number: 3635141

0

Comments

Want to join the conversation?

Loading comments...