Open Models Coding Essentials – Running LLMs Locally and in the Cloud Course

freeCodeCamp
freeCodeCampMay 7, 2026

Why It Matters

Understanding the trade‑offs between open‑source LLMs, hardware requirements, and cloud subscriptions helps businesses decide whether to invest in self‑hosted AI coding assistants or rely on established SaaS providers.

Key Takeaways

  • Gemma 4 runs locally with low memory but needs large VRAM.
  • Kimmy 2.5 outperforms other open models for coding tasks.
  • Claude Code remains most reliable tool‑aware coding harness.
  • Olama Cloud subscription offers stable API access for open models.
  • GPU rental remains costly and unpredictable for production deployments.

Summary

The video introduces Andrew Brown’s "Open Models Coding Essentials" course, which examines how to run open‑source large language models (LLMs) both on‑premises and in the cloud for software‑development tasks. Brown outlines the landscape of available models—Gemma, GLM, Kimmy, Quen—and the coding harnesses such as Claude Code, Cloud Code, and Codeex that can host them.

Key findings include hardware limits: Gemma 4’s tiny footprint is appealing, yet it demands a 32,000‑token context window and 24‑32 GB of VRAM, far beyond the presenter’s RTX 4060. Kimmy 2.5 emerged as the top performer for code generation, while Quen struggled with tool‑calling. Claude Code proved the most dependable tool‑aware harness, and Olama Cloud’s $20‑$30 subscription delivered a smooth, integrated API experience.

Brown highlights practical frustrations: lack of standardized benchmarks, flaky GPU rental markets, and the steep cost of building custom rigs. He notes, “I was absolutely impressed with Kimmy,” and points out that Quen “half the time it doesn’t call tool use.” These anecdotes underscore the experimental nature of the tests, which were limited to smoke‑tests like building a Flappy‑Bird app.

The implications are clear for enterprises and developers: open‑source models can be viable alternatives to proprietary APIs if hardware budgets allow, but reliable tool‑aware performance still leans toward established services like Claude Code. Subscription platforms such as Olama Cloud simplify access, while GPU‑as‑a‑service remains a barrier for scalable, cost‑effective deployment.

Original Description

Learn how to work with a wide range of open large language models (LLMs) such as Gemma, Kimmy, and GLM across various local and cloud-based environments. This comprehensive guide by Andrew Brown explores how to use coding harnesses like Claude Code and Pi Agent to build real-world agentic workflows while benchmarking model performance and hardware requirements.
✏️ Course created by @ExamProChannel
⭐️ Chapters ⭐️
00:00 Introduction
Foundations of Open Coding Systems
00:01:17 Exploration of Coding Harnesses and Open Models
00:09:19 Open Models Selection
00:13:21 Coding Harness Selection
Local Models and Ollama Setup
00:19:57 Install Ollama Serve Model
00:35:58 ClaudeCode Gemma4 Local
00:55:51 Codex Gemma GPT OSS Part 1
01:01:25 Codex Gemma GPT OSS Part 2
Cloud Models and Coding Agents
01:17:16 Claude Code Gemma4 Cloud
01:27:22 Claude Code Kimi 2 5
01:33:33 Claude Code GLM 5
01:36:36 Claude Code MiniMax 2 7
01:41:49 Claude Code Qwen 3 5
01:46:06 Pi Coding Agent Ollama Cloud
02:03:04 Droid CLI Ollama Cloud
02:10:54 Open Code Ollama
❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning: https://scrimba.com/freecodecamp
🎉 Thanks to our Champion and Sponsor supporters:
👾 @omerhattapoglu1158
👾 @goddardtan
👾 @akihayashi6629
👾 @kikilogsin
👾 @anthonycampbell2148
👾 @tobymiller7790
👾 @rajibdassharma497
👾 @CloudVirtualizationEnthusiast
👾 @adilsoncarlosvianacarlos
👾 @martinmacchia1564
👾 @ulisesmoralez4160
👾 @_Oscar_
👾 @jedi-or-sith2728
👾 @justinhual1290
--
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://freecodecamp.org/news

Comments

Want to join the conversation?

Loading comments...