
Kronk AI: Building a Basic AI Chat Agent with Message History Pt. 1
The video walks through building a rudimentary chat agent in Go, leveraging Kronk AI’s model server and server‑side events (SSE) to exchange messages with an OpenAI‑compatible backend. It starts by defining constants for the model endpoint, then creates a simple stdin scanner to capture user input. A factory function builds an agent that wraps an SSE client, writes output to stdout, and maintains a slice of message maps as the conversation history. Each user turn is added with role = "user", packaged into a chat‑completion request, and sent with `stream:true` so the server streams partial tokens back. During execution the code prints colored chunks—regular content in default color and reasoning in red—demonstrating how to differentiate response parts. The presenter shows a live session: greeting the model, asking it to write a Go “Hello World”, then requesting the same in Rust, confirming that the stored history informs follow‑up answers. While the prototype successfully preserves context, the host notes that each request clears the KV cache, forcing full re‑decoding and degrading performance as the dialogue grows. This highlights the need for caching optimizations before scaling the agent to more complex tool‑calling scenarios.

Invest in Yourself & Master AI Agents
The speaker urges professionals to invest in themselves by mastering AI agents, dedicating a portion of weekly work time to develop the skill set needed to harness these tools effectively. Key recommendations include allocating roughly 20 % of one’s weekly schedule to...

Kronk AI: Hugging Face & Vision Model File Formats
The video walks through the file structure required for vision‑oriented models hosted on Hugging Face, emphasizing that unlike pure‑text models they ship two distinct artifacts: the core model binary and a companion projection file. The projection file is consumed by Llama‑CPP’s...

Kronk AI: Understanding GGUF & Jinja Chat Templates
The video walks viewers through the GGUF model format and Jinja‑based chat templates, showing how to locate, download, and run large language models from Hugging Face. It highlights Unsloth as the go‑to provider for GGUF files and advises checking each...

VLLM Vs. Kronk: Choosing the Best AI Engine for Your App
The video contrasts two local model inference engines—VLLM and Kron—explaining their distinct design philosophies and target use‑cases. VLLM is presented as the leading production‑grade server for deploying large language models at scale, engineered to handle thousands of concurrent users and...

Understanding Context Limits in Kronk: GGUF, Truncation, & Summarization
The video explains how Kronk determines a model’s context window using the GGUF file and why respecting that limit is essential for stable operation. For example, the presenter points to a model with a 256 K token limit, noting that the limit...

Optimizing Local AI: Kronk + Metrics for Gauging Performance
The video introduces Kronk’s new “playground” tool for locally running AI models, showing how it automatically evaluates multiple configuration combos to identify optimal settings for a given machine. The presenter argues that traditional tokens-per-second (TPS) numbers are misleading, emphasizing that the...

Rethinking AI Deployment: Self Contained AI with Go and Kronk
The video introduces Kron SDK, a Go‑based toolkit that lets developers embed the model serving logic directly into their applications, removing the traditional separate model server. By compiling the entire RAG stack—including a vector database—into one Go binary, developers can deploy...

Turn Plain English Into SQL Queries with Go and LLMs
The video walks through a Go‑based prototype that lets users ask plain‑English questions about a DuckDB database and have a large language model generate the corresponding SQL, execute it, and return a natural‑language answer. The implementation follows a two‑prompt workflow: the...

Bill Kennedy at FOSDEM'26: Directly Integrating LLM Models Into Go Applications
At FOSDEM ‘26 Bill Kennedy unveiled a new approach for integrating large‑language‑model inference directly into Go applications, bypassing traditional model‑server architectures. He explained how licensing costs and the need to run separate C‑or‑Python services have hampered Go developers. Ron Evans’ pure‑Go FFI...