AI Dev 25 X NYC | Alex Ker: How Open Source Models Actually Run AI Coding at Scale

•December 2, 2025

0

Andrew Ng

Andrew Ng•Dec 2, 2025

Why It Matters

Open‑source LLMs now deliver comparable or superior performance to proprietary models at dramatically lower latency and cost, enabling developers to build scalable AI‑coding tools with full control over reliability and economics.

Summary

Alex Ker, a growth software engineer at Base 10, delivered a deep‑dive on how open‑source large language models (LLMs) are now powering AI‑assisted coding at scale, challenging the dominance of closed‑source offerings like GPT‑5 and Claude. He framed the talk around three core pillars—latency, reliability, and cost—arguing that open‑source models increasingly match or exceed proprietary benchmarks while giving developers granular control over performance knobs, enabling real‑time, low‑latency experiences essential for developer tooling.

Ker highlighted three state‑of‑the‑art open‑source models: GLM 4.6, a general‑purpose model that consumes 30% fewer tokens than its predecessor; QN3 Coder, a specialist coding model from Alibaba suited for high‑volume token tasks; and Kimi K2 Thinking, a trillion‑parameter model that leads on both the Humanities‑Last exam and the Tal2 tool‑use benchmark, thanks to its interleaved thinking architecture and a five‑step tool‑calling training pipeline. He contrasted these with closed‑source models, noting that while they remain “smart,” the quality gap is narrowing, and Kimi’s performance on complex, multi‑step tasks—such as solving a PhD‑level geometry problem with 23 interleaved reasoning cycles—demonstrates its superiority in tool use and hallucination mitigation.

The presentation moved from theory to practice, showing how Base 10 integrates open‑source LLMs into developer workflows in under ten minutes. Ker demonstrated a lightweight LLM proxy that reroutes API calls from cloud services to open‑source endpoints, achieving a 167% throughput boost and 5‑7× cost reduction with GLM 4.6. He also surveyed tooling options—from the minimally opinionated Open Router to the more integrated Vercel AI SDK, LangChain, LlamaIndex, and the Klein IDE, which offers a “bring‑your‑own‑key” model selection and built‑in guardrails. A case study on Sourcegraph’s autocomplete service illustrated three inference optimizations: KV‑cache reuse, KV‑aware routing, and n‑gram speculation, collectively delivering sub‑200 ms latency and maintaining developer productivity at scale.

Ker concluded that developers who remain tethered to proprietary APIs risk missing out on the rapid advancements and economic benefits of open‑source AI. By experimenting with these models and leveraging the emerging ecosystem of tools, engineers can build faster, more reliable, and cost‑effective AI‑driven products. The broader implication is a shift in the AI market toward democratized, high‑performance models that empower companies to own their inference stack and tailor experiences without sacrificing quality.

Original Description

Alex Ker, Growth Software Engineer at Baseten shared how 10x engineers are saving time and money with frontier open-source coding models.

These models now match their closed-source rivals on benchmarks, but are faster and more specialized on agentic tasks. In this talk, we'll uncover the design decisions behind Qwen3Coder and Kimi K2 and show practical ways to use them in your dev workflow with OpenRouter, Cline, and OpenCode.

Attendees learned how these models support production-scale inference and how to build their own performant AI apps for coding and beyond.

--------------

Join us at AI Dev 26 x San Francisco! Tickets: https://ai-dev.deeplearning.ai/

0

Comments

Want to join the conversation?

Loading comments...