Cohere Open-Sources a Coding Agent that Runs on a Single H100

Cohere Open-Sources a Coding Agent that Runs on a Single H100

VentureBeat
VentureBeatJun 9, 2026

Why It Matters

Enterprises can now choose a high‑throughput, open‑source coding agent that avoids per‑token cloud fees and preserves data sovereignty, reshaping the economics of agentic development pipelines.

Key Takeaways

  • 30B MoE model activates 3B parameters per token
  • 256k token context window handles multi‑file codebases
  • Runs locally on a Mac Studio with ~20 GB RAM
  • Delivers 210 tokens/sec, first token in 0.25 s
  • Generates three times more output tokens than comparable models

Pulse Analysis

North Mini Code marks a pivotal shift in the coding‑assistant market by delivering a purpose‑built, open‑source large language model that rivals proprietary offerings while running on a single H100 GPU. Its 30‑billion‑parameter mixture‑of‑experts architecture, with only eight of 128 experts active per token, keeps inference compute comparable to a 3 billion‑parameter dense model. The massive 256,000‑token context window enables the model to ingest entire multi‑file projects, making it suitable for architecture mapping, code review, and terminal‑based agentic tasks that require sustained, multi‑step reasoning.

Performance benchmarks place North Mini Code among the fastest open‑weight models, achieving 210 tokens per second and a sub‑0.3‑second first‑token latency. Compared with Mistral’s Devstral Small 2, Cohere reports a 2.8× throughput advantage and 30% lower inter‑token latency on identical hardware. However, the model’s verbosity—producing roughly three times more output tokens than peers—introduces hidden inference costs in high‑volume pipelines. For organizations that process millions of tokens daily, this trade‑off between speed and token efficiency becomes a critical factor in total cost of ownership.

For enterprises, the release crystallizes a strategic decision point: adopt a managed service like Claude Fable 5, which charges about $50 per million output tokens, or deploy North Mini Code on‑premises to eliminate per‑token fees and retain full data control. The open‑source Apache 2.0 license, combined with demonstrated local deployment on modest hardware (a Mac Studio with ~20 GB RAM), lowers barriers to entry for teams seeking sovereign AI solutions. Companies should model both cost pathways against their expected workload, accounting for the model’s higher token output, to determine the most economical and secure architecture for agentic coding pipelines.

Cohere open-sources a coding agent that runs on a single H100

Comments

Want to join the conversation?

Loading comments...