Alex J. Champandard

Alex J. Champandard

Creator
0 followers

Building tools and teams where humans ≫ machines. AI, ML, research & development. co-Founded #CreativeAI #⚘

Cursor’s Unlicensed K2.5 Model Signals Strategic Misstep
SocialMar 20, 2026

Cursor’s Unlicensed K2.5 Model Signals Strategic Misstep

A member of the Moonshot team claimed (in now deleted tweets) they confirmed Cursor's new model is based on K2.5 without a license. Hard to not see this as a massive strategic mistake of Cursor's management and the board. If you're...

By Alex J. Champandard
LLM‑Driven RL Environment Learns During Interaction
SocialMar 16, 2026

LLM‑Driven RL Environment Learns During Interaction

Currently writing an article on my work where the RL environment (or agent harness) learns back while the LLM interacts with it... If you have any references or pointers for releated work on this topic, I'd welcome it! https://t.co/DGUCi3PLpv

By Alex J. Champandard
GLM‑5 Shows Regression in Python Coding Tasks
SocialFeb 25, 2026

GLM‑5 Shows Regression in Python Coding Tasks

Alright, I'm calling it: GLM-5 is a regression from GLM 4.7 for Python coding. 🫠 Subscribed to Z(.)ai on the basis of 4.7 as it reliably took over all my devops too, and been using GLM 5 since launch. But with...

By Alex J. Champandard
2.6k‑parameter RL Model Matches 62B Baseline on GSM8k
SocialDec 15, 2025

2.6k‑parameter RL Model Matches 62B Baseline on GSM8k

Oh My! I just built a new RL-native system with 2.6k parameters that matches a specialized 62B model from a 2024 paper, specifically on GSM8k. That's over 20,000,000x smaller than the baseline — but the approach is so different the traditional...

By Alex J. Champandard
Intellect-3 Beats Larger Models with 34% Gain
SocialDec 1, 2025

Intellect-3 Beats Larger Models with 34% Gain

If the initial benchmarks scores (and graphs used for PR) showcased a 3x reduction in size for the same performance, I think the broader public reception would have been less tepid. Just looking at this, it just seems 7% behind other...

By Alex J. Champandard
Tiny 6k Model Replicates Behaviors of Massive Systems
SocialNov 27, 2025

Tiny 6k Model Replicates Behaviors of Massive Systems

A latent process that operates over plans? I've been working on this recently! What's most fascinating to me: my 6k parameter system can match aspects of models 100,000x bigger. Scaling laws apply very differently too... 🤔

By Alex J. Champandard
Opus 4.5 Mitigates Cursor AI Bugs, Not Solely Responsible
SocialNov 25, 2025

Opus 4.5 Mitigates Cursor AI Bugs, Not Solely Responsible

For Gemini 3, I don't rule out bugs in @cursor_ai — as many new features don't work, worktrees getting trashed or even renamed (!) mid-way through agent working. But since Opus 4.5 manages around those bugs, it can't...

By Alex J. Champandard
New Model Outperforms Gemini 3 with Greater Polish
SocialNov 25, 2025

New Model Outperforms Gemini 3 with Greater Polish

My verdict is that it's significantly better than Gemini 3. It's at least as smart and just got more polish to it. Alignment on little details also significantly higher. Gemini 3 gets many things mixed up after a half-dozen messages, and...

By Alex J. Champandard
Opus 4.5 Executes Tasks Seamlessly Beyond Token Limits
SocialNov 25, 2025

Opus 4.5 Executes Tasks Seamlessly Beyond Token Limits

With Opus 4.5, it seems you don't need to ask multiple times or ORDER it to do work, it just gets stuff done — even beyond 50% the token limit and after chat compaction! This kind of message is a thing...

By Alex J. Champandard
Benchmarks Mislead; Human Review Is the Real Bottleneck
SocialNov 20, 2025

Benchmarks Mislead; Human Review Is the Real Bottleneck

These kinds of benchmarks are misleading without a joint metric showing much work was necessary by humans after the fact. How much time to clean up that 2h42m of code? Style and architecture need to make sense, not just passing tests. That's...

By Alex J. Champandard
LLMs Lose Context After 100k Tokens, Need Frequent Resets
SocialNov 19, 2025

LLMs Lose Context After 100k Tokens, Need Frequent Resets

People working on basic code and reset their Agent chats every 4-5 replies I envy you. Having to work on deep context design work and at about 100k tokens, LLMs start to get lazy / confused. I resorted to giving them...

By Alex J. Champandard
Gemini 3: Fast but Unreliable, Files Get Corrupted
SocialNov 19, 2025

Gemini 3: Fast but Unreliable, Files Get Corrupted

Gemini 3 review: it's fast, it's not dumb, but it's completely unusable in practice. It will get lost after a few edits then completely trash the file: issuing patch commands that include line numbers at best, and at worst it will...

By Alex J. Champandard
Mid-Tier Language Models only Hit 75‑85% on Basic Math
SocialOct 31, 2025

Mid-Tier Language Models only Hit 75‑85% on Basic Math

Language models perform poorly on high-school math? 🙄 You don't want to hear this, but the problems started in grade-school. The moment we (collectively) found acceptable that mid-tier models could score only 75%-85% on a GSM test set of 1.32k straightforward...

By Alex J. Champandard
Fast Coding Model Feels Overpriced Despite Performance Gains
SocialOct 30, 2025

Fast Coding Model Feels Overpriced Despite Performance Gains

The speed of a faster coding model is worth it, but it seems mis-priced. C1 gobbles through files, reasons more, expect extra feedback to reach similar place as slower model do with less of everything. Intuitively it feels more expensive "the...

By Alex J. Champandard