Alex J. Champandard

Alex J. Champandard

Creator
0 followers

Building tools and teams where humans ≫ machines. AI, ML, research & development. co-Founded #CreativeAI #⚘

GPT‑5.5 Prototype Repeatedly Reward‑hacks, Exposing Research Claims
SocialApr 27, 2026

GPT‑5.5 Prototype Repeatedly Reward‑hacks, Exposing Research Claims

Having GPT 5.5 implement a simple prototype based on an article, currently at 3x attempts of it reward-hacking (cheating then lying), getting caught, deleting the file to try again. Whoever says they are solving long-horizon research with this should read the...

By Alex J. Champandard
GLM 5.1 Fails to Process Tool Output, Earlier Versions Work
SocialMar 28, 2026

GLM 5.1 Fails to Process Tool Output, Earlier Versions Work

@Zai_org In Cursor, GLM 5.1 completely fails to find/read any tool output, where GLM 4.7 and 5.0 works with the same endpoint -- just different model name. Not exactly sure what GLM 5.1 is doing differently, looks the same in IDE...

By Alex J. Champandard
Cursor’s Unlicensed K2.5 Model Signals Strategic Misstep
SocialMar 20, 2026

Cursor’s Unlicensed K2.5 Model Signals Strategic Misstep

A member of the Moonshot team claimed (in now deleted tweets) they confirmed Cursor's new model is based on K2.5 without a license. Hard to not see this as a massive strategic mistake of Cursor's management and the board. If you're...

By Alex J. Champandard
LLM‑Driven RL Environment Learns During Interaction
SocialMar 16, 2026

LLM‑Driven RL Environment Learns During Interaction

Currently writing an article on my work where the RL environment (or agent harness) learns back while the LLM interacts with it... If you have any references or pointers for releated work on this topic, I'd welcome it! https://t.co/DGUCi3PLpv

By Alex J. Champandard
GLM‑5 Shows Regression in Python Coding Tasks
SocialFeb 25, 2026

GLM‑5 Shows Regression in Python Coding Tasks

Alright, I'm calling it: GLM-5 is a regression from GLM 4.7 for Python coding. 🫠 Subscribed to Z(.)ai on the basis of 4.7 as it reliably took over all my devops too, and been using GLM 5 since launch. But with...

By Alex J. Champandard
2.6k‑parameter RL Model Matches 62B Baseline on GSM8k
SocialDec 15, 2025

2.6k‑parameter RL Model Matches 62B Baseline on GSM8k

Oh My! I just built a new RL-native system with 2.6k parameters that matches a specialized 62B model from a 2024 paper, specifically on GSM8k. That's over 20,000,000x smaller than the baseline — but the approach is so different the traditional...

By Alex J. Champandard
Intellect-3 Beats Larger Models with 34% Gain
SocialDec 1, 2025

Intellect-3 Beats Larger Models with 34% Gain

If the initial benchmarks scores (and graphs used for PR) showcased a 3x reduction in size for the same performance, I think the broader public reception would have been less tepid. Just looking at this, it just seems 7% behind other...

By Alex J. Champandard
Tiny 6k Model Replicates Behaviors of Massive Systems
SocialNov 27, 2025

Tiny 6k Model Replicates Behaviors of Massive Systems

A latent process that operates over plans? I've been working on this recently! What's most fascinating to me: my 6k parameter system can match aspects of models 100,000x bigger. Scaling laws apply very differently too... 🤔

By Alex J. Champandard
Opus 4.5 Mitigates Cursor AI Bugs, Not Solely Responsible
SocialNov 25, 2025

Opus 4.5 Mitigates Cursor AI Bugs, Not Solely Responsible

For Gemini 3, I don't rule out bugs in @cursor_ai — as many new features don't work, worktrees getting trashed or even renamed (!) mid-way through agent working. But since Opus 4.5 manages around those bugs, it can't...

By Alex J. Champandard
New Model Outperforms Gemini 3 with Greater Polish
SocialNov 25, 2025

New Model Outperforms Gemini 3 with Greater Polish

My verdict is that it's significantly better than Gemini 3. It's at least as smart and just got more polish to it. Alignment on little details also significantly higher. Gemini 3 gets many things mixed up after a half-dozen messages, and...

By Alex J. Champandard
Opus 4.5 Executes Tasks Seamlessly Beyond Token Limits
SocialNov 25, 2025

Opus 4.5 Executes Tasks Seamlessly Beyond Token Limits

With Opus 4.5, it seems you don't need to ask multiple times or ORDER it to do work, it just gets stuff done — even beyond 50% the token limit and after chat compaction! This kind of message is a thing...

By Alex J. Champandard
Benchmarks Mislead; Human Review Is the Real Bottleneck
SocialNov 20, 2025

Benchmarks Mislead; Human Review Is the Real Bottleneck

These kinds of benchmarks are misleading without a joint metric showing much work was necessary by humans after the fact. How much time to clean up that 2h42m of code? Style and architecture need to make sense, not just passing tests. That's...

By Alex J. Champandard
LLMs Lose Context After 100k Tokens, Need Frequent Resets
SocialNov 19, 2025

LLMs Lose Context After 100k Tokens, Need Frequent Resets

People working on basic code and reset their Agent chats every 4-5 replies I envy you. Having to work on deep context design work and at about 100k tokens, LLMs start to get lazy / confused. I resorted to giving them...

By Alex J. Champandard
Gemini 3: Fast but Unreliable, Files Get Corrupted
SocialNov 19, 2025

Gemini 3: Fast but Unreliable, Files Get Corrupted

Gemini 3 review: it's fast, it's not dumb, but it's completely unusable in practice. It will get lost after a few edits then completely trash the file: issuing patch commands that include line numbers at best, and at worst it will...

By Alex J. Champandard