Alex J. Champandard

Creator

0 followers

Building tools and teams where humans ≫ machines. AI, ML, research & development. co-Founded #CreativeAI #⚘

Social•May 22, 2026

GLM 5.1 Leads Open-Source Coding Models, Beats K2.

This graph confirms what I also found: GLM 5.1 is the best open model for coding right now. Outperforms K2.6 here and also much better than Composer 2.5 for me. (Excited about the Qwen3.7 open models and hoping for the full suite incl. 0.8B / 2B!)

By Alex J. Champandard

Social•Apr 27, 2026

GPT‑5.5 Prototype Repeatedly Reward‑hacks, Exposing Research Claims

Having GPT 5.5 implement a simple prototype based on an article, currently at 3x attempts of it reward-hacking (cheating then lying), getting caught, deleting the file to try again. Whoever says they are solving long-horizon research with this should read the...

By Alex J. Champandard

Social•Mar 28, 2026

GLM 5.1 Fails to Process Tool Output, Earlier Versions Work

@Zai_org In Cursor, GLM 5.1 completely fails to find/read any tool output, where GLM 4.7 and 5.0 works with the same endpoint -- just different model name. Not exactly sure what GLM 5.1 is doing differently, looks the same in IDE...

By Alex J. Champandard

Social•Mar 20, 2026

Cursor’s Unlicensed K2.5 Model Signals Strategic Misstep

A member of the Moonshot team claimed (in now deleted tweets) they confirmed Cursor's new model is based on K2.5 without a license. Hard to not see this as a massive strategic mistake of Cursor's management and the board. If you're...

By Alex J. Champandard

Social•Mar 16, 2026

LLM‑Driven RL Environment Learns During Interaction

Currently writing an article on my work where the RL environment (or agent harness) learns back while the LLM interacts with it... If you have any references or pointers for releated work on this topic, I'd welcome it! https://t.co/DGUCi3PLpv

By Alex J. Champandard

Social•Feb 25, 2026

GLM‑5 Shows Regression in Python Coding Tasks

Alright, I'm calling it: GLM-5 is a regression from GLM 4.7 for Python coding. 🫠 Subscribed to Z(.)ai on the basis of 4.7 as it reliably took over all my devops too, and been using GLM 5 since launch. But with...

By Alex J. Champandard

Social•Dec 15, 2025

2.6k‑parameter RL Model Matches 62B Baseline on GSM8k

Oh My! I just built a new RL-native system with 2.6k parameters that matches a specialized 62B model from a 2024 paper, specifically on GSM8k. That's over 20,000,000x smaller than the baseline — but the approach is so different the traditional...

By Alex J. Champandard

Social•Dec 1, 2025

Intellect-3 Beats Larger Models with 34% Gain

If the initial benchmarks scores (and graphs used for PR) showcased a 3x reduction in size for the same performance, I think the broader public reception would have been less tepid. Just looking at this, it just seems 7% behind other...

By Alex J. Champandard

Social•Nov 27, 2025

Tiny 6k Model Replicates Behaviors of Massive Systems

A latent process that operates over plans? I've been working on this recently! What's most fascinating to me: my 6k parameter system can match aspects of models 100,000x bigger. Scaling laws apply very differently too... 🤔

By Alex J. Champandard

Social•Nov 25, 2025

Opus 4.5 Mitigates Cursor AI Bugs, Not Solely Responsible

For Gemini 3, I don't rule out bugs in @cursor_ai — as many new features don't work, worktrees getting trashed or even renamed (!) mid-way through agent working. But since Opus 4.5 manages around those bugs, it can't...

By Alex J. Champandard

Social•Nov 25, 2025

New Model Outperforms Gemini 3 with Greater Polish

My verdict is that it's significantly better than Gemini 3. It's at least as smart and just got more polish to it. Alignment on little details also significantly higher. Gemini 3 gets many things mixed up after a half-dozen messages, and...

By Alex J. Champandard

Social•Nov 25, 2025

Opus 4.5 Executes Tasks Seamlessly Beyond Token Limits

With Opus 4.5, it seems you don't need to ask multiple times or ORDER it to do work, it just gets stuff done — even beyond 50% the token limit and after chat compaction! This kind of message is a thing...

By Alex J. Champandard

Social•Nov 20, 2025

Benchmarks Mislead; Human Review Is the Real Bottleneck

These kinds of benchmarks are misleading without a joint metric showing much work was necessary by humans after the fact. How much time to clean up that 2h42m of code? Style and architecture need to make sense, not just passing tests. That's...

By Alex J. Champandard

Social•Nov 19, 2025

LLMs Lose Context After 100k Tokens, Need Frequent Resets

People working on basic code and reset their Agent chats every 4-5 replies I envy you. Having to work on deep context design work and at about 100k tokens, LLMs start to get lazy / confused. I resorted to giving them...

By Alex J. Champandard

Social•Nov 19, 2025

Gemini 3: Fast but Unreliable, Files Get Corrupted

Gemini 3 review: it's fast, it's not dumb, but it's completely unusable in practice. It will get lost after a few edits then completely trash the file: issuing patch commands that include line numbers at best, and at worst it will...

By Alex J. Champandard

Alex J. Champandard

GLM 5.1 Leads Open-Source Coding Models, Beats K2.

GPT‑5.5 Prototype Repeatedly Reward‑hacks, Exposing Research Claims

GLM 5.1 Fails to Process Tool Output, Earlier Versions Work

Cursor’s Unlicensed K2.5 Model Signals Strategic Misstep

LLM‑Driven RL Environment Learns During Interaction

GLM‑5 Shows Regression in Python Coding Tasks

2.6k‑parameter RL Model Matches 62B Baseline on GSM8k

Intellect-3 Beats Larger Models with 34% Gain

Tiny 6k Model Replicates Behaviors of Massive Systems

Opus 4.5 Mitigates Cursor AI Bugs, Not Solely Responsible

New Model Outperforms Gemini 3 with Greater Polish

Opus 4.5 Executes Tasks Seamlessly Beyond Token Limits

Benchmarks Mislead; Human Review Is the Real Bottleneck

LLMs Lose Context After 100k Tokens, Need Frequent Resets

Gemini 3: Fast but Unreliable, Files Get Corrupted

Technology Pulse

GLM 5.1 Leads Open-Source Coding Models, Beats K2.

Opus 4.5 Mitigates Cursor AI Bugs, Not Solely Responsible

New Model Outperforms Gemini 3 with Greater Polish

Opus 4.5 Executes Tasks Seamlessly Beyond Token Limits

Gemini 3: Fast but Unreliable, Files Get Corrupted