
🤖 AI Agents Weekly: Claude Fable 5, Kimi K2.7-Code, NotebookLM Goes Agentic, DiffusionGemma, MiMo Code, and More
Anthropic unveiled Claude Fable 5, its first publicly available Mythos‑class model, positioning it as the most capable system the company has released for general use. The model achieved state‑of‑the‑art results on a wide range of benchmarks, excelling in long‑form, multi‑step reasoning and autonomous task execution. Notably, it compressed a 50‑million‑line code migration from two months of human effort into a single day, and completed the Pokémon FireRed game from raw screenshots without assistance. Pricing is set at $10 per million input tokens and $50 per million output tokens, with rollout through June 22.

🤖 AI Agents Weekly: Microsoft's Seven MAI Models, Gemma 4 12B, NVIDIA Nemotron 3 Ultra, Agents' Last Exam, Devin Desktop,...
Microsoft unveiled seven new in‑house MAI models, highlighted by the 35B MAI‑Thinking‑1 reasoning model that hit 97% on the AIME benchmark and outperformed Claude Sonnet 4.6 in early tests. All models were trained on commercially licensed data, avoiding third‑party distillation...

🤖 AI Agents Weekly: Claude Opus 4.8, Claude Code Dynamic Workflows, Chrome DevTools for Agents 1.0, DeepSWE, Agent Harness Scaling...
Anthropic unveiled Claude Opus 4.8, an incremental upgrade that boosts agentic judgment, self‑correction, and honesty while keeping token pricing unchanged. The model hits 84% on the Online‑Mind2Web benchmark and is four times less likely to miss code flaws than its 4.7...

🤖 AI Agents Weekly: Gemini 3.5 Flash, Antigravity 2.0, Codex Thursday, Cohere Command A+, Qwen3.7-Max, and More
Google unveiled Gemini 3.5 Flash, a frontier model tuned for AI agents and coding, alongside Managed Agents that spin up an isolated Linux sandbox with each API call. Flash achieved 76.2% on Terminal‑Bench 2.1, 83.6% on MCP Atlas, and 1656...

🤖 AI Agents Weekly: Thinking Machines Interaction Models, Is Grep All You Need?, Codex Mobile + Hooks, Cursor Cloud Agents,...
Thinking Machines Lab unveiled its first interaction model, a 276 billion‑parameter mixture‑of‑experts system that processes audio, video and text in continuous 200 ms micro‑turns. The model achieved a 77.8 score on the new FD‑bench v1.5, outpacing competing turn‑based agents. A separate study,...

🤖 AI Agents Weekly: Meta FAIR Autodata, ZAYA1-8B, SubQ 12M Context, Natural Language Autoencoders, Claude Managed Agents Dreaming, and More
Meta FAIR unveiled Autodata, an agentic data‑scientist that autonomously creates, critiques, and refines training and evaluation datasets. Using a planner‑executor self‑instruct loop, the system replaces static seed sets with a continuously hardening data pipeline. In a computer‑science research QA benchmark,...

🤖 AI Agents Weekly: Codex for Everyday Work, Cursor SDK, Mistral Workflows, LLM Knowledge Bases, Agentic Harness Engineering, and More
OpenAI has expanded its Codex agent from a pure coding tool into a general‑purpose work assistant. The new version offers role‑based onboarding for finance, marketing, data‑science and operations teams, and integrates directly with Google Docs, Sheets and Slides. Codex’s computer‑use...

🤖 AI Agents Weekly: GPT-5.5, DeepSeek-V4 Preview, Kimi K2.6 Agent Swarm, Diversity Collapse, Sakana Fugu, and More
OpenAI unveiled GPT-5.5, a new class of model built for agentic work, emphasizing multi‑step planning, tool use, and self‑verification. The model delivers the biggest performance jumps in coding, computer‑use tasks, knowledge work, and early scientific research. GPT-5.5 now powers ChatGPT...

🤖 AI Agents Weekly: Claude Opus 4.7, Codex Everywhere, Claude Design, Windsurf 2.0, Qwen3.6-35B-A3B, AiScientist, and More
Anthropic unveiled Claude Opus 4.7, its most capable Opus model to date, optimized for long‑running, agentic workflows. The upgrade adds self‑verification, higher‑resolution vision capabilities, and a new xhigh effort level for finer latency‑quality control. Developers gain beta task‑budget tools to...

🤖 AI Agents Weekly: Claude Managed Agents, Muse Spark, Project Glasswing, Advisor Strategy, GLM-5.1, Memento, and More
Anthropic has opened Claude Managed Agents to the public in beta, delivering a suite of composable APIs that let developers launch cloud‑hosted AI agents in days rather than months. The platform provides production‑grade sandboxing, secure tool orchestration, and persistent state,...

🤖 AI Agents Weekly: Claude Code Review, AutoHarness, Perplexity Personal Computer, Cloudflare /Crawl, Context7 CLI, and More
Anthropic unveiled Claude Code Review, a multi‑agent system that simultaneously scans, verifies, and prioritizes pull‑request issues, delivering both summary comments and inline annotations. The service flags problems in 84% of large PRs, averaging 7.5 bugs per review, with less than...
