The New Stack
DevOps, open source, and cloud native news with resources and insights for developers
Why GPT-5.4, Claude, and Gemini Can’t Agree on Basic, Real-World Facts
A Lenz analysis of 1,000 real‑user fact‑check claims found that five leading LLMs—GPT‑5.4, Claude Opus 4.7, Gemini 3 Pro (with and without Search), and Sonar Pro—disagreed on 67% of the items. The split includes 34% of claims with substantial disagreement and 21% where models gave opposite True versus False verdicts. The study highlights that middle‑ground categories (Mostly True, Misleading) are used unevenly, with Gemini assigning only 6% of claims to those buckets versus 45% for Claude Opus. Researchers plan follow‑up work with human‑labelled data to map systematic divergence.
The Fix for Soaring AI Cloud Bills Exists — so Why Won’t We Trust It?
CloudBolt’s COO Yasmin Rajabi warns that a trust gap is stalling automated right‑sizing for AI‑heavy Kubernetes workloads, even as cloud bills surge. While 89% of organizations say cost‑optimization is a priority, 71% of engineers still demand human review and only...
Why AWS Scrapped OpenSearch’s Architecture to Chase Agent Workloads
AWS has rebuilt its managed OpenSearch Serverless service, separating storage from compute to create a truly serverless platform that can shrink to zero when idle. The new architecture promises up to 60% cost savings versus peak‑capacity provisioned clusters and an...
Claude Opus 4.8 Is Here: Effort Controls, Dynamic Workflows, Cheaper Fast Mode, Better Honesty, Less Deception
Anthropic launched Claude Opus 4.8, adding effort controls, dynamic workflow capabilities, and a faster, three‑times‑cheaper fast mode. The model claims higher honesty, lower deception, and stronger alignment with user interests. Benchmarks show Opus 4.8 outperforming its predecessor and edging out...
Claw-Style AI Agents Are Coming to the Enterprise. The Governance Infrastructure Is Still Catching Up.
Automation Anywhere unveiled EnterpriseClaw, a platform that wraps Nvidia’s OpenShell runtime into a governed, "claw‑style" AI agent capable of device‑level access, dynamic tool creation, and screen interaction. The solution integrates security from Cisco, identity management from Okta, and GPT‑5.5 from...
“There Is No Accountability”: AI Coding Agents Are Installing Packages No One Owns
AI coding assistants such as GitHub Copilot, Claude Code and Cursor are increasingly installing packages and dependencies without clear ownership, creating a security accountability gap across enterprises. Aikido Security’s new Endpoint product monitors and blocks unknown installations for a configurable...
Who’s Monitoring the Agents?
AI agent frameworks such as CrewAI, AutoGen and LangGraph have moved from demos to production, powering incident response, internal copilots and automation pipelines. While composition is now easy, operators lack the visibility needed to monitor these multi‑agent systems at scale....
What ClickHouse Learned From a Year of Coding with AI Agents
ClickHouse spent 2025‑2026 integrating AI coding agents into its massive C++ codebase, moving from occasional chat‑based snippets to daily CLI‑embedded assistance after Anthropic’s Claude Opus 4.5 arrived. The team identified three maturity levels—copy‑paste, IDE agents, and autonomous loops—and found agents excel...
I Buried 20 Problems in a Fake P&L to See if Claude for Small Business Could Find Them
Anthropic unveiled Claude for Small Business, embedding native connectors to tools such as QuickBooks, HubSpot, Canva, and Google Workspace. In a hands‑on test, the AI scanned a fabricated seven‑month P&L with twenty deliberately hidden issues and produced an executive summary,...
Why Enterprise AI Keeps Stalling — and How Data Streaming Could Unlock It
Enterprise AI projects are hitting a wall not because of model quality but due to fragmented, batch‑oriented data infrastructures. Confluent launched Confluent Intelligence and new Cloud capabilities in London on May 19 to make real‑time data streaming the secure foundation for...
Why Six AI Labs Built the Same Product for Knowledge Workers in Four Months
In the first four months of 2026, six AI labs released remarkably similar agents aimed at knowledge workers. Anthropic’s Claude Cowork debuted in January, prompting Perplexity, Microsoft, OpenAI, Google, and Amazon to launch their own orchestrators by late April. The products...
LLMs Were Trained on an Inaccessible Web — AudioEye Data Shows AI Is Still Building One
Large language models (LLMs) are generating web code that inherits the accessibility flaws of the public web they were trained on, according to AudioEye’s chief accessibility officer Mike Paciello. The 2026 WebAIM Million report shows 95.9% of the top‑million homepages...
Google Launches $100 AI Ultra Plan and Cuts Top Tier to $200
Google introduced a $100‑per‑month AI Ultra subscription that slots between its existing $20 and $200 plans, and it switched to a compute‑used metering model that charges based on token complexity rather than daily prompt limits. At the same time, the...
Google Now Lets Developers Use GPT and Claude in Android Studio
Google announced at I/O that Android Studio now lets developers pick from Gemini, OpenAI’s GPT, or Anthropic’s Claude for AI‑assisted coding, and adds local Gemma 4 model download directly in the IDE. The Android CLI has been promoted to a stable...
Google Wants to Make the Web Agent-Ready
At Google I/O 2026, the company unveiled a suite of tools to make the web "agent‑ready," centering on the new WebMCP standard that lets AI agents invoke JavaScript functions and HTML forms directly. Chrome 149 beta will host an origin...