The New Stack - Latest News and Information

All News Deals Social Blogs Videos Podcasts Digests

The New Stack

Publication

0 followers

DevOps, open source, and cloud native news with resources and insights for developers

Why GPT-5.4, Claude, and Gemini Can’t Agree on Basic, Real-World Facts

News•May 30, 2026

Why GPT-5.4, Claude, and Gemini Can’t Agree on Basic, Real-World Facts

A Lenz analysis of 1,000 real‑user fact‑check claims found that five leading LLMs—GPT‑5.4, Claude Opus 4.7, Gemini 3 Pro (with and without Search), and Sonar Pro—disagreed on 67% of the items. The split includes 34% of claims with substantial disagreement and 21% where models gave opposite True versus False verdicts. The study highlights that middle‑ground categories (Mostly True, Misleading) are used unevenly, with Gemini assigning only 6% of claims to those buckets versus 45% for Claude Opus. Researchers plan follow‑up work with human‑labelled data to map systematic divergence.

By The New Stack

The Fix for Soaring AI Cloud Bills Exists — so Why Won’t We Trust It?

News•May 29, 2026

The Fix for Soaring AI Cloud Bills Exists — so Why Won’t We Trust It?

CloudBolt’s COO Yasmin Rajabi warns that a trust gap is stalling automated right‑sizing for AI‑heavy Kubernetes workloads, even as cloud bills surge. While 89% of organizations say cost‑optimization is a priority, 71% of engineers still demand human review and only...

By The New Stack

Why AWS Scrapped OpenSearch’s Architecture to Chase Agent Workloads

News•May 28, 2026

Why AWS Scrapped OpenSearch’s Architecture to Chase Agent Workloads

AWS has rebuilt its managed OpenSearch Serverless service, separating storage from compute to create a truly serverless platform that can shrink to zero when idle. The new architecture promises up to 60% cost savings versus peak‑capacity provisioned clusters and an...

By The New Stack

Claude Opus 4.8 Is Here: Effort Controls, Dynamic Workflows, Cheaper Fast Mode, Better Honesty, Less Deception

News•May 28, 2026

Claude Opus 4.8 Is Here: Effort Controls, Dynamic Workflows, Cheaper Fast Mode, Better Honesty, Less Deception

Anthropic launched Claude Opus 4.8, adding effort controls, dynamic workflow capabilities, and a faster, three‑times‑cheaper fast mode. The model claims higher honesty, lower deception, and stronger alignment with user interests. Benchmarks show Opus 4.8 outperforming its predecessor and edging out...

By The New Stack

Claw-Style AI Agents Are Coming to the Enterprise. The Governance Infrastructure Is Still Catching Up.

News•May 28, 2026

Claw-Style AI Agents Are Coming to the Enterprise. The Governance Infrastructure Is Still Catching Up.

Automation Anywhere unveiled EnterpriseClaw, a platform that wraps Nvidia’s OpenShell runtime into a governed, "claw‑style" AI agent capable of device‑level access, dynamic tool creation, and screen interaction. The solution integrates security from Cisco, identity management from Okta, and GPT‑5.5 from...

By The New Stack

“There Is No Accountability”: AI Coding Agents Are Installing Packages No One Owns

News•May 27, 2026

“There Is No Accountability”: AI Coding Agents Are Installing Packages No One Owns

AI coding assistants such as GitHub Copilot, Claude Code and Cursor are increasingly installing packages and dependencies without clear ownership, creating a security accountability gap across enterprises. Aikido Security’s new Endpoint product monitors and blocks unknown installations for a configurable...

By The New Stack

Who’s Monitoring the Agents?

News•May 24, 2026

Who’s Monitoring the Agents?

AI agent frameworks such as CrewAI, AutoGen and LangGraph have moved from demos to production, powering incident response, internal copilots and automation pipelines. While composition is now easy, operators lack the visibility needed to monitor these multi‑agent systems at scale....

By The New Stack

What ClickHouse Learned From a Year of Coding with AI Agents

News•May 24, 2026

What ClickHouse Learned From a Year of Coding with AI Agents

ClickHouse spent 2025‑2026 integrating AI coding agents into its massive C++ codebase, moving from occasional chat‑based snippets to daily CLI‑embedded assistance after Anthropic’s Claude Opus 4.5 arrived. The team identified three maturity levels—copy‑paste, IDE agents, and autonomous loops—and found agents excel...

By The New Stack

I Buried 20 Problems in a Fake P&L to See if Claude for Small Business Could Find Them

News•May 22, 2026

I Buried 20 Problems in a Fake P&L to See if Claude for Small Business Could Find Them

Anthropic unveiled Claude for Small Business, embedding native connectors to tools such as QuickBooks, HubSpot, Canva, and Google Workspace. In a hands‑on test, the AI scanned a fabricated seven‑month P&L with twenty deliberately hidden issues and produced an executive summary,...

By The New Stack

Why Enterprise AI Keeps Stalling — and How Data Streaming Could Unlock It

News•May 22, 2026

Why Enterprise AI Keeps Stalling — and How Data Streaming Could Unlock It

Enterprise AI projects are hitting a wall not because of model quality but due to fragmented, batch‑oriented data infrastructures. Confluent launched Confluent Intelligence and new Cloud capabilities in London on May 19 to make real‑time data streaming the secure foundation for...

By The New Stack

Why Six AI Labs Built the Same Product for Knowledge Workers in Four Months

News•May 20, 2026

Why Six AI Labs Built the Same Product for Knowledge Workers in Four Months

In the first four months of 2026, six AI labs released remarkably similar agents aimed at knowledge workers. Anthropic’s Claude Cowork debuted in January, prompting Perplexity, Microsoft, OpenAI, Google, and Amazon to launch their own orchestrators by late April. The products...

By The New Stack

LLMs Were Trained on an Inaccessible Web — AudioEye Data Shows AI Is Still Building One

News•May 20, 2026

LLMs Were Trained on an Inaccessible Web — AudioEye Data Shows AI Is Still Building One

Large language models (LLMs) are generating web code that inherits the accessibility flaws of the public web they were trained on, according to AudioEye’s chief accessibility officer Mike Paciello. The 2026 WebAIM Million report shows 95.9% of the top‑million homepages...

By The New Stack

Google Launches $100 AI Ultra Plan and Cuts Top Tier to $200

News•May 19, 2026

Google Launches $100 AI Ultra Plan and Cuts Top Tier to $200

Google introduced a $100‑per‑month AI Ultra subscription that slots between its existing $20 and $200 plans, and it switched to a compute‑used metering model that charges based on token complexity rather than daily prompt limits. At the same time, the...

By The New Stack

Google Now Lets Developers Use GPT and Claude in Android Studio

News•May 19, 2026

Google Now Lets Developers Use GPT and Claude in Android Studio

Google announced at I/O that Android Studio now lets developers pick from Gemini, OpenAI’s GPT, or Anthropic’s Claude for AI‑assisted coding, and adds local Gemma 4 model download directly in the IDE. The Android CLI has been promoted to a stable...

By The New Stack

Google Wants to Make the Web Agent-Ready

News•May 19, 2026

Google Wants to Make the Web Agent-Ready

At Google I/O 2026, the company unveiled a suite of tools to make the web "agent‑ready," centering on the new WebMCP standard that lets AI agents invoke JavaScript functions and HTML forms directly. Chrome 149 beta will host an origin...

By The New Stack

The New Stack | Pulse