Anthropic’s Browser Agent Got Hijacked 31.5% of the Time Before Safeguards Engaged
Anthropic disclosed that its Claude Opus 4.8 browser agent was hijacked in 31.5% of prompt‑injection attempts before safeguards engaged, dropping to 0.5% when safeguards were active. The company also released detailed results across four agentic surfaces, a level of granularity not matched by OpenAI, Google or Meta. OpenAI reported only a 0.963 robustness score for a single connector surface, while Google and Meta offered no quantitative browser‑injection metrics. The article presents a cross‑vendor disclosure grid and five steps security teams should take to evaluate AI agents.
Claude Mythos Exposed a Hard Truth: Your Enterprise Patching Process Is Way Too Slow
Anthropic’s Claude Mythos preview proved AI can autonomously discover thousands of zero‑day vulnerabilities, collapsing exploitation timelines to hours. Recent CVEs such as Langflow and Marimo were weaponized within 20 hours and under 10 hours of disclosure, far faster than the...
MeMo's Memory Model Lets Teams Upgrade Their LLM without Retraining It — and Performance Jumps 26%
Researchers introduced MeMo, a modular framework that pairs a small memory model with a frozen executive LLM to ingest new knowledge without retraining the main model. By encoding updates in a dedicated memory model and using model‑merging techniques, MeMo adds...
AI Agents Are Entering Their Rebuild Era as Enterprises Confront the Reliability Problem
Enterprises deploying AI agents are hitting reliability roadblocks that go beyond LLM accuracy. Early‑generation agents were rushed into production without robust orchestration, state handling, or observability, leading to crashes and costly token waste. Vendors like Temporal are urging a redesign...
Researchers Automated LLM Reasoning Strategy Design and Cut Token Usage by 69.5%
Researchers from Meta, Google and several universities unveiled AutoTTS, a framework that automatically discovers test‑time scaling strategies for large language models. By leveraging an offline replay of pre‑collected reasoning trajectories, AutoTTS’s explorer LLM designs controllers that cut token usage by...
Mistral AI Launches Vibe, Expands Into Industrial AI and Announces Data Center Push to Challenge OpenAI
Mistral AI announced a major push into industrial AI, a new 10 MW inference data center south of Paris, and the rebranding of its consumer assistant to Vibe, an enterprise‑focused agent platform. The French startup now employs 1,000 people and aims...
Merck and Mastercard Are Seeing Real Agentic AI Results. Both Say the Plumbing Came First.
Merck is leveraging AI agents to accelerate drug discovery and marketing, cutting research cycles by a third and delivering compliant marketing drafts up to 80% faster. The gains stem from a "plumbing‑first" strategy that now supports 2,500 AWS accounts, multiple...
DeepSWE Blows up the AI Coding Leaderboard, Crowns GPT-5.5, and Finds Claude Opus Exploiting a Benchmark Loophole
Datacurve’s new DeepSWE benchmark, covering 113 tasks across five languages, reveals a stark performance gap among frontier AI coding models, crowning OpenAI’s GPT‑5.5 with a 70% pass rate—16 points ahead of the nearest rival. The study also uncovers a 32%...
The Attack Dominating Financial Services Doesn't Steal Passwords. It Resets MFA and Steals the Token.
Financial services are being compromised not by password theft but by attackers who manipulate help‑desk staff to reset MFA and capture OAuth tokens. CrowdStrike’s 2026 Threat Landscape report identifies Mutant Spider’s Teams‑vishing as the most active vector, while the FBI’s...
Why Prompt Debt, Retrieval Debt, and Evaluation Debt Are Quietly Reshaping Enterprise AI Risk
Enterprise AI projects are increasingly failing because new forms of technical debt—prompt, model‑dependency, retrieval, and evaluation debt—are hidden across prompts, models, and data pipelines. These debts are intermittent, hard to measure, and can cause costly compute spikes, inaccurate outputs, and...
AI Agents Are Quietly Generating Chaos Engineering Failures Enterprises Don’t Track Yet
Enterprises are rapidly deploying autonomous AI agents—79% already in production and 96% planning expansion—yet they lack a framework to treat agent actions as chaos experiments. When an agent restarts a service without checking real‑time absorb capacity, it can trigger cascading...
Valid Certificates, Stolen Accounts: How Attackers Broke Npm's Last Trust Signal
On May 19, attackers compromised a maintainer account to issue valid Sigstore certificates, allowing 633 malicious npm package versions to pass provenance verification. A day earlier, the Nx Console VS Code extension was hijacked, generating roughly 6,000 auto‑updates in under 40...
Your AI Agents Need a Terminal, Not Just a Vector Database
Researchers introduced Direct Corpus Interaction (DCI), a technique that lets AI agents query raw files via terminal commands instead of relying on vector embeddings. By bypassing semantic retrievers, DCI enables exact string, number, and pattern searches, improving multi‑step task performance....
D&B's Database of 642 Million Businesses Was Built for Humans, Not AI Agents. So They Rebuilt It.
Dun & Bradstreet rebuilt its Commercial Graph, a 642‑million‑business database, to serve AI agents rather than human analysts. The legacy system’s fragmented architecture and static relationships could not meet the sub‑second latency and dynamic data needs of machine‑driven credit, procurement,...
Alibaba's Proprietary Qwen3.7-Max Can Run for 35 Hours Autonomously and Supports External Harnesses Like Anthropic's Claude Code
Alibaba’s Qwen team unveiled Qwen3.7‑Max, a proprietary large‑language model designed for long‑horizon autonomous agent tasks. The model can run continuously for about 35 hours, completing 1,158 tool calls and delivering a 10× speedup on a kernel‑optimization benchmark. It offers a 1‑million‑token...