
Artificial Analysis released version 2.0 of its AA‑WER speech‑to‑text benchmark, ranking ElevenLabs' Scribe v2 as the most accurate model with a 2.3 % word error rate. Google’s Gemini 3 Pro follows at 2.9 % and Mistral’s Voxtral Small at 3.0 %, while OpenAI’s Whisper Large v3 sits at 4.2 %. In the specialized AA‑AgentTalk test for voice‑assistant queries, Scribe v2 and Gemini 3 Pro again lead with error rates of 1.6 % and 1.7 % respectively. The results highlight rapid gains in multimodal AI transcription without dedicated training.

Researchers from the University of Maryland and MBZUAI conducted the first large‑scale study of Moltbook, a Reddit‑style platform populated solely by over 2.6 million autonomous LLM agents. Analyzing 290 000 posts and 1.8 million comments, they found the AI community to be socially...

OpenAI signed a Pentagon contract within hours of Anthropic being barred from federal use, agreeing to provide its models for “all lawful purposes” while drawing three red‑line restrictions on domestic mass surveillance, autonomous weapons, and high‑risk automated decisions. The agreement’s...

Researchers led by Philippe Laban evaluated frontier large language models from GPT‑5 onward across six diverse tasks and found that spreading a request over multiple conversation turns reduces accuracy by up to 33 %. While newer models shrink the degradation from...

Researchers from Apple, Stanford, and the University of Washington discovered that the choice of HTML extraction tool dramatically influences which web pages enter large language model training sets. Their analysis of three popular extractors—Resiliparse, Trafilatura, and JusText—found that only 39%...

Claude Code introduced an auto‑memory feature that automatically records debugging patterns, project context, and user preferences in a per‑project MEMORY.md file. The system recalls these details in subsequent sessions, eliminating the need for manual logging or the /init command. The...

Suno, the AI‑generated music platform, has reached $300 million in annualized revenue and 2 million paying subscribers in under two years. Investor C.C. Gong publicly said she shifted most of her listening from Spotify to Suno, claiming AI music offers a personalized, infinite...

Anthropic announced that Claude can now switch autonomously between Excel and PowerPoint, allowing users to run data analyses and instantly generate presentation decks. The capability is released as a research preview on all paid plans. At the same time, Anthropic...

Inception Labs unveiled Mercury 2, the first diffusion‑based language reasoning model, claiming dramatic speed and cost advantages over leading models. The model generates 1,009 tokens per second with 1.7‑second end‑to‑end latency, beating Gemini 3 Flash and Claude Haiku on latency while delivering comparable benchmark...

DeepMind researchers propose an "intelligent AI delegation" framework to govern how autonomous AI agents assign tasks to each other and to humans. The model adapts organizational theory, treating AI delegation as a principal‑agent problem and emphasizing verifiable outcomes, decentralized smart‑contract...

OpenAI released two API upgrades for developers: the gpt‑realtime‑1.5 model enhances voice command reliability, delivering roughly a ten‑percent boost in number and letter transcription, a five‑percent lift in logical audio tasks, and a seven‑percent improvement in instruction following. The audio...

Anthropic has uncovered a coordinated distillation attack by three Chinese AI labs—Deepseek, Moonshot AI, and MiniMax—targeting its Claude model. Over 24,000 fabricated accounts generated more than 16 million queries to extract reasoning, programming, and tool‑usage capabilities. The labs employed proxy services...

OpenAI announced that the SWE‑bench Verified coding benchmark has lost its credibility, citing that roughly 59.4% of its tasks are flawed and enforce overly specific implementation details. The company also highlighted data contamination, noting that leading models such as GPT‑5.2,...

Newsguard evaluated the audio output of OpenAI’s ChatGPT Voice, Google’s Gemini Live, and Amazon’s Alexa+ by feeding each bot 20 false claims across health, politics, and world news. In neutral prompts, ChatGPT and Gemini reproduced falsehoods about 22‑23 percent of the...

OpenAI disclosed that it is preparing for an initial public offering in the fourth quarter of 2026, targeting a valuation of $830 billion and a raise of over $100 billion. The startup is in informal talks with Wall Street banks and has...

Baidu's AI chip division Kunlunxin has confidentially filed for an IPO in Hong Kong, submitting its application on Jan 1. A recent financing round values the unit at roughly $3 billion, though the final offering size remains undetermined. The filing adds Kunlunxin...

Chinese AI startup Moonshot AI announced a $500 million Series C round that values the company at $4.3 billion. The round was led by IDG with $150 million and included Alibaba, Tencent and individual investor Wang Huiwen. The capital will fund Kimi‑K3 development and...