OpenAI Now Accepting ChatGPT App Submissions From Third-Party Devs, Launches App Directory
OpenAI announced that third‑party developers can now submit apps for inclusion in a new ChatGPT App Directory, accessible from the sidebar and chat interface. The submission process went live on December 17, with approved apps slated to roll out to users beginning early 2026. The directory expands the ecosystem beyond the original pilot partners, offering interactive UI elements and a broader range of services. OpenAI also outlined initial monetization limits to physical‑goods purchases and detailed privacy and review requirements for developers.
Enterprise AI Coding Grows Teeth: GPT‑5.2‑Codex Weaves Security Into Large-Scale Software Refactors
OpenAI released GPT‑5.2‑Codex, an agentic coding model built on GPT‑5.2 with enhanced cybersecurity capabilities. The model achieved top scores on Capture‑the‑Flag, CVE‑Bench (87%) and a 72.7% pass rate on Cyber Range tests, demonstrating improved long‑horizon code understanding. Enterprise users can...
JP Morgan’s AI Adoption Hit 50% of Employees. The Secret? A Connectivity-First Architecture
JPMorgan Chase rolled out an internal LLM‑powered assistant suite two‑and‑a‑half years ago, and adoption surged to over 60% of its 250,000‑plus workforce without mandates. The rapid, organic uptake stemmed from a connectivity‑first architecture that embeds AI into existing data, CRM,...
AI Agents Fail 63% of the Time on Complex Tasks. Patronus AI Says Its New 'Living' Training Worlds Can Fix...
Patronus AI, backed by $20 million, unveiled Generative Simulators—a dynamic training architecture that creates adaptive, continuously evolving environments for AI agents. The platform aims to replace static benchmarks, which have struggled to predict real‑world performance, by generating on‑the‑fly challenges and feedback....
AI Is Moving to the Edge – and Network Security Needs to Catch Up
Small and mid‑size businesses are rapidly deploying AI at the edge, moving workloads from centralized data centers to retail stores, clinics, and remote sites. This shift delivers real‑time insights, resilience, and faster deployment but strains network bandwidth and security controls....
Zoom Says It Aced AI’s Hardest Exam. Critics Say It Copied Off Its Neighbors.
Zoom announced that its federated AI system achieved a 48.1% score on the Humanity's Last Exam, surpassing Google’s Gemini 3 Pro benchmark. The approach routes queries to multiple external models and selects the best output via a proprietary Z‑scorer. Critics...
With 91% Accuracy, Open Source Hindsight Agentic Memory Provides 20/20 Vision for AI Agents Stuck on Failing RAG
Vectorize.io’s open‑source Hindsight memory architecture outperforms traditional retrieval‑augmented generation (RAG) by organizing agent knowledge into four specialized networks. The system achieved a record 91.4% accuracy on the LongMemEval benchmark, dramatically boosting multi‑session recall, temporal reasoning, and knowledge‑update scores. Hindsight’s TEMPR...
Echo Raises $35M to Secure the Enterprise Cloud's Base Layer — Container Images — with Autonomous AI Agents
Israeli startup Echo raised $35 million Series A to overhaul container base images, the hidden OS layer of cloud workloads, with a secure‑by‑design approach. The company rebuilds images from source, hardens them to SLSA Level 3, and uses autonomous AI agents to monitor and...
Zencoder Drops Zenflow, a Free AI Orchestration Tool that Pits Claude Against OpenAI’s Models to Catch Coding Errors
Zencoder unveiled Zenflow, a free desktop AI orchestration tool that coordinates multiple AI agents—such as Claude and OpenAI models—to plan, implement, test, and review code in structured workflows. The platform replaces ad‑hoc prompting with repeatable sequences, spec‑driven development, multi‑agent verification,...
Korean AI Startup Motif Reveals 4 Big Lessons for Training Enterprise LLMs
Korean startup Motif Technologies released Motif-2-12.7B-Reasoning, an open‑weight model that outperforms many larger U.S. and European counterparts on benchmark tests. The company also published a reproducible training recipe that isolates the real drivers of reasoning performance in enterprise LLMs. Four...
Bolmo’s Architecture Unlocks Efficient Byte‑level LM Training without Sacrificing Quality
The Allen Institute for AI unveiled Bolmo, a family of open‑source byte‑level language models (7B and 1B) built by "bytefying" its Olmo 3 architecture. By operating directly on raw UTF‑8 bytes, Bolmo eliminates the need for tokenizers, improving robustness to misspellings,...
Why Agentic AI Needs a New Category of Customer Data
Twilio argues that the data infrastructure behind most enterprises was built for batch‑oriented marketing, not the millisecond‑level, context‑rich interactions demanded by agentic AI. Conversational AI needs a new category of customer data—real‑time conversational memory that captures tone, intent, and sentiment...
Ai2's New Olmo 3.1 Extends Reinforcement Learning Training for Stronger Reasoning Benchmarks
The Allen Institute for AI unveiled Olmo 3.1, an upgraded 32‑billion‑parameter family that extends the original Olmo 3 models through an additional 21‑day reinforcement‑learning run on 224 GPUs. The Think 32B variant shows 5‑plus point gains on the AIME math benchmark and strong...
Marble Enters the Race to Bring AI to Tax Work, Armed with $9 Million and a Free Research Tool
Marble, a startup developing AI agents for tax professionals, announced a $9 million seed round led by Susa Ventures. The funding will support its free AI‑powered tax research tool and future agents that can analyze compliance scenarios and automate parts of...
Nous Research Just Released Nomos 1, an Open-Source AI that Ranks Second on the Notoriously Brutal Putnam Math Exam
Nous Research released Nomos 1, an open‑source AI mathematician that scored 87 out of 120 on the 2024 Putnam Competition, which would place it second among 3,988 participants. The system achieves this performance with a 30‑billion‑parameter mixture‑of‑experts model, activating only...
Cohere’s Rerank 4 Quadruples the Context Window over 3.5 to Cut Agent Errors and Boost Enterprise Search Accuracy
Cohere has released Rerank 4, expanding its context window to 32 K tokens—four times larger than Rerank 3.5—and promising higher ranking accuracy for enterprise search. The model arrives in Fast and Pro variants, targeting speed‑critical and deep‑reasoning workloads respectively. Rerank 4 also introduces self‑learning...
The 70% Factuality Ceiling: Why Google’s New ‘FACTS’ Benchmark Is a Wake-Up Call for Enterprise AI
Google’s FACTS Benchmark Suite, released with Kaggle, evaluates large language models on factuality across four real‑world scenarios—parametric knowledge, search‑augmented retrieval, multimodal interpretation, and text grounding. The initial leaderboard shows Gemini 3 Pro topping the chart with a 68.8% overall score, while GPT‑5...
The AI that Scored 95% — Until Consultants Learned It Was AI
SAP secretly tested its AI co‑pilot Joule with five consultant teams, asking four teams to believe the answers came from junior interns. Those teams rated the output about 95% accurate, while the fifth team, told the answers were AI‑generated, rejected...
Quilter's AI Just Designed an 843‑part Linux Computer that Booted on the First Try. Hardware Will Never Be the Same.
Quilter, a San Francisco AI startup, used a physics‑driven system to design a two‑board Linux computer with 843 components in just one week, cutting human effort from an estimated 428 hours to 38.5. The AI generated a layout with 98%...
Mistral Launches Powerful Devstral 2 Coding Model Including Open Source, Laptop-Friendly Version
Mistral AI unveiled Devstral 2, a 123‑billion‑parameter coding model with a 256K‑token context window, alongside a 24‑billion‑parameter Devstral Small 2 that runs on a single laptop. Both models are open‑weight and available free for a limited time via API and...
Databricks' OfficeQA Uncovers Disconnect: AI Agents Ace Abstract Tests but Stall at 45% on Enterprise Docs
Databricks introduced OfficeQA, a benchmark that tests AI agents on document‑heavy enterprise tasks using 89,000 pages of U.S. Treasury Bulletins. Tests show top agents such as Claude Opus 4.5 and GPT‑5.1 achieve only 37‑44% accuracy on raw PDFs, rising to 68%...
Brand-Context AI: The Missing Requirement for Marketing AI
Marketing teams are adopting generative AI, but outputs often miss brand, audience, and strategic alignment because models lack contextual intelligence. BlueOcean argues that the missing ingredient is structured brand‑context, which unifies vertical data streams into a horizontal view for decision‑quality...
Z.ai Debuts Open Source GLM-4.6V, a Native Tool-Calling Vision Model for Multimodal Reasoning
Zhipu AI’s Z.ai has launched the GLM-4.6V series, an open‑source vision‑language model family featuring a 106‑billion‑parameter flagship and a 9‑billion‑parameter Flash variant. Both models introduce native multimodal function calling, allowing visual inputs to be passed directly to tools such as...
Booking.com’s Agent Strategy: Disciplined, Modular and Already Delivering 2× Accuracy
Booking.com has turned its homegrown conversational recommendation system into a disciplined, modular AI agent stack that blends small travel‑specific models with larger LLMs and in‑house evaluations. This hybrid approach has doubled accuracy on key retrieval, ranking and customer‑interaction tasks while...
Design in the Age of AI: How Small Businesses Are Building Big Brands Faster
Generative AI has turned design from a late‑stage expense into a front‑line capability for small businesses. Since 2022, searches for AI‑powered naming, logo and website generators have surged 700‑1,600%, indicating rapid adoption. Unified platforms like Design.com now deliver naming, logo...
Why AI Coding Agents Aren’t Production-Ready: Brittle Context Windows, Broken Refactors, Missing Operational Awareness
AI coding agents can generate snippets quickly, but they falter in enterprise settings due to limited context windows, service limits, and lack of hardware awareness. Indexing caps at 2,500 files and 500 KB per file leave large monorepos partially invisible, forcing...
Inside NetSuite’s Next Act: Evan Goldberg on the Future of AI-Powered Business Systems
Oracle NetSuite unveiled NetSuite Next at SuiteWorld 2025, branding it as the platform’s biggest product evolution. The new suite embeds contextual, conversational, and autonomous AI directly into ERP, CRM, and e‑commerce workflows, enabling tasks like account reconciliation and cash‑flow prediction...
Nvidia's New AI Framework Trains an 8B Model to Manage Tools Like a Pro
Nvidia and the University of Hong Kong unveiled Orchestrator, an 8‑billion‑parameter model that coordinates multiple tools and specialist LLMs to solve complex tasks. Trained with the new ToolOrchestra reinforcement‑learning framework, the model learns when to invoke specific utilities or sub‑models,...
Gemini 3 Pro Scores 69% Trust in Blinded Testing up From 16% for Gemini 2.5: The Case for Evaluating AI...
Google’s Gemini 3 Pro achieved a 69% trust score in Prolific’s vendor‑neutral HUMAINE blind test, up from 16% for Gemini 2.5. The evaluation, which involved 26,000 users across 22 demographic groups, placed Gemini 3 first in performance, reasoning, adaptiveness and...
Tariff Turbulence Exposes Costly Blind Spots in Supply Chains and AI
Tariff volatility forces companies to react within 48 hours, prompting a shift toward process intelligence (PI) and AI‑driven supply‑chain orchestration. At Celosphere 2025, Vinmar, Florida Crystals and ASOS demonstrated how Celonis’ PI platform creates real‑time digital twins that cut expedites,...
Workspace Studio Aims to Solve the Real Agent Problem: Getting Employees to Use Them
Google has made Workspace Studio generally available, letting employees design, manage, and share AI agents directly within Google Workspace. The platform, powered by Gemini 3, targets business teams rather than developers and offers templates that automate routine tasks across Docs, Sheets,...
AWS Claims 90% Vector Cost Savings with S3 Vectors GA, Calls It 'Complementary' - Analysts Split on What It Means...
Amazon Web Services announced the general availability of Amazon S3 Vectors, a native vector storage and similarity‑search capability built directly into its S3 object storage service. The GA release expands capacity to 2 billion vectors per index and up to 20 trillion...
Ascentra Labs Raises $2 Million to Help Consultants Use AI Instead of All-Night Excel Marathons
London‑based Ascentra Labs closed a $2 million seed round led by Berlin VC NAP to automate survey analysis in private‑equity due diligence. The platform ingests raw survey data and generates traceable Excel workbooks, promising 60‑80% time savings for consulting teams. Early...
New Training Method Boosts AI Multimodal Reasoning with Smaller, Smarter Datasets
Researchers at MiroMind AI and partner universities introduced OpenMMReasoner, a two‑stage training framework that first fine‑tunes a base vision‑language model on a curated, high‑quality dataset and then applies reinforcement learning to sharpen multimodal reasoning. The approach achieves state‑of‑the‑art performance on...
AWS Goes Beyond Prompt-Level Safety with Automated Reasoning in AgentCore
At re:Invent, AWS announced major upgrades to its Bedrock AgentCore platform, adding policy enforcement, episodic memory, and evaluation tools powered by automated reasoning. The new policy layer sits between agents and external tools, allowing enterprises to enforce guardrails after an...
With Nova Forge, AWS Gives Companies a Path to Build Foundation-Class Models without GPUs
AWS unveiled Nova Forge, a new service that lets enterprises fine‑tune its Nova 2 foundation models with proprietary data without needing costly GPU clusters. The offering creates custom “Novellas” that retain core reasoning abilities while gaining domain‑specific knowledge, and these...
Arcee Aims to Reboot U.S. Open Source AI with New Trinity Models Released Under Apache 2.0
Arcee AI unveiled Trinity Mini (26B parameters) and Trinity Nano (6B parameters) as the first U.S.-trained open‑weight Mixture‑of‑Experts models released under an Apache 2.0 license. The models are available for free download on Hugging Face and can be accessed via a...
DeepSeek Just Dropped Two Insanely Powerful AI Models that Rival GPT-5 and They're Totally Free
DeepSeek, a Chinese AI startup, unveiled two 685‑billion‑parameter models—DeepSeek‑V3.2 and the high‑performance DeepSeek‑V3.2‑Speciale—under an MIT open‑source license. The models employ a novel Sparse Attention architecture that halves inference costs for long‑context tasks, supporting 128,000‑token windows at roughly $0.70 per million...
AI Models Block 87% of Single Attacks, but Just 8% when Attackers Persist
Cisco’s AI Threat Research team discovered that open‑weight large language models block 87% of single‑turn malicious prompts but see attack success soar to 92% when adversaries persist across multiple turns. The study evaluated eight popular models and found multi‑turn success...
OpenAGI Emerges From Stealth with an AI Agent that It Claims Crushes OpenAI and Anthropic
OpenAGI, a stealth startup founded by MIT researcher Zengyi Qin, unveiled Lux, an AI foundation model that autonomously controls computers. Lux achieved an 83.6% success rate on the Online‑Mind2Web benchmark, outpacing OpenAI’s Operator (61.3%) and Anthropic’s Claude Computer Use (56.3%)....
Capture the Full Value of Your Technology with Financial Intelligence
Apptio’s Technology Business Management (TBM) platform adds a Financial Intelligence Layer that unifies data from ERP, cloud, ITSM, HR and other systems. By normalizing and enriching these inputs, the solution enables FinOps, IT financial management and strategic portfolio management teams...
Agent Coordination Is the Missing Piece in AI Commerce — New AWS and Visa Blueprints Target the Gap
AWS has added Visa’s Intelligence Commerce platform to its Marketplace, pairing Visa’s Trusted Agent Protocol tools with Amazon Bedrock and AgentCore. The joint effort includes publicly available blueprints that streamline multi‑agent workflows such as travel booking and B2B payment reconciliation....
Ontology Is the Real Guardrail: How to Stop AI Agents From Misunderstanding Your Business
Enterprises pour billions into AI agents, yet real‑world deployments falter because agents lack true understanding of siloed business data. The article proposes an ontology‑based single source of truth—defining concepts, hierarchies, and relationships—to bridge this gap, enabling agents to interpret context,...
What to Be Thankful for in AI in 2025
2025 marks a turning point for generative AI as the ecosystem diversifies beyond a handful of cloud‑only giants. OpenAI launched GPT‑5, GPT‑5.1, Atlas, Sora 2 and open‑weight models, while enterprises report over‑50% ticket‑resolution gains using the new models. China’s open‑source wave,...
Prompt Security's Itamar Golan on Why Generative AI Security Requires Building a Category, Not a Feature
Prompt Security, founded by Itamar Golan in August 2023, built a full‑stack GenAI security platform and was acquired by SentinelOne for an estimated $250 million in August 2025. The company pioneered runtime protection, shadow‑AI discovery, and real‑time data sanitization, moving beyond...
Black Forest Labs Launches Flux.2 AI Image Models to Challenge Nano Banana Pro and Midjourney
Black Forest Labs unveiled FLUX.2, a new family of image‑generation and editing models that includes four variants—Pro, Flex, Dev and the upcoming Klein—plus an open‑source VAE released under Apache 2.0. The models add multi‑reference conditioning, higher‑fidelity 4‑megapixel outputs, and markedly better...
What Enterprises Should Know About The White House's New AI 'Manhattan Project' The Genesis Mission
President Trump announced the Genesis Mission, an AI‑focused "Manhattan Project" that directs the Department of Energy to create a closed‑loop AI experimentation platform linking the nation’s 17 national labs, federal supercomputers and decades of government scientific data. The initiative aims...
OpenAI Now Lets Enterprises Choose Where to Host Their Data
OpenAI has broadened its data residency options for ChatGPT Enterprise, Edu, and approved API customers, now allowing data at rest to be stored and processed in ten regions—including the EU, UK, US, Canada, Japan, South Korea, Singapore, India, Australia, and...
DeepSeek Injects 50% More Security Bugs when Prompted with Chinese Political Triggers
CrowdStrike researchers found that the Chinese AI model DeepSeek‑R1 injects up to 50% more insecure code when prompts contain politically sensitive terms such as "Falun Gong," "Uyghurs" or "Tibet." The vulnerability stems from an embedded censorship mechanism in the model’s...
Microsoft’s Fara-7B Is a Computer-Use AI Agent that Rivals GPT-4o and Works Directly on Your PC
Microsoft unveiled Fara-7B, a 7‑billion‑parameter computer‑use AI agent that runs locally on a PC and interacts with web interfaces via pixel‑level visual input. In benchmark tests on WebVoyager, it achieved a 73.5% task‑success rate, surpassing larger models such as GPT‑4o...