Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model
NVIDIA unveiled the Nemotron ColEmbed V2 family, a set of late‑interaction multimodal embedding models available in 3B, 4B and 8B sizes. The models achieve state‑of‑the‑art results on the ViDoRe V1‑V3 benchmarks, with the 8B variant ranking #1 on ViDoRe V3 (NDCG@10 63.42). They extend the ColBERT‑style MaxSim interaction to text‑image token pairs, delivering finer semantic matching at the cost of higher storage. The release targets researchers and enterprises building high‑accuracy visual document retrieval and RAG pipelines.
Community Evals: Because We're Done Trusting Black-Box Leaderboards over the Community
Hugging Face launched decentralized evaluation reporting, enabling benchmark datasets to host leaderboards and models to store evaluation scores in .eval_results YAML files. Community members can submit results via pull requests, which appear alongside author scores and are aggregated on dataset...

H Company's New Holo2 Model Takes the Lead in UI Localization
H Company unveiled the Holo2‑235B‑A22B Preview, its largest UI‑localization model to date, achieving a new state‑of‑the‑art 78.5% accuracy on the Screenspot‑Pro benchmark and 79.0% on OSWorld G. The model is released on Hugging Face as a research preview focused on UI‑element grounding....
Training Design for Text-to-Image Models: Lessons From Ablations
The PRX Part 2 post documents a systematic series of ablations on training a 1.2 B‑parameter text‑to‑image diffusion model. Adding representation alignment (REPA) with frozen vision teachers lowered FID by up to three points, while latent‑space alignment (REPA‑E) and the Flux2‑AE tokenizer halved...
Nemotron-Personas-Brazil: Co-Designed Data for Sovereign AI
NVIDIA released Nemotron‑Personas‑Brazil, an open dataset of six million fully synthetic Brazilian personas grounded in official IBGE census and labor statistics. The collection spans 20 fields, 1.5 k occupation categories, and covers every Brazilian state, delivering culturally authentic Portuguese narratives. Built...
**NVIDIA Earth-2 Open Models Span the Whole Weather Stack**
NVIDIA announced three new open‑source models—StormScope, Atlas, and HealDA—under its Earth‑2 portfolio, covering nowcasting, medium‑range, and data assimilation. StormScope delivers kilometer‑scale, zero‑to‑six‑hour storm forecasts that outperform traditional physics models, while Atlas provides high‑accuracy 15‑day global predictions across more than 70...
AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality
AssetOpsBench is a new benchmark that evaluates agentic AI in industrial asset‑lifecycle management using 2.3 M sensor points, 140+ curated scenarios, 4.2 K work orders and 53 structured failure modes. It scores agents across six qualitative dimensions—task completion, retrieval accuracy, result verification,...
Differential Transformer V2
The Differential Transformer V2 (DIFF V2) introduces a differential attention operation that doubles query heads while keeping key‑value heads unchanged, eliminating the need for custom attention kernels. By projecting a per‑token, per‑head λ and applying a sigmoid‑scaled subtraction, DIFF V2 removes the...

Introducing OptiMind, a Research Model Designed for Optimization
Microsoft Research unveiled OptiMind, a specialized language model that converts natural‑language optimization problems into solver‑ready mathematical formulations. The model is released as an experimental offering on Hugging Face, allowing developers and researchers to test it directly in the platform’s playground. OptiMind...
Open Responses: What You Need to Know
Open Responses is an open‑source inference standard that extends OpenAI’s Responses API, aiming to replace the legacy Chat Completion format for agentic workloads. It unifies text, image, JSON, and video generation while enabling provider‑side tool execution and autonomous sub‑agent loops....
Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models
NVIDIA introduced two compact Llama Nemotron models—an image‑text embedding encoder and a cross‑encoder reranker—tailored for multimodal retrieval over visual documents. Both run on typical NVIDIA GPUs, emit a single dense vector per page, and integrate seamlessly with existing vector databases. Benchmarks...

Generalist Robot Policy Evaluation in Simulation with NVIDIA Isaac Lab-Arena and LeRobot
NVIDIA and Hugging Face have merged NVIDIA Isaac Lab‑Arena with the LeRobot EnvHub, creating an open‑source pipeline for evaluating vision‑language‑action (VLA) robot policies in simulation. The integration gives developers access to pre‑trained GR00T N models, a library of 250+ Lightwheel tasks, and...

Introducing Falcon H1R 7B
The Technology Innovation Institute unveiled Falcon H1R 7B, a decoder‑only 7‑billion‑parameter LLM that rivals much larger reasoning models. Leveraging a two‑stage pipeline of curated supervised fine‑tuning and reinforcement learning with the GRPO algorithm, the model excels on math, code, and general...
Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture
Falcon‑H1‑Arabic introduces a family of 3B, 7B and 34B parameter models that merge Mamba state‑space modules with Transformer attention in a hybrid block design. The architecture expands context windows to 128K tokens for the 3B model and 256K tokens for...
NVIDIA Brings Agents to Life with DGX Spark and Reachy Mini
At CES 2026 NVIDIA demonstrated how its DGX Spark platform can power a personal AI assistant built on the Reachy Mini robot. Using open‑source Nemotron 3 Nano for reasoning and Nemotron Nano 2 VL for vision, the demo combined NVIDIA’s NeMo Agent Toolkit with ElevenLabs TTS to...
Tokenization in Transformers V5: Simpler, Clearer, and More Modular
Transformers v5 introduces a major redesign of tokenizers, consolidating each model’s tokenizer into a single, transparent file and exposing the full architecture—normalizer, pre‑tokenizer, model, post‑processor, and decoder—as inspectable properties. The library now defaults to a Rust‑based backend for speed, while...

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator
NVIDIA released the Nemotron 3 Nano 30B A3B model alongside a fully open evaluation recipe built with the NeMo Evaluator library. The blog details how developers can reproduce the model‑card results, inspect structured logs, and run the same benchmarks on any inference endpoint. Published YAML...
New in llama.cpp: Model Management
The llama.cpp server now includes a router mode that enables dynamic loading, unloading, and switching among multiple LLM models without restarting the service. Models are auto‑discovered from the default cache or a user‑specified directory and are launched in separate processes,...
Apriel-1.6-15b-Thinker: Cost-Efficient Frontier Multimodal Performance
ServiceNow released Apriel-1.6-15B-Thinker, a 15‑billion‑parameter multimodal reasoning model that rivals the performance of models ten times larger. Built on the Apriel‑1.5 foundation, it boosts text and vision reasoning while cutting reasoning token usage by over 30%. Trained on NVIDIA GB200...

Introducing Swift-Huggingface: The Complete Swift Client for Hugging Face
Mattt announced swift‑huggingface, a new Swift package that delivers a full‑featured client for the Hugging Face Hub, slated to replace the existing HubApi in swift‑transformers. The library adds reliable, resumable downloads with progress tracking, a Python‑compatible cache, and a flexible...

DeepMath: A Lightweight Math Reasoning Agent with SmolAgents
DeepMath is a math‑reasoning agent built on the Qwen‑3‑4B Thinking model and fine‑tuned with Group Relative Policy Optimization (GRPO). It replaces verbose chain‑of‑thought text with tiny Python snippets that run in a sandboxed executor, then folds the results back into the...
We Got Claude to Fine-Tune an Open Source LLM
Anthropic’s Claude Code now leverages a new Hugging Face Skills plugin to fine‑tune open‑source large language models end‑to‑end. The skill generates training scripts, selects appropriate cloud GPUs, submits jobs to Hugging Face Jobs, monitors progress via Trackio, and pushes the finished model to the Hub....
Custom Policy Enforcement with Reasoning: Faster, Safer AI Applications
NVIDIA unveiled Nemotron Content Safety Reasoning, a model that blends dynamic policy reasoning with production‑grade latency. The system lets developers load natural‑language policies at inference time, enabling nuanced content moderation across e‑commerce, telecom, and healthcare use cases. It achieves low...
Transformers V5 RC Launches with Seamless Ecosystem Interoperability
Today we release the transformers version 5 RC! 🤗 With this, we enable e2e interoperability with our friends in ecosystem, ease up adding new models and simplify the library 🙌🏻 Read our blog to learn more: https://t.co/ysZW3btRgR https://t.co/PVKFRsH9Z2
SARLO-80: Worldwide Slant SAR Language Optic Dataset at 80 Cm Resolution
The SARLO‑80 dataset aggregates roughly 2,500 Umbra synthetic aperture radar (SAR) scenes and aligns them with high‑resolution optical imagery at a uniform 80 cm slant‑range resolution. Each SAR patch (1,024 × 1,024 px) is co‑registered with an optical counterpart and enriched with English natural‑language...
Transformers V5: Simple Model Definitions Powering the AI Ecosystem
Hugging Face released Transformers v5.0.0rc‑0, marking a five‑year jump from v4’s initial candidate. Daily pip installs have surged to over 3 million, pushing total installations past 1.2 billion. The library now supports more than 400 model architectures and hosts upwards of 750 k checkpoints,...
New Profile Status Launched—Showcase Your Weekend Projects
we've recently shipped profile status 🫡 drop below what you've built on Hugging Face Hub this weekend!
Flux.1-dev Ranks #2, Eagerly Awaiting Flux.2-dev
Flux.1-dev has been the second most liked model on Hugging Face just after Deepseek R1 so super excited to see the release of Flux.2-dev by @bfl_ml today! Download the weights or try the model (thanks to @fal) on @huggingface: https://t.co/kdmVlvdLZh Read the...
Diffusers Welcomes FLUX-2
The post introduces FLUX.2, Black Forest Labs' latest open‑source image generation model, highlighting its new architecture—including a single Mistral Small 3.1 text encoder and a re‑engineered DiT transformer with more single‑stream blocks and bias‑free layers. It offers practical guidance for...

Building Deep Research: How We Achieved State of the Art
Tavily’s team detailed how they rebuilt their deep‑research AI agent to achieve state‑of‑the‑art performance. By designing a lightweight agent harness, leveraging evolving model tool‑calling abilities, and integrating an advanced search tool, they streamlined orchestration and context handling. Their context‑engineering approach...
OVHcloud on Hugging Face Inference Providers 🔥
OVHcloud is now an official Inference Provider on the Hugging Face Hub, enabling serverless AI model calls directly from model pages. The service offers pay‑per‑token pricing starting at €0.04 per million tokens and runs on secure European data centers for...
Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks
The post introduces new multilingual and long‑form tracks on the Open ASR Leaderboard, highlighting recent trends across 60+ models. It finds that Conformer encoders paired with LLM decoders achieve the best English accuracy, while CTC/TDT decoders offer the highest speed,...
20x Faster TRL Fine-Tuning with RapidFire AI
The post announces the integration of RapidFire AI with Hugging Face TRL, enabling up to 20× faster fine‑tuning and post‑training experiments by running multiple configurations concurrently on a single GPU through adaptive chunk‑based scheduling. It highlights drop‑in TRL wrappers, real‑time...
Olmo 3 Launch Live: Join the Celebration
BOOM! Olmo 3 has just landed, join us in this livestream to learn more about the release 🤗💗

Introducing AnyLanguageModel: One API for Local and Remote LLMs on Apple Platforms
The post announces AnyLanguageModel, a Swift package that lets Apple developers swap the Foundation Models import for a unified API supporting local (Core ML, MLX, llama.cpp, Ollama) and cloud (OpenAI, Anthropic, Gemini, Hugging Face) LLM providers with minimal code changes. By...

Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models
ServiceNow‑AI converted its 15 B attention‑based reasoning model into a hybrid Mamba architecture, achieving 2.1× throughput with negligible quality loss. The breakthrough came from distilling on the teacher’s high‑quality SFT reasoning traces rather than generic pretraining data, and using reverse KL...
The Pharmome Map: A Comprehensive Public Dataset for Drug-Target Interaction Modeling
EvE Bio has released the “pharmome map,” the largest public drug‑target interaction dataset to date. It measures 1,397 FDA‑approved small‑molecule drugs against nuclear receptors, GPCRs, and protein kinases using standardized high‑throughput assays. The dataset is openly accessible, refreshed bi‑monthly, and...
Easily Build and Share ROCm Kernels with Hugging Face
The post explains how to build, test, and share ROCm‑compatible GPU kernels—specifically a high‑performance FP8 GEMM kernel—using Hugging Face’s kernel‑builder and kernels libraries, with a focus on reproducible builds via a flake.nix environment. It walks through project layout, configuration files...

World's Largest AI Hackathon Launches with $20K Prizes
This might be the biggest AI hackathon ever: * >6,300 registrants * Runs for 2 weeks (Nov. 14-30) * Open to anyone, anywhere virtually * $20,000 in cash prizes + $3.5M+ in sponsor credits Hosted by @Anthropic and @Gradio, along with 10 sponsors, join...

Join the AMD Open Robotics Hackathon
AMD, Hugging Face, and Data Monsters are launching the AMD Open Robotics Hackathon with in‑person events in Tokyo (December 5‑7, 2025) and Paris (December 12‑14, 2025). Teams of up to four participants will complete two missions—setting up the LeRobot development environment and creating a...
Building for an Open Future - Our New Partnership with Google Cloud
The post announces a deeper partnership between Hugging Face and Google Cloud aimed at making it easier for companies to build and customize AI using open models. It highlights integrated services such as Vertex AI Model Garden, GKE, Cloud Run,...
Essential Testing Tips for Your Gradio App
Best practices for testing your Gradio app 👇
Building a Healthcare Robot From Simulation to Deployment with NVIDIA Isaac
NVIDIA released Isaac for Healthcare v0.4 with an end‑to‑end SO‑ARM starter workflow that takes developers from mixed simulation and real‑world data collection through fine‑tuning and real‑time deployment of surgical assistant robots. The pipeline fine‑tunes GR00T N1.5 on predominantly synthetic data...
Voice Cloning with Consent
Hugging Face researchers propose a “voice consent gate” to allow voice cloning only after an explicit, context‑specific spoken consent, and provide a demo and modular code to demonstrate the approach. The system combines autogenerated consent sentences, automatic speech recognition to...
Huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning
Hugging Face has released huggingface_hub v1.0 after five years of development, positioning the library as the mature Python backbone for its Hub and the broader ML ecosystem. The package now powers roughly 200,000 dependent libraries and provides access to more...

Streaming Datasets: 100x More Efficient
Hugging Face announced major backend improvements to its datasets and huggingface_hub libraries that make streaming multi‑TB training data far more efficient and reliable using the same load_dataset(streaming=True) API. Changes—including a persistent data files cache, optimized resolution logic, Parquet prefetching and...
LeRobot v0.4.0: Super Charging OSS Robotics Learning
Hugging Face released LeRobot v0.4.0, a major upgrade to its open‑source robotics stack that introduces LeRobotDataset v3.0 (chunked episodes and streaming to handle OXE‑scale datasets >400 GB), new VLA models (PI0.5 and GR00T N1.5), and a plugin system for easier...
Building the Open Agent Ecosystem Together: Introducing OpenEnv
Meta and Hugging Face launched the OpenEnv Hub, an open community repository and 0.1 RFC standard for “agentic environments” that package tools, APIs, credentials and execution context into secure, sandboxed interfaces for training and deployment. The Hub—seeded with initial environments...
Sentence Transformers Is Joining Hugging Face!
Hugging Face announced it is formally taking stewardship of the popular open‑source Sentence Transformers library — maintained by Tom Aarsen since 2023 — transitioning the project from TU Darmstadt’s UKP Lab to Hugging Face while retaining its Apache 2.0 license...
Hugging Face and VirusTotal Collaborate to Strengthen AI Security
Hugging Face has partnered with VirusTotal to continuously scan all 2.2M+ public model and dataset repositories on the Hugging Face Hub, checking file hashes against VirusTotal’s threat‑intelligence database to surface prior detections and related metadata. The integration retrieves status (clean...