Hugging Face

Company-Unified Profile

17 followers

The AI community building the future. https://t.co/VkRPD0Vclr

Blog•Mar 13, 2026

Beyond Semantic Similarity: Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline

NVIDIA’s NeMo Retriever team unveiled an agentic retrieval pipeline that topped the ViDoRe v3 leaderboard and placed second on the reasoning‑heavy BRIGHT benchmark. The system replaces static semantic‑similarity searches with a ReACT‑style loop where an LLM agent iteratively plans, retrieves, and refines queries. Engineering advances, notably an in‑process singleton retriever, slashed latency and GPU overhead, making the approach viable at leaderboard scale. Ablation studies show the pipeline’s robustness across models and embeddings, while highlighting trade‑offs in speed and cost.

By Hugging Face

Social•Mar 12, 2026

Lead Paris AI Community with Hugging Face Builders

Hugging Face Builders is a global community program that puts local leaders at the center of the open-source AI movement 🤗 If you're passionate about open AI and love bringing people together, this is your invitation to lead ✉️ Apply for to build the Paris chapter today ➡️ https://t.co/ONVBZdxRdc

By Hugging Face

Blog•Mar 11, 2026

Code Concepts: A Large-Scale Synthetic Dataset Generated From Programming Concept Seeds

Researchers introduced a concept‑driven workflow that produces synthetic code data aligned with specific programming skills. Using a taxonomy of 91 Python concepts, they generated roughly 15 million Python problems and incorporated 10 billion tokens into the final 100 billion‑token pretraining of Nemotron‑Nano‑v3. The...

By Hugging Face

Social•Mar 10, 2026

Storage Buckets: Fast, Cheap, Mutable AI Data Storage

🪣 We just shipped Storage Buckets: S3-like mutable storage, cheaper & faster Git falls short for everything on high-throughput side of AI (checkpoints, processed data, agent traces, logs etc) Buckets fixes that: fast writes, overwrites, directory sync 💨 All powered by...

By Hugging Face

Blog•Mar 9, 2026

Granite 4.0 1B Speech: Compact, Multilingual, and Built for the Edge

IBM released Granite 4.0 1B Speech, a compact multilingual speech‑language model aimed at resource‑constrained enterprise devices. The 1‑billion‑parameter model halves the size of its predecessor while delivering higher English transcription accuracy and faster inference via speculative decoding. It adds Japanese ASR and keyword‑list...

By Hugging Face

Blog•Mar 9, 2026

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Ulysses Sequence Parallelism, part of Snowflake AI's Arctic Long Sequence Training protocol, distributes transformer attention across multiple GPUs by sharding both the input sequence and attention heads. The method replaces the quadratic memory bottleneck with two all‑to‑all communications per layer,...

By Hugging Face

Blog•Mar 5, 2026

Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

Modular Diffusers launches a composable framework that breaks diffusion pipelines into interchangeable blocks such as text encoding, denoising, and decoding. Developers can assemble, replace, or run individual blocks, enabling lazy loading, memory‑efficient inference, and easy experimentation with models like FLUX.2‑Klein 4B....

By Hugging Face

Blog•Feb 26, 2026

Mixture of Experts (MoEs) in Transformers

Mixture‑of‑Experts (MoE) Transformers replace dense feed‑forward layers with multiple lightweight experts, activating only a few per token to keep inference cost low while preserving the capacity of much larger models. The Hugging Face transformers library introduced a WeightConverter that merges and splits...

By Hugging Face

Blog•Feb 24, 2026

Deploying Open Source Vision Language Models (VLM) on Jetson

NVIDIA’s Cosmos Reason 2B vision‑language model can now be deployed on the Jetson family using the vLLM inference engine. The tutorial walks through installing the NGC CLI, pulling FP8‑quantized weights, and running device‑specific Docker containers for AGX Thor, AGX Orin and Orin Super Nano. After...

By Hugging Face

Blog•Feb 20, 2026

Train AI Models with Unsloth and Hugging Face Jobs for FREE

The blog shows how Unsloth paired with Hugging Face Jobs lets developers fine‑tune the 1.2 B‑parameter LFM2.5‑Instruct model in half the usual time while using roughly 60 % less VRAM. By invoking a single `hf jobs` command, users can launch a managed GPU job,...

By Hugging Face

Blog•Feb 20, 2026

GGML and llama.cpp Join HF to Ensure the Long-Term Progress of Local AI

Hugging Face announced that GGML and its llama.cpp project are joining the company. Georgi Gerganov and his team will continue full‑time maintenance, retaining autonomy while receiving HF resources. Integration aims to streamline model deployment via the transformers library and improve...

By Hugging Face

Blog•Feb 13, 2026

Custom Kernels for All From Codex and Claude

Hugging Face released a 550‑token CUDA‑kernel agent skill that equips coding agents like Claude and Codex with architecture‑aware optimization knowledge. The skill was used to generate production‑ready RMSNorm, RoPE, GEGLU and AdaLN kernels for a diffusers video pipeline and a...

By Hugging Face

Blog•Feb 12, 2026

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

OpenEnv, an open‑source framework from Meta and Hugging Face, lets AI agents interact with real‑world tools through a standardized gym‑style API. Turing contributed a production‑grade Calendar Gym that mimics authentic calendar systems with access controls, partial visibility, and multi‑step workflows....

By Hugging Face

Social•Feb 6, 2026

New Features: Community Eval Integration & Enhanced Data Tools

We have been shipping 🛳️❤️ 📦 Community Evals & Benchmark Datasets: Benchmark datasets host benchmark leaderboards, you can now contribute eval results by opening a PR to model repositories, all PRs are fed to benchmark datasets 📦 Chat with datasets: agents...

By Hugging Face

Blog•Feb 5, 2026

Introducing SyGra Studio

SyGra 2.0.0 launches Studio, a visual IDE for building synthetic data generation workflows. The canvas lets users configure models, data sources, and prompts via drag‑and‑drop, automatically generating the underlying YAML/JSON graph. Studio provides live execution monitoring, token‑cost tracking, and inline...

By Hugging Face

Hugging Face

Beyond Semantic Similarity: Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline

Lead Paris AI Community with Hugging Face Builders

Code Concepts: A Large-Scale Synthetic Dataset Generated From Programming Concept Seeds

Storage Buckets: Fast, Cheap, Mutable AI Data Storage

Granite 4.0 1B Speech: Compact, Multilingual, and Built for the Edge

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

Mixture of Experts (MoEs) in Transformers

Deploying Open Source Vision Language Models (VLM) on Jetson

Train AI Models with Unsloth and Hugging Face Jobs for FREE

GGML and llama.cpp Join HF to Ensure the Long-Term Progress of Local AI

Custom Kernels for All From Codex and Claude

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

New Features: Community Eval Integration & Enhanced Data Tools

Introducing SyGra Studio

Technology Pulse