The partnership expands low‑latency, sovereign AI infrastructure in Europe, giving developers a competitive alternative to US‑centric cloud providers. It also strengthens Hugging Face’s ecosystem by diversifying inference options and pricing models.
The addition of OVHcloud to Hugging Face’s Inference Provider roster marks a strategic shift toward a more geographically diversified AI inference market. European enterprises, long concerned about data residency, now have a native, serverless option that keeps traffic within EU borders while still accessing cutting‑edge models like GPT‑OSS, Qwen3, and Llama. By embedding OVHcloud directly into the Hub’s UI and SDKs, Hugging Face reduces friction for developers seeking to experiment or deploy at scale, reinforcing its role as the central marketplace for open‑weight models.
Technically, OVHcloud AI Endpoints deliver a compelling blend of performance and flexibility. Sub‑200 ms first‑token latency meets the demands of real‑time chatbots and agentic workflows, while structured outputs, function calling, and multimodal capabilities broaden use‑case horizons. The pay‑per‑token model, anchored at €0.04 per million tokens, offers transparent cost control, and the dual billing pathways—direct provider keys or routed through Hugging Face—let users choose the most convenient financial arrangement. Integration with both Python’s `huggingface_hub` and JavaScript’s `@huggingface/inference` libraries ensures seamless adoption across development stacks.
From a business perspective, this collaboration strengthens the competitive landscape against dominant US cloud players. European data sovereignty, combined with low latency and competitive pricing, positions OVHcloud as a viable alternative for regulated industries such as finance, healthcare, and public sector. The move also hints at future revenue‑sharing models, potentially unlocking new monetization streams for both Hugging Face and its provider partners. For developers, the added inference option expands the toolkit for rapid prototyping, while Hugging Face PRO users benefit from monthly inference credits that can be applied across providers, further lowering barriers to entry.
Authors: Gilles Closset, Fabien Ric, Elias Tourneux · November 24, 2025
We’re thrilled to share that OVHcloud is now a supported Inference Provider on the Hugging Face Hub! OVHcloud joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub’s model pages. Inference Providers are also seamlessly integrated into our client SDKs (for both JS and Python), making it super easy to use a wide variety of models with your preferred providers.
This launch makes it easier than ever to access popular open‑weight models like gpt‑oss, Qwen3, DeepSeek R1, and Llama — right from Hugging Face. You can browse OVHcloud’s org on the Hub at https://huggingface.co/ovhcloud and try trending supported models at https://huggingface.co/models?inference_provider=ovhcloud&sort=trending.
OVHcloud AI Endpoints are a fully managed, serverless service that provides access to frontier AI models from leading research labs via simple API calls. The service offers competitive pay‑per‑token pricing starting at €0.04 per million tokens.
The service runs on secure infrastructure located in European data centers, ensuring data sovereignty and low latency for European users. The platform supports advanced features including structured outputs, function calling, and multimodal capabilities for both text and image processing.
Built for production use, OVHcloud’s inference infrastructure delivers sub‑200 ms response times for first tokens, making it ideal for interactive applications and agentic workflows. The service supports both text generation and embedding models. You can learn more about OVHcloud’s platform and infrastructure at https://www.ovhcloud.com/en/public-cloud/ai-endpoints/catalog/.
Read more about how to use OVHcloud as an Inference Provider in its dedicated documentation page.
See the list of supported models here.
In your user account settings, you are able to:
Set your own API keys for the providers you’ve signed up with. If no custom key is set, your requests will be routed through HF.
Order providers by preference. This applies to the widget and code snippets in the model pages.
As mentioned, there are two modes when calling Inference Providers:
Custom key – calls go directly to the inference provider, using your own API key of the corresponding inference provider.
Routed by HF – you don’t need a token from the provider, and the charges are applied directly to your HF account rather than the provider’s account.
Model pages showcase third‑party inference providers (the ones that are compatible with the current model, sorted by user preference).
From Python, using huggingface_hub
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="openai/gpt-oss-120b:ovhcloud",
messages=[
{
"role": "user",
"content": "What is the capital of France?"
}
],
)
print(completion.choices[0].message)
From JS, using @huggingface/inference
import { InferenceClient } from "@huggingface/inference";
const client = new InferenceClient(process.env.HF_TOKEN);
const chatCompletion = await client.chatCompletion({
model: "openai/gpt-oss-120b:ovhcloud",
messages: [
{
role: "user",
content: "What is the capital of France?",
},
],
});
console.log(chatCompletion.choices[0].message);
Here is how billing works:
For direct requests (using the key from an inference provider), you are billed by the corresponding provider. For instance, if you use an OVHcloud API key you’re billed on your OVHcloud account.
For routed requests (authenticating via the Hugging Face Hub), you’ll only pay the standard provider API rates. There’s no additional markup from us; we just pass through the provider costs directly. (In the future, we may establish revenue‑sharing agreements with our provider partners.)
Important Note ‼️ PRO users get $2 worth of Inference credits every month. You can use them across providers. 🔥
Subscribe to the Hugging Face PRO plan to get access to Inference credits, ZeroGPU, Spaces Dev Mode, 20Ă— higher limits, and more.
We also provide free inference with a small quota for our signed‑in free users, but please upgrade to PRO if you can!
We would love to get your feedback! Share your thoughts and/or comments here: https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49
Comments
Want to join the conversation?
Loading comments...