Introducing Fireworks AI on Microsoft Foundry: Bringing High Performance, Low Latency Open Model Inference to Azure

•March 11, 2026

Azure Blog•Mar 11, 2026

Why It Matters

Enterprises gain a scalable, secure way to run open‑source LLMs at production speed, reducing lock‑in and infrastructure complexity. This accelerates AI adoption across industries while preserving compliance and cost control.

Key Takeaways

•Fireworks AI offers 180k RPS inference on Azure
•Foundry unifies model evaluation, deployment, and governance
•Supports bring‑your‑own‑weights for custom model use
•Serverless and PTU pricing give flexible cost options
•Open models now run with enterprise‑grade security

Pulse Analysis

Open‑source large language models are becoming the backbone of enterprise AI strategies, offering flexibility, cost efficiency, and the ability to fine‑tune models for specific workloads. However, many organizations struggle with fragmented tooling, inconsistent governance, and the operational overhead of building custom serving stacks. Microsoft Foundry addresses these pain points by providing a single control plane that centralizes model cataloging, evaluation, deployment, and observability, enabling teams to move from experimentation to production without reinventing infrastructure.

The integration of Fireworks AI into Foundry brings industry‑leading inference performance to Azure’s cloud ecosystem. Fireworks’ engine, already handling more than 13 trillion tokens daily and sustaining roughly 180 thousand requests per second, now powers open models such as DeepSeek V3.2, OpenAI gpt‑oss‑120b, Kimi K2.5, and MiniMax M2.5. Developers can choose serverless, pay‑per‑token execution for rapid prototyping or provisioned throughput units for predictable, high‑volume workloads, all while leveraging Azure’s security, compliance, and monitoring capabilities.

For businesses, this partnership translates into faster time‑to‑value and reduced operational risk. By consolidating model lifecycle management— from day‑zero access and custom weight uploads to governance and scaling— enterprises can focus on building differentiated AI applications rather than managing disparate infrastructure. The combined offering positions Azure as a premier destination for open‑model AI, encouraging broader adoption and fostering innovation across sectors that demand both performance and enterprise‑grade control.