How We Built a Distributed Work Scheduling System for Pulumi Cloud

•February 26, 2026

Pulumi Blog•Feb 26, 2026

Why It Matters

The system delivers reliable, scalable workflow execution without external queue dependencies, reducing operational overhead for Pulumi and its self‑hosted customers. It ensures consistent performance and observability across heterogeneous environments, a critical advantage in multi‑cloud management.

Key Takeaways

•Built on database, avoids external queue dependencies
•Lease-based optimistic concurrency ensures exactly-once execution
•Supports both hosted and customer‑managed runners via pull‑only agents
•Handles retries, priorities, and dependency DAGs natively
•Extensible design adds new workflow types without extra plumbing

Pulse Analysis

Pulumi’s shift from a single‑purpose deployment queue to a generic background activity platform reflects a broader industry trend toward unified orchestration layers. By anchoring the scheduler in the existing relational database, Pulumi eliminates the need for separate message‑queue services, a boon for self‑hosted deployments that often operate in air‑gapped or firewalled environments. This architectural choice also simplifies compliance and reduces the operational surface area, allowing teams to focus on core product features rather than maintaining ancillary infrastructure.

At the heart of the system lies a lease‑based state machine that coordinates work across distributed agents without a central coordinator. Agents poll for work, acquire an atomic lease token, and periodically renew it, ensuring that only one worker processes a given activity at any time. If a lease expires due to a crash or network loss, the activity automatically transitions to a restarting state, making it instantly available for another agent. The model also embeds priority handling, rate‑limit awareness, and dependency tracking, enabling complex DAG‑style workflows such as Insight scans that trigger downstream policy evaluations.

For businesses, this design translates into faster feature rollout and lower total cost of ownership. New workflow types inherit scheduling, retry, and observability capabilities out‑of‑the‑box, accelerating time‑to‑value for Pulumi’s customers. The symmetric execution paths—direct for Pulumi‑hosted workers and REST‑based for customer‑managed runners—ensure consistent behavior and monitoring regardless of where the code runs. As cloud environments grow more heterogeneous, Pulumi’s extensible, lease‑driven scheduler positions it as a reliable backbone for multi‑cloud automation and compliance at scale.

How We Built a Distributed Work Scheduling System for Pulumi Cloud

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI: