DevOps News - Page 21

News•Feb 26, 2026

How We Built a Distributed Work Scheduling System for Pulumi Cloud

Pulumi Cloud needed a unified scheduler to orchestrate deployments, Insights scans, and policy evaluations across both its own infrastructure and customer‑managed runners. The team built a database‑backed background activity system that treats each workflow as a typed, persistent activity with priority, routing, and retry metadata. A lease‑based optimistic concurrency model guarantees exactly‑once execution and automatic recovery from crashes or network failures. The design supports pull‑only agents, dependency DAGs, and a single handler interface for both hosted and remote execution modes, enabling rapid addition of new workflow types.

By Pulumi Blog

News•Feb 26, 2026

Deep Dive: How Linkerd-Destination Works in the Linkerd Service Mesh

The article dissects linkerd-destination, the core component of Linkerd’s control plane that drives service discovery, policy distribution, and service‑profile enforcement. It explains how the service uses Kubernetes watches and EndpointSlices to translate cluster events into real‑time gRPC streams for proxies....

By Linkerd Blog

News•Feb 25, 2026

6,000 AWS Accounts, Three People, One Platform: Lessons Learned

ProGlove runs a SaaS platform on AWS using an account-per-tenant architecture, currently operating about 6,000 tenant accounts—half active—with over 120,000 service instances and a million Lambda functions. The approach gives each customer isolated compute, storage, and IAM boundaries, simplifying security,...

By AWS Architecture Blog

News•Feb 25, 2026

Fix Cypress CI Failures Caused by No Spec Files Found

Cypress 15.11.0 introduces the --pass-with-no-tests CLI flag, allowing test runs that find zero spec files to exit with a zero status code instead of failing the CI pipeline. The failure previously occurred when configuration patterns like specPattern or --spec matched no files, often due to mis‑configured...

By Cypress – Blog

News•Feb 25, 2026

Percona Operator for MongoDB 1.22.0: Automatic Storage Resizing, Vault Integration, Service Mesh Support, and More!

Percona released Operator for MongoDB version 1.22.0, adding automatic Persistent Volume Claim resizing, HashiCorp Vault integration for system user credentials, and native service‑mesh compatibility via the appProtocol field. The update also expands backup and restore capabilities, including replica‑set name remapping,...

By Percona Blog

News•Feb 25, 2026

Rafay Joins VAST Cosmos to Enable Governed GPU-Powered AI Services

Rafay has joined the VAST Cosmos Community as a Technology Partner, aligning its AI‑native cloud control plane with VAST Data’s AI Operating System. The collaboration integrates Rafay’s orchestration platform with VAST’s governed storage services, creating a unified, multi‑tenant AI service...

By Rafay – Blog

News•Feb 25, 2026

Maintaining Compliance when Adopting AI in Regulated Industries

Regulated firms can integrate AI without sacrificing compliance by leveraging automated testing. Continuous validation mitigates risks from non‑deterministic model behavior, frequent updates, and limited explainability. The approach preserves audit‑readiness, traceability, and documented evidence across frameworks such as SOX, HIPAA, and...

By SmartBear – Blog

News•Feb 25, 2026

Cagent: Dockers Newest Low Code Agentic Platform

Docker unveiled Cagent, an open‑source, low‑code framework that lets developers launch AI agents using a single YAML file instead of extensive code. The platform integrates the Model Context Protocol (MCP) and Docker Model Runner to support multiple LLM providers and...

By DZone – DevOps & CI/CD

News•Feb 25, 2026

CloudCasa Expands Red Hat OpenShift Data Protection Across Edge and Hybrid Cloud

CloudCasa has upgraded its backup and recovery platform to better serve Red Hat OpenShift deployments across core, edge, and hybrid cloud environments. The update adds native SMB protocol support as a backup target, letting customers use existing SMB storage or operator‑deployed...

By Help Net Security

News•Feb 25, 2026

Crossplane & AI: The Case for API-First Infrastructure

AI‑assisted development has moved the primary bottleneck from writing code to the myriad tasks that follow a git push, such as provisioning, policy enforcement, and drift remediation. Most existing platforms keep the desired state in Git while the actual state...

By Crossplane Blog

News•Feb 25, 2026

Lightrun Debuts Real-Time AI Site Reliability Engineer for Autonomous Software Remediation

Lightrun Inc. unveiled an AI‑powered Site Reliability Engineer that can generate missing runtime evidence on‑the‑fly, eliminating the need for redeployments. The tool leverages the company’s patented Sandbox and Runtime Context engine to capture live, line‑level execution data, prove root causes,...

By SiliconANGLE

News•Feb 25, 2026

Sauce Labs Launches Industry’s First Programmable Mobile Device Cloud for the AI Era

Sauce Labs announced the Real Device Access API, the first programmable mobile device cloud designed for the AI era. The API lets developers control real Android and iOS devices through HTTP, issuing ADB or xcrun commands, streaming video, and accessing...

By SD Times

News•Feb 25, 2026

What AX Can Do to Deliver Cohesion and Uniformity to AI Agents

The article introduces the concept of Agent Experience (AX), a discipline for preparing enterprise systems so AI agents can discover, invoke, and manage tools reliably. It stresses that agents require precise, structured documentation, robust API specifications, and context‑engineering such as...

By CIO.com

News•Feb 25, 2026

Rootly | The Unofficial KubeCon EU '26 SRE Track

Rootly has published an unofficial KubeCon Europe 2026 SRE track, hand‑picking six sessions that focus on reliability, observability, incidents, and chaos engineering. The guide highlights high‑impact talks such as Airbnb’s zero‑downtime migration of 1,000 services, AI‑enabled control planes for alert fatigue,...

By Rootly – Blog

News•Feb 25, 2026

New GitLab Metrics and Registry Features Help Reduce CI/CD Bottlenecks

GitLab announced two beta features aimed at easing CI/CD bottlenecks: job‑level performance metrics and a Container Virtual Registry. The job metrics panel, available to Premium and Ultimate customers, displays median and 95th‑percentile durations, failure rates, and sortable tables directly in...

By GitLab Blog

News•Feb 25, 2026

MCP Security: The Current Situation

The Model Context Protocol (MCP) standardizes LLM integration with external tools, but recent flaws expose enterprises to serious threats. A prompt‑injection bug in GitHub's MCP client leaked private repository data, while Anthropic's Filesystem server suffered CVE‑2025‑53109 and CVE‑2025‑53110 sandbox‑escape vulnerabilities....

DevOps News and Headlines

How We Built a Distributed Work Scheduling System for Pulumi Cloud

Deep Dive: How Linkerd-Destination Works in the Linkerd Service Mesh

6,000 AWS Accounts, Three People, One Platform: Lessons Learned

Fix Cypress CI Failures Caused by No Spec Files Found

Percona Operator for MongoDB 1.22.0: Automatic Storage Resizing, Vault Integration, Service Mesh Support, and More!

Rafay Joins VAST Cosmos to Enable Governed GPU-Powered AI Services

Maintaining Compliance when Adopting AI in Regulated Industries

Cagent: Dockers Newest Low Code Agentic Platform

CloudCasa Expands Red Hat OpenShift Data Protection Across Edge and Hybrid Cloud

Crossplane & AI: The Case for API-First Infrastructure

Lightrun Debuts Real-Time AI Site Reliability Engineer for Autonomous Software Remediation

Sauce Labs Launches Industry’s First Programmable Mobile Device Cloud for the AI Era

What AX Can Do to Deliver Cohesion and Uniformity to AI Agents

Rootly | The Unofficial KubeCon EU '26 SRE Track

New GitLab Metrics and Registry Features Help Reduce CI/CD Bottlenecks

MCP Security: The Current Situation

Bi-Directional Sync for ServiceNow and Azure DevOps

How to Integrate an AI Chatbot Into Your Application: A Practical Engineering Guide

Integration Reliability for AI Systems: A Framework for Detecting and Preventing Interface Mismatch at Scale

Most Platform Teams Build Products, but They Don’t Know It

Terraform Enterprise 1.2 Upgrades Workflows, Visibility, and Brownfield Migration

Building Event-Driven Data Pipelines in GCP

Kilo Launches KiloClaw, Allowing Anyone to Deploy Hosted OpenClaw Agents Into Production in 60 Seconds

TASKING Integrates Modern AI Technology to Enable Robust Software Verification and Validation (V&V)

How to Setup Credentials for Windows to Use DigiCert KeyLocker & SMCTL?

7 Ways to Tame Multicloud Chaos with Generative AI

From Days to Minutes: How Omnisend Embedded AI Into the Data Lifecycle

Composio Open Sources Agent Orchestrator to Help AI Developers Build Scalable Multi-Agent Workflows Beyond the Traditional ReAct Loops

Global Scale, Local Presence: Fastly Object Storage Expands to New Regions

Enhancing Security and Transparency: Introducing Private Notifications for Fastly Maintenance and Incidents

Predictable AI: Announcing the January and February Validated Model Batches

Observability Vs. Monitoring: What's the Difference?

Red Hat AI Enterprise: Bridging the Gap From Experimentation to Production Scale

Migrate Your VMs Faster with the Migration Toolkit for Virtualization 2.11

Agentic SDLC: GitLab and TCS Deliver Intelligent Orchestration Across the Enterprise

AI Infrastructure Cost Optimization for Scaling Teams

Researchers Baked 3x Inference Speedups Directly Into LLM Weights — without Speculative Decoding

Getting Started with Gemini and CircleCI

BMC Expands Collaboration with AWS to Accelerate Intelligent Automation

The Rise of Infrastructure as Code in Live Production: Are You Ready?

Kubernetes as AI’s Operating System: 1.35 Release Signals

RemotiveLabs Joins HERE and AWS SDV Accelerator Programme

A Coding Guide to Instrumenting, Tracing, and Evaluating LLM Applications Using TruLens and OpenAI Models

AI & Data Security: Insights From IBM’s Chief Architect

Metrics that Matter: How to Prove the Business Value of DevEx

Move Harness Projects Between Orgs Without Starting Over

Cloudflare’s Markdown for Agents Automatically Make Websites Agent-Ready

Anthropic Unveils New AI Feature to Scan Codebases, Suggest Patches Within Claude Code

Guide to the Top 20 QA Metrics that Matter

Hubert 'Depesz' Lubaczewski: Per-Worker, and Global, IO Bandwidth in Explain Plans

DevOps Pulse