The Linux Foundation

Creator

0 followers

Open source/cloud-native ecosystem, compliance initiatives

Video•Mar 17, 2026

LF Live Webinar: Context Engineering for Self-Healing AI SRE

The LF Live webinar featured Assaf Saf Salvich, AI Engineering Group Manager at Commodore, outlining the company’s journey toward self‑healing AI‑driven Site Reliability Engineering (SRE). He described how Commodore has amassed close to two million real‑world Kubernetes incidents, initially attempting to address them with deterministic runbooks before realizing the approach could not scale. Key insights revealed that early categorization into six broad buckets quickly proved insufficient; the incident taxonomy ballooned to dozens of nuanced sub‑categories, each demanding distinct remediation logic. To cut through the noise, Commodore introduced a “context engine” that aggregates organizational, cluster, cloud, and historical incident data, feeding it into machine‑learning models that generate dynamic, situation‑specific runbooks. Illustrative examples highlighted the perils of shallow analysis: two services—Cash Loader and Event Processor—both exhibited out‑of‑memory crashes, yet one required a simple memory‑limit increase while the other stemmed from a memory leak that would be exacerbated by the same fix. A second case contrasted an order‑processing chain with a data‑analytics pipeline, showing identical storage‑service symptoms but divergent root causes, underscoring the necessity of deep contextual signals. The broader implication is a paradigm shift from static, one‑size‑fits‑all runbooks to adaptive, AI‑powered incident remediation. By automating root‑cause identification and prescribing context‑aware fixes, Commodore aims to dramatically shrink mean‑time‑to‑recovery (MTTR) for SRE teams operating at massive scale, setting a new benchmark for operational resilience in cloud‑native environments.

By The Linux Foundation

Video•Mar 11, 2026

AI Runs on Open Source & Real Humans: Why You Need Linux & Cloud Native Skills to Power AI at...

AI adoption is accelerating, but high‑performing models depend on open‑source foundations such as Linux, Kubernetes, and cloud‑native infrastructure. Without this stack, AI systems struggle to scale, deploy reliably, and move beyond experimental phases. The video highlights a growing talent gap:...

By The Linux Foundation

Video•Feb 19, 2026

Why Half of All Kubernetes Clusters Are About to Become Vulnerable | Kat Cosgrove & Tabitha Sable

The Kubernetes Steering Committee announced that the Ingress NGINX controller – a core ingress solution for roughly half of cloud‑native deployments – will be officially retired at the end of March, six weeks from the announcement. After that date the...

By The Linux Foundation

Sponsored Session: Installing OpenTelemetry, Today and Tomorrow - Ted Young, Grafana

Video•Feb 18, 2026

Technology Pulse

Top Publishers

Top Creators

Top Companies

Top Investors

The Linux Foundation

LF Live Webinar: Context Engineering for Self-Healing AI SRE

AI Runs on Open Source & Real Humans: Why You Need Linux & Cloud Native Skills to Power AI at...

Why Half of All Kubernetes Clusters Are About to Become Vulnerable | Kat Cosgrove & Tabitha Sable

Sponsored Session: Installing OpenTelemetry, Today and Tomorrow - Ted Young, Grafana

Technology Pulse

Top Publishers

Top Creators

Top Companies

Top Investors

The Linux Foundation

LF Live Webinar: Context Engineering for Self-Healing AI SRE

AI Runs on Open Source & Real Humans: Why You Need Linux & Cloud Native Skills to Power AI at...

Why Half of All Kubernetes Clusters Are About to Become Vulnerable | Kat Cosgrove & Tabitha Sable

Sponsored Session: Installing OpenTelemetry, Today and Tomorrow - Ted Young, Grafana