MLOps Community

MLOps Community

Publication
0 followers

Independent community publication on MLOps practices, tooling, and production ML.

Production Sub-Agents for LLM Post Training
VideoApr 10, 2026

Production Sub-Agents for LLM Post Training

The talk introduced a new production workflow for post‑training large language models, championed by Pinterest’s growth AI lead. Traditional pipelines required a linear, manual sequence—data cleaning, model selection, hyper‑parameter tuning, evaluation loops, and reinforcement learning—taking four to six weeks. By...

By MLOps Community
Fixing GPU Starvation in Large-Scale Distributed Training
VideoApr 10, 2026

Fixing GPU Starvation in Large-Scale Distributed Training

The video examines a pervasive problem in large‑scale distributed machine‑learning: GPUs sit idle because the data pipeline cannot feed them fast enough. Engineers at Uber and former Google staff explain that the bottleneck is not model architecture or quantization,...

By MLOps Community
Practical Security for AI-Generated Code
VideoApr 3, 2026

Practical Security for AI-Generated Code

Milan Williams, product manager at Segrep, opened the session by warning that AI‑driven code generators are no longer limited to single‑line suggestions; they now produce thousands of lines of code and execute shell commands with elevated credentials. He framed the...

By MLOps Community
MCP Dev Summit [Day 1] Ft. Anthropic, Hugging Face, Open AI & Microsoft
VideoApr 2, 2026

MCP Dev Summit [Day 1] Ft. Anthropic, Hugging Face, Open AI & Microsoft

The third MCP Dev Summit kicked off in New York, showcasing the rapid maturation of the Agentic AI Foundation (AIF) and its flagship Model Connectivity Protocol (MCP). Organizers highlighted a surge in community participation, a slate of global events, and a...

By MLOps Community
Choosing the Right Model Is Hard. Maintaining Accuracy Is Harder.
VideoApr 1, 2026

Choosing the Right Model Is Hard. Maintaining Accuracy Is Harder.

Ash Lewis, founder and CEO of Fast Labs, opened the session by highlighting a growing pain point for AI product teams: picking the right large‑language model (LLM) and keeping its performance steady once it’s in production. He noted that the...

By MLOps Community
Stop Shipping on Vibes — How to Build Real Evals for Coding Agents
VideoMar 31, 2026

Stop Shipping on Vibes — How to Build Real Evals for Coding Agents

At the Coding Agents Conference, Braintrust’s developer advocate Jessica Wang warned that many AI coding teams are “shipping on vibes,” deploying agents without solid evaluation frameworks. She emphasized that without real eval datasets, scoring systems, and controlled experiments, organizations are...

By MLOps Community
Decomposing the Agent Orchestration System: Lessons Learned
VideoMar 31, 2026

Decomposing the Agent Orchestration System: Lessons Learned

At the Coding Agents Conference, Union.ai’s chief ML engineer Niels Bantilan warned that building agents is less about novel features and more about resilient infrastructure. He emphasized that durable, self‑healing, and easily debuggable systems prevent costly downtime. Bantilan highlighted Flyte’s...

By MLOps Community
How to Make a Coding Agent a General Purpose Agent - Harrison Chase
VideoMar 31, 2026

How to Make a Coding Agent a General Purpose Agent - Harrison Chase

At the Coding Agents Conference on March 3, 2026, LangChain CEO Harrison Chase and Arcade AI CTO Sam Partee delivered a keynote arguing that the real barrier to scaling AI agents is not model intelligence but foundational infrastructure. They highlighted...

By MLOps Community
How AI Agents Store Memmories
VideoMar 26, 2026

How AI Agents Store Memmories

The video explores how artificial‑intelligence agents manage memory, contrasting traditional file‑system storage with newer, more dynamic approaches. It highlights the distinction between personalization memory and task‑execution memory, and why the choice of storage architecture matters for different agent designs. For agents...

By MLOps Community
Your Code Remembers Where It Broke
VideoMar 25, 2026

Your Code Remembers Where It Broke

The video introduces Temporal’s ability to remember exactly where a piece of code failed and resume execution once the error is fixed. This feature eliminates the traditional need to restart a server or rewrite logic after a syntax or runtime...

By MLOps Community
Lessons From 25 Trillion Tokens — Scaling AI-Assisted Development at Kilo
VideoMar 24, 2026

Lessons From 25 Trillion Tokens — Scaling AI-Assisted Development at Kilo

Kilo’s co‑founder and CEO Scott outlined how the company processed more than 25 trillion tokens since its May launch and used that data to reshape software engineering. By treating 2027‑level AI tools as core collaborators, Kilo shifted developers from manual coders...

By MLOps Community
Performance Optimization and Software/Hardware Co-Design Across PyTorch, CUDA, and NVIDIA GPUs
VideoMar 23, 2026

Performance Optimization and Software/Hardware Co-Design Across PyTorch, CUDA, and NVIDIA GPUs

The conversation centers on performance optimization and software‑hardware co‑design spanning PyTorch, CUDA, and NVIDIA GPUs, highlighted by the launch of SageMaker HyperPod—a service that keeps GPUs pre‑warmed for instant swapping. The speaker also promotes his new O'Reilly book that stitches...

By MLOps Community
Explaining Durable Execution
VideoMar 23, 2026

Explaining Durable Execution

The video explains Temporal’s durable execution model, emphasizing that workflow code must be deterministic. By restricting programs to repeatable logic—no random number generators or external nondeterministic calls—Temporal ensures that rerunning a workflow with identical inputs yields the same results. Key insights...

By MLOps Community
The Promise of Serverless
VideoMar 21, 2026

The Promise of Serverless

The video revisits the original promise of serverless computing, explaining how the term emerged organically as developers imagined writing code, uploading it to a massive cloud, and letting the platform handle execution without manual server management. It highlights key attributes such...

By MLOps Community