Machine learning at scale

Machine learning at scale

Creator
0 followers

Machine learning systems in the real world.

Production ML: A Reality Check on MLOps
BlogApr 22, 2026

Production ML: A Reality Check on MLOps

A UC Berkeley study of 18 machine‑learning engineers reveals a stark gap between MLOps hype and day‑to‑day practice. The authors introduce a "Three Vs" framework—Velocity, Validation, Versioning—to describe mature production pipelines. They argue that the oft‑cited 85‑90% model‑to‑production failure rate actually...

By Machine learning at scale
LinkedIn’s MixLM: 10x Faster LLM Ranking via Embedding Injection
BlogApr 19, 2026

LinkedIn’s MixLM: 10x Faster LLM Ranking via Embedding Injection

LinkedIn unveiled MixLM, a production ranking system that replaces full job descriptions with pre‑computed soft‑embedding tokens, shrinking context from roughly 900 tokens to just 1‑2 per item. This compression lets the Ranker LLM process queries with minimal item overhead, enabling...

By Machine learning at scale
How xAI's Recommendation System Actually Works
BlogApr 18, 2026

How xAI's Recommendation System Actually Works

The post delivers a detailed technical teardown of xAI’s recommendation system, outlining a two‑stage retrieval and ranking pipeline, the signals that feed the model, and the re‑ranking layer that leverages large language models. It highlights the strategic bets xAI is...

By Machine learning at scale
$220K Lost to a Fraud Model That Passed a 0.82 Accuracy Check [Edition #5]
BlogApr 18, 2026

$220K Lost to a Fraud Model That Passed a 0.82 Accuracy Check [Edition #5]

FinFlow AI, a Series B fintech processing 15 million daily transactions, lost $220,000 after a schema change rendered the merchant_zip feature null. The XGBoost fraud model still met its 0.82 accuracy threshold, so the corrupted data went undetected and fraud capture...

By Machine learning at scale
Pruning LLMs for Retrieval: Why Attention Matters and MLPs Don't
BlogApr 12, 2026

Pruning LLMs for Retrieval: Why Attention Matters and MLPs Don't

The paper introduces EffiR, a pruning framework that flips conventional LLM pruning wisdom for dense retrieval tasks. By aggressively removing MLP layers while preserving attention heads, the authors cut Mistral‑7B’s parameters by roughly 50% and doubled inference speed with minimal...

By Machine learning at scale
A $27K/Month Ranking System That Silently Buried 45,000 New Listings Daily [Edition #4]
BlogApr 11, 2026

A $27K/Month Ranking System That Silently Buried 45,000 New Listings Daily [Edition #4]

SwiftMarket, a Series B e‑commerce marketplace, raised $45 million to scale its discovery engine, processing 520 million search requests and adding 45,000 new listings daily. Its new learning‑to‑rank system, an XGBoost model refreshed weekly, has lifted search click‑through rate by 12% while costing...

By Machine learning at scale
Deep Neural Networks for YouTube Recommendations
BlogApr 5, 2026

Deep Neural Networks for YouTube Recommendations

The 2016 Google paper introduced a two‑stage "funnel" architecture that now underpins YouTube’s massive‑scale recommender system. A Candidate Generation network treats recommendation as extreme multiclass classification, using negative sampling and approximate nearest‑neighbor search to retrieve a few hundred videos from...

By Machine learning at scale
The $5800 FAISS Index That Was Stale for 168 Hours Straight [Edition #3]
BlogApr 4, 2026

The $5800 FAISS Index That Was Stale for 168 Hours Straight [Edition #3]

LexiFeed’s discovery engine relies on a flat FAISS index rebuilt only once a week and a two‑tower model trained on six‑month‑old engagement data. This architecture makes every article up to 168 hours stale, contributing to a flat 4.2% click‑through rate despite...

By Machine learning at scale
ML@Scale Is Leveling up (and Your Window to Lock in at 7 CHF / Month Closes in 48h)
BlogApr 1, 2026

ML@Scale Is Leveling up (and Your Window to Lock in at 7 CHF / Month Closes in 48h)

Machine Learning at Scale (ML@Scale) announced a 2026 content schedule featuring four weekly formats, including a new Zürich Feed that curates Swiss machine‑learning job listings with compensation estimates. The newsletter offers a limited‑time early‑bird subscription at $15 per month (≈ 13 CHF)...

By Machine learning at scale
The Modern LLM Optimization Stack: A Field Guide
BlogMar 29, 2026

The Modern LLM Optimization Stack: A Field Guide

Gauri Gupta’s LLM optimization notes map the current distributed training and inference landscape, emphasizing that naive implementations quickly hit memory limits. The guide details advanced parallelism techniques—ZeRO data parallelism, tensor and pipeline parallelism—and memory‑saving methods like Flash Attention. It also...

By Machine learning at scale
800ms Latency Spikes From A $45K Redis Cluster That Looked Healthy [Edition #2]
BlogMar 28, 2026

800ms Latency Spikes From A $45K Redis Cluster That Looked Healthy [Edition #2]

Fintech firm Veritas Pay, processing 800 million transactions annually, saw its real‑time fraud detection engine exceed the 150 ms SLA, with P99 latency spiking to 800 ms during peak loads. The root causes include Redis write saturation during six‑hour batch syncs, a Python...

By Machine learning at scale
Evolutionary Code Optimization: How Datadog Automates Low-Level Performance Tuning
BlogMar 22, 2026

Evolutionary Code Optimization: How Datadog Automates Low-Level Performance Tuning

Datadog engineers moved from hand‑tuning Go assembly to an automated system called BitsEvolve that leverages large language models and evolutionary algorithms to optimize low‑level code. Manual removal of redundant bounds checks alone delivered a 25% CPU reduction on targeted functions....

By Machine learning at scale
VectoScale Is Paying $237k/Month to Hide a Bad Architectural Decision [Edition #1]
BlogMar 21, 2026

VectoScale Is Paying $237k/Month to Hide a Bad Architectural Decision [Edition #1]

VectoScale, a Series B AI‑infrastructure startup handling 500 million daily queries, spends $237,000 a month on GPU inference and vector storage. Their hybrid retrieval pipeline suffers from an O(N) cross‑encoder reranker, unquantized 768‑dimensional vectors, and a one‑size‑fits‑all HNSW index, leading to p99...

By Machine learning at scale
Meta's GEM: Bringing LLM-Scale Architectures to Ads Recommendation
BlogMar 18, 2026

Meta's GEM: Bringing LLM-Scale Architectures to Ads Recommendation

Meta introduced GEM (Generative Ads Model), a foundation‑model approach that treats ad recommendation like a large language model. The architecture separates sequence and non‑sequence features, uses an InterFormer to handle long user histories, and adds a Student Adapter to keep...

By Machine learning at scale
The Industrialization of Algorithm Design: AI-Driven Research for Systems
BlogMar 15, 2026

The Industrialization of Algorithm Design: AI-Driven Research for Systems

UC Berkeley researchers introduced AI‑Driven Research for Systems (ADRS), a closed‑loop framework where large language models iteratively generate and refine system algorithms using simulators as hard verifiers. The approach treats code generation as an evolutionary search, allowing the LLM to...

By Machine learning at scale
Engineering Airbnb’s Embedding-Based Retrieval System
BlogMar 8, 2026

Engineering Airbnb’s Embedding-Based Retrieval System

Airbnb introduced an Embedding‑Based Retrieval (EBR) system to sharpen the candidate pool for its search experience. The model uses a two‑tower architecture, with offline‑precomputed listing embeddings and real‑time query embeddings, trained on session‑based hard negatives rather than random samples. For...

By Machine learning at scale
Continual Learning via Sparse Memory Finetuning
BlogMar 4, 2026

Continual Learning via Sparse Memory Finetuning

Continual learning for large language models (LLMs) is hampered by catastrophic forgetting when traditional fine‑tuning updates all parameters. A new approach replaces transformer feed‑forward layers with sparse memory layers, updating only a handful of key‑value slots identified via TF‑IDF. Experiments...

By Machine learning at scale
A Real Day in the Life of a ML Engineer.
BlogMar 2, 2026

A Real Day in the Life of a ML Engineer.

The post demystifies a machine‑learning engineer’s routine, showing it’s less about glamorous model training and more about disciplined workflow. The author starts early, clears email inbox, applies a five‑minute rule for quick actions, and parks larger tasks in a physical...

By Machine learning at scale