
Machine Learning at Scale (ML@Scale) announced a 2026 content schedule featuring four weekly formats, including a new Zürich Feed that curates Swiss machine‑learning job listings with compensation estimates. The newsletter offers a limited‑time early‑bird subscription at $15 per month (≈ 13 CHF) or $109 per year, after which the price rises to $20 per month (≈ 22 CHF) or $149 per year. Subscribers lock in the discounted rate permanently, while the free tier remains available. The promotional window closes in 48 hours.

Gauri Gupta’s LLM optimization notes map the current distributed training and inference landscape, emphasizing that naive implementations quickly hit memory limits. The guide details advanced parallelism techniques—ZeRO data parallelism, tensor and pipeline parallelism—and memory‑saving methods like Flash Attention. It also...
![800ms Latency Spikes From A $45K Redis Cluster That Looked Healthy [Edition #2]](/cdn-cgi/image/width=1200,quality=75,format=auto,fit=cover/https://substackcdn.com/image/fetch/$s_!fOxT!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F444d8dff-2e3d-4216-b86d-30b379177d49_1200x1200.png)
Fintech firm Veritas Pay, processing 800 million transactions annually, saw its real‑time fraud detection engine exceed the 150 ms SLA, with P99 latency spiking to 800 ms during peak loads. The root causes include Redis write saturation during six‑hour batch syncs, a Python...

Datadog engineers moved from hand‑tuning Go assembly to an automated system called BitsEvolve that leverages large language models and evolutionary algorithms to optimize low‑level code. Manual removal of redundant bounds checks alone delivered a 25% CPU reduction on targeted functions....
![VectoScale Is Paying $237k/Month to Hide a Bad Architectural Decision [Edition #1]](/cdn-cgi/image/width=1200,quality=75,format=auto,fit=cover/https://substackcdn.com/image/fetch/$s_!fOxT!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F444d8dff-2e3d-4216-b86d-30b379177d49_1200x1200.png)
VectoScale, a Series B AI‑infrastructure startup handling 500 million daily queries, spends $237,000 a month on GPU inference and vector storage. Their hybrid retrieval pipeline suffers from an O(N) cross‑encoder reranker, unquantized 768‑dimensional vectors, and a one‑size‑fits‑all HNSW index, leading to p99...

Meta introduced GEM (Generative Ads Model), a foundation‑model approach that treats ad recommendation like a large language model. The architecture separates sequence and non‑sequence features, uses an InterFormer to handle long user histories, and adds a Student Adapter to keep...

UC Berkeley researchers introduced AI‑Driven Research for Systems (ADRS), a closed‑loop framework where large language models iteratively generate and refine system algorithms using simulators as hard verifiers. The approach treats code generation as an evolutionary search, allowing the LLM to...

Airbnb introduced an Embedding‑Based Retrieval (EBR) system to sharpen the candidate pool for its search experience. The model uses a two‑tower architecture, with offline‑precomputed listing embeddings and real‑time query embeddings, trained on session‑based hard negatives rather than random samples. For...

Continual learning for large language models (LLMs) is hampered by catastrophic forgetting when traditional fine‑tuning updates all parameters. A new approach replaces transformer feed‑forward layers with sparse memory layers, updating only a handful of key‑value slots identified via TF‑IDF. Experiments...

The post demystifies a machine‑learning engineer’s routine, showing it’s less about glamorous model training and more about disciplined workflow. The author starts early, clears email inbox, applies a five‑minute rule for quick actions, and parks larger tasks in a physical...