AI Judges Will Filter the Coming Content Deluge
One AI use case that is only getting more popular: LLM as a judge. Everyone still talks about AI generating more content, but not enough people are talking about: 1) the horrific deluge of noise we're going to have to deal with in the next 12 months when everyone nanobananas a million data reports 2) how impactful a 900x cost drop is when we're trying to start with a goal and find the best answers and best ideas So how can AI help us filter that content? The real workflow to me looks more like: extreme expansion then extreme synthesis. Both powered by AI. The winners will figure out AI-powered evaluation an curation (think: scoring, ranking, evaluating, finetuning models, prompting with examples) as fast as they figured out AI-powered generation. Infographic Prompted by: @alliekmiller Created by: @NanoBanana
Image Generation Has Leaped Forward Since Summer 2022
it is wild how far we’ve come since the hot image gen summer of 2022 image cred: @bfl_ml https://t.co/i5KbszlFDM
FLUX.2 Launches, Upgrading Community’s Top Image Model
the community’s favorite image creation and editing model just got better: welcome, FLUX.2 by @bfl_ml 🤩 https://t.co/iLrbYYK4bd
Scaling Boosts Benchmarks, Not Genuine Problem‑solving Ability
I think it is somewhat true though that scaling helps with benchmark performance but not necessarily with with new model capabilities. Like the example he mentioned > U: "Please code xyz." > M: "Ok here is xyz." > U: "You have a bug." >...

CV Engineers Should Master YOLO for Object Detection
So you are a CV engineer, what do you know about Computer Vision? I have used YOLO for... https://t.co/lX4OrFFYqE
New Research Paper Now Available Online
Here's our paper: https://t.co/RmNft3zU5Z
AI Control Remains Unsolved; Top Oversight Fails 92%
Excited to present our new AI paper as a @NeurIPSConf spotlight next week: we find that the problem of controlling artificial superintelligence remains unsolved. With simulations and scaling laws, we find that an implementation of the least unpromising...
OpenAI Positions Codex as Team Member, Not Tool
OpenAI is very deliberate about how they talk about Codex. It's not positioned as an operating system. It's heavily positioned as a teammate. Their site says: "Your new coding partner", "accelerates your team" Their job postings say: "we're building an AI software...
Beyond Scaling: Engineering Tricks Now Drive AI Progress
@dwarkesh_sp @ilyasut “The Age of Scaling is over.” I agree with that. Basically, since GPT 4.5 a lot of the perceived real-world progress was driven by clever engineering wrappers (context filtering, inference scaling, multi-turn tricks, retrieval, tool use, etc).

SAM 3D Enables Data‑Driven, Personalized Rehabilitation Insights
SAM 3D is helping advance the future of rehabilitation. See how researchers at @carnegiemellon are using SAM 3D to capture and analyze human movement in clinical settings, opening the doors to personalized, data-driven insights in the recovery process. 🔗 Learn more about...
Open Collaboration Fuels AI Boom and Scientific Innovation
Excited about the Genesis mission - congrats to @POTUS @SecretaryWright @ScienceUnderSec @mkratsios47 @sriramk! We've experienced first-hand how more openness and collaboration in the US can massively accelerate progress. In my opinion, that's what led to the current AI boom and US...
15 Essential Architectural Traits for Building Robust AI Agents
Just shared this brilliant mind map on the 15 key architectural characteristics of AI agents — absolutely packed with insights! Modularity, evolvability, context awareness, security compliance… everything you need to design robust agents. Huge thanks to @Python_Dv for creating this gem
Seeing Benchmaxxing, Ilya Launches Company for Proper LLM Development
Ok, so what Ilya saw was extreme benchmaxxing, which in turn prompted him to create his own company to do LLM development the proper way?! Makes sense, I sympathize with that.
Machines Exploit Shortcuts, Creating More Correct‑unintended Rules than Humans
@giffmana @dileeplearning the "correct-unintended" rules were just that -- correct on the demonstrations but using "shortcuts" (e.g., the numerical value of a color). We also saw a small percentage of "correct-unintended" rules that humans generated, but much less...
Adaptive 2D Gaussians Redefine Image Compression and Restoration
📢 Image-GS: Content-Adaptive Image Reconstruction using 2D Gaussians In this week’s deep dive, we explore Image-GS, a groundbreaking framework that reimagines how images can be represented, compressed, restored, and upsampled using adaptive 2D Gaussian splats. Unlike traditional codecs or neural...
Machines Craft Meaningful Unintended Rules; Humans Produce Nonsense
@giffmana @dileeplearning There was a big difference between "not classified" rules generated by humans and "correct-unintended" rules generated by machines. For humans, the "not classified" rules were generally humans writing nonsensical things like ⬇️
New AI Tests Need Fresh Images; Recipe Finally Clarified
One more comment is that giving this image to an AI and asking about it is not sufficient to show the diff because it's all over the training data by now. You'd have to use a new, very recent image,...
Pretrain, Fine‑tune, and Let Big AI Solve Tasks
@matejhladky_dev AI has crushed it since this post way beyond expectation. I made the same category of mistake all of AI was making, of thinking we have to discover and write the algorithm. You don't. You pretrain and then finetune...
LLMs Know Popular APIs, Need Docs for Obscure Ones
I've had medium success asking LLMs if a thing exists, it works out of the box for some of the more well-known things (e.g. both GPT 5.1 and Gemini 3 know about this function if you describe the tensor transformation...
AI and Energy Tech Converge to Transform Industries
@UmmayHabiba0 @SchneiderNA Certainly. We’re witnessing a major shift in real time. AI and energy tech are finally converging in ways that will reshape how industries operate and how infrastructure is built. Here’s the video if you’d like to take a look: 📺...
AI, Energy, Infrastructure Converge to Transform U.S. Economy
@the_AI_girl @SchneiderNA Absolutely, the momentum building across AI, energy, and infrastructure is setting the stage for a major transformation in the U.S. economy. I just shared more of my insights here on @LinkedIn : https://t.co/WwaOkGdcNm Big shifts ahead.
Discover PyTorch’s Pixel_unshuffle: Skip Custom Tensor Hacks
Always a slightly mixed feeling to write pretty good first-principles code to do some tensor rearrangement, only to find that PyTorch has a built in function that does it faster. I had made a point of at least skimming the docs...
Clear AI Rules Accelerate Trust and Innovation
85% of organizations believe responsible AI is a top management issue. Yet only 25% have governance mechanisms in place to address it. This trust gap is costing companies dearly. In Europe alone, 68% of companies don't understand their EU AI...
Hybrid Search on 1.2M Samples: BM25 & Embeddings
1.2 million samples. BM25, Embeddings and Hybrid search. Tutorial and code comes tomorrow! Stay tuned! https://t.co/FlmaDlpASR
Embeddings Beat LLMs for Fast, Cheap Classification
totally forgot about this experiment where i found it was faster and cheaper to do classification via embeddings vs using the fastest/cheapest llm (at the time)
Embedding Classification Beats Fastest Model on Speed, Cost
@jasonth0 did an experiment a bit back, and found that embedding based classification seemed consistently faster and cheaper than using the cheapest/fastest model (at the time) https://t.co/uuEPwu88cg
LLM
kinda like this, but instead of using vec2text - i found grabbing a few samples from each cluster and feeding into an llm came up with better names (not surprisingly) https://t.co/K8phyyDFdR
Clustering Embeddings Drives Dynamic Ontology Exploration
last weekend i went down the rabbit hole of how to build dynamic ontologies, and kept coming back to clustering of embeddings curious if anyone has cool experiments i could look at around this
AI's Temporal Hacking Threatens Autonomy and Democracy
AI manipulation techniques revealed. For our free newsletter this week, we cover temporal hacking: AI systems that game human attention over months. @IrenaCronin and I write this newsletter every week. Temporal hacking describes AI systems that optimize for long term outcomes by subtly...
AI Landing Page Tools Score Poorly in Real Audits
We scored every major AI landing page analyzer across the same criteria we use for real CRO audits. Comprehensiveness. Specificity. Originality. Realistic implementation. Correctness. The highest score was 5/15. Several landed at zero. This isn’t a knock on AI. It’s a reflection of...
Anthropic Takes Lead in AI Coding Competition
Anthropic won the AI coding race 😏
Modern VLMs Struggle with Long‑Horizon Household Tasks
Our most recent work that benchmarks modern VLM and their efficacy for long horizon household activities in robotic learning, using BEHAVIOR benchmark environment.👇
Chat UI Limits AI; Context‑First Unlocks Power
Chat made AI feel real for the first time. A blank box. A question. A response that sounded alive. It created the belief that conversation was the natural interface for intelligence. But the chat window is the smallest view of...
Chat UI Masks AI Limits; Context Drives True Potential
Chat made AI feel real for the first time. A blank box. A question. A response that sounded alive. It created the belief that conversation was the natural interface for intelligence. But the chat window is the smallest view of...
Evolving Guardrails Needed for Autonomous AI Accountability
Generative, agentic, and autonomous AI raise new challenges that affect both decisions and careers. • Who is accountable when automated advice causes harm • How do we manage control when agents act across multiple systems • How do we sustain...

OpenAI Publishes Essential Guide for Developers
Link to the OpenAI guide here: →https://t.co/9pQoaiMq5i https://t.co/M1STn4lgjt
OpenAI's Playbook Enables AI-Run Full SDLC
OpenAI just dropped an ace playbook for AI-native engineering teams! It explains how coding agents can now run the entire SDLC (planning, coding, testing) while humans make the high-level decisions. A clear, practical framework for hybrid engineering. Free download in 🧵↓ https://t.co/N2vSskm1Pr
NanoBanana Pro Powers Gemini 3 Prompting Best‑practice Whiteboard
NanoBanana Pro is insane. I love how @_philschmid used it to create a whiteboard that breaks down Gemini 3’s prompting best practices 🔥 https://t.co/8Wia9rJR02
Opus 4.5 Mitigates Cursor AI Bugs, Not Solely Responsible
For Gemini 3, I don't rule out bugs in @cursor_ai — as many new features don't work, worktrees getting trashed or even renamed (!) mid-way through agent working. But since Opus 4.5 manages around those bugs, it can't...
New Model Outperforms Gemini 3 with Greater Polish
My verdict is that it's significantly better than Gemini 3. It's at least as smart and just got more polish to it. Alignment on little details also significantly higher. Gemini 3 gets many things mixed up after a half-dozen messages, and...
Opus 4.5 Executes Tasks Seamlessly Beyond Token Limits
With Opus 4.5, it seems you don't need to ask multiple times or ORDER it to do work, it just gets stuff done — even beyond 50% the token limit and after chat compaction! This kind of message is a thing...
LLMs Still Far From Their Intelligence Ceiling
LLMs haven't hit their "smartness" limits yet.
Self‑Driving Transit Beats Cars: Faster, Cheaper, Higher Throughput
#3 reduces utility because of congestion. Self driving public transit like @glydways is a much better answer that reduces costs, reduces time to destination and increases thruput 10X in humans transited per hour for any given street width. Time...
NVIDIA's Jetson Shortage Highlights Soaring Demand Across All Tiers
Matt is right about NVIDIA's competition. No one can meet demand. We need all hands on deck. And Jensen didn't even talk about its Jetson product line in the latest earnings report. Entrepreneurs tell me they have to wait eight weeks...
AI Bubble Exists, Real Value Firms Will Prevail
Are we in an AI bubble? 🤖💭 There is more talk than ever about AI hype, inflated valuations, and whether the bubble will burst. In this video I break down what is really happening in the AI market, why both...
AI Uses Embeddings, Not Memory, to Understand Context
Ever wondered how AI “remembers” your question… without having memory? 🤔 Every time you chat with an LLM, it somehow knows what you said before. But here’s the secret: It doesn’t remember your words. It understands meaning through something called embeddings. Embeddings are how machines...
Genspark’s Token‑Heavy Model Fuels Unicorn‑Speed Growth
You get Opus. And you get Opus. And you get Opus. Genspark uses more tokens than almost any company. At its press event last week the founder proudly held up a trophy from OpenAI proving so. Out of all the American companies...
Six-Layer Stack Powers Autonomous Generative AI Agents
The Generative AI ecosystem is evolving into a full tech stack — powering autonomous AI agents. From infrastructure and LLMs to RAG pipelines, agent behaviors and orchestration layers, this framework shows the 6 layers driving next-gen AI systems. Credit: @goyalshalini #AI #GenerativeAI #AgenticAI...
First EU AI Audit Forces Real Traceability, Not Slides
Some moments in tech feel like déjà vu, you can sense the shift coming long before it hits. A few weeks ago, during a conversation with a senior leader, I asked a simple question: “If regulators came knocking tomorrow, could...
AI Agents Will Revolutionize Business Workflows by 2026
5 Amazing AI Agent Use Cases That Will Transform Any Business In 2026 #AIagents represent the evolution beyond simple #chatbots , capable of autonomously planning and executing complex business workflows from start to finish. This article explores five practical applications,...