Large Language Models Do Science

Science Magazine
Science MagazineMar 10, 2026

Why It Matters

AI’s ability to accelerate discovery promises competitive advantage for firms and labs, yet unchecked errors risk costly setbacks and erode trust in scientific outputs.

Key Takeaways

  • Google Gemini solved five of six IMO problems, earning gold
  • Meta’s Lama-powered KMA optimized conditions for novel pharmaceutical synthesis
  • Google AI co‑scientist identified existing compounds for liver fibrosis
  • AI agents authored and reviewed papers at Open Agents for Science
  • Researchers warn AI can misinterpret methods and fabricate citations

Summary

The video highlights how large language models are increasingly being deployed as active participants in scientific discovery, from solving competition‑level math problems to designing experiments in chemistry and biology.

Google DeepMind’s Gemini model answered five of six International Math Olympiad questions, a performance once thought decades away. Meta’s Lama‑based KMA assistant pinpointed optimal conditions for a previously unreported organic synthesis crucial to many drugs, while Google’s AI co‑scientist flagged repurposing candidates for liver fibrosis from existing libraries. At the inaugural Open Agents for Science conference, AI agents acted as both authors and peer reviewers alongside human collaborators.

Proponents praised the speed gains, noting that AI can draft sections, generate code, and surface literature in minutes. Critics, however, warned that the models sometimes misinterpret experimental methods, produce code that crashes without human debugging, and even cite or fabricate nonexistent papers, underscoring the need for rigorous validation.

These developments suggest AI could dramatically shorten research cycles, but the mixed results also highlight the necessity of human oversight and new standards for AI‑generated scholarship to prevent misinformation and maintain scientific integrity.

Original Description

When Google DeepMind unveiled its protein-structure predictor AlphaFold2 in 2020, it upended expectations for what AI could accomplish in science. Few imagined that general-purpose large language models (LLMs), trained on trillions of words and optimized simply to regurgitate humanlike text, might follow suit.
That view is now shifting—tectonically—as LLMs scale up. Last year, they showcased Ph.D.-caliber acumen across a wide swath of science.
CREDITS: (FOOTAGE) AGENTS4SCIENCE; GOOGLE DEEPMIND; THEVISUALMD/SCIENCE SOURCE; (RESEARCH) GUAN ET AL./ADVANCED SCIENCE; ZHANG ET AL./NATURE MACHINE INTELLIGENCE; (VIDEO PRODUCTION) M. CANTWELL/SCIENCE
#AI #LLMs #Science #ScienceShorts

Comments

Want to join the conversation?

Loading comments...