With Evo 2, AI Can Model and Design the Genetic Code for All Domains of Life

With Evo 2, AI Can Model and Design the Genetic Code for All Domains of Life

Phys.org – Biotechnology
Phys.org – BiotechnologyMar 4, 2026

Why It Matters

Evo 2 provides a universal genomic language model that accelerates disease‑gene discovery, synthetic biology and targeted therapies, reshaping R&D efficiency across biotech and pharma. Its open‑source release democratizes high‑scale AI tools, fostering rapid innovation while addressing biosecurity through built‑in safeguards.

Key Takeaways

  • Trained on 9.3 trillion nucleotides from 128,000 genomes
  • Predicts BRCA1 mutation pathogenicity with >90% accuracy
  • Designs synthetic bacteriophages to combat antibiotic resistance
  • Open‑source code integrated into NVIDIA BioNeMo framework
  • Safety filters exclude human pathogens from training data

Pulse Analysis

Evo 2 marks a turning point in computational biology, extending the foundation‑model paradigm that reshaped natural‑language processing to the entire tree of life. 3 trillion nucleotides from over 128 000 genomes, the model captures evolutionary signals that span bacteria, archaea, plants and humans. Its underlying StripedHyena 2 architecture, optimized for NVIDIA’s DGX H100 cloud, processes up to one million bases in a single pass, a scale previously unattainable for genomic AI. This breadth enables Evo 2 to act as a universal “genomic language model,” learning patterns that individual labs would need years to discover.

The immediate utility of Evo 2 lies in precision medicine and synthetic biology. In benchmark tests on BRCA1 variants, the system exceeded 90 % accuracy in distinguishing benign from pathogenic mutations, offering a rapid pre‑screen for clinical genetics pipelines. Researchers have already leveraged the model to design functional bacteriophages, a promising avenue against multidrug‑resistant infections. Moreover, the ability to predict cell‑type‑specific regulatory elements opens new routes for targeted gene‑therapy vectors, reducing off‑target effects. By automating sequence‑function inference, Evo 2 can accelerate drug target validation and reduce R&D costs across biotech firms.

Arc Institute’s decision to release the full code, weights and training data through GitHub and NVIDIA’s BioNeMo framework democratizes access to large‑scale genomics AI, fostering a collaborative ecosystem akin to open‑source software. Built‑in safety filters that omit human pathogens address biosecurity concerns, while the mechanistic visualizer provides interpretability for regulators and scientists alike. As downstream developers build specialized applications on top of this “genomic operating system,” the industry can expect a surge of niche models for agriculture, environmental monitoring and personalized therapeutics. Evo 2 therefore not only expands scientific capability but also reshapes the business landscape of biotech innovation.

With Evo 2, AI can model and design the genetic code for all domains of life

Comments

Want to join the conversation?

Loading comments...