AI Trained on Bacterial Genomes Produces Never-Before-Seen Proteins

AI Trained on Bacterial Genomes Produces Never-Before-Seen Proteins

Ars Technica AI
Ars Technica AINov 21, 2025

Why It Matters

Evo shows that large‑scale genomic language models can discover functional proteins beyond existing databases, opening a new route for biotechnological innovation and accelerating the design of enzymes, therapeutics, and synthetic biology tools.

Summary

Stanford researchers have built a genomic language model called Evo, trained on millions of bacterial genomes, that can predict and generate novel protein-coding sequences directly from DNA context. In benchmark tests Evo accurately completed partial gene sequences and restored missing genes in operons, and when prompted with toxin or CRISPR‑related genes it produced antitoxins and CRISPR inhibitors that were functionally active yet bore little similarity to known proteins. Of 17 synthesized CRISPR‑inhibiting proteins, two were completely unprecedented and confused structure‑prediction software, demonstrating Evo’s ability to create entirely new functional proteins without explicit structural design. The team has now generated 120 billion base pairs of AI‑derived DNA from 1.7 million bacterial genes, offering a massive library of potentially novel bio‑parts.

AI trained on bacterial genomes produces never-before-seen proteins

Comments

Want to join the conversation?

Loading comments...