Could Data From 100 Million Species Help Cure Disease? One Startup Is Betting on It

Could Data From 100 Million Species Help Cure Disease? One Startup Is Betting on It

Fortune
FortuneMar 19, 2026

Why It Matters

By creating a massive, ethically sourced biological dataset, Basecamp could dramatically speed AI‑driven drug development while setting a new standard for data provenance and benefit‑sharing in the biotech‑AI ecosystem.

Key Takeaways

  • Basecamp aims to map 100M species' genomes.
  • Trillion Gene Atlas targets trillion‑gene scale dataset.
  • Partnerships include Anthropic, Ultima Genomics, PacBio, Nvidia.
  • $85M raised; likened to Human Genome Project.
  • Royalty system tracks origins, pays 60 groups across 21 countries.

Pulse Analysis

The Trillion Gene Atlas represents a bold escalation in biodiversity sequencing, moving beyond the single‑human focus of the original Human Genome Project to a planetary catalog of life. By targeting more than 100 million species, Basecamp hopes to capture evolutionary signals that have been hidden for billions of years, providing a richer substrate for machine‑learning models. The collaboration with industry leaders such as Anthropic and Nvidia ensures the computational horsepower needed to process petabytes of raw sequence data, turning raw genetics into actionable insights.

Basecamp’s Eden models illustrate how AI can translate raw genomic information into predictive tools for drug discovery. Unlike large language models trained on scraped internet text, these models ingest curated, high‑resolution biological data, allowing them to identify gene‑function relationships and metabolic pathways that are difficult for human researchers to discern. This scientific‑first approach could shorten the timeline from target identification to clinical trials, offering a competitive edge in a market where speed and precision are paramount.

Equally noteworthy is Basecamp’s royalty‑tracking framework, which tags each DNA sample to its geographic and community origin and allocates payments when the data contributes to downstream value. This model directly addresses longstanding concerns about biopiracy and data colonialism, offering a template for responsible data stewardship in AI. If adopted broadly, such provenance mechanisms could reshape how the AI industry negotiates data rights, fostering greater trust and unlocking new sources of high‑quality, ethically sourced information for future innovations.

Could data from 100 million species help cure disease? One startup is betting on it

Comments

Want to join the conversation?

Loading comments...