News Publishers Are Trying To Prevent AI Scraping, But They’re Killing A Valuable History Service

News Publishers Are Trying To Prevent AI Scraping, But They’re Killing A Valuable History Service

ArtsJournal
ArtsJournalApr 26, 2026

Companies Mentioned

Why It Matters

Restricting web archives jeopardizes historical transparency and hampers researchers, while also raising questions about how to balance copyright concerns with the need for an immutable public record.

Key Takeaways

  • NYT, Guardian, USA Today block Wayback to curb AI scraping
  • Over 120 journalists signed petition supporting Wayback Machine preservation
  • EFF urges lawsuits against AI firms instead of blocking archives
  • Wayback links 2.6 million news articles in 249 languages
  • Blocking archives could cause irreversible loss of public record

Pulse Analysis

Publishers’ decision to block the Wayback Machine stems from a growing fear that AI developers will harvest large volumes of copyrighted news content to train large language models. While the concern is legitimate—unrestricted scraping could undermine revenue streams—blocking a public archive creates collateral damage. The Wayback Machine serves as a de‑facto historical ledger, preserving original reporting that might later be altered or removed, a function that becomes critical when journalists need to verify changes or expose misinformation.

In response, a coalition of journalists, media scholars, and digital‑rights nonprofits has mobilized to defend the archive. A petition organized by Fight for the Future, now signed by over 120 journalists including high‑profile figures like Cory Doctorow and Rachel Maddow, calls for publishers to lift the restrictions. The Electronic Frontier Foundation backs this effort, urging stakeholders to pursue litigation against AI firms that violate copyright, rather than erasing the public record. Their stance underscores a strategic shift: protect intellectual property through courts, not by dismantling a tool that underpins transparency.

The broader implications extend beyond the media industry. Researchers, historians, and the public rely on archived web content to trace the evolution of news narratives, policy debates, and cultural moments. Removing that safety net could create gaps in the digital memory, making it harder to hold power to account. As AI continues to reshape content consumption, a balanced approach—combining robust copyright enforcement with open archival access—will be essential to safeguard both creators’ rights and the collective historical record.

News Publishers Are Trying To Prevent AI Scraping, But They’re Killing A Valuable History Service

Comments

Want to join the conversation?

Loading comments...