Porn, Dog Poo and Social Media Snaps: The ‘Taskers’ Scraping the Internet for Meta-Owned AI Firm

Porn, Dog Poo and Social Media Snaps: The ‘Taskers’ Scraping the Internet for Meta-Owned AI Firm

The Guardian  Media
The Guardian  MediaApr 7, 2026

Companies Mentioned

Why It Matters

The data‑harvesting practices could expose Meta to legal liability and erode public trust in AI, while highlighting broader industry challenges around ethical training data.

Key Takeaways

  • Scale AI owned 49% by Meta uses gig workers.
  • Workers scrape Instagram, copyrighted media, porn soundtracks.
  • Platform Outlier markets “expert” training for AI systems.
  • Practices raise privacy, copyright, and labor concerns.
  • Regulatory scrutiny could reshape AI data sourcing.

Pulse Analysis

Scale AI emerged in 2024 as a data‑labeling and model‑training service that leverages a distributed workforce of freelancers, or “taskers,” to collect and curate massive datasets for artificial‑intelligence projects. Backed by a 49% stake from Meta, the company promotes its Outlier platform as a way for highly credentialed individuals—doctors, physicists, economists—to become the “expert that AI learns from.” In practice, the platform assigns mundane but high‑volume chores: crawling public Instagram accounts, downloading copyrighted images, and transcribing audio, including explicit adult content, in exchange for modest per‑task payments.

The operational model raises a cascade of ethical and legal red flags. By pulling personal profile data and copyrighted works without explicit consent, Scale AI skirts privacy regulations such as the EU’s GDPR and U.S. state‑level data‑protection statutes. The inclusion of pornographic soundtracks adds another layer of risk, potentially violating age‑verification laws and community standards. Moreover, the gig‑worker arrangement skirts traditional employment protections, leaving thousands of taskers without benefits or job security while they handle sensitive content for a Meta‑linked AI pipeline.

Industry observers warn that such data‑sourcing practices could trigger heightened regulatory scrutiny and class‑action lawsuits, pressuring Meta and its affiliates to overhaul training pipelines. Companies may be forced to adopt transparent data‑auditing tools, secure licensing agreements, and stricter worker‑rights frameworks to mitigate reputational damage. At the same time, the demand for high‑quality, domain‑specific training data remains intense, pushing firms to balance speed with compliance. How Scale AI navigates these pressures will likely set a benchmark for the broader AI ecosystem’s approach to ethical data collection.

Porn, dog poo and social media snaps: the ‘taskers’ scraping the internet for Meta-owned AI firm

Comments

Want to join the conversation?

Loading comments...