Aspen Digital - 2025 State of AI Evaluations for Philanthropy

Aspen Institute
Aspen InstituteMay 6, 2026

Why It Matters

Philanthropic investment in AI evaluations can steer the development of safer, more transparent systems, amplifying public oversight and aligning technology with societal values.

Key Takeaways

  • Benchmarks have surged into mainstream AI discourse this year.
  • Evaluations guide model selection, deployment, monitoring, and regulatory oversight.
  • Philanthropic funders can steer AI safety by financing eval frameworks.
  • Cost, time, and data access dictate which evaluation methods are feasible.
  • Red‑team, audits, and synthetic‑data checks address bias and reliability.

Summary

Aspen Digital’s 2025 State of AI Evaluations for Philanthropy outlines how AI benchmarking has moved from niche research labs into public conversation, noting a rapid rise in awareness over the past months. Director Bea Covello frames evaluations as essential tools for comparing models, setting thresholds, and ongoing system monitoring, emphasizing their relevance across academia, industry, regulators, journalists, users, and especially funders. The presentation maps the AI lifecycle—from goal definition, data collection, model training, deployment, to post‑deployment monitoring—and identifies specific evaluation touchpoints at each stage. Benchmarks, red‑team exercises, interpretability analyses, efficiency tests, and data provenance checks are highlighted, with practical examples such as Sora’s staged release and DeepMind’s AlphaFold challenge illustrating how targeted evals can drive innovation and safety. Covello cites organizations like the Humane AI team and the Collective Intelligence Project’s WeVal as pioneers in red‑team and user‑centric assessments. She stresses that evaluation choice hinges on cost, time horizons, and data access, noting that longitudinal studies can be expensive and slow, while quicker proxies may sacrifice precision. For philanthropy, the talk argues that funders wield unique leverage: they can define aspirational benchmarks, finance accountability mechanisms, and support open‑source eval infrastructure, thereby shaping a responsible AI future and ensuring that emerging technologies align with societal goals.

Original Description

Aspen Digital in partnership with Siegel Family Endowment convened leaders from philanthropy to discuss strategies for the philanthropic ecosystem to support AI benchmarks and evaluations. To provide context for the discussion, Aspen Digital's Director of Emerging Technologies, B Cavello, shared context on the state of AI evaluations—what they are, who they serve, and how they can shape the development of artificial intelligence.

Comments

Want to join the conversation?

Loading comments...