An Honest Reflection on the Integration of LLMs Into Open Data Portals

An Honest Reflection on the Integration of LLMs Into Open Data Portals

Open Knowledge Foundation — Blog —
Open Knowledge Foundation — Blog —Apr 7, 2026

Key Takeaways

  • LLMs hallucinate, accuracy ~50% for SQL queries.
  • Transparency lacking; reasoning not reproducible.
  • Proposed pilots combine data engineering with LLM presentation.
  • Trust requires human supervision and controlled data sources.
  • Open Knowledge Foundation leads 2026 trust initiative.

Pulse Analysis

Open data portals have become essential for government transparency, yet the rush to embed large language models threatens their credibility. Recent benchmarks reveal that commercial LLMs produce correct SQL results only about half the time, and their probabilistic nature leads to frequent hallucinations. This unreliability is especially problematic when citizens and policymakers rely on precise statistics for decisions. By highlighting these technical shortcomings, the OKFN report underscores the urgent need for a disciplined, verification‑first approach before AI can be safely layered onto public datasets.

The foundation’s proposed pilots take a pragmatic stance: use data‑engineering pipelines to retrieve and validate information, then let LLMs format the output for human consumption. This separation of concerns preserves the strengths of AI—natural language generation—while mitigating its weaknesses in factual accuracy. Human supervisors will audit the retrieved data, enforce provenance tracking, and intervene when the model’s reasoning diverges from the source. Such a hybrid workflow aligns with emerging best practices in AI governance, emphasizing auditability, reproducibility, and clear accountability.

If successful, OKFN’s model could become a blueprint for municipalities and NGOs worldwide seeking to modernize data portals without sacrificing trust. By co‑designing solutions with governments and focusing on transparent, testable architectures, the initiative addresses both technical and policy dimensions of AI integration. The broader impact may reshape how public sector entities balance innovation with the legal and ethical obligations of accurate information dissemination, setting a new standard for responsible AI deployment in open data ecosystems.

An Honest Reflection on the Integration of LLMs into Open Data Portals

Comments

Want to join the conversation?