AI in Education #1: Five Things I Learned From Daisy Christodoulou About AI and Assessment

•March 17, 2026

Eedi Newsletter •Mar 17, 2026

Key Takeaways

•AI models can exploit superficial cues like essay length
•Comparative judgement outperforms absolute scoring, reducing human error
•Human oversight remains essential to prevent gaming and ensure trust
•Personalised AI tutors face explanation limits, hallucinations, and screen fatigue
•Start with clear educational problem, not just AI capabilities

Summary

The blog recaps a conversation with Daisy Christodoulou of No More Marking, tracing AI’s shift from skepticism in 2022 to practical use in education by 2025. Key insights include the pitfalls of surface‑level performance metrics, the superiority of comparative judgement over absolute scoring, and the necessity of a human‑in‑the‑loop. Christodoulou also highlights three unresolved challenges for AI tutors—explanations, hallucinations, and screen time—and urges schools to define the problem before adopting technology.

Pulse Analysis

The rapid evolution of artificial intelligence in education has moved from a decade‑long wait to tangible classroom impact within three years. Early attempts at automated essay scoring relied on simple proxies such as word count, inflating performance statistics while masking fragile models. Today, educators and vendors are scrutinising underlying algorithms, demanding qualitative discrepancy analyses rather than headline agreement rates. This shift underscores the importance of transparent validation frameworks that can survive adversarial behavior and maintain stakeholder confidence.

Comparative judgement, where AI evaluates pairs of student work rather than assigning absolute scores, has emerged as a surprisingly robust approach. By mirroring the way humans naturally rank quality, AI reduces bias linked to handwriting or superficial features, and large‑scale analyses reveal that many disagreements stem from human error, not machine fault. Schools can therefore leverage a hybrid model—often 10% human review and 90% AI—to scale assessment while preserving teacher engagement, professional growth, and a safety net against unforeseen model failures.

Despite these advances, personalised AI tutoring still confronts three critical hurdles: insufficient explanatory depth, persistent hallucinations, and the risk of excessive screen exposure. Even a 0.14% error rate translates to thousands of mistakes across a national student body, highlighting the diminishing returns of incremental accuracy gains. Policymakers must balance the promise of data‑driven instruction with pedagogical realities, ensuring technology augments rather than dominates learning environments. By anchoring AI deployments to clearly defined educational problems, districts can harness the benefits of speed and precision while safeguarding the human elements that drive long‑term academic success.