Retrospective on My Unsupervised Elicitation Challenge

Retrospective on My Unsupervised Elicitation Challenge

LessWrong
LessWrongApr 27, 2026

Key Takeaways

  • Opus 4.6 missed Greek accent rules in fill‑in‑the‑blank task
  • No participant solved the challenge using Opus 4.6 despite 20 attempts
  • Opus 4.7 solved the problem with one‑shot adaptive thinking
  • $100 prize spurred attempts, later $50 awarded for Opus 4.7 solution
  • Accent errors highlight tokenization and capability gaps in language models

Pulse Analysis

Unsupervised elicitation challenges aim to extract latent model knowledge without external supervision, mirroring real‑world scenarios where users cannot verify AI outputs directly. In this case, the task centered on Ancient Greek accentuation—a niche linguistic feature that most users, including the challenger, do not master. By framing the problem as a fill‑in‑the‑blank exercise, the author tested whether Claude Opus 4.6 could internally apply the three accent rules it had likely absorbed during pre‑training. The failure of more than twenty community attempts, despite a $100 incentive, highlighted a gap between the model’s stored knowledge and its ability to surface that knowledge on demand.

Technical analysis points to two likely culprits. First, Opus 4.6’s tokenization treats diacritics as separate sub‑tokens, making the model less likely to treat accents as atomic units during generation. Second, the model’s reasoning pathways appear biased toward an "English‑speaker learning Greek" mode, where it defaults to surface‑level translations rather than deep morphological adjustments. Opus 4.7, released shortly after the challenge began, introduced a revised tokenizer that encodes accented characters more compactly and benefitted from overall capability gains. When the adaptive‑thinking flag was activated, the newer model produced the correct answer in a single shot, confirming that token granularity and chain‑of‑thought prompting can dramatically affect performance on fine‑grained linguistic tasks.

The broader implication for AI alignment is that capability gaps can masquerade as alignment failures. A model may possess the requisite knowledge but lack the incentive or prompting structure to apply it correctly, leading to seemingly errant behavior. This underscores the need for evaluation frameworks that probe hidden competencies, especially in low‑resource or specialized domains. Future unsupervised challenges should incorporate diverse tokenization tests and adaptive prompting strategies to surface latent abilities, ensuring that AI systems remain reliable when users cannot directly verify their outputs.

Retrospective on my unsupervised elicitation challenge

Comments

Want to join the conversation?