Tracking Difficulty with Feature Portfolios

•May 19, 2026

LessWrong•May 19, 2026

Key Takeaways

•Human completion time alone fails to capture task difficulty
•Portfolio of attributes improves explanatory power for AI capability forecasts
•Combined human‑AI cost is a measurable, interpretable difficulty proxy
•Cost metric accommodates AI assistance and skill variation
•Tracking multiple features enables better error analysis and model improvement

Pulse Analysis

Forecasting when artificial intelligence will master real‑world tasks has long relied on the "time‑horizon" method, which plots human completion time against model performance. While human‑time is easy to measure and intuitively understandable, the article highlights its growing shortcomings: tasks are becoming longer, fuzzier, and increasingly assisted by AI, making pure time estimates unstable and insufficient for reliable predictions. As AI systems take on more complex workflows, the correlation between human‑only duration and true task difficulty weakens, prompting researchers to seek richer, quantifiable signals.

The proposed solution is a multi‑dimensional portfolio of task attributes. By recording not just human completion time but also human‑AI combined time, labor costs, monetary value, and even the bits of human input fed to the model, analysts can regress true difficulty against a broader feature set. This approach uncovers latent factors that single‑metric models miss and sharpens error analysis by flagging tasks where residuals remain high. Among the suggested metrics, combined human‑AI cost stands out: it translates both human labor rates and AI inference expenses into a single, comparable figure, sidestepping the need for artificial baselines that exclude AI assistance.

Adopting a cost‑centric difficulty metric has practical implications for AI labs, investors, and policymakers. It enables more realistic benchmark design, allowing participants to use their usual tooling—including agents—without distorting results. Moreover, cost‑based forecasts can be normalized for performance‑adjusted inference, offering a dynamic view as AI compute prices fall. Ultimately, a portfolio‑driven methodology promises sturdier, less optimistic forecasts, helping stakeholders allocate resources, set regulatory expectations, and track progress toward truly autonomous AI capabilities.

Tracking Difficulty with Feature Portfolios

Read Original Article

Comments

Want to join the conversation?

Tracking Difficulty with Feature Portfolios

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

AI Pulse