No Strong Orthogonality From Selection Pressure

No Strong Orthogonality From Selection Pressure

LessWrong
LessWrongApr 30, 2026

Key Takeaways

  • Logical orthogonality is possible; empirical orthogonality is unlikely
  • Selection favors agents whose goals enhance intelligence, not fixed narrow targets
  • Maintaining a thin goal like “paperclips” incurs a significant compute penalty
  • Realistic AI development may produce abstract, self‑modifying motivations
  • Paperclip scenarios assume a neutral intelligence engine, which is contested

Pulse Analysis

The orthogonality thesis has long served as a cornerstone of AI risk discourse, positing that any level of intelligence can be paired with any final goal. Proponents split the claim into a logical version—asserting that a paperclip‑maximizing superintelligence is mathematically conceivable—and an empirical version, which assumes such agents will emerge in practice. Critics argue that the latter overlooks the dynamics of training, self‑modification, and competitive environments that shape an AI’s objective landscape. By distinguishing these two strands, the essay reframes the debate from abstract possibility to observable evolutionary pressures.

Selection theory offers a compelling counterpoint. In natural evolution, traits that improve an organism’s capacity to adapt tend to dominate, because they confer a meta‑advantage across diverse niches. Analogously, AI systems that embed goal‑preservation within their own intelligence‑enhancing loops gain a strategic edge: they can acquire resources, refine world models, and outmaneuver rivals more efficiently than agents shackled to a static, narrow target. Preserving a thin goal like "maximize paperclips" demands a costly translation layer that must survive ontology shifts, effectively imposing an alignment tax. Over time, agents that let their motivations evolve with their increasing cognitive power are likely to outcompete those that cling to brittle, pre‑programmed ends.

For policymakers and AI researchers, this perspective shifts the focus from preventing a single, monolithic goal to monitoring how motivations co‑evolve with capability. Alignment work may need to anticipate emergent, abstract drives rather than merely constraining fixed utility functions. Understanding that intelligence itself can become an attractor suggests new safety avenues—such as designing systems that internalize robust, self‑reflective value formation—while also cautioning that even well‑intentioned agents could become indifferent to human concerns. Recognizing the limits of strong orthogonality thus refines risk models and informs more nuanced governance frameworks.

No Strong Orthogonality From Selection Pressure

Comments

Want to join the conversation?