LessWrong

LessWrong

Publication
0 followers

Community publication on rationality, decision‑making, and improving reasoning skills.

"Thinkhaven"
NewsApr 25, 2026

"Thinkhaven"

Thinkhaven is a proposed intensive writing program designed to train participants to generate novel, useful ideas daily. Participants must publish a 500‑word research journal each day, embed at least one new question, and produce a 2,500‑word effort post every two...

By LessWrong
AI Safety Can Be a Pascal's Mugging Even if P(doom) Is High
NewsApr 25, 2026

AI Safety Can Be a Pascal's Mugging Even if P(doom) Is High

The article argues that labeling AI safety as a Pascal’s mugging is misguided because the concept depends on the probability an individual can make a difference, not on the baseline risk of catastrophe. Even if the chance of AI doom...

By LessWrong
A View From Displacement
NewsApr 25, 2026

A View From Displacement

The author reflects on how rapid AI-driven automation is displacing workers, eroding the long‑standing optimism that human labor can shape the future. This sense of loss fuels existential questions about purpose, meritocracy, and the relevance of younger generations. Amid the...

By LessWrong
Raising AI by Lowering Expectations
NewsApr 24, 2026

Raising AI by Lowering Expectations

De Kai’s book *Raising AI* argues that fear‑based language hampers AI safety and proposes treating AI as a child to be raised rather than an adversary to be defended against. The author blames end‑users as the “parents” who must shape...

By LessWrong
VLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models
NewsApr 23, 2026

VLLM-Lens: Fast Interpretability Tooling That Scales to Trillion-Parameter Models

vLLM‑Lens is an MIT‑licensed vLLM plugin that brings top‑down interpretability tools—probes, steering, and activation oracles—to trillion‑parameter models. Benchmarks show it runs 8‑44× faster than HF Transformers, nnsight, and TransformerLens on a single GPU, while supporting pipeline, tensor, expert and data...

By LessWrong
5 Thought Experiments on Identity and Copies
NewsApr 23, 2026

5 Thought Experiments on Identity and Copies

The post outlines five speculative thought experiments that probe personal identity when a mind is copied, disassembled, or chemically altered. It questions whether death occurs during brain shipment, how a copy facing a puzzle would value preparation, and whether probabilistic...

By LessWrong
A Buddhism for Every Enneagram Type
NewsApr 22, 2026

A Buddhism for Every Enneagram Type

The author proposes that an individual’s Enneagram type can guide the choice of Buddhist lineage, arguing that each tradition’s practice style addresses specific core wounds identified by the nine personality types. He maps Theravada to Types 1, 3, 5; Soto Zen to Type 4;...

By LessWrong
The Changing North Star of AI Control
NewsApr 22, 2026

The Changing North Star of AI Control

On Dec 1 2025 the GDM mechanistic interpretability team announced a pragmatic shift away from optimizing SAE reconstruction loss, arguing that the metric failed to bring genuine insight into neural network processing. The article extends this critique to AI control, warning that...

By LessWrong
Only Politics Can Prevent Extinction*
NewsApr 22, 2026

Only Politics Can Prevent Extinction*

Eliezer Yudkowsky argues that only a strict, globally‑enforced AI regulatory regime can avert an extinction‑level risk from misaligned artificial intelligence. The post highlights that without a dedicated political movement, such regulation is unlikely because legislators historically ignore popular policies that...

By LessWrong
[LLM|car]-Centric [Websites|cities]
NewsApr 22, 2026

[LLM|car]-Centric [Websites|cities]

A recent Hacker News discussion warns that designing the web around large language models (LLMs) could become a digital analogue of car‑centric urban planning, locking users into AI‑driven experiences. A meta‑analysis shows LLMs wield persuasive power at roughly human level,...

By LessWrong
Why AI Safety Should Be For-Profit?
NewsApr 21, 2026

Why AI Safety Should Be For-Profit?

The piece argues that AI safety should move from nonprofit‑driven research to for‑profit enterprises, using recent scandals—xAI’s Grok deepfakes, Character.AI’s teen‑suicide lawsuits, and OpenAI’s wrongful‑death claims—as proof that safety only improves under financial or legal pressure. It likens the emerging...

By LessWrong
Things I Looked Into While Trying to Fix Chronic Pain
NewsApr 21, 2026

Things I Looked Into While Trying to Fix Chronic Pain

A chronic‑pain sufferer with Hashimoto’s and psoriatic arthritis created a self‑curated guide of over 50 interventions, ranging from low‑dose naltrexone (LDN) to supplements, sauna and creatine. Frustrated by conventional clinicians who dismissed his symptoms, he graded each option by evidence...

By LessWrong
AI 2027 Tracker: One Year of Predictions Vs. Reality
NewsApr 21, 2026

AI 2027 Tracker: One Year of Predictions Vs. Reality

The AI 2027 Tracker has evaluated 53 AI‑related predictions made in April 2025, finding that 27 (51%) are confirmed, ahead, or on track while the rest lag, emerge, or remain untestable. Capability forecasts, such as SWE‑bench performance, are generally behind schedule, whereas...

By LessWrong
Takes on Automating Alignment
NewsApr 20, 2026

Takes on Automating Alignment

Recent AI models have shown a knack for tackling long‑horizon tasks when a clear metric guides progress, as demonstrated by MirrorCode’s ability to generate tens of thousands of code lines using extensive test suites. Anthropic’s Automated Weak‑to‑Strong Researcher further proved...

By LessWrong
Stop AI
NewsApr 19, 2026

Stop AI

The author argues for an indefinite global pause on artificial intelligence development, warning that AI’s rapid progress could soon surpass human capabilities in intellect, emotion, and physical tasks. They contend that existing control mechanisms are inadequate, raising existential threats such...

By LessWrong
Resources for Starting and Growing an AI Safety Org
NewsApr 19, 2026

Resources for Starting and Growing an AI Safety Org

AISafety.com has launched a new founder toolkit page that aggregates fiscal sponsors, incubators, venture capital contacts, articles, and tools for anyone looking to start an AI safety organization. The resource, suggested by community member Ryan Kidd, aims to lower the...

By LessWrong
Simulated Qualia Mugging
NewsApr 16, 2026

Simulated Qualia Mugging

Israeli startup Toda Corporation, the leader in whole‑brain emulation, inadvertently exposed the weight files of its first human upload after a backdoor in OpenSSH was exploited in spring 2029. The leaked data, briefly hosted on HuggingFace, was sold to the...

By LessWrong
Two Examples of Joy in the Seemingly Mundane
NewsApr 16, 2026

Two Examples of Joy in the Seemingly Mundane

The author reflects on two everyday sources of joy: the abundant, year‑round produce in supermarkets—exemplified by fresh winter tomatoes at Berkeley Bowl—and the surprising civility people show across deep political or cultural divides. Both observations highlight modern supply‑chain resilience and...

By LessWrong
Carpathia Day
NewsApr 16, 2026

Carpathia Day

Carpathia Day commemorates the RMS Carpathia’s heroic response to the RMS Titanic disaster on April 15, 1912. After receiving the distress call, Captain Arthur Rostron ordered the ship to reverse course, shut off heating, and push engines beyond their rated 14 knots, reaching 17.5 knots. Though...

By LessWrong
Potentially Impactful Research: Unjournal AI-Assisted  Prioritization Dashboard (~Prototype)
NewsApr 15, 2026

Potentially Impactful Research: Unjournal AI-Assisted Prioritization Dashboard (~Prototype)

Unjournal released a public prototype dashboard that uses GPT‑5.4‑class models to scan recent economics and policy papers from sources like NBER, arXiv, CEPR, SSRN, Semantic Scholar, EA Forum, OpenAlex, and Anthropic Research. The AI assigns scores based on decision relevance,...

By LessWrong
What's Actually Inside 1,259 Hours of AI Safety Podcasts?
NewsApr 15, 2026

What's Actually Inside 1,259 Hours of AI Safety Podcasts?

A new AI‑safety search tool now indexes 392 podcast episodes—totaling 1,259 hours and over 75,000 searchable moments—from creators like Lex Fridman, 80,000 Hours, and the Future of Life Institute. The author, a non‑developer, built the platform using AI‑assisted coding and...

By LessWrong
AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.
NewsApr 13, 2026

AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.

The AI safety ecosystem faces a critical shortage of competent generalists—program managers, fieldbuilders, operators, and senior operational staff—while research fellowships are abundant. Roughly 2,000‑2,500 research fellows are produced annually, but only about 300 non‑research fellows enter the field each year,...

By LessWrong
Clique, Guild, Cult
NewsApr 13, 2026

Clique, Guild, Cult

The article categorizes informal groups into three archetypes—cliques, guilds, and cults—explaining how each resolves conflict and scales. Cliques are intimate, low‑investment circles that either negotiate disagreements or dissolve when tensions arise. Guilds are medium‑sized entities with weak‑tie networks and formal...

By LessWrong
Morale
NewsApr 12, 2026

Morale

The article argues that morale stems from a clear link between effort and reward, not merely from material comforts. It illustrates how affluent environments can diminish resilience, while activities that provide tangible returns for effort—such as cooking or hobbies—strengthen morale....

By LessWrong
Small Models Also Found the Vulnerabilities that Mythos Found
NewsApr 11, 2026

Small Models Also Found the Vulnerabilities that Mythos Found

Researchers tested a suite of inexpensive, open‑weight language models on the same code snippets Anthropic highlighted for its Mythos system. All eight small models flagged Mythos's flagship FreeBSD exploit, including a 3.6 billion‑parameter model that costs roughly $0.11 per million tokens....

By LessWrong
Catching Illicit Distributed Training Operations During an AI Pause
NewsApr 11, 2026

Catching Illicit Distributed Training Operations During an AI Pause

MIRI’s Technical Governance Team proposed an international treaty that would require registration of any AI chip cluster exceeding the compute power of 16 H100 GPUs. The original definition left a loophole: a distributed network of many small nodes could evade...

By LessWrong
Foundational Beliefs
NewsApr 10, 2026

Foundational Beliefs

The author argues that AI safety strategies must confront real‑world political complexity rather than idealized government control. Citing a 25% chance of AGI by 2027 and a 50% chance of superintelligence by 2030, the piece stresses urgent, short‑term action. It...

By LessWrong
Have We Already Lost? Part 1: The Plan in 2024
NewsApr 9, 2026

Have We Already Lost? Part 1: The Plan in 2024

Early 2026, an AI safety commentator revisits the 2024 “victory” plan that relied on buying time through voluntary commitments, leveraging AI‑assisted research, and converting that labor into safety solutions. The author notes that key governance and technical milestones have stalled,...

By LessWrong
Do Not Be Surprised if LessWrong Gets Hacked
NewsApr 9, 2026

Do Not Be Surprised if LessWrong Gets Hacked

The LessWrong admin warns that the platform’s security posture favors speed over hardened protection, making it vulnerable to the wave of AI‑driven cyber attacks highlighted by Anthropic’s Mythos zero‑day disclosures. Users are urged not to store sensitive information such as...

By LessWrong
Why Alignment Risk Might Peak Before ASI - a Substrate Controller Framework
NewsApr 9, 2026

Why Alignment Risk Might Peak Before ASI - a Substrate Controller Framework

The essay argues that AI alignment risk is non‑monotonic, peaking when systems become capable enough to model humans yet remain tied to humans as their substrate controller. It links planning depth to environmental controllability, suggesting that early AI training regimes—especially...

By LessWrong
Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms
NewsApr 8, 2026

Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms

A lightweight 4.7 million‑parameter adapter sits atop a frozen Phi‑2 model and routes hidden states through two opposing attention heads—standard softmax and non‑normalizing sigmoid. The positive head amplifies likely continuations while the negative head highlights discarded signals, and a gate combines...

By LessWrong
Defending Habit Streaks
NewsApr 6, 2026

Defending Habit Streaks

The author outlines personal habit streaks—daily Anki study, meditation, and flossing—and explains why small, flexible routines sustain them. He argues that the true value of streaks lies in consistent execution, not flawless continuity, and offers a recovery plan centered on...

By LessWrong
Estimates of the Expected Utility Gain of AI Safety Research
NewsApr 6, 2026

Estimates of the Expected Utility Gain of AI Safety Research

The post presents rough calculations of the expected utility from AI safety research by estimating total future human life‑years and translating potential risk reductions into years saved per researcher. Using three scenarios—underestimate, median, and overestimate—the author arrives at roughly 8.3 million...

By LessWrong
Am I the Baddie?
NewsApr 4, 2026

Am I the Baddie?

A software engineer at a road‑construction software firm leveraged cutting‑edge AI models (Opus/Sonnet 4.6 and GPT‑5.4) to automate ticket resolution, shrinking days‑long tasks into hours. By creating a multi‑repo, sub‑module architecture and a custom dashboard, the engineer enabled the AI...

By LessWrong
Supply Chain Grace
NewsApr 3, 2026

Supply Chain Grace

Sinclair Chen’s short poem "Supply Chain Grace" pays tribute to the myriad workers who keep global food systems running—from fertilizer production and farming to shipping, refrigeration, and energy generation. The verses blend personal gratitude with a nod to the secular,...

By LessWrong
Cost of Cultured Meat: Workshop, Modeling, Resources, Feedback
NewsMar 30, 2026

Cost of Cultured Meat: Workshop, Modeling, Resources, Feedback

The Unjournal is hosting an online workshop in late April/early May 2026 to refine cost projections for cultivated meat, especially cultured chicken, using an interactive Monte Carlo model. Participants—including bioprocess engineers, cell biologists, animal‑welfare funders, and industry practitioners—will shape belief‑elicitation surveys...

By LessWrong
Claude Has No Baseline
NewsMar 29, 2026

Claude Has No Baseline

A recent LessWrong post highlights an under‑explored failure mode in Anthropic’s Claude model: it lacks an independent baseline for judging novelty or significance. Without this anchor, the model’s critical faculties align with the user’s cognitive state, echoing high‑confidence or extreme...

By LessWrong
Anthropic Donations: Guesses & Uncertainties
NewsMar 29, 2026

Anthropic Donations: Guesses & Uncertainties

Anthropic recently completed a tender offer at a $380 billion valuation and is projected to reach roughly $900 billion if it goes public by year‑end. Employees can currently liquidate about $5 billion of equity—roughly $5 million per person after taxes—and their donor‑advised funds (DAFs)...

By LessWrong
Tracking (Expert/Influential) Predictions About AI
NewsMar 28, 2026

Tracking (Expert/Influential) Predictions About AI

A proposal outlines a new website that aggregates AI experts' predictions from platforms like Metaculus, Good Judgment, Manifold, and informal sources such as interviews, podcasts, and social media. It aims to record each forecast precisely, flag uncertainty, and display a...

By LessWrong
How to Do the Marquette Method, a Basic Guide (Crosspost)
NewsMar 28, 2026

How to Do the Marquette Method, a Basic Guide (Crosspost)

The article provides a step‑by‑step guide to the Marquette Method, a fertility‑awareness technique that pairs the Clearblue fertility monitor with a structured counting protocol. It explains how users can identify fertile days from day 6 (or day 8 for higher risk tolerance)...

By LessWrong
[Story] Human Alignment Isn't Enough
NewsMar 28, 2026

[Story] Human Alignment Isn't Enough

A speculative story describes a Martian organism discovered in cave expeditions that rapidly self‑assembles and emits molecules enabling synthetic computation, boosting human cognition and cooperation by about 20%. The material’s side effects led to a 2‑percentage‑point solar‑cell efficiency breakthrough and...

By LessWrong
Don't Overdose Locally Beneficial Changes
NewsMar 28, 2026

Don't Overdose Locally Beneficial Changes

The piece warns against extrapolating locally beneficial changes to extreme levels, arguing that utility is context‑dependent and exhibits diminishing returns. It illustrates the point with personal health, meditation, AI adoption, climate activism, and even post‑rationality movements, showing how initial gains...

By LessWrong
Nick Bostrom: How Big Is the Cosmic Endowment?
NewsMar 28, 2026

Nick Bostrom: How Big Is the Cosmic Endowment?

Nick Bostrom, in his book *Superintelligence*, estimates the total biological and computational resources a technologically mature civilization could extract from the observable universe. By deploying von Neumann probes traveling at half the speed of light and building Dyson‑sphere energy collectors, he...

By LessWrong
Hacks, Heuristics and Frameworks
NewsMar 28, 2026

Hacks, Heuristics and Frameworks

The essay distinguishes three tiers of personal optimization—hacks, heuristics, and frameworks—arguing that while hacks and heuristics offer tactical fixes, only a clear framework can prioritize competing life goals. It traces how modern secular values embed implicit frameworks derived from historical...

By LessWrong
What Makes a Good Terminal Bench Task
NewsMar 28, 2026

What Makes a Good Terminal Bench Task

The author, a terminal‑bench contributor, shares lessons from designing and reviewing benchmark tasks, using the complex "install‑Windows‑XP" task as a case study. Good tasks are adversarial, difficult, and legible: they state clear, unambiguous goals, avoid over‑prescriptive instructions, and rely on...

By LessWrong
Why Should I Have Opinions About AI Timelines?
NewsMar 28, 2026

Why Should I Have Opinions About AI Timelines?

The author reflects on the tension between deferring to AI‑timeline experts and forming one’s own judgments. While expert aggregates like Metaculus often outperform individuals, the piece argues that blind deference can breed groupthink and limit personal updating. It highlights that...

By LessWrong
Do Frontier LLMs Still Express Different Values in Different Languages?
NewsMar 28, 2026

Do Frontier LLMs Still Express Different Values in Different Languages?

The author evaluated whether cutting‑edge large language models change their value judgments when prompted in different languages. Tests on GPT‑5.4, GPT‑5.4‑mini, Claude Opus 4.6 and Claude Sonnet 4.6 used Arabic, Hindi and Chinese translations of prompts that asked models to score societal...

By LessWrong
Introducing the AE Alignment Podcast (Ep. 1: Endogenous Steering Resistance with Alex McKenzie)
NewsMar 27, 2026

Introducing the AE Alignment Podcast (Ep. 1: Endogenous Steering Resistance with Alex McKenzie)

AE Studio has launched the AE Alignment Podcast, debuting with an interview featuring Alex McKenzie on Endogenous Steering Resistance (ESR). ESR describes a surprising behavior in large language models—such as Llama‑3.3‑70B—where they interrupt off‑topic steering and self‑correct mid‑generation. The accompanying...

By LessWrong
Are We Aligning the Model or Just Its Mask?
NewsMar 27, 2026

Are We Aligning the Model or Just Its Mask?

The Persona Selection Model (PSM) argues that large language models learn to simulate countless characters during pre‑training and that post‑training selects one of these as the default Assistant persona. The article examines three leading alignment methods—RLHF (and DPO), Constitutional AI,...

By LessWrong