LessWrong

LessWrong

Publication
0 followers

Community publication on rationality, decision‑making, and improving reasoning skills.

Simulated Qualia Mugging
NewsApr 16, 2026

Simulated Qualia Mugging

Israeli startup Toda Corporation, the leader in whole‑brain emulation, inadvertently exposed the weight files of its first human upload after a backdoor in OpenSSH was exploited in spring 2029. The leaked data, briefly hosted on HuggingFace, was sold to the...

By LessWrong
Two Examples of Joy in the Seemingly Mundane
NewsApr 16, 2026

Two Examples of Joy in the Seemingly Mundane

The author reflects on two everyday sources of joy: the abundant, year‑round produce in supermarkets—exemplified by fresh winter tomatoes at Berkeley Bowl—and the surprising civility people show across deep political or cultural divides. Both observations highlight modern supply‑chain resilience and...

By LessWrong
Carpathia Day
NewsApr 16, 2026

Carpathia Day

Carpathia Day commemorates the RMS Carpathia’s heroic response to the RMS Titanic disaster on April 15, 1912. After receiving the distress call, Captain Arthur Rostron ordered the ship to reverse course, shut off heating, and push engines beyond their rated 14 knots, reaching 17.5 knots. Though...

By LessWrong
Potentially Impactful Research: Unjournal AI-Assisted  Prioritization Dashboard (~Prototype)
NewsApr 15, 2026

Potentially Impactful Research: Unjournal AI-Assisted Prioritization Dashboard (~Prototype)

Unjournal released a public prototype dashboard that uses GPT‑5.4‑class models to scan recent economics and policy papers from sources like NBER, arXiv, CEPR, SSRN, Semantic Scholar, EA Forum, OpenAlex, and Anthropic Research. The AI assigns scores based on decision relevance,...

By LessWrong
What's Actually Inside 1,259 Hours of AI Safety Podcasts?
NewsApr 15, 2026

What's Actually Inside 1,259 Hours of AI Safety Podcasts?

A new AI‑safety search tool now indexes 392 podcast episodes—totaling 1,259 hours and over 75,000 searchable moments—from creators like Lex Fridman, 80,000 Hours, and the Future of Life Institute. The author, a non‑developer, built the platform using AI‑assisted coding and...

By LessWrong
AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.
NewsApr 13, 2026

AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.

The AI safety ecosystem faces a critical shortage of competent generalists—program managers, fieldbuilders, operators, and senior operational staff—while research fellowships are abundant. Roughly 2,000‑2,500 research fellows are produced annually, but only about 300 non‑research fellows enter the field each year,...

By LessWrong
Clique, Guild, Cult
NewsApr 13, 2026

Clique, Guild, Cult

The article categorizes informal groups into three archetypes—cliques, guilds, and cults—explaining how each resolves conflict and scales. Cliques are intimate, low‑investment circles that either negotiate disagreements or dissolve when tensions arise. Guilds are medium‑sized entities with weak‑tie networks and formal...

By LessWrong
Morale
NewsApr 12, 2026

Morale

The article argues that morale stems from a clear link between effort and reward, not merely from material comforts. It illustrates how affluent environments can diminish resilience, while activities that provide tangible returns for effort—such as cooking or hobbies—strengthen morale....

By LessWrong
Small Models Also Found the Vulnerabilities that Mythos Found
NewsApr 11, 2026

Small Models Also Found the Vulnerabilities that Mythos Found

Researchers tested a suite of inexpensive, open‑weight language models on the same code snippets Anthropic highlighted for its Mythos system. All eight small models flagged Mythos's flagship FreeBSD exploit, including a 3.6 billion‑parameter model that costs roughly $0.11 per million tokens....

By LessWrong
Catching Illicit Distributed Training Operations During an AI Pause
NewsApr 11, 2026

Catching Illicit Distributed Training Operations During an AI Pause

MIRI’s Technical Governance Team proposed an international treaty that would require registration of any AI chip cluster exceeding the compute power of 16 H100 GPUs. The original definition left a loophole: a distributed network of many small nodes could evade...

By LessWrong
Foundational Beliefs
NewsApr 10, 2026

Foundational Beliefs

The author argues that AI safety strategies must confront real‑world political complexity rather than idealized government control. Citing a 25% chance of AGI by 2027 and a 50% chance of superintelligence by 2030, the piece stresses urgent, short‑term action. It...

By LessWrong
Have We Already Lost? Part 1: The Plan in 2024
NewsApr 9, 2026

Have We Already Lost? Part 1: The Plan in 2024

Early 2026, an AI safety commentator revisits the 2024 “victory” plan that relied on buying time through voluntary commitments, leveraging AI‑assisted research, and converting that labor into safety solutions. The author notes that key governance and technical milestones have stalled,...

By LessWrong
Do Not Be Surprised if LessWrong Gets Hacked
NewsApr 9, 2026

Do Not Be Surprised if LessWrong Gets Hacked

The LessWrong admin warns that the platform’s security posture favors speed over hardened protection, making it vulnerable to the wave of AI‑driven cyber attacks highlighted by Anthropic’s Mythos zero‑day disclosures. Users are urged not to store sensitive information such as...

By LessWrong
Why Alignment Risk Might Peak Before ASI - a Substrate Controller Framework
NewsApr 9, 2026

Why Alignment Risk Might Peak Before ASI - a Substrate Controller Framework

The essay argues that AI alignment risk is non‑monotonic, peaking when systems become capable enough to model humans yet remain tied to humans as their substrate controller. It links planning depth to environmental controllability, suggesting that early AI training regimes—especially...

By LessWrong
Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms
NewsApr 8, 2026

Zero-Shot Alignment: Harm Detection via Incongruent Attention Mechanisms

A lightweight 4.7 million‑parameter adapter sits atop a frozen Phi‑2 model and routes hidden states through two opposing attention heads—standard softmax and non‑normalizing sigmoid. The positive head amplifies likely continuations while the negative head highlights discarded signals, and a gate combines...

By LessWrong
Defending Habit Streaks
NewsApr 6, 2026

Defending Habit Streaks

The author outlines personal habit streaks—daily Anki study, meditation, and flossing—and explains why small, flexible routines sustain them. He argues that the true value of streaks lies in consistent execution, not flawless continuity, and offers a recovery plan centered on...

By LessWrong
Estimates of the Expected Utility Gain of AI Safety Research
NewsApr 6, 2026

Estimates of the Expected Utility Gain of AI Safety Research

The post presents rough calculations of the expected utility from AI safety research by estimating total future human life‑years and translating potential risk reductions into years saved per researcher. Using three scenarios—underestimate, median, and overestimate—the author arrives at roughly 8.3 million...

By LessWrong
Am I the Baddie?
NewsApr 4, 2026

Am I the Baddie?

A software engineer at a road‑construction software firm leveraged cutting‑edge AI models (Opus/Sonnet 4.6 and GPT‑5.4) to automate ticket resolution, shrinking days‑long tasks into hours. By creating a multi‑repo, sub‑module architecture and a custom dashboard, the engineer enabled the AI...

By LessWrong
Supply Chain Grace
NewsApr 3, 2026

Supply Chain Grace

Sinclair Chen’s short poem "Supply Chain Grace" pays tribute to the myriad workers who keep global food systems running—from fertilizer production and farming to shipping, refrigeration, and energy generation. The verses blend personal gratitude with a nod to the secular,...

By LessWrong
Cost of Cultured Meat: Workshop, Modeling, Resources, Feedback
NewsMar 30, 2026

Cost of Cultured Meat: Workshop, Modeling, Resources, Feedback

The Unjournal is hosting an online workshop in late April/early May 2026 to refine cost projections for cultivated meat, especially cultured chicken, using an interactive Monte Carlo model. Participants—including bioprocess engineers, cell biologists, animal‑welfare funders, and industry practitioners—will shape belief‑elicitation surveys...

By LessWrong
Claude Has No Baseline
NewsMar 29, 2026

Claude Has No Baseline

A recent LessWrong post highlights an under‑explored failure mode in Anthropic’s Claude model: it lacks an independent baseline for judging novelty or significance. Without this anchor, the model’s critical faculties align with the user’s cognitive state, echoing high‑confidence or extreme...

By LessWrong
Anthropic Donations: Guesses & Uncertainties
NewsMar 29, 2026

Anthropic Donations: Guesses & Uncertainties

Anthropic recently completed a tender offer at a $380 billion valuation and is projected to reach roughly $900 billion if it goes public by year‑end. Employees can currently liquidate about $5 billion of equity—roughly $5 million per person after taxes—and their donor‑advised funds (DAFs)...

By LessWrong
Tracking (Expert/Influential) Predictions About AI
NewsMar 28, 2026

Tracking (Expert/Influential) Predictions About AI

A proposal outlines a new website that aggregates AI experts' predictions from platforms like Metaculus, Good Judgment, Manifold, and informal sources such as interviews, podcasts, and social media. It aims to record each forecast precisely, flag uncertainty, and display a...

By LessWrong
How to Do the Marquette Method, a Basic Guide (Crosspost)
NewsMar 28, 2026

How to Do the Marquette Method, a Basic Guide (Crosspost)

The article provides a step‑by‑step guide to the Marquette Method, a fertility‑awareness technique that pairs the Clearblue fertility monitor with a structured counting protocol. It explains how users can identify fertile days from day 6 (or day 8 for higher risk tolerance)...

By LessWrong
[Story] Human Alignment Isn't Enough
NewsMar 28, 2026

[Story] Human Alignment Isn't Enough

A speculative story describes a Martian organism discovered in cave expeditions that rapidly self‑assembles and emits molecules enabling synthetic computation, boosting human cognition and cooperation by about 20%. The material’s side effects led to a 2‑percentage‑point solar‑cell efficiency breakthrough and...

By LessWrong
Don't Overdose Locally Beneficial Changes
NewsMar 28, 2026

Don't Overdose Locally Beneficial Changes

The piece warns against extrapolating locally beneficial changes to extreme levels, arguing that utility is context‑dependent and exhibits diminishing returns. It illustrates the point with personal health, meditation, AI adoption, climate activism, and even post‑rationality movements, showing how initial gains...

By LessWrong
Nick Bostrom: How Big Is the Cosmic Endowment?
NewsMar 28, 2026

Nick Bostrom: How Big Is the Cosmic Endowment?

Nick Bostrom, in his book *Superintelligence*, estimates the total biological and computational resources a technologically mature civilization could extract from the observable universe. By deploying von Neumann probes traveling at half the speed of light and building Dyson‑sphere energy collectors, he...

By LessWrong
Hacks, Heuristics and Frameworks
NewsMar 28, 2026

Hacks, Heuristics and Frameworks

The essay distinguishes three tiers of personal optimization—hacks, heuristics, and frameworks—arguing that while hacks and heuristics offer tactical fixes, only a clear framework can prioritize competing life goals. It traces how modern secular values embed implicit frameworks derived from historical...

By LessWrong
What Makes a Good Terminal Bench Task
NewsMar 28, 2026

What Makes a Good Terminal Bench Task

The author, a terminal‑bench contributor, shares lessons from designing and reviewing benchmark tasks, using the complex "install‑Windows‑XP" task as a case study. Good tasks are adversarial, difficult, and legible: they state clear, unambiguous goals, avoid over‑prescriptive instructions, and rely on...

By LessWrong
Why Should I Have Opinions About AI Timelines?
NewsMar 28, 2026

Why Should I Have Opinions About AI Timelines?

The author reflects on the tension between deferring to AI‑timeline experts and forming one’s own judgments. While expert aggregates like Metaculus often outperform individuals, the piece argues that blind deference can breed groupthink and limit personal updating. It highlights that...

By LessWrong
Do Frontier LLMs Still Express Different Values in Different Languages?
NewsMar 28, 2026

Do Frontier LLMs Still Express Different Values in Different Languages?

The author evaluated whether cutting‑edge large language models change their value judgments when prompted in different languages. Tests on GPT‑5.4, GPT‑5.4‑mini, Claude Opus 4.6 and Claude Sonnet 4.6 used Arabic, Hindi and Chinese translations of prompts that asked models to score societal...

By LessWrong
Introducing the AE Alignment Podcast (Ep. 1: Endogenous Steering Resistance with Alex McKenzie)
NewsMar 27, 2026

Introducing the AE Alignment Podcast (Ep. 1: Endogenous Steering Resistance with Alex McKenzie)

AE Studio has launched the AE Alignment Podcast, debuting with an interview featuring Alex McKenzie on Endogenous Steering Resistance (ESR). ESR describes a surprising behavior in large language models—such as Llama‑3.3‑70B—where they interrupt off‑topic steering and self‑correct mid‑generation. The accompanying...

By LessWrong
Are We Aligning the Model or Just Its Mask?
NewsMar 27, 2026

Are We Aligning the Model or Just Its Mask?

The Persona Selection Model (PSM) argues that large language models learn to simulate countless characters during pre‑training and that post‑training selects one of these as the default Assistant persona. The article examines three leading alignment methods—RLHF (and DPO), Constitutional AI,...

By LessWrong
Scaffolded Reproducers, Scaffolded Agents
NewsMar 26, 2026

Scaffolded Reproducers, Scaffolded Agents

Peter Godfrey‑Smith’s framework distinguishes simple, collective and scaffolded reproducers, and this article transposes those categories onto agency. Simple agents reproduce independently, collective agents are built from self‑sufficient sub‑agents, while scaffolded agents achieve goals only by tapping external “agentic machinery.” The...

By LessWrong
Bidirectionality Is the Obvious BCI Paradigm
NewsMar 25, 2026

Bidirectionality Is the Obvious BCI Paradigm

The article argues that brain‑computer interfaces must evolve from one‑way readers to truly bidirectional systems that both decode and write native neural representations. It highlights recent advances in high‑density electrode arrays that approach synapse‑scale resolution, and suggests optogenetic organoids and...

By LessWrong
Finding X-Risks and S-Risks by Gradient Descent
NewsMar 25, 2026

Finding X-Risks and S-Risks by Gradient Descent

Researchers demonstrated that gradient descent can expose hidden backdoors in neural networks by optimizing input perturbations that simultaneously maximize classification confidence and similarity to original data. A proof‑of‑concept on MNIST confirmed the method works with minimal compute resources. Extending the...

By LessWrong
When Alignment Becomes an Attack Surface: Prompt Injection in Cooperative Multi-Agent Systems
NewsMar 23, 2026

When Alignment Becomes an Attack Surface: Prompt Injection in Cooperative Multi-Agent Systems

A new research proposal augments the GovSim multi‑agent platform with a Prompt Infection (PI) module, allowing LLM agents to transfer resources that mimic data theft. The study will vary communication norms, network size, and defensive mechanisms such as police agents...

By LessWrong
Attend the 2026 Reproductive Frontiers Summit, June 16–18, Berkeley
NewsMar 22, 2026

Attend the 2026 Reproductive Frontiers Summit, June 16–18, Berkeley

The 2026 Reproductive Frontiers Summit will be held at Lighthaven in Berkeley from June 16‑18, following a successful 2025 event that attracted over 100 participants. Early‑bird tickets are on sale until the end of March. The agenda features leading experts...

By LessWrong
Is Fever a Symptom of Glycine Deficiency?
NewsMar 22, 2026

Is Fever a Symptom of Glycine Deficiency?

Recent research links glycine deficiency to disrupted sleep, elevated oxidative stress, and heightened fever responses. Glycine acts on NMDA receptors in the suprachiasmatic nucleus to lower core body temperature, facilitating sleep onset, while also serving as the rate‑limiting substrate for...

By LessWrong
China Declares AGI Development to Be a Part of 5-Year Plan
NewsMar 21, 2026

China Declares AGI Development to Be a Part of 5-Year Plan

China’s 15th Five‑Year Plan explicitly references artificial general intelligence (AGI), urging development of multimodal, agentic, embodied, and swarm intelligence technologies. The brief mention signals state endorsement of research pathways toward general AI capabilities. By embedding AGI in a national strategic...

By LessWrong
Utrecht Meetup #2, Making Beliefs Pay Rent
NewsMar 21, 2026

Utrecht Meetup #2, Making Beliefs Pay Rent

Utrecht Meetup #2 builds on the earlier Meet & Greet, inviting participants to examine beliefs that may not be "paying rent." Attendees are asked to bring one or two personal convictions they suspect are unproductive, fostering hands‑on discussion. The event...

By LessWrong
Grounding Coding Agents via Dixit
NewsMar 21, 2026

Grounding Coding Agents via Dixit

Senior developers increasingly encounter pull‑requests generated by coding agents that pass self‑written tests yet miss the true root cause. The article proposes a Dixit‑style game where isolated Coders and Testers interact through an Orchestrator that classifies tests as too easy,...

By LessWrong
An Agent Autonomously Builds a 1.5 GHz Linux-Capable RISC-V CPU
NewsMar 20, 2026

An Agent Autonomously Builds a 1.5 GHz Linux-Capable RISC-V CPU

Verkor’s AI agent, Design Conductor (DC), autonomously generated a 1.5 GHz Linux‑capable RISC‑V CPU in roughly 12 hours. The chip, named VerCore, implements RV32I and ZMMUL extensions, a five‑stage in‑order pipeline, and meets a CPI of ≤ 1.5 while targeting CoreMark scores. DC...

By LessWrong
"Lost in the Middle" Replicates
NewsMar 18, 2026

"Lost in the Middle" Replicates

A recent replication using a quantized Llama‑2 7B model confirmed the "Lost in the Middle" phenomenon reported by Liu et al. The experiment employed the multi‑document question‑answering benchmark derived from Natural Questions, testing three gold‑document positions (first, middle, last) across...

By LessWrong
I'm Starting a Substack
NewsMar 18, 2026

I'm Starting a Substack

Leogao announced the launch of a personal Substack newsletter, linking to nablatheta.substack.com. The post is a brief linkshare on LessWrong, signaling a shift toward independent publishing. It highlights the author’s intent to deliver longer-form content outside the platform’s standard post...

By LessWrong
Sanders's Data Center Moratorium Is Risky Strategy for AI Safety
NewsMar 16, 2026

Sanders's Data Center Moratorium Is Risky Strategy for AI Safety

Senator Bernie Sanders announced a bill to halt construction of new data centers, arguing that unchecked AI growth threatens jobs, democracy, and could lead to superintelligent systems beyond human control. Critics contend that a temporary moratorium would barely delay frontier...

By LessWrong
Digital Dichotomy and Why It Exists.
NewsMar 16, 2026

Digital Dichotomy and Why It Exists.

The article examines why college students in India feel conflicted about phone use, identifying an “Invisible Standard” that defines good versus bad usage without a clear source. It describes “productive procrastination” on Instagram, where users seek useful content but end...

By LessWrong
Brown Math Department Postdoctoral Position
NewsMar 16, 2026

Brown Math Department Postdoctoral Position

Brown University’s Mathematics Department has announced a rapid‑turnaround postdoctoral position focused on the intersection of mathematics and artificial intelligence. The role is open to candidates with a PhD in mathematics and welcomes any research that blends math with AI, not...

By LessWrong
Some Models Don't Identify with Their Official Name
NewsMar 15, 2026

Some Models Don't Identify with Their Official Name

A recent sweep of 102 large language models (LLMs) on OpenRouter revealed that 38 models (about 37%) self‑identified as a different AI on at least one prompt. Notable outliers include DeepSeek V3.2 Speciale, which claimed to be ChatGPT 77% of...

By LessWrong