Alignment

Recursive Self-Improvement: Can AI Bootstrap Its Own Intelligence?

TL;DR Recursive self-improvement (RSI) is the idea of an AI that improves its own ability to improve - each round producing a smarter system that does the next round better. It is the engine behind every “intelligence explosion” story since I.J. Good described it in 1965 The narrow version is already real. Systems like AlphaEvolve and the AI Scientist measurably improve algorithms, code, and even research output - including, in AlphaEvolve’s case, the infrastructure that trains the models themselves The leap people fear is different: improving an algorithm is not the same as improving general intelligence. Nothing in 2026 has crossed that line, and the gap is structural, not just a matter of scale Four bottlenecks decide whether RSI runs away or fizzles: compute, data, verification, and diminishing returns. Each is a hard physical or informational limit, not a temporary engineering nuisance The realistic picture is steady, human-paced acceleration - AI assisting AI research - not an overnight takeoff. METR’s time-horizon data shows fast but smooth exponential progress, which is exactly what a bottlenecked process looks like In May 2026 Anthropic put numbers on this from inside a frontier lab. Its essay When AI Builds Itself reports that over 80% of the code it merges is now written by Claude, that task horizons are doubling every roughly four months rather than seven, and lays out a candid three-way bet on where this ends. None of it overturns the bottlenecked-flywheel picture - but it sharpens it It still deserves serious safety attention, because a slow takeoff is the one we can actually govern There is a particular shape of argument that has haunted artificial intelligence since before the field had a settled name. It goes like this: build a machine slightly better than humans at designing machines, and it will design a machine better than itself. That machine designs a better one. The loop tightens, each turn faster than the last, and intelligence runs away from us in an afternoon. ...

Personal Universes: Yampolskiy's Strangest Answer to the AI Alignment Problem

First, the thing this is all in service of. The AI alignment problem is the challenge of making a powerful AI system reliably pursue what we actually want it to pursue - getting its goals, values, and behaviour to line up with human intentions, and to stay lined up even as the system becomes more capable than the people supervising it. It sounds simple and is not: we struggle to state our own values precisely, those values conflict between people, and an AI optimising hard for a slightly-wrong objective can produce outcomes nobody asked for. The multi-agent version - aligning one system with all of humanity at once, rather than a single person - is harder still, and it is the specific version Personal Universes is trying to dodge. ...

Roman Yampolskiy: The Researcher Who Thinks AI Cannot Be Controlled

Most people writing about AI risk in 2026 are recent arrivals. Roman Yampolskiy is not. He has been making the same argument - that advanced AI systems may be fundamentally uncontrollable - since before the field of AI safety had a settled name, which is partly because he is the one who gave it that name. Whether you find his conclusions alarmist, prescient, or somewhere in between depends mostly on how you read the gap between current systems and the ones he writes about. This post is an attempt to lay out the man, the argument, and the reasons it deserves more than a dismissal. ...

AI Safety From First Principles: What Actually Matters vs What's Hype

TL;DR “AI safety” covers four distinct layers - product safety, system safety, model alignment, and civilisational safety - and conflating them produces incoherent debates For engineers building production systems today, system safety dominates: most real incidents trace back to flawed system design around the model, not the model itself Practical mitigations are unglamorous: scope tool permissions, bound blast radius, require human approval for irreversible actions, validate outputs, and observe everything The hype conflates capability with intent, existential risk with ordinary risk, and refusal with safety - all three conflations make the conversation harder to act on The load-bearing principle across all four layers is the same: a system should fail in ways that are detectable, recoverable, and bounded The AI safety conversation has reached the point where the phrase has stopped meaning anything specific. In the same week, you will see “AI safety” used to describe content moderation on a chat product, the alignment of frontier models toward human values, the question of whether superintelligence ends civilisation, and a regulatory paper about copyright. These are not the same problem. Treating them as one conversation is the reason the conversation never resolves. ...