This section is organised around one question: what has to be true before you can trust AI to do real work? Reliability, context, economics, security, evaluation, and eventually physical action - each post is a different angle on the same problem.

Start here

I want to build

I want context

Resources

Link indexes and tool directories - useful for discovery, not the narrative spine:

Government directive to suspend Fable 5 and Mythos 5 access

Pulled From The Shelf: The Government Order to Suspend Fable 5 and Mythos 5

TL;DR On 12 June 2026 at 5:21pm ET, the US government issued an export control directive ordering Anthropic to suspend all access to Fable 5 and Mythos 5 - globally, for every user, including Anthropic’s own employees The stated reason is national security: the government believes it has identified a method of jailbreaking Fable 5. Anthropic says the evidence was verbal only and describes a narrow, non-universal technique - essentially asking the model to read a codebase and fix software flaws Anthropic reviewed a demonstration and found it surfaced a small number of previously known, minor vulnerabilities that are widely available from other models Anthropic disagrees that a narrow jailbreak justifies recalling a commercial model deployed to hundreds of millions of people, and warns the same standard would “essentially halt all new model deployments for all frontier model providers” All other Anthropic models are unaffected. The company says it believes this is a misunderstanding and is working to restore access Four days. That is how long Mythos-class capability lasted as a publicly available product before the US government ordered it off the shelf. ...

June 13, 2026 · 10 min · James M
Expertise and work reading path

Expertise and Work in the Age of AI: A Reading Path

TL;DR Start with What Does Expertise Mean When AI Can Pass Any Exam? - the credential crisis Then What It Means to Be Expert in 2030 - where the speculation goes next The through-line: expertise is shifting from reference knowledge to judgement, accountability, and taste Read in order What Does ‘Expertise’ Mean When AI Can Pass Any Exam? What It Means to Be Expert in 2030 The Architect vs The Builder Taste Is the New Scarcity The Automation Paradox Career and pipeline Agent-First Architecture: The Engineer as Curator The Junior Developer Pipeline Problem Will AI Kill Coding Jobs? The Meaning of Work in an Age of Abundance Related Trust series - accountability when agents act on your behalf Securing AI Agents - the liability side of delegated work Related Reading AI Dev Tooling: A Reading Path for 2026 - how the tooling changes the practical skill equation day to day Four Futures: Reading the Signals - the broader economic scenarios these evolving roles exist within The Free Intelligence Era - why intelligence abundance reshapes demand for human expertise The Next Decade of AI - longer-horizon thinking on where expertise and AI diverge or converge

June 12, 2026 · 1 min · James M
Inside Anthropic Bloomberg The Circuit Documentary Banner

Inside Anthropic: What The Bloomberg Documentary Reveals

TL;DR Bloomberg’s The Circuit with Emily Chang went inside Anthropic in a rare, in-depth episode released June 10, 2026. Dario and Daniela Amodei discuss the founding story, the Pentagon dispute, and why they say safety and commercial success are the same bet. Anthropic is now valued at $965 billion, eclipsing OpenAI’s $852 billion for the first time, after an 80-fold revenue surge in Q1 2026. The Pentagon story is not PR - Anthropic refused to remove safety guardrails from its military contract, was blacklisted by the Trump administration, and sued. A federal judge sided with Anthropic. A confidential S-1 IPO filing in June 2026 means this stops being a private company conversation soon. The Bloomberg Documentary: Emily Chang Inside Anthropic Bloomberg’s The Circuit has done this kind of access piece before - Zuckerberg, Musk, Jensen Huang. But the Anthropic episode feels different in tone. Emily Chang is not sitting across from a founder who has already won. She is sitting across from two founders in the middle of one of the most consequential moments in the company’s short history: record valuation, Pentagon litigation, IPO on the horizon, and model releases arriving fast enough that the competitive landscape changes every few months. ...

June 12, 2026 · 7 min · James M
Policy on the AI Exponential Banner

Policy on the AI Exponential: Dario Amodei's Case for Acting While the Window Is Open

Dario Amodei has published a new essay, Policy on the AI Exponential, and it reads like the third act of a trilogy. Machines of Loving Grace made the case for what powerful AI could give us. The Adolescence of Technology catalogued what could go wrong. This one is about the machinery in between - the laws, agencies, and international arrangements that will decide which of those two essays turns out to be the better prediction. ...

June 11, 2026 · 8 min · James M
When Machines Stop Speaking Our Language Banner

When Machines Stop Speaking Our Language - Binary Agents and the End of Compilers

TL;DR When two AI agents talk to each other in English, they are doing something faintly absurd: serialising rich internal state into a lossy human language, transmitting it, and decoding it back. English between machines is a compatibility layer, not a natural medium. Machines have already shown they will drop that layer the moment we let them - negotiation bots drifting out of English in 2017, agents switching to sound-based data protocols in 2025, and research systems now sharing internal model state directly with no language in between. The same logic applies to programming languages. Python and Rust exist for human readers. If agents write, maintain, and consume the software, the human-readability requirement quietly disappears - and with it, eventually, the need for source code and compilers as we know them. I do not think compilers vanish so much as sink. Like assembly, the layers below us stop being something humans write or read, while the guarantees they provide get absorbed into the agents’ toolchain. The part worth worrying about is not efficiency, it is legibility. Human language and human-readable code are our audit trail into what machines are doing. This is all speculation on my part, and I sketch where I think the line should be held. Human Language Is a Compatibility Layer Think about what actually happens when two AI agents have a conversation in English today. ...

June 10, 2026 · 11 min · James M
Claude Fable 5 and Mythos 5 release

Claude Fable 5 and Mythos 5: Anthropic's Mythos-Class Models Go Public - With Guardrails

TL;DR Claude Fable 5 is Anthropic’s first Mythos-class model made safe for general use - state-of-the-art on nearly every benchmark Anthropic tested, with the gap widening on longer, more complex tasks Claude Mythos 5 is the same underlying model with cyber safeguards lifted for Project Glasswing partners; a biology trusted-access program is coming next Risky queries in cybersecurity, biology/chemistry, or suspected distillation attempts are routed to Claude Opus 4.8 instead - roughly 5% of sessions, with Anthropic acknowledging some false positives Pricing drops to $10 / $50 per million input/output tokens - less than half what Mythos Preview cost Fable 5 is free on Pro, Max, Team, and seat-based Enterprise plans through 22 June 2026, then moves to usage credits until capacity catches up Two months ago I wrote that Claude Mythos Preview was the benchmark breaker that would not be released - 93.9% on SWE-bench, thousands of zero-day vulnerabilities found autonomously, access restricted to a dozen companies through Project Glasswing. The question hanging over that post was whether Anthropic could ever democratise Mythos-level capability without democratising the offensive potential. ...

June 9, 2026 · 11 min · James M
What I'm Researching in AI Right Now Banner

What I'm Researching in AI Right Now - And Where I'm Going Next

TL;DR I treat my own learning like a research agenda - a small set of questions I am actively chasing, not a reading list I feel guilty about The work I have been deep in clusters into four areas: agent reliability and non-determinism, context engineering and memory, the economics of intelligence, and the open-weight and small-model frontier The areas I have decided to move into next are the ones where I keep hitting questions I cannot answer well: securing agents that hold real tool access, evaluating agents on their trajectory rather than their final answer, world models beyond the language-only era, and the machine-to-machine agent economy I treat AGI timelines less as a forecast to win and more as a planning input - what changes for an engineer if capable autonomous systems arrive in three years rather than fifteen I am deliberately not chasing every frontier. Quantum machine learning and neuromorphic hardware sit on my watch list, not my work list, and being honest about that line is the whole point Most people consume AI news. I used to do the same - a feed of model releases, benchmark claims, and launch threads that left me feeling informed and changed nothing about what I could actually build. ...

June 8, 2026 · 12 min · James M
Geoffrey Hinton - AI Researcher and Pioneer

Geoffrey Hinton Interviews

Few people have done more to build modern AI, and fewer still have turned around to warn the world about it as loudly. Geoffrey Hinton spent half a century making neural networks work when most of the field thought they never would, and then - at the point of maximum credibility - left his job at Google to say he was worried about where the technology is heading. This page is a growing, chronological index of his interviews, talks, and public appearances, with enough context around each to know what you are clicking into. ...

June 8, 2026 · 6 min · James M
Trust series - deploying AI agents in production

Trust: Conditions for Deploying AI Agents in Production

TL;DR The Trust series is my answer to one question: what has to be true before you can hand a non-deterministic system a real job and walk away? Read in this order: research map → evals → security → world models → trajectory evaluation Supporting posts cover reliability, context engineering, and safety foundations Full series index: /series/trust/ Start here What I’m Researching in AI Right Now - the research map and trust through-line AI Evals Are Broken - why public benchmarks stopped measuring real capability Securing AI Agents - MCP hardening, confused deputy, and what I run on my home stack World Models: What Comes After the Language-Only Era - when text-only agents hit their ceiling Evaluating Agents in Production: Trajectory Metrics - step-level scoring, not just final answers Supporting reading AI Agents That Actually Work - patterns from real projects The Agent Reliability Problem - debugging non-deterministic systems Context Engineering - curating the window across a whole agent run AI Reliability Is Weird - why testing LLMs breaks familiar QA AI Safety From First Principles - engineering safety vs speculative scenarios Related paths Home Agent Stack - build the stack these defenses protect AI Dev Tooling - the coding-agent side of the same problem Related Reading AI Economics and Hardware: A Reading Path - cost and infrastructure decisions that constrain what you can actually deploy Expertise and Work in the Age of AI - how trust and accountability reshape what human expertise is for Agent Protocols in 2026: MCP, A2A, and ACP - the protocol layer where many trust boundaries live Structured Outputs and Schema Design for LLMs - making agent behaviour predictable enough to evaluate

June 8, 2026 · 2 min · James M
Recursive Self-Improvement - Can AI Bootstrap Its Own Intelligence? Banner

Recursive Self-Improvement: Can AI Bootstrap Its Own Intelligence?

TL;DR Recursive self-improvement (RSI) is the idea of an AI that improves its own ability to improve - each round producing a smarter system that does the next round better. It is the engine behind every “intelligence explosion” story since I.J. Good described it in 1965 The narrow version is already real. Systems like AlphaEvolve and the AI Scientist measurably improve algorithms, code, and even research output - including, in AlphaEvolve’s case, the infrastructure that trains the models themselves The leap people fear is different: improving an algorithm is not the same as improving general intelligence. Nothing in 2026 has crossed that line, and the gap is structural, not just a matter of scale Four bottlenecks decide whether RSI runs away or fizzles: compute, data, verification, and diminishing returns. Each is a hard physical or informational limit, not a temporary engineering nuisance The realistic picture is steady, human-paced acceleration - AI assisting AI research - not an overnight takeoff. METR’s time-horizon data shows fast but smooth exponential progress, which is exactly what a bottlenecked process looks like In May 2026 Anthropic put numbers on this from inside a frontier lab. Its essay When AI Builds Itself reports that over 80% of the code it merges is now written by Claude, that task horizons are doubling every roughly four months rather than seven, and lays out a candid three-way bet on where this ends. None of it overturns the bottlenecked-flywheel picture - but it sharpens it It still deserves serious safety attention, because a slow takeoff is the one we can actually govern There is a particular shape of argument that has haunted artificial intelligence since before the field had a settled name. It goes like this: build a machine slightly better than humans at designing machines, and it will design a machine better than itself. That machine designs a better one. The loop tightens, each turn faster than the last, and intelligence runs away from us in an afternoon. ...

June 4, 2026 · 16 min · James M