Agentic-Engineering

Mechanistic Interpretability: Reading the Mind of a Model

TL;DR Mechanistic interpretability is the attempt to reverse-engineer a trained neural network into human-understandable parts - to say not just what a model does but which internal machinery makes it do that The core obstacle is superposition: models pack far more concepts than they have neurons by smearing each concept across many neurons and each neuron across many concepts, so a single neuron almost never means one clean thing Sparse autoencoders were the breakthrough that undid the smearing, pulling millions of monosemantic features out of a production model - Anthropic’s “Golden Gate Claude” demonstration proved these features are causal, not just correlational Circuit tracing went further, showing that models plan ahead when writing poetry, share a language-independent “space of thought,” and sometimes reason backwards from a desired answer while narrating a plausible-but-fake chain of thought I am a data engineer and an enthusiast here, not an interpretability researcher, but I think this is the single most under-watched thread in AI: it is the only path I know of to a model we can audit rather than merely test, and it quietly reshapes how I think about the mind question too Every other reliability technique I have written about treats the model as a black box. Retrieval, verification, structured outputs, evals - they all wrap machinery you cannot see and try to make its outputs trustworthy from the outside. That is the correct engineering stance today, and I stand by all of it. But it is also, if you sit with it, a slightly desperate stance. We are building the most consequential technology of the century and our primary safety strategy is to poke it from the outside and see what comes out. ...

Evaluating agents in production with trajectory metrics

Evaluating Agents in Production: Trajectory Metrics, Not Just Final Answers

TL;DR Endpoint evals miss the failure mode that hurts in production - an agent can reach the right answer through a reckless path: wrong tool first, lucky recovery, ignored constraints that did not bite this time Trajectory evaluation scores the run: which tools were called, in what order, with what arguments, and whether each step satisfied policy The minimum viable setup: 50–200 real examples, per-step rubrics, 10+ runs per example, statistical regression tracking, and a held-out set you never tune against Replay harnesses let you re-run a captured trace against a new model or policy without re-hitting production systems This is the measurement layer that connects broken public benchmarks to agent security - you cannot harden what you cannot observe AI Evals Are Broken argued that leaderboard numbers stopped measuring production capability. Securing AI Agents argued that the tool layer must enforce policy the model cannot be trusted to enforce. This post is the bridge: how you measure whether an agent actually behaves before and after you ship. ...

World Models: What Comes After the Language-Only Era

TL;DR Language-only models do not contain a reliable simulator of physical reality - they contain a statistical shadow of one, good enough for many tasks and dangerously wrong for others. A world model is a system that learns to predict how an environment evolves and can plan inside that prediction - not just describe it in text. The gap matters for agents that must act in physical space, manipulate objects, or reason about counterfactuals where the answer is not in the training corpus. The 2026 frontier includes generative world simulators, vision-language-action models for robotics, and sim-to-real pipelines - not one breakthrough but a stack assembling in parallel. For builders today: language agents with MCP tools are the right architecture for knowledge work. World models are the path to agents that can competently act in the physical world. Almost everything I have written about AI agents assumes a model whose understanding of the world arrives through text. That assumption has carried the field a long way. Context engineering, tool use via MCP, memory across sessions - all of it sits on top of language models that read, reason, and call APIs. ...

AI Evals Are Broken: Why Benchmarks Stopped Measuring Real Capability

When a frontier lab releases a new model in 2026, the press release leads with a row of benchmark scores. The numbers are bigger than they were a year ago, the model is the new state-of-the-art on whichever evaluation the lab chose to highlight, and the headline writes itself. The honest summary is that most of these numbers have stopped measuring what they were designed to measure, and the gap between benchmark performance and real-world capability is now wide enough that the benchmark-led narrative is actively misleading. ...

Securing AI Agents: Tool-Calling Risks, MCP Hardening, and the Confused Deputy Problem

TL;DR Agent security is reliability under an adversary. Everything you learned about debugging non-deterministic agents still applies - but now someone may be trying to break the system on purpose. The confused-deputy problem is the core threat. An agent acts with its own privileges on behalf of an instruction it cannot fully trust. Prompt injection is how the untrusted instruction gets in. The attack path is simple: untrusted input → agent reasoning → privileged tool call → data exfiltration, spend, or production damage. MCP hardening means least privilege at the tool layer - scoped filesystem roots, confirmation gates for irreversible actions, denylisted extensions, and policies enforced by a router, not by the prompt. Prompts cannot be your security boundary. Confirmation, allowlists, action budgets, and audit logs have to live in code the model cannot rewrite mid-run. I spent most of last year on agent reliability - why agents that demo well fail in production, how to constrain non-determinism, what evaluation actually looks like. That work assumed honest users and honest inputs. The moment I gave my home agent real tools - filesystem access, mail, calendar, shell - I realised I had been studying half the problem. ...

What I'm Researching in AI Right Now Banner

What I'm Researching in AI Right Now - And Where I'm Going Next

TL;DR I treat my own learning like a research agenda - a small set of questions I am actively chasing, not a reading list I feel guilty about The work I have been deep in clusters into four areas: agent reliability and non-determinism, context engineering and memory, the economics of intelligence, and the open-weight and small-model frontier The areas I have decided to move into next are the ones where I keep hitting questions I cannot answer well: securing agents that hold real tool access, evaluating agents on their trajectory rather than their final answer, world models beyond the language-only era, and the machine-to-machine agent economy I treat AGI timelines less as a forecast to win and more as a planning input - what changes for an engineer if capable autonomous systems arrive in three years rather than fifteen I am deliberately not chasing every frontier. Quantum machine learning and neuromorphic hardware sit on my watch list, not my work list, and being honest about that line is the whole point Most people consume AI news. I used to do the same - a feed of model releases, benchmark claims, and launch threads that left me feeling informed and changed nothing about what I could actually build. ...

Trust series - deploying AI agents in production

Trust: Conditions for Deploying AI Agents in Production

TL;DR The Trust series is my answer to one question: what has to be true before you can hand a non-deterministic system a real job and walk away? Read in this order: research map → evals → security → world models → trajectory evaluation → interpretability Supporting posts cover reliability, context engineering, and safety foundations Full series index: /series/trust/ Start here What I’m Researching in AI Right Now - the research map and trust through-line AI Evals Are Broken - why public benchmarks stopped measuring real capability Securing AI Agents - MCP hardening, confused deputy, and what I run on my home stack World Models: What Comes After the Language-Only Era - when text-only agents hit their ceiling Evaluating Agents in Production: Trajectory Metrics - step-level scoring, not just final answers Mechanistic Interpretability: Reading the Mind of a Model - the inside-out complement to behavioural safety Supporting reading AI Agents That Actually Work - patterns from real projects The Agent Reliability Problem - debugging non-deterministic systems Context Engineering - curating the window across a whole agent run AI Reliability Is Weird - why testing LLMs breaks familiar QA AI Safety From First Principles - engineering safety vs speculative scenarios Related paths Home Agent Stack - build the stack these defenses protect AI Dev Tooling - the coding-agent side of the same problem Related Reading AI Economics and Hardware: A Reading Path - cost and infrastructure decisions that constrain what you can actually deploy Expertise and Work in the Age of AI - how trust and accountability reshape what human expertise is for Agent Protocols in 2026: MCP, A2A, and ACP - the protocol layer where many trust boundaries live Structured Outputs and Schema Design for LLMs - making agent behaviour predictable enough to evaluate

Context Engineering: The Discipline That Replaced Prompt Engineering

TL;DR Prompt engineering optimised the wording of a single human-written request. Context engineering optimises the entire set of tokens in the model’s window across a whole run - system prompt, tool definitions, retrieved documents, tool results, conversation history, and memory The shift happened because of agents. The window is no longer one prompt you wrote - it is an accumulation that grows on every step, and most of it is produced by the system, not by you More context is not better context. Research on “context rot” and the older lost-in-the-middle effect show model accuracy degrades as the window fills, even well below the advertised limit The four levers are retrieval (what you pull in), memory (what persists across runs), tool results (what tools dump back), and compaction (what you summarise and discard) Treat the window as a budget. Measure its token composition, design tools to return terse output, curate rather than accumulate, and keep the static prefix stable so prompt caching still works For a few years, “prompt engineering” was the named skill of working with language models. It meant finding the wording, the framing, the few-shot examples, and the role instructions that coaxed the best answer out of a single request. It produced a small industry of prompt libraries, prompt marketplaces, and job titles. And in 2026 it is mostly gone, absorbed into something larger and harder. ...

AI Dev Tooling: A Reading Path for 2026

TL;DR Start with What Actually Belongs in My AI Dev Stack in 2026 - the canonical stack essay Then An AI Tooling Learning Path - phased skill-building order Deep dives below cover comparisons and spec-driven workflows; single-tool posts are briefs, not entry points Canonical essays What Actually Belongs in My AI Dev Stack in 2026 An AI Tooling Learning Path: Logical Phases for 2026 Context Engineering - the production skill behind reliable coding agents Spec-Driven Development - when the brief becomes the product Deep dives Claude Code vs Cursor: A 6-Month Comparison GitHub Spec Kit and Spec-Driven Development GitHub Spec Kit in 2026: SDD Goes Mainstream My AI-Augmented Design Workflow When to Fine-Tune vs When to RAG Briefs (moment-in-time) These are useful snapshots, not the starting point: ...

AI Economics and Hardware: A Reading Path

TL;DR Cost is a design constraint, not an afterthought - model tier, context size, and deployment location are economic decisions Read the essays below in any order; start with Token Economics if you only have time for one Pairs with open-weight models and local inference guides Core essays Token Economics: Why the Cost of AI Isn’t Going Down GPU Servers vs AI API Credits: The Real Cost Breakdown Local AI vs Cloud AI: The Tradeoff Landscape in 2026 The AI Energy Crisis: Why Data Center Power Will Define the Next Decade Cerebras, Groq, SambaNova: The Inference Hardware Insurgents Adjacent The State of Open-Weight Models in 2026 - when open weights beat closed APIs on price Prompt Caching - the quiet latency and cost win The Token Efficiency Mindset - curating spend per conversation Is the $20 AI Subscription Era Over? We Are Learning to Buy Intelligence Related Reading AI Dev Tooling: A Reading Path for 2026 - canonical path for coding agents and stack decisions that depend on these cost constraints Home Agent Stack: From Mac Studio to Secured MCP Tools - building the hardware and software layer these economics govern Reasoning Models in 2026: What Changed and What Didn’t - why reasoning models carry a different cost profile than base models The Free Intelligence Era - the macro argument for where intelligence costs are headed