Trust

TL;DR The Trust series is my answer to one question: what has to be true before you can hand a non-deterministic system a real job and walk away? Read in this order: research map → evals → security → world models → trajectory evaluation → interpretability Supporting posts cover reliability, context engineering, and safety foundations Full series index: /series/trust/ Start here What I’m Researching in AI Right Now - the research map and trust through-line AI Evals Are Broken - why public benchmarks stopped measuring real capability Securing AI Agents - MCP hardening, confused deputy, and what I run on my home stack World Models: What Comes After the Language-Only Era - when text-only agents hit their ceiling Evaluating Agents in Production: Trajectory Metrics - step-level scoring, not just final answers Mechanistic Interpretability: Reading the Mind of a Model - the inside-out complement to behavioural safety Supporting reading AI Agents That Actually Work - patterns from real projects The Agent Reliability Problem - debugging non-deterministic systems Context Engineering - curating the window across a whole agent run AI Reliability Is Weird - why testing LLMs breaks familiar QA AI Safety From First Principles - engineering safety vs speculative scenarios Related paths Home Agent Stack - build the stack these defenses protect AI Dev Tooling - the coding-agent side of the same problem Related Reading AI Economics and Hardware: A Reading Path - cost and infrastructure decisions that constrain what you can actually deploy Expertise and Work in the Age of AI - how trust and accountability reshape what human expertise is for Agent Protocols in 2026: MCP, A2A, and ACP - the protocol layer where many trust boundaries live Structured Outputs and Schema Design for LLMs - making agent behaviour predictable enough to evaluate