Trust: Conditions for Deploying AI Agents in Production

TL;DR

The Trust series is my answer to one question: what has to be true before you can hand a non-deterministic system a real job and walk away?
Read in this order: research map → evals → security → world models → trajectory evaluation → interpretability
Supporting posts cover reliability, context engineering, and safety foundations
Full series index: /series/trust/

What I’m Researching in AI Right Now - the research map and trust through-line
AI Evals Are Broken - why public benchmarks stopped measuring real capability
Securing AI Agents - MCP hardening, confused deputy, and what I run on my home stack
World Models: What Comes After the Language-Only Era - when text-only agents hit their ceiling
Evaluating Agents in Production: Trajectory Metrics - step-level scoring, not just final answers
Mechanistic Interpretability: Reading the Mind of a Model - the inside-out complement to behavioural safety

AI Economics and Hardware: A Reading Path - cost and infrastructure decisions that constrain what you can actually deploy
Expertise and Work in the Age of AI - how trust and accountability reshape what human expertise is for
Agent Protocols in 2026: MCP, A2A, and ACP - the protocol layer where many trust boundaries live
Structured Outputs and Schema Design for LLMs - making agent behaviour predictable enough to evaluate