Securing AI Agents: Tool-Calling Risks, MCP Hardening, and the Confused Deputy Problem
TL;DR Agent security is reliability under an adversary. Everything you learned about debugging non-deterministic agents still applies - but now someone may be trying to break the system on purpose. The confused-deputy problem is the core threat. An agent acts with its own privileges on behalf of an instruction it cannot fully trust. Prompt injection is how the untrusted instruction gets in. The attack path is simple: untrusted input → agent reasoning → privileged tool call → data exfiltration, spend, or production damage. MCP hardening means least privilege at the tool layer - scoped filesystem roots, confirmation gates for irreversible actions, denylisted extensions, and policies enforced by a router, not by the prompt. Prompts cannot be your security boundary. Confirmation, allowlists, action budgets, and audit logs have to live in code the model cannot rewrite mid-run. I spent most of last year on agent reliability - why agents that demo well fail in production, how to constrain non-determinism, what evaluation actually looks like. That work assumed honest users and honest inputs. The moment I gave my home agent real tools - filesystem access, mail, calendar, shell - I realised I had been studying half the problem. ...