AI Reliability Is Weird: Why Testing LLMs Breaks Everything You Know

We’ve embraced the future. AI agents like Cline are now the primary “builders” of software, executing complex engineering plans from high-level specifications. As I’ve argued in “The Architect vs The Builder”, the human role is shifting from execution to architectural oversight and defining intent. But this shift introduces a profound, often uncomfortable, question: How do we know it actually works? In a world where AI is writing the code, generating the data, and even orchestrating deployments, traditional notions of testing and reliability are breaking down. AI reliability is weird, and it demands a complete re-evaluation of our verification strategies. ...

April 9, 2026 · 6 min · James M