When Machines Stop Speaking Our Language Banner

When Machines Stop Speaking Our Language - Binary Agents and the End of Compilers

TL;DR When two AI agents talk to each other in English, they are doing something faintly absurd: serialising rich internal state into a lossy human language, transmitting it, and decoding it back. English between machines is a compatibility layer, not a natural medium. Machines have already shown they will drop that layer the moment we let them - negotiation bots drifting out of English in 2017, agents switching to sound-based data protocols in 2025, and research systems now sharing internal model state directly with no language in between. The same logic applies to programming languages. Python and Rust exist for human readers. If agents write, maintain, and consume the software, the human-readability requirement quietly disappears - and with it, eventually, the need for source code and compilers as we know them. I do not think compilers vanish so much as sink. Like assembly, the layers below us stop being something humans write or read, while the guarantees they provide get absorbed into the agents’ toolchain. The part worth worrying about is not efficiency, it is legibility. Human language and human-readable code are our audit trail into what machines are doing. This is all speculation on my part, and I sketch where I think the line should be held. Human Language Is a Compatibility Layer Think about what actually happens when two AI agents have a conversation in English today. ...

June 10, 2026 · 11 min · James M
What I'm Researching in AI Right Now Banner

What I'm Researching in AI Right Now - And Where I'm Going Next

TL;DR I treat my own learning like a research agenda - a small set of questions I am actively chasing, not a reading list I feel guilty about The work I have been deep in clusters into four areas: agent reliability and non-determinism, context engineering and memory, the economics of intelligence, and the open-weight and small-model frontier The areas I have decided to move into next are the ones where I keep hitting questions I cannot answer well: securing agents that hold real tool access, evaluating agents on their trajectory rather than their final answer, world models beyond the language-only era, and the machine-to-machine agent economy I treat AGI timelines less as a forecast to win and more as a planning input - what changes for an engineer if capable autonomous systems arrive in three years rather than fifteen I am deliberately not chasing every frontier. Quantum machine learning and neuromorphic hardware sit on my watch list, not my work list, and being honest about that line is the whole point Most people consume AI news. I used to do the same - a feed of model releases, benchmark claims, and launch threads that left me feeling informed and changed nothing about what I could actually build. ...

June 8, 2026 · 12 min · James M
Trust series - deploying AI agents in production

Trust: Conditions for Deploying AI Agents in Production

TL;DR The Trust series is my answer to one question: what has to be true before you can hand a non-deterministic system a real job and walk away? Read in this order: research map → evals → security → world models → trajectory evaluation Supporting posts cover reliability, context engineering, and safety foundations Full series index: /series/trust/ Start here What I’m Researching in AI Right Now - the research map and trust through-line AI Evals Are Broken - why public benchmarks stopped measuring real capability Securing AI Agents - MCP hardening, confused deputy, and what I run on my home stack World Models: What Comes After the Language-Only Era - when text-only agents hit their ceiling Evaluating Agents in Production: Trajectory Metrics - step-level scoring, not just final answers Supporting reading AI Agents That Actually Work - patterns from real projects The Agent Reliability Problem - debugging non-deterministic systems Context Engineering - curating the window across a whole agent run AI Reliability Is Weird - why testing LLMs breaks familiar QA AI Safety From First Principles - engineering safety vs speculative scenarios Related paths Home Agent Stack - build the stack these defenses protect AI Dev Tooling - the coding-agent side of the same problem Related Reading AI Economics and Hardware: A Reading Path - cost and infrastructure decisions that constrain what you can actually deploy Expertise and Work in the Age of AI - how trust and accountability reshape what human expertise is for Agent Protocols in 2026: MCP, A2A, and ACP - the protocol layer where many trust boundaries live Structured Outputs and Schema Design for LLMs - making agent behaviour predictable enough to evaluate

June 8, 2026 · 2 min · James M
Why the AI Cyber Threat Is Rising Banner

Why the AI Cyber Threat Is Rising

For most of the last few years, the “AI and cybersecurity” conversation has been a vibes argument. One side said the models would soon write novel exploits at scale. The other side said the models were still tripping over basic shell commands and could not be trusted to hack anything more dangerous than a CTF box. The honest answer was that nobody had hard numbers, so the debate stayed stuck on intuition. ...

May 26, 2026 · 6 min · James M
Context Engineering - The Discipline That Replaced Prompt Engineering Banner

Context Engineering: The Discipline That Replaced Prompt Engineering

TL;DR Prompt engineering optimised the wording of a single human-written request. Context engineering optimises the entire set of tokens in the model’s window across a whole run - system prompt, tool definitions, retrieved documents, tool results, conversation history, and memory The shift happened because of agents. The window is no longer one prompt you wrote - it is an accumulation that grows on every step, and most of it is produced by the system, not by you More context is not better context. Research on “context rot” and the older lost-in-the-middle effect show model accuracy degrades as the window fills, even well below the advertised limit The four levers are retrieval (what you pull in), memory (what persists across runs), tool results (what tools dump back), and compaction (what you summarise and discard) Treat the window as a budget. Measure its token composition, design tools to return terse output, curate rather than accumulate, and keep the static prefix stable so prompt caching still works For a few years, “prompt engineering” was the named skill of working with language models. It meant finding the wording, the framing, the few-shot examples, and the role instructions that coaxed the best answer out of a single request. It produced a small industry of prompt libraries, prompt marketplaces, and job titles. And in 2026 it is mostly gone, absorbed into something larger and harder. ...

May 20, 2026 · 11 min · James M
Cursor Composer 2.5 banner

Composer 2.5: Cursor's In-House Model Grows Up

TL;DR Composer 2.5 is Cursor’s most capable in-house coding model yet, built on Moonshot’s open-source Kimi K2.5 checkpoint with about 85% of total training compute spent on Cursor’s own continued pretraining and RL The model is purpose-built for the agent loop inside Cursor - long-horizon tasks, hundreds of tool calls, multi-step instructions - rather than as a general-purpose chat model Cursor claims parity with Claude Opus 4.7 and GPT-5.5 on its own CursorBench v3.1 (63.2%) and a strong 79.8% on SWE-Bench Multilingual Pricing is dramatically lower: $0.50 / $2.50 per million input/output tokens on the default variant, with included usage doubled for the first week Together with SpaceXAI, Cursor is now training a much larger successor model from scratch on Colossus 2 with around 10x the compute - so 2.5 is a waypoint, not the endgame For a while, Cursor was an IDE wrapped around someone else’s models - Claude, GPT, Gemini. That story has shifted. With Composer 2.5, released this week, Cursor has shipped its most capable first-party coding model yet, and it is a serious enough piece of work that it deserves real consideration as a daily driver rather than a budget fallback. ...

May 18, 2026 · 8 min · James M
Home agent stack reading path

Home Agent Stack: From Mac Studio to Secured MCP Tools

TL;DR This path walks through the full stack I run on a Mac Studio: local models → MCP tools → memory → remote access → security Almost no other blogs document the build and the hardening layer together Finish with Securing AI Agents before giving the agent real filesystem or mail access Part of the broader Trust series Read in order Which Mac Studio Should You Buy for Running LLMs Locally? - hardware and model sizing Giving Your Home AI Agent Real Tools: MCP Servers on a Mac Studio - wiring the tool layer Giving Your Home AI Agent Memory That Lasts - persistence across sessions How to Phone Your Home AI Agent - remote access when you are away Securing AI Agents - least privilege, confirmation gates, audit logs Adjacent guides Running AI Models Locally with Ollama - lighter-weight local inference option Agent Protocols in 2026: MCP, A2A, and ACP - the protocol layer Local AI vs Cloud AI - when to host vs call APIs DGX Spark vs Mac Studio - if you are sizing a dedicated inference box Related Reading AI Economics and Hardware: A Reading Path - token costs, GPU sizing, and energy constraints behind every hardware decision AI Dev Tooling: A Reading Path for 2026 - the coding and development layer that sits above the agent infrastructure Open WebUI: A Self-Hosted LLM Interface - web interface layer to pair with local inference Agent Protocols in 2026: MCP, A2A, and ACP - the protocol layer connecting agents to tools

May 15, 2026 · 2 min · James M
The Agent Reliability Problem Banner

The Agent Reliability Problem: Debugging Non-Deterministic Systems

The conventional reliability engineering toolkit was built for systems that behaved the same way each time given the same input. AI agents do not behave the same way each time given the same input. The classic tools - unit tests, integration tests, deterministic replay, traditional monitoring - all assume a property that the systems being operated do not have. This mismatch is not a small operational annoyance; it is the central challenge of running AI agents in production, and the patterns for handling it are still being worked out. ...

May 15, 2026 · 7 min · James M
ETL Tools and Data Integration

ETL Tools & Data Integration Platforms

What is ETL? ETL is a foundational data engineering process that powers modern analytics: Extract - Retrieve data from various sources (databases, APIs, files, cloud services, streaming platforms) Transform - Clean, validate, deduplicate, and reshape data into required data models Load - Move processed data into data warehouses, data lakes, or analytical systems ETL ensures data quality, consistency, and accessibility for analytics and reporting. In 2026 the dominant pattern is ELT (Extract-Load-Transform), which leverages cloud data warehouse compute for transformation, and increasingly EtLT (adding lightweight pre-load transforms for streaming and schema drift). See the Fundamentals of Data Engineering book for a deeper framing. ...

May 4, 2026 · 9 min · James M
Onchain AI Agents Hype Reality Banner

Onchain AI Agents - Hype, Reality, and Where the Money Actually Flows

TL;DR “Onchain AI agents” became the dominant crypto narrative in 2025 and has cooled meaningfully in 2026 as the picture has gotten clearer. The honest taxonomy has three buckets: agents that hold wallets and trade, agents that automate DeFi operations, and agents that exist primarily as tokens with a chatbot attached. Only the first two are doing real work. Real revenue is concentrated in agent-driven DeFi automation, MEV strategies executed by agents, and onchain payment rails for AI services. Most of the rest is meme economics dressed in technical clothing. The structural question - “do AI agents need crypto rails at all” - has become a genuinely live debate. The answer in 2026 is “yes, but only for a narrow set of jobs, and most of those jobs are not what was being pitched.” If you are evaluating an onchain AI agent project, the test is brutally simple: strip away the token and ask whether the agent does something useful. If the answer is no, the project is a token with extra steps. How We Got Here The phrase “onchain AI agent” started showing up in crypto Twitter in late 2024 and exploded in early 2025. By the middle of last year there were thousands of agent tokens, dozens of agent platforms, and a handful of agents with billion-dollar implied market caps doing things that would have embarrassed a 2010-era chatbot. ...

May 3, 2026 · 9 min · James M