This section is organised around one question: what has to be true before you can trust AI to do real work? Reliability, context, economics, security, evaluation, and eventually physical action - each post is a different angle on the same problem.

Start here

I want to build

I want context

Resources

Link indexes and tool directories - useful for discovery, not the narrative spine:

GitHub backing OpenClaw

GitHub Is Now Officially Backing OpenClaw

TL;DR GitHub became an official sponsor of OpenClaw, the fastest-growing open source project in history, breaking React’s 10-year GitHub milestone in just 60 days The sponsorship is concrete, not symbolic - it includes Copilot Pro+ access, dedicated security funding, and scalability support for the project team GitHub sponsors projects that matter for the future of software development, and this backing signals OpenClaw has crossed from “interesting experiment” into infrastructure-level significance The move is a bet that open source AI agents will be central to how software is built in 2026 and beyond, and that GitHub wants to be the home where that class of technology lives and scales OpenClaw’s growth trajectory and now its platform backing make it a clear signal about the direction of agentic, AI-operated software development Two weeks ago, GitHub made a quiet but significant announcement: they are now an official sponsor of OpenClaw. ...

April 14, 2026 · 4 min · James M
Token economics - why AI costs are not falling

Token Economics: Why the Cost of AI Isn't Going Down

TL;DR Inference cost is architectural - generating each token requires loading massive models into GPU memory, and that fundamental constraint doesn’t disappear with scale or competition Despite Moore’s Law expectations, flagship model prices (Claude 3, GPT-4) have remained flat for 18+ months because demand growth absorbs any efficiency gains The true cost of using AI is 1.5 - 2.5x the raw token price once you factor in monitoring, retries, fine-tuning, and compliance overhead Providers convert efficiency gains into better features (longer context, faster inference, multimodal) rather than lower prices - you get more value per dollar, not fewer dollars Stop waiting for cheaper AI; treat token costs as fixed infrastructure spend and optimise usage with tools like prompt caching instead There’s a persistent myth in tech: AI will get cheaper. The argument is straightforward - Moore’s Law, scale effects, competition, and raw compute efficiency improvements mean costs should plummet. Yet in April 2026, Claude costs roughly what it did in 2024. GPT-4 Turbo pricing hasn’t moved in eighteen months. Gemini’s cost structure remains sticky. Why? ...

April 13, 2026 · 8 min · James M
Claude Mythos restricted release

The Forbidden Frontier: Claude Mythos and the Dawn of Restricted AI Power

TL;DR Claude Mythos is Anthropic’s most powerful model to date, scoring 93.9% on SWE-bench and 97.6% on USAMO 2026 - a 55-point leap over rival models It is not publicly available; Anthropic restricted access to 12 vetted companies through Project Glasswing, focused on defensive cybersecurity Mythos autonomously identified thousands of zero-day vulnerabilities, including a 27-year-old unpatched OpenBSD bug - making its offensive potential too dangerous to democratize This marks a shift away from open innovation toward controlled deployment, where the most capable AI may never be publicly released The Mythos story forces a rethink of how we evaluate AI: benchmark performance and public availability are no longer the same thing Anthropic built its most capable model to date, demonstrated it autonomously discovering thousands of zero-day vulnerabilities, and then declined to release it. That is the Mythos story, and it is worth sitting with rather than rushing past. The benchmarks are striking, but the decision not to publish is the more consequential part - it signals a real shift in how frontier AI labs are thinking about deployment. ...

April 13, 2026 · 4 min · James M
Structured outputs and schema design for LLMs

Structured Outputs: When Your AI Needs to Follow a Schema

TL;DR Structured outputs constrain an LLM’s response to match a JSON schema during generation, eliminating the entire class of post-processing parse failures (which occur 2-5% of the time with free-form output) They produce simpler code, more reliable pipelines, and modest inference cost savings (typically 5-15% fewer tokens) in high-volume systems Use structured outputs for data extraction, classification, entity recognition, and API payload generation - not for creative writing or open-ended reasoning Common mistakes include over-constraining schemas with too-strict enums, forgetting that the response format changes, and mistaking schema validity for semantic correctness The trajectory is toward structured outputs becoming the default: schemas will be inferred from English descriptions, and TypeScript types will auto-generate schemas For years, extracting structured data from LLMs meant post-processing their text output: parse JSON, handle edge cases where the model forgot to close a bracket, write validation code to check if the output matched your schema, implement fallback logic when parsing failed. ...

April 12, 2026 · 7 min · James M
Small language models - why size is not everything

The Rise of Small Language Models: Why Size Isn't Everything

TL;DR Small language models (typically under 15B parameters) trained on high-quality data can match or outperform much larger models on many real-world tasks, thanks to distillation, instruction tuning, and quantization The key advantages are speed (milliseconds vs seconds), cost (no per-token API charges), privacy (data stays on your hardware), and offline capability Standout models include Mistral 7B for speed, Phi-3 for edge devices, and OpenClaw for code and reasoning - all usable locally via Ollama The industry is moving toward a multi-tier approach: small models (7-13B) for 80% of workloads, medium models as a step-up, and large models reserved only for complex reasoning tasks where they genuinely outperform Large models still win on deep multi-step reasoning, breadth of knowledge, and few-shot generalization - the shift is about matching model size to task, not replacing large models entirely For years, the narrative was simple: bigger is better. GPT-4 was massive, Claude was massive, and the race seemed to be about who could train the largest model on the most data. But that story is changing. Small language models - typically under 15 billion parameters - are proving that you don’t need 175 billion parameters to solve real problems. ...

April 12, 2026 · 8 min · James M
LLM context window arms race

The LLM Context Window Arms Race: Does It Actually Matter?

TL;DR Context window size is the wrong metric to optimise for - attention scales quadratically, so larger windows mean dramatically higher latency and cost with diminishing quality gains Retrieval-augmented generation consistently outperforms stuffing entire documents into a prompt, because focused context beats diluted context What actually matters in production: token efficiency, prompt caching, structured output formats, and intelligent retrieval - not raw window size Large context windows are genuinely useful for whole-document analysis and complex cross-file code review, but wasteful for Q&A, structured extraction, and high-volume routine tasks The teams that will ship faster and scale further are those building intelligent architecture around a 200K context window, not those waiting for 1M-token models Every week brings a new headline: “Model X reaches 1M token context!” “Model Y supports 2M tokens!” The LLM industry seems locked in an arms race where the stated goal is always “bigger context window,” as if this single metric determines whether a model is useful. ...

April 11, 2026 · 7 min · James M
Local vs cloud AI tradeoffs in 2026

Local AI vs Cloud AI: The Tradeoff Landscape in 2026

The local vs. cloud AI debate used to be simple: cloud was smarter, local was cheaper and private. In 2026 that framing has collapsed. The hardware caught up to the software. Unified memory on Apple Silicon and 24GB+ VRAM cards like the RTX 50-series mean local inference is no longer a compromise - it is a deliberate architectural choice. Professional engineers are not “trying to see if Llama runs on a Mac” anymore. They are building sophisticated Hybrid AI Stacks where local and cloud models each handle the workloads they are genuinely suited for. Here is the tradeoff landscape as it stands today. ...

April 11, 2026 · 5 min · James M
Cline AI coding agent

Cline: The Next Generation AI Coding Assistant

Reading path: For the canonical stack essay, start with AI Dev Tooling. TL;DR Cline (formerly Claude Dev) is an open-source VS Code extension that acts as an autonomous agent - it reasons, uses tools, runs terminal commands, and verifies its own work in a loop Unlike “chat-and-copy” tools, Cline operates as an operator with tools: reading files, executing code, running tests, and iterating until a task is complete Model Context Protocol (MCP) is Cline’s superpower - it lets Cline connect to external data sources like databases, documentation, and APIs without those features being hard-coded Compared to Cursor (best for speed and UX) and Claude Code (best for terminal-native workflows), Cline excels at complex, multi-file tasks that span many steps The developer’s role shifts from writing syntax to architectural oversight - you review intent and direction, not individual lines of code In the rapidly evolving landscape of AI Dev Stacks, a new heavyweight has emerged that fundamentally changes the “Assistant” dynamic. Formerly known as Claude Dev, Cline has matured into a sophisticated autonomous agent that doesn’t just suggest code - it executes engineering plans. ...

April 10, 2026 · 4 min · James M
Cline Kanban integration via MCP

Cline + Kanban: Autonomous Development Meets Project Management

TL;DR Cline integrates with Kanban boards (Linear, GitHub Projects, Jira, Trello) via Model Context Protocol (MCP), closing the gap between project management and code execution Instead of manually copy-pasting tasks, Cline reads directly from your board, works through the implementation, and updates the task status automatically when done This makes the Kanban board the single source of truth - it stays in sync with reality rather than being an afterthought you update when you remember Works best with clear, testable acceptance criteria; vague tasks like “improve performance” need refinement before Cline can act on them autonomously Even with full autonomy, human code review remains essential - Cline completing a task means it is “Ready for Review”, not that it ships In the evolution of agentic software engineering, one critical gap remains: the disconnect between project management and code execution. Your Kanban board tracks what needs doing, but your AI assistant lives in your IDE. Cline + Kanban closes that gap. ...

April 10, 2026 · 5 min · James M
Cline Kanban integration via MCP

Cline + Kanban: Autonomous Development Meets Project Management

TL;DR Cline integrates with Kanban boards (Linear, GitHub Projects, Jira, Trello) via Model Context Protocol (MCP), closing the gap between project management and code execution Instead of manually copy-pasting tasks, Cline reads directly from your board, works through the implementation, and updates the task status automatically when done This makes the Kanban board the single source of truth - it stays in sync with reality rather than being an afterthought you update when you remember Works best with clear, testable acceptance criteria; vague tasks like “improve performance” need refinement before Cline can act on them autonomously Even with full autonomy, human code review remains essential - Cline completing a task means it is “Ready for Review”, not that it ships In the evolution of agentic software engineering, one critical gap remains: the disconnect between project management and code execution. Your Kanban board tracks what needs doing, but your AI assistant lives in your IDE. Cline + Kanban closes that gap. ...

April 10, 2026 · 5 min · James M