This section is organised around one question: what has to be true before you can trust AI to do real work? Reliability, context, economics, security, evaluation, and eventually physical action - each post is a different angle on the same problem.

Start here

I want to build

I want context

Resources

Link indexes and tool directories - useful for discovery, not the narrative spine:

GPT-5.5 release illustration

GPT-5.5 Is Here: Real Step Forward or Quiet Iteration?

TL;DR GPT-5.5 (“Spud”) is the first fully retrained base model since GPT-4.5, with architecture and pretraining reworked from scratch with agentic objectives in mind It takes the top spot on Terminal-Bench 2.0 (82.7%) and GDPval (84.9%), narrowly beating Anthropic’s Claude Mythos Preview on agentic coding benchmarks A 1M-token context window is new for OpenAI, enabling whole-codebase reasoning and long multi-step agent runs without context collapse Pricing is competitive ($5/$30 per million input/output tokens) but the strategic story is about OpenAI building an integrated super app - chat, code, browser agent - all driven by one model The gains are incremental, not a leap - but the full retraining signals OpenAI is betting the next two years on autonomous agentic work, not chat OpenAI released GPT-5.5 on April 23, 2026, weeks after GPT-5.4 and only months after GPT-5. The cadence is starting to feel relentless. Codenamed “Spud” internally, GPT-5.5 is the first fully retrained base model since GPT-4.5 - architecture, pretraining corpus, and agent-oriented objectives all reworked from scratch. ...

April 24, 2026 · 6 min · James M
Agent-First Architecture Banner

Agent-First Architecture: The Engineer as System Curator

TL;DR Agent-first architecture imagines a future where the primary unit of work is an AI agent with intent, tools, memory, and a feedback loop - not a human-authored codebase The engineer’s role may shift from building and maintaining systems line by line to curating, governing, and evolving fleets of agents Glue code, routine maintenance, first-pass incident triage, and migration work are plausible candidates for automation; deciding what a system is for and holding architectural intent across time probably are not Managing an agent fleet might resemble logistics fleet management: define intent, set constraints, design feedback loops, curate the roster, and own the outcomes This is a speculative post, not a description of how anything works today - pinning down a hypothesis to revisit when it turns out to be wrong This is a “thinking out loud” post, not a report from the front lines. I have no evidence any of this is happening at scale, and it is not how my current day job looks. These are just ideas I keep turning over, and I wanted to write them down to see if they hold together. ...

April 23, 2026 · 13 min · James M
AI generated image

ChatGPT Images 2.0: Why Everyone Is Impressed

TL;DR ChatGPT Images 2.0 introduces a thinking mode that reasons through complex prompts before generating, dramatically improving instruction-following for multi-part requests Text rendering is finally reliable - legible across English, Japanese, Korean, Chinese, Hindi, and Bengali - unlocking infographics, menus, and slides as genuine use cases Web search during generation means Images 2.0 can pull accurate, current data into visual outputs rather than fabricating plausible-looking information Batch generation produces up to eight images from one prompt with consistent characters and style across all of them, solving a long-standing problem for narrative and sequential content The overall shift is from toy to tool: outputs are more predictable, less stylistically over-processed, and viable for production work rather than just prototyping A year ago, OpenAI’s image generation went viral for Studio Ghibli portraits. That was GPT Image 1 - impressive, playful, and fundamentally still a party trick. ChatGPT Images 2.0, released on April 22nd 2026, is a different thing entirely. It’s the version that starts to look genuinely useful. ...

April 23, 2026 · 6 min · James M
AI Law and Regulation

AI Law Is No Longer Theoretical: What's Here, What's Coming, and What It Means

TL;DR The EU AI Act is now in force with full enforcement of high-risk AI requirements from August 2026, carrying fines of up to 7% of global turnover - this is no longer a distant deadline Over fifty copyright lawsuits against AI developers are working through US courts, and the EU Copyright Directive puts the burden of verifying training data rights on the AI developer, not the rights holder Courts in multiple jurisdictions are consistently finding that deploying AI does not transfer liability to the vendor - “the AI did it” is not a defence that holds up The US has no comprehensive federal AI law; instead, businesses must navigate a patchwork of state statutes (California, Colorado, New York, Texas) alongside existing federal agency enforcement from the FTC, CFPB, and FDA The “move fast and figure out the legal stuff later” era is over - enough of the legal framework has arrived that the gaps are no longer a safe place to operate For the past few years, AI law has been one of those topics that felt perpetually five minutes away. Governments would announce frameworks. Committees would publish white papers. Experts would debate what the rules should eventually look like. ...

April 22, 2026 · 9 min · James M
Home AI Agent Memory That Lasts Banner

Giving Your Home AI Agent Memory That Lasts

TL;DR Problem: a home agent with tools but no memory is a very well-read goldfish. Every morning it re-meets you. Answer: split memory into three layers - working, episodic, and semantic - and give each layer its own store and its own rules for what gets written. Where it lives: SQLite for episodic and facts, a local vector store for semantic search, and a tiny policy file that decides what is worth remembering in the first place. How it plugs in: a memory MCP server that exposes recall, remember, and forget - nothing else. Result: the agent can say “last Tuesday we tried restarting the Postgres container and it worked” and mean it. It also knows what not to store. The Goldfish Problem The home agent I built over the last few weeks can do real things now. It can read my mail, move files around my workspace, turn lights off, and check my calendar. What it could not do, until this week, was remember any of it. ...

April 22, 2026 · 9 min · James M
AI Tooling Learning Path Banner

An AI Tooling Learning Path: Logical Phases for 2026

TL;DR The order you learn AI tools matters as much as which tools you learn - most people start with terminal agents or editors before they understand how models actually fail The seven-phase path runs: fundamentals, chat interfaces, AI-native editors, terminal agents, local models, orchestration, and review and evaluation Terminal agents (Claude Code, Cline, Aider) represent the biggest mindset shift - you move from driving with suggestions to specifying and letting the model execute Local models via Ollama belong in phase five, once you have felt the pain of API costs and know which tasks actually need frontier capability Review, evaluation, and capture (phase seven) is the phase most developers skip - and the one that separates AI-curious from AI-competent The hardest part of learning AI tooling in 2026 is not any single tool. It is the order you meet them in. ...

April 21, 2026 · 10 min · James M
Amazon Banner

Amazon Doubles Down: The $25 Billion Anthropic Bet

TL;DR Amazon announced up to $25 billion in additional investment in Anthropic on April 20, 2026, bringing total committed capital past $33 billion In return, Anthropic committed to spending over $100 billion on AWS over the next decade - effectively a closed loop where Amazon’s capital funds Anthropic’s compute bill The deal gives Amazon a flagship AI workload to prove out its Trainium custom silicon against Nvidia, while countering Microsoft’s OpenAI advantage on Azure For developers building with Claude, expect more capacity, more aggressive pricing on Bedrock, and deeper AWS service integration as the compute comes online The arrangement signals that frontier AI has fully consolidated into a small number of hyperscaler-aligned labs - the era of independent AI startups is effectively over On April 20, 2026, Amazon announced it would invest up to an additional $25 billion in Anthropic, stacking on top of the $8 billion it has already poured into the AI startup over recent years. In return, Anthropic committed to spending more than $100 billion on Amazon Web Services over the next ten years. ...

April 21, 2026 · 6 min · James M
Hermes Agent Banner

Hermes Agent: Persistent Autonomy That Learns and Grows

TL;DR Hermes Agent by Nous Research is an open-source persistent autonomous system that builds memory across conversations, auto-generates reusable skills from repeated tasks, and compounds in capability over time Unlike stateless agents, Hermes accumulates project context - learning codebase quirks, team conventions, and recurring workflows so it stops asking questions it has already answered It works across Telegram, Discord, Slack, WhatsApp, Signal, Email, and CLI - meeting teams on the platforms they already use rather than requiring a dedicated app Running cost is roughly $20 to $60 per month for a solo developer (a $5-$10 VPS plus LLM API calls); it is MIT licensed with no seat fees or vendor lock-in The honest trade-off: Hermes beats alternatives on persistence and learning depth, but raises open questions about memory scaling, skill auditing, and what happens when an agent learns something wrong Most AI agents are forgettable. You ask them to do something, they do it, you close the window. The next time you need help, they start from zero - no context, no learning, no continuity. Hermes Agent works differently. Nous Research built it as a persistent system that remembers what it learns and gets measurably more capable the longer it runs. ...

April 20, 2026 · 9 min · James M
Speech To Text Banner

MacWhisper vs Wispr Flow vs Superwhisper: The 2026 Dictation Stack Compared

TL;DR MacWhisper is a file transcription tool (audio in, text out) that runs entirely on-device - the right pick for journalists, researchers, and anyone transcribing recordings Wispr Flow is the easiest system-wide dictation option, with AI-powered prose cleanup and cross-platform sync, but it sends audio to the cloud with no on-device option Superwhisper matches Wispr Flow’s system-wide dictation but processes audio locally, with bring-your-own-key LLM cleanup and deep customisation for power users The core decision is simple: if your audio can leave your machine, use Wispr Flow; if it must stay local, use Superwhisper; if you just need transcription, use MacWhisper The real product differentiation is no longer the underlying Whisper model - it is hotkey ergonomics, auto-edit prompts, and workflow integration Voice input on the Mac used to mean fighting with the built-in Dictation feature or paying Nuance a small fortune. In 2026, the landscape looks completely different. A handful of indie and venture-backed apps have turned Whisper-class models into genuinely fast, accurate tools that sit quietly in your menu bar until you hold a hotkey. ...

April 20, 2026 · 7 min · James M
AI Cloud Subscriptions Icon

AI Cloud Subscriptions: Comparing Pricing and Features in 2026

AI cloud subscriptions have fragmented into a crowded market. Frontier-lab APIs compete with open-weights challengers, consumer chat plans compete with agent platforms, and every provider is reshuffling model tiers every few months. This guide organizes the 2026 landscape so you can pick a plan without reading six pricing pages. For background on how these costs behave over time, see Token Economics: Why Costs Aren’t Going Down and Local vs Cloud AI in 2026. ...

April 19, 2026 · 8 min · James M