In-depth exploration of AI in practice: building and deploying AI agents that work, designing developer workflows around Claude and other LLMs, critical analysis of AI safety and reliability, and the real shifts happening in careers, skills, and how we work. This section mixes tactical guides (how to actually build with AI), strategic analysis (what’s hype vs. what matters), and deeper dives into the tools and systems reshaping software development and knowledge work.

When to Fine-Tune vs When to RAG Banner

When to Fine-Tune vs When to RAG: Choosing Your AI Architecture

TL;DR The default choice for most teams should be RAG - it is reversible in days, whereas a bad fine-tuning decision is an expensive sunk cost that requires retraining to fix RAG fails when the question requires reasoning across an entire knowledge domain rather than extracting a specific answer from a passage; fine-tuning handles that case better Fine-tuning fails silently when underlying facts change - it produces confidently wrong, stale answers with no warning; RAG automatically picks up changes at query time A practical decision framework: use RAG for volatile facts and cited answers, use fine-tuning for stable style, voice, and cross-domain reasoning The best production systems use both: a fine-tuned base model for stable domain knowledge, augmented with retrieval for current and specific information The question I get asked most often by engineers starting to build with language models is some variation of: “should we fine-tune or should we do RAG?” It is almost always the wrong question, but it is the wrong question in an instructive way. The reason it gets asked so much is that the choice feels architectural, and architectural choices feel like the kind of thing you commit to once and live with. In practice, the choice is closer to “should I use a database or a cache” - the answer is usually some of both, applied to different problems, and the ratio shifts as the system matures. ...

April 29, 2026 · 11 min · James M
The Free Intelligence Era Banner

The Free Intelligence Era: What Breaks When Thinking Costs Nothing

TL;DR The marginal cost of AI intelligence is halving roughly every two months and heading toward a level where rationing stops making sense - similar to how bandwidth and storage became effectively unconstrained This will break pricing models built on scarce cognition: anything billed per word, per hour, or per consult faces a hard ceiling set by what machines charge for the same work The Jevons paradox means total cognitive work in the economy likely goes up, not down - cheaper thinking means we apply thinking to far more problems, not the same problems more cheaply Three categories of human work survive: accountability (being the named responsible party), taste (choosing well from infinite AI-generated options), and real-world coupling (a body in a place, a relationship that took years to build) The political question of who captures the surplus and who absorbs the transition cost is still open - it will be decided by institutions and policy, not by the technology itself This is a personal reflection, not a forecast dressed up as one. I am writing about a trend I think is real, but the second-order consequences are guesses, and I am sure some of them are wrong. ...

April 28, 2026 · 14 min · James M
Junior Developer Pipeline Problem Banner

The Junior Developer Pipeline Problem: Where Do Tomorrow's Seniors Come From?

TL;DR The work AI now automates - boring tickets, bug hunts, boilerplate - was the unspoken apprenticeship that turned juniors into seniors The skills that work built (pattern recognition, systems intuition, taste, calibration) are built by doing, not by reading - and that doing is now cheapest to delegate The new apprenticeship shifts toward reading over writing, debugging agent output, earlier architectural decisions, and deliberate practice of things agents do badly There is a coordination problem: individual organisations rationally skip junior investment in the short term, but the senior pipeline thins industry-wide a few years later If you are starting out today, optimise for proximity to a great senior engineer above salary, title, or any other variable The views in this post are my own personal reflections on the industry as a whole, written in my own time. They are not about any specific employer, team, or colleague, past or present. ...

April 28, 2026 · 11 min · James M
AI Hallucinations Understanding and Mitigating False Outputs Banner

AI Hallucinations: Understanding and Mitigating False Outputs

TL;DR AI hallucinations are not perceptual errors - they are confident pattern completions that happen to be unanchored in the world, and no model will ever stop producing them entirely because truth is not what the training objective optimises for Hallucinations cluster into five distinct types: factual, citation, code and API, instruction (claiming to have done something it did not), and reasoning - each with a different root cause and a different mitigation The mitigations that genuinely move the dial are structural: retrieval-augmented generation, tool use over recall, constrained structured outputs, explicit verification layers, and lower temperature for factual tasks The model is not the product; the model surrounded by retrieval, verification, structured outputs, calibration, and human-in-the-loop review is the product Hallucination is not the bug - the absence of a system around the model is the bug, and treating it as an engineering problem rather than a model problem is what separates demos from production The word “hallucination” is one of the most successful pieces of accidental marketing in our industry. It is a soft, almost endearing way to describe an LLM stating with full confidence that a function exists when it does not, that a court case was decided when it was not, that a paper was written by an author who has never published in that field. It makes the failure sound like a quirk rather than the central reliability problem of the entire technology. ...

April 28, 2026 · 13 min · James M
GPT-5.5 release illustration

GPT-5.5 Is Here: Real Step Forward or Quiet Iteration?

TL;DR GPT-5.5 (“Spud”) is the first fully retrained base model since GPT-4.5, with architecture and pretraining reworked from scratch with agentic objectives in mind It takes the top spot on Terminal-Bench 2.0 (82.7%) and GDPval (84.9%), narrowly beating Anthropic’s Claude Mythos Preview on agentic coding benchmarks A 1M-token context window is new for OpenAI, enabling whole-codebase reasoning and long multi-step agent runs without context collapse Pricing is competitive ($5/$30 per million input/output tokens) but the strategic story is about OpenAI building an integrated super app - chat, code, browser agent - all driven by one model The gains are incremental, not a leap - but the full retraining signals OpenAI is betting the next two years on autonomous agentic work, not chat OpenAI released GPT-5.5 on April 23, 2026, weeks after GPT-5.4 and only months after GPT-5. The cadence is starting to feel relentless. Codenamed “Spud” internally, GPT-5.5 is the first fully retrained base model since GPT-4.5 - architecture, pretraining corpus, and agent-oriented objectives all reworked from scratch. ...

April 24, 2026 · 6 min · James M
Abstract illustration of a person sitting with a tool laid down beside them

The Meaning of Work in an Age of Abundance: Finding Purpose When Agents Do the Heavy Lifting

TL;DR Modern knowledge work has quietly built identity on producing things - and AI pressure makes that fragility visible without you having to lose your job to feel it History (Keynes’ 1930 prediction) suggests freed-up capacity defaults to “more work”, not leisure - the shift to meaningful work has to be chosen deliberately What stays valuable when execution gets cheap: deciding what is worth doing, taking responsibility, sitting with other humans, craft for its own sake, and growing other people The “everyone will do deeper work” narrative ignores the dignity problem - for many people, work is structure and belonging, not just a vehicle for meaning Put your meaning somewhere that does not depend on being the cheapest producer of an artefact - it was never a secure place to put it, and agents are just making that clearer This is another “thinking out loud” post, in the same spirit as the agent-first architecture piece. I do not know how any of this is going to land. I am writing it partly because the question has been rattling around in my head for months, and partly because I suspect a lot of people in and around software are quietly wondering the same thing without quite wanting to say it out loud. ...

April 23, 2026 · 13 min · James M
Agent-First Architecture Banner

Agent-First Architecture: The Engineer as System Curator

TL;DR Agent-first architecture imagines a future where the primary unit of work is an AI agent with intent, tools, memory, and a feedback loop - not a human-authored codebase The engineer’s role may shift from building and maintaining systems line by line to curating, governing, and evolving fleets of agents Glue code, routine maintenance, first-pass incident triage, and migration work are plausible candidates for automation; deciding what a system is for and holding architectural intent across time probably are not Managing an agent fleet might resemble logistics fleet management: define intent, set constraints, design feedback loops, curate the roster, and own the outcomes This is a speculative post, not a description of how anything works today - the author is pinning down a hypothesis to revisit when it turns out to be wrong This is a “thinking out loud” post, not a report from the front lines. I have no evidence any of this is happening at scale, and it is not how my current day job looks. These are just ideas I keep turning over, and I wanted to write them down to see if they hold together. ...

April 23, 2026 · 13 min · James M
AI subscription pricing illustration

Is the $20 AI Subscription Era Over?

TL;DR The $20/month subscription tier is not disappearing, but what you get for it is quietly shrinking - agent features are being capped or metered while the price holds The Claude Code episode (briefly paywalled for Pro users) was a deliberate A/B test, not a glitch - a signal that Anthropic is steering heavy users toward the Max tier at $100 - $200/month Agent workflows like Claude Code consume 50 - 500x more tokens than a chat session, making flat all-you-can-eat pricing economically unsustainable for power users Most major providers (Anthropic, OpenAI, Google, Cursor) are projected to raise consumer tiers by $5 - $10 by end of 2026, with sharper increases at the enterprise level If you are a chat-only user the $20 plan remains a good deal; if you are running agents daily, budget for a higher tier or pay-as-you-go API access instead For the last three years, $20 a month has been the magic number. Claude Pro, ChatGPT Plus, Gemini Advanced, Copilot Pro, Cursor Pro - all twenty dollars, all clearly priced to anchor against Netflix rather than against enterprise software. That anchor is cracking. The labs are burning cash on inference for power users, the frontier models cost more per token than they did a year ago, and agent tools like Claude Code and Codex are consuming ten to a hundred times the compute a chat session does. Something has to give. ...

April 23, 2026 · 10 min · James M
Meta employee tracking banner

Meta Is Tracking Its Own Employees to Train AI Agents

TL;DR Meta’s Model Capability Initiative installs software on US employee laptops that captures keystrokes, mouse movements, and screenshots to train AI agents - there is no opt-out The program is US-only because EU and UK employees are protected by GDPR; the scope of the tracking maps directly onto the absence of legal protection Meta CTO Andrew Bosworth openly framed the end state: agents do the work, humans direct and review - the surveillance and the automation plan are the same story The irony is deliberate: Meta’s defence of the program - narrow purpose, safeguards, not used against the person - echoes its long-standing defences of consumer data collection This is a signal about where the agent-training bottleneck actually sits: not reasoning or context windows, but the long tail of real software interactions that only real employees can provide Meta has started installing tracking software on the work laptops of its US-based employees. It captures keystrokes, mouse movements, clicks, and occasional screenshots. The captured activity is fed back into training data for AI agents. There is no opt-out. The program was disclosed to staff in an internal memo in April 2026, and the response from inside the company has been about what you would expect. ...

April 23, 2026 · 8 min · James M
AI generated image

ChatGPT Images 2.0: Why Everyone Is Impressed

TL;DR ChatGPT Images 2.0 introduces a thinking mode that reasons through complex prompts before generating, dramatically improving instruction-following for multi-part requests Text rendering is finally reliable - legible across English, Japanese, Korean, Chinese, Hindi, and Bengali - unlocking infographics, menus, and slides as genuine use cases Web search during generation means Images 2.0 can pull accurate, current data into visual outputs rather than fabricating plausible-looking information Batch generation produces up to eight images from one prompt with consistent characters and style across all of them, solving a long-standing problem for narrative and sequential content The overall shift is from toy to tool: outputs are more predictable, less stylistically over-processed, and viable for production work rather than just prototyping A year ago, OpenAI’s image generation went viral for Studio Ghibli portraits. That was GPT Image 1 - impressive, playful, and fundamentally still a party trick. ChatGPT Images 2.0, released on April 22nd 2026, is a different thing entirely. It’s the version that starts to look genuinely useful. ...

April 23, 2026 · 6 min · James M