Ai | jamesm.blog

AI Skills: One Folder, Any Model

TL;DR A Claude Code skill is just a folder with a SKILL.md file - YAML frontmatter plus natural-language instructions - and the same folder works across Cursor, Gemini CLI, Codex, and a dozen other tools The format is model-agnostic because it contains no provider-specific syntax; any instruction-following model can read it, and any harness that loads markdown can execute it Progressive disclosure keeps large skill libraries cheap: only names and descriptions load at session start, with full instructions loading only when a skill is activated The portability is practically valuable - version-controlled runbooks that survive tool switches, model upgrades, and team growth without being rewritten Core skills are genuinely portable; advanced frontmatter extensions (like allowed-tools or context: fork) are tool-specific and may need tuning across harnesses Most of the tooling I have written about over the last year has been provider-specific. A particular model, a particular harness, a particular set of features. The thing I find interesting about agent skills is that they are not. ...

Suno in May 2026: where the platform actually is

TL;DR - Suno v5.5 (March 2026) is the most expressive model yet, and three personalisation features finally make the platform usable as a real workflow: Voices (clone your own verified singing voice), Custom Models (fine-tune v5.5 on your own catalogue), and My Taste (lightweight preference learning for everyone). The Warner Music deal is now visible in the product - older models are being deprecated, free accounts have lost commercial download rights, and the ownership language has softened from “you own this” to “you have commercial rights.” Best used for demos, stem libraries, and personal sound signatures; still risky for releases that need clean copyright provenance. ...

My AI-Augmented Design Workflow: A 10-Minute Loop From Discussion to Documented Decision

TL;DR A combination of Cursor in the IDE, Claude Code and Codex in the terminal, and GitHub Spec Kit as the living contract has collapsed the discuss-design-document loop from days to under ten minutes Every meeting is transcribed and checked into GitHub alongside the design corpus, giving AI agents access to the full historical record - not just curated decisions but the debates that shaped them Model selection matters: cheaper, faster models for throwaway sketches and small refactors; expensive models (Opus) for large cross-repo work where the cost of a wrong answer is high The real transformation is cognitive flow - removing friction between thinking and recording means decisions get made and captured while the problem is still fresh, with almost no context switching AI is now suggesting improvements faster than the author can implement them; the next bottleneck is compaction, not generation - asking the model to reduce documents to their load-bearing claims rather than produce more content Since making a combination of Cursor in the IDE and Claude Code and Codex in the terminal the centre of my working day - with ChatGPT for general questions and GitHub Spec Kit holding the design contract - the way I move from a question on Slack to a documented design decision has changed beyond recognition. ...

When to Fine-Tune vs When to RAG: Choosing Your AI Architecture

TL;DR The default choice for most teams should be RAG - it is reversible in days, whereas a bad fine-tuning decision is an expensive sunk cost that requires retraining to fix RAG fails when the question requires reasoning across an entire knowledge domain rather than extracting a specific answer from a passage; fine-tuning handles that case better Fine-tuning fails silently when underlying facts change - it produces confidently wrong, stale answers with no warning; RAG automatically picks up changes at query time A practical decision framework: use RAG for volatile facts and cited answers, use fine-tuning for stable style, voice, and cross-domain reasoning The best production systems use both: a fine-tuned base model for stable domain knowledge, augmented with retrieval for current and specific information The question I get asked most often by engineers starting to build with language models is some variation of: “should we fine-tune or should we do RAG?” It is almost always the wrong question, but it is the wrong question in an instructive way. The reason it gets asked so much is that the choice feels architectural, and architectural choices feel like the kind of thing you commit to once and live with. In practice, the choice is closer to “should I use a database or a cache” - the answer is usually some of both, applied to different problems, and the ratio shifts as the system matures. ...

The Free Intelligence Era: What Breaks When Thinking Costs Nothing

TL;DR The marginal cost of AI intelligence is halving roughly every two months and heading toward a level where rationing stops making sense - similar to how bandwidth and storage became effectively unconstrained This will break pricing models built on scarce cognition: anything billed per word, per hour, or per consult faces a hard ceiling set by what machines charge for the same work The Jevons paradox means total cognitive work in the economy likely goes up, not down - cheaper thinking means we apply thinking to far more problems, not the same problems more cheaply Three categories of human work survive: accountability (being the named responsible party), taste (choosing well from infinite AI-generated options), and real-world coupling (a body in a place, a relationship that took years to build) The political question of who captures the surplus and who absorbs the transition cost is still open - it will be decided by institutions and policy, not by the technology itself This is a personal reflection, not a forecast dressed up as one. I am writing about a trend I think is real, but the second-order consequences are guesses, and I am sure some of them are wrong. ...

AI Hallucinations: Understanding and Mitigating False Outputs

TL;DR AI hallucinations are not perceptual errors - they are confident pattern completions that happen to be unanchored in the world, and no model will ever stop producing them entirely because truth is not what the training objective optimises for Hallucinations cluster into five distinct types: factual, citation, code and API, instruction (claiming to have done something it did not), and reasoning - each with a different root cause and a different mitigation The mitigations that genuinely move the dial are structural: retrieval-augmented generation, tool use over recall, constrained structured outputs, explicit verification layers, and lower temperature for factual tasks The model is not the product; the model surrounded by retrieval, verification, structured outputs, calibration, and human-in-the-loop review is the product Hallucination is not the bug - the absence of a system around the model is the bug, and treating it as an engineering problem rather than a model problem is what separates demos from production The word “hallucination” is one of the most successful pieces of accidental marketing in our industry. It is a soft, almost endearing way to describe an LLM stating with full confidence that a function exists when it does not, that a court case was decided when it was not, that a paper was written by an author who has never published in that field. It makes the failure sound like a quirk rather than the central reliability problem of the entire technology. ...

How to Phone Your Home AI Agent Running on a Mac Studio

TL;DR Goal: Call a real phone number and have a proper back-and-forth with my Mac Studio agent while walking the dog. Hardware: Mac Studio (M2 Ultra, 128 GB) running a local model via Ollama or MLX. Voice pipeline: Twilio SIP in, LiveKit Agents orchestrating STT / LLM / TTS, Whisper for transcription, Piper or ElevenLabs for speech. Brain: A local 30B-class model for chat plus tool calls, with Claude API as a fallback for the harder reasoning. Reach: Tailscale between the Mac and a tiny VPS so I never punch a hole in my home router. Outcome: I can ring a UK landline number, ask “what’s failing on the CI pipeline?” and get a spoken answer in ~2 seconds. Why bother phoning your own agent? Typing is great at a desk. Outside the desk, it’s hopeless. I wanted the simplest possible interface to the box sat under my desk at home - dial a number, talk, hang up. No app, no login, no VPN dance on my phone. ...

Giving Your Home AI Agent Real Tools: MCP Servers on a Mac Studio

TL;DR Problem: a local agent that can only chat is a toy. The value is in what it can do. Answer: Model Context Protocol servers, running locally on the Mac Studio, expose filesystem, calendar, mail, notes, and a handful of custom tools. Runtime: one supervisord config, a small router, and per-server allowlists so nothing escapes its box. Security posture: no tool runs without a policy, secrets live in the macOS Keychain, and every call is logged to a local SQLite file I can grep at 11pm. Result: I can phone the agent (see How to Phone Your Home AI Agent), ask “move the CI failure email to triage and put a 15 minute hold on my calendar at 4”, and it actually does it. Why MCP and Not “Just Functions” Before MCP I had a directory of half-finished Python shims. Each one spoke a slightly different dialect: one took JSON arguments, one took positional args, one returned markdown and one returned a dict. Adding a new tool meant editing the agent prompt, the router, and the caller. ...

The Year 3026: Thinking Seriously About a Thousand Years From Now

TL;DR Over a thousand years, the substrate of civilisation changes beyond recognition, but the human core - love, grief, storytelling, the search for meaning - almost certainly does not Computation and energy will have hit their physical cost floors by 3026; intelligence is ambient, woven into the environment so thoroughly that “using AI” becomes as meaningless a phrase as “using oxygen” The built environment is almost certainly at solar-system scale - with the Earth a protected biosphere and heavy industry, compute, and energy capture distributed across the inner solar system No company, currency, or nation founded in 2026 is likely to survive in any meaningful continuity; the middle layer of institutions gets hollowed out, leaving fewer but far longer-lived structures The decisions being made right now - on AI safety, climate, and coordination - have genuinely astronomical consequences, because they determine whether there is a 3026 worth having at all Most writing about the future of AI stops at ten years. A few brave pieces stretch to fifty. I wrote one of the ten-year ones myself in The Next Decade of AI, and the honest reason the horizon stays short is that the uncertainty gets unmanageable much past that. Forecasting even the shape of the economy in 2040 is already mostly vibes. ...

The Year 2126: What the Next Hundred Years Actually Looks Like

TL;DR By 2126, clean energy, most infectious disease, and routine cognitive work are almost certainly solved - the AI transition will look as obvious in hindsight as the car replacing the horse Climate is the hardest unsolved problem: the outcome depends on decisions made in the next thirty years, and 2126 inherits either a managed problem or a civilisation in partial retreat The demographic inversion is one of the most structurally important facts - global population peaks around 2060-2080 then declines, leaving a world where a hundred-year-old is ordinary and a child is rare and socially valued Human work shifts toward human-presence roles, stewardship of powerful systems, physical craft, meaning-making, and accountability - the categories that cannot be automated The decade we are in now is one that 2126 will study closely; the decisions made about AI safety, climate, and institutional reform are visibly reflected in the outcome a century later A hundred years is a useful distance. Long enough that the current news cycle is ancient history, short enough that some people alive in 2126 will have living memory of people who were alive in 2026. The children being born this week have a non-trivial chance of being interviewed, in their late nineties, about what the early AI era was actually like. That matters. It makes the 100-year horizon a question about the world people we know will inherit, not an abstract one. ...