Artificial Intelligence

In-depth exploration of AI in practice: building and deploying AI agents that work, designing developer workflows around Claude and other LLMs, critical analysis of AI safety and reliability, and the real shifts happening in careers, skills, and how we work. This section mixes tactical guides (how to actually build with AI), strategic analysis (what’s hype vs. what matters), and deeper dives into the tools and systems reshaping software development and knowledge work.

How to Phone Your Home AI Agent Running on a Mac Studio

TL;DR Goal: Call a real phone number and have a proper back-and-forth with my Mac Studio agent while walking the dog. Hardware: Mac Studio (M2 Ultra, 128 GB) running a local model via Ollama or MLX. Voice pipeline: Twilio SIP in, LiveKit Agents orchestrating STT / LLM / TTS, Whisper for transcription, Piper or ElevenLabs for speech. Brain: A local 30B-class model for chat plus tool calls, with Claude API as a fallback for the harder reasoning. Reach: Tailscale between the Mac and a tiny VPS so I never punch a hole in my home router. Outcome: I can ring a UK landline number, ask “what’s failing on the CI pipeline?” and get a spoken answer in ~2 seconds. Why bother phoning your own agent? Typing is great at a desk. Outside the desk, it’s hopeless. I wanted the simplest possible interface to the box sat under my desk at home - dial a number, talk, hang up. No app, no login, no VPN dance on my phone. ...

Giving Your Home AI Agent Real Tools: MCP Servers on a Mac Studio

TL;DR Problem: a local agent that can only chat is a toy. The value is in what it can do. Answer: Model Context Protocol servers, running locally on the Mac Studio, expose filesystem, calendar, mail, notes, and a handful of custom tools. Runtime: one supervisord config, a small router, and per-server allowlists so nothing escapes its box. Security posture: no tool runs without a policy, secrets live in the macOS Keychain, and every call is logged to a local SQLite file I can grep at 11pm. Result: I can phone the agent (see How to Phone Your Home AI Agent), ask “move the CI failure email to triage and put a 15 minute hold on my calendar at 4”, and it actually does it. Why MCP and Not “Just Functions” Before MCP I had a directory of half-finished Python shims. Each one spoke a slightly different dialect: one took JSON arguments, one took positional args, one returned markdown and one returned a dict. Adding a new tool meant editing the agent prompt, the router, and the caller. ...

The Year 3026: Thinking Seriously About a Thousand Years From Now

TL;DR Over a thousand years, the substrate of civilisation changes beyond recognition, but the human core - love, grief, storytelling, the search for meaning - almost certainly does not Computation and energy will have hit their physical cost floors by 3026; intelligence is ambient, woven into the environment so thoroughly that “using AI” becomes as meaningless a phrase as “using oxygen” The built environment is almost certainly at solar-system scale - with the Earth a protected biosphere and heavy industry, compute, and energy capture distributed across the inner solar system No company, currency, or nation founded in 2026 is likely to survive in any meaningful continuity; the middle layer of institutions gets hollowed out, leaving fewer but far longer-lived structures The decisions being made right now - on AI safety, climate, and coordination - have genuinely astronomical consequences, because they determine whether there is a 3026 worth having at all Most writing about the future of AI stops at ten years. A few brave pieces stretch to fifty. I wrote one of the ten-year ones myself in The Next Decade of AI, and the honest reason the horizon stays short is that the uncertainty gets unmanageable much past that. Forecasting even the shape of the economy in 2040 is already mostly vibes. ...

The Year 2126: What the Next Hundred Years Actually Looks Like

TL;DR By 2126, clean energy, most infectious disease, and routine cognitive work are almost certainly solved - the AI transition will look as obvious in hindsight as the car replacing the horse Climate is the hardest unsolved problem: the outcome depends on decisions made in the next thirty years, and 2126 inherits either a managed problem or a civilisation in partial retreat The demographic inversion is one of the most structurally important facts - global population peaks around 2060-2080 then declines, leaving a world where a hundred-year-old is ordinary and a child is rare and socially valued Human work shifts toward human-presence roles, stewardship of powerful systems, physical craft, meaning-making, and accountability - the categories that cannot be automated The decade we are in now is one that 2126 will study closely; the decisions made about AI safety, climate, and institutional reform are visibly reflected in the outcome a century later A hundred years is a useful distance. Long enough that the current news cycle is ancient history, short enough that some people alive in 2126 will have living memory of people who were alive in 2026. The children being born this week have a non-trivial chance of being interviewed, in their late nineties, about what the early AI era was actually like. That matters. It makes the 100-year horizon a question about the world people we know will inherit, not an abstract one. ...

Reading the Signals: Which of the Four Futures Is Actually Emerging?

TL;DR Scoring four future scenarios against real-world signals: winner-take-most has the clearest corporate and capital logic behind it as of April 2026, driven by vertical integration across chips, data centres, models, and distribution Broad abundance gets partial credit - inference costs have fallen two orders of magnitude and open-weight models are competitive, but institutional-level gains in healthcare and education haven’t materialized Techno-feudalism is quietly accumulating through agentic platform lock-in (Claude Code, Cursor, Devin) and payment rail consolidation, with competition enforcement as the main counterweight Managed transition is the weakest scenario - UBI pilots haven’t scaled nationally, compute taxation remains a proposal, and institutional response cycles are mismatched with AI deployment speed The three signals that will determine where this goes: whether the open-weight frontier gap widens or closes, whether agentic memory becomes portable or platform-owned, and whether any serious economy moves past pilot-scale on redistribution I recently mapped four plausible futures for the machine-speed economy and listed the signals to watch for each. The obvious next question is the one I deliberately held back from answering: which signals are actually firing right now, and what does the mix say about where we’re heading? ...

The Next Decade of AI: What Actually Happens From Here

TL;DR AI will not arrive as a single dramatic event - it will be a slow, uneven embedding of intelligence into ordinary software until it becomes invisible infrastructure, like electricity The agent layer will eat the interface: for a growing share of tasks, humans will give high-level intent to an agent that drives other software on their behalf, making the SaaS dashboard model look dated The scarce resource shifts from generating answers to judging which answer is right - hiring, education, and professional identity will all restructure around this AI splits into two permanent species: powerful, expensive frontier models in the cloud, and fast, private, cheap local models - with hybrid architectures winning in practice Reliability, not capability, becomes the dominant engineering problem as agents move from co-pilots to operators; the field must invent new testing and monitoring disciplines for non-deterministic systems Most predictions about the future of AI fall into two flavours. One camp says we are months away from machines that can do everything a human can do, and we should brace for either paradise or extinction. The other camp says the whole thing is a bubble, the models have plateaued, and in five years we will be talking about something else. ...

Grok's New Voice APIs: Speech Recognition and Synthesis at Enterprise Scale

TL;DR xAI launched standalone Speech-to-Text (STT) and Text-to-Speech (TTS) APIs built on the same stack powering Grok Voice, Tesla in-vehicle assistants, and Starlink customer support Grok’s STT is among the cheapest at $0.10/hour (batch) and $0.20/hour (streaming), with features like speaker diarization, word-level timestamps, and Inverse Text Normalization The TTS offering ships with five expressive voices, inline expression control tags ([laugh], [sigh], whisper), and covers 20 languages - priced at $4.20 per million characters xAI’s pitch is vendor consolidation: replacing three separate contracts (transcription, LLM, synthesis) with one stack on one billing account The best fit is teams already building on Grok for reasoning - for lowest-latency TTS, ElevenLabs Flash v2.5 at ~75ms is still unmatched xAI has released two standalone voice APIs - Speech-to-Text (STT) and Text-to-Speech (TTS) - built on the same stack powering Grok Voice, Tesla in-vehicle assistants, and Starlink customer support. The move puts xAI in direct competition with ElevenLabs, Deepgram, and AssemblyAI, three companies that have owned the enterprise voice API market for years. ...

GPT-5.5 Is Here: Real Step Forward or Quiet Iteration?

TL;DR GPT-5.5 (“Spud”) is the first fully retrained base model since GPT-4.5, with architecture and pretraining reworked from scratch with agentic objectives in mind It takes the top spot on Terminal-Bench 2.0 (82.7%) and GDPval (84.9%), narrowly beating Anthropic’s Claude Mythos Preview on agentic coding benchmarks A 1M-token context window is new for OpenAI, enabling whole-codebase reasoning and long multi-step agent runs without context collapse Pricing is competitive ($5/$30 per million input/output tokens) but the strategic story is about OpenAI building an integrated super app - chat, code, browser agent - all driven by one model The gains are incremental, not a leap - but the full retraining signals OpenAI is betting the next two years on autonomous agentic work, not chat OpenAI released GPT-5.5 on April 23, 2026, weeks after GPT-5.4 and only months after GPT-5. The cadence is starting to feel relentless. Codenamed “Spud” internally, GPT-5.5 is the first fully retrained base model since GPT-4.5 - architecture, pretraining corpus, and agent-oriented objectives all reworked from scratch. ...

Agent-First Architecture: The Engineer as System Curator

TL;DR Agent-first architecture imagines a future where the primary unit of work is an AI agent with intent, tools, memory, and a feedback loop - not a human-authored codebase The engineer’s role may shift from building and maintaining systems line by line to curating, governing, and evolving fleets of agents Glue code, routine maintenance, first-pass incident triage, and migration work are plausible candidates for automation; deciding what a system is for and holding architectural intent across time probably are not Managing an agent fleet might resemble logistics fleet management: define intent, set constraints, design feedback loops, curate the roster, and own the outcomes This is a speculative post, not a description of how anything works today - the author is pinning down a hypothesis to revisit when it turns out to be wrong This is a “thinking out loud” post, not a report from the front lines. I have no evidence any of this is happening at scale, and it is not how my current day job looks. These are just ideas I keep turning over, and I wanted to write them down to see if they hold together. ...

ChatGPT Images 2.0: Why Everyone Is Impressed

TL;DR ChatGPT Images 2.0 introduces a thinking mode that reasons through complex prompts before generating, dramatically improving instruction-following for multi-part requests Text rendering is finally reliable - legible across English, Japanese, Korean, Chinese, Hindi, and Bengali - unlocking infographics, menus, and slides as genuine use cases Web search during generation means Images 2.0 can pull accurate, current data into visual outputs rather than fabricating plausible-looking information Batch generation produces up to eight images from one prompt with consistent characters and style across all of them, solving a long-standing problem for narrative and sequential content The overall shift is from toy to tool: outputs are more predictable, less stylistically over-processed, and viable for production work rather than just prototyping A year ago, OpenAI’s image generation went viral for Studio Ghibli portraits. That was GPT Image 1 - impressive, playful, and fundamentally still a party trick. ChatGPT Images 2.0, released on April 22nd 2026, is a different thing entirely. It’s the version that starts to look genuinely useful. ...