Artificial Intelligence

This section is organised around one question: what has to be true before you can trust AI to do real work? Reliability, context, economics, security, evaluation, and eventually physical action - each post is a different angle on the same problem.

Start here

Trust series - research map, broken evals, agent security, world models, trajectory evaluation
What I’m Researching in AI Right Now - my live research agenda

I want to build

Home Agent Stack - Mac Studio → MCP → memory → remote access → hardening
AI Dev Tooling - stack decisions, learning path, Cursor vs Claude Code, spec-driven development

I want context

AI Economics and Hardware - token costs, local vs cloud, energy, inference hardware
Expertise and Work - credentials, judgement, roles, and 2030 speculation
The State of Open-Weight Models in 2026 - Llama, Qwen, Mistral, DeepSeek, Gemma

Resources

Link indexes and tool directories - useful for discovery, not the narrative spine:

AI Tools & Frameworks · Courses · Conferences · GitHub Projects · Explainers · Chatbots & LLMs

What Actually Belongs in My AI Dev Stack in 2026

TL;DR A single AI tool cannot handle everything - a proper AI dev stack in 2026 needs distinct layers for spec writing, fast editing, heavy agentic work, cheap model tasks, review, research, and capture Spec-driven development is the most underused part: writing requirements and acceptance criteria before generation dramatically improves AI output and reduces wasted iterations Tools like Cursor AI handle fast, in-flow editing while Claude Code or Cline are better suited to multi-file refactors and autonomous implementation from specs Letting the same model that generated code also review it is a weak loop - a separate review pass with a different model or explicitly critical prompt is essential The real shift is treating AI not as a bolt-on assistant but as part of the workflow architecture itself, with each tool assigned a clear, specific responsibility There is a big difference between using AI for development and having an actual AI development stack. ...

GitHub Spec Kit 2026 - SDD goes mainstream

GitHub Spec Kit in 2026: SDD Goes Mainstream 🚀

TL;DR GitHub Spec Kit reached v0.5.0 in 2026, evolving from a documentation toolkit into a full extensibility platform for AI-assisted development Claude Code CLI is now a native skill within Spec Kit, making spec-to-code pipelines seamless and built-in The ecosystem has exploded with dedicated tools like AWS Kiro and Tessl, while multi-agent support covers Copilot, Cursor, Gemini CLI, and more Spec-Driven Development prevents architectural drift by making the spec the single source of truth - versioned, reviewable, and respected by AI agents Getting started is now low-effort: write a spec.md, pick any AI tool, and let the spec drive implementation Six months ago, we explored how GitHub Spec Kit was beginning to reshape software development. In early 2026, that promise isn’t just materializing - it’s accelerating. The project has hit version 0.5.0, the ecosystem has exploded, and Spec-Driven Development has transitioned from “interesting idea” to actual industry standard. ...

Taste Is the New Scarcity

TL;DR When AI can generate thousands of solutions on demand, the bottleneck shifts from thinking capacity to judgment - knowing which answer is actually right Taste - the ability to recognise what is elegant, insightful, or truly worth building - becomes the primary skill rather than a secondary one layered on top of expertise Editing and curation become more valuable than creation; the ability to say “no” to a thousand options and hold out for the right one is rare and increasingly prized Experience still matters, but for a different reason - not to accumulate facts, but to develop the discernment that recognises quality when you see it In a world of abundant intelligence, wisdom - knowing not just what you can do but what you should do - becomes the most distinctly human and most valuable contribution If intelligence is becoming a commodity, then something else becomes precious. ...

Personal AI Development Stack

This guide documents a highly productive, AI-driven development stack using cloud-based LLMs, terminal tools, IDEs, and mobile access. It is designed for developers who want persistent workflows, AI-powered coding assistance, and flexible access from multiple devices. TL;DR Primary IDE: Cursor AI for daily work, Claude Code CLI for multi-file refactors. Local completions: Ollama with Qwen 2.5 Coder or Llama 3.3 to keep latency low and costs at zero. Routing: OpenRouter as a single API gateway; LiteLLM if you want fallback chains. Persistence: tmux sessions survive disconnects; Tailscale makes your MacBook reachable from an iPhone without port forwarding. Total baseline: around $20/month (Cursor only) scaling to $40-50/month plus API usage for the full stack. Architecture Overview Hardware & Connectivity An iPhone connects over Tailscale VPN to a MacBook Air. The MacBook runs tmux or zellij for session persistence, alongside Lungo or Patterned as keep-awake utilities. ...

Is the $20 AI Subscription Era Over?

TL;DR The $20/month subscription tier is not disappearing, but what you get for it is quietly shrinking - agent features are being capped or metered while the price holds The Claude Code episode (briefly paywalled for Pro users) was a deliberate A/B test, not a glitch - a signal that Anthropic is steering heavy users toward the Max tier at $100 - $200/month Agent workflows like Claude Code consume 50 - 500x more tokens than a chat session, making flat all-you-can-eat pricing economically unsustainable for power users Most major providers (Anthropic, OpenAI, Google, Cursor) are projected to raise consumer tiers by $5 - $10 by end of 2026, with sharper increases at the enterprise level If you are a chat-only user the $20 plan remains a good deal; if you are running agents daily, budget for a higher tier or pay-as-you-go API access instead For the last three years, $20 a month has been the magic number. Claude Pro, ChatGPT Plus, Gemini Advanced, Copilot Pro, Cursor Pro - all twenty dollars, all clearly priced to anchor against Netflix rather than against enterprise software. That anchor is cracking. The labs are burning cash on inference for power users, the frontier models cost more per token than they did a year ago, and agent tools like Claude Code and Codex are consuming ten to a hundred times the compute a chat session does. Something has to give. ...

Abstract illustration of a person sitting with a tool laid down beside them

The Meaning of Work in an Age of Abundance: Finding Purpose When Agents Do the Heavy Lifting

TL;DR Modern knowledge work has quietly built identity on producing things - and AI pressure makes that fragility visible without you having to lose your job to feel it History (Keynes’ 1930 prediction) suggests freed-up capacity defaults to “more work”, not leisure - the shift to meaningful work has to be chosen deliberately What stays valuable when execution gets cheap: deciding what is worth doing, taking responsibility, sitting with other humans, craft for its own sake, and growing other people The “everyone will do deeper work” narrative ignores the dignity problem - for many people, work is structure and belonging, not just a vehicle for meaning Put your meaning somewhere that does not depend on being the cheapest producer of an artefact - it was never a secure place to put it, and agents are just making that clearer This is another “thinking out loud” post, in the same spirit as the agent-first architecture piece. I do not know how any of this is going to land. I am writing it partly because the question has been rattling around in my head for months, and partly because I suspect a lot of people in and around software are quietly wondering the same thing without quite wanting to say it out loud. ...

SpaceX Buys the Right to Buy Cursor for $60 Billion

TL;DR SpaceX has signed an option to acquire Cursor (made by Anysphere) for $60 billion, or pay $10 billion for the joint work if it walks away Cursor’s valuation has risen 24x in fifteen months - from $2.5 billion in January 2025 to a $60 billion option price in April 2026 The deal sits under SpaceX rather than xAI directly, because SpaceX holds the balance sheet after the SpaceX - xAI merger valued at $1.25 trillion For xAI, buying Cursor is a faster route to developer relevance than out-marketing OpenAI’s Codex or Anthropic’s Claude Code If the acquisition closes, three of the main AI coding interfaces will sit inside three frontier labs - raising questions about model neutrality and pricing pressure on independent tools It’s rare to see an option contract make the front page, but that is what landed on 21 April 2026. SpaceX disclosed that it has signed a deal with Cursor - the AI coding tool made by Anysphere - giving it the right to buy the startup outright for $60 billion later this year, or to walk away with a $10 billion payment for the joint work the two teams are doing in the meantime. ...

Claude Code source leak - Anthropic 2000 file exposure

Claude Code Source Leak: Anthropic's 2,000-File Exposure and What It Means

TL;DR An internal debugging file was accidentally included in a public package update, exposing a compressed archive of roughly 500,000 lines of code across around 2,000 files - not a breach, but a packaging mistake The leaked material revealed unreleased features including persistent memory, an always-on autonomous background assistant, and multi-device remote access Competitors gained rare visibility into Anthropic’s development pipeline and longer-term product direction, which is the primary competitive damage The incident undermines Anthropic’s safety-first positioning, particularly because it was the second such exposure in just over a year The broader lesson for the AI industry: internal operational security is becoming as critical as defending against external threats, especially as AI tools target enterprise customers Anthropic’s Claude Code has been making waves as one of the most capable AI coding assistants available, but a significant internal leak has exposed the underlying technology behind the platform for the second time in just over a year. The incident raised fresh concerns about how the company handles sensitive internal information and operational security. ...

Meta Is Tracking Its Own Employees to Train AI Agents

TL;DR Meta’s Model Capability Initiative installs software on US employee laptops that captures keystrokes, mouse movements, and screenshots to train AI agents - there is no opt-out The program is US-only because EU and UK employees are protected by GDPR; the scope of the tracking maps directly onto the absence of legal protection Meta CTO Andrew Bosworth openly framed the end state: agents do the work, humans direct and review - the surveillance and the automation plan are the same story The irony is deliberate: Meta’s defence of the program - narrow purpose, safeguards, not used against the person - echoes its long-standing defences of consumer data collection This is a signal about where the agent-training bottleneck actually sits: not reasoning or context windows, but the long tail of real software interactions that only real employees can provide Meta has started installing tracking software on the work laptops of its US-based employees. It captures keystrokes, mouse movements, clicks, and occasional screenshots. The captured activity is fed back into training data for AI agents. There is no opt-out. The program was disclosed to staff in an internal memo in April 2026, and the response from inside the company has been about what you would expect. ...

A Year of Agents, and What is Coming Next

TL;DR The defining shift from April 2025 to April 2026 is the move from “ask” to “delegate” - agents now run for minutes, open files, execute shells, and return results rather than waiting for each prompt Key developments that drove this: coding agents becoming operators (Claude Code, Cursor, Codex), MCP standardising tool access, spec-driven development going mainstream, and context windows expanding to millions of tokens In the next two years, longer-horizon agents, multi-agent coordination, persistent personal AI memory, and computer-use automation will move from early features to default expectations The working day is reshaping around less typing and more reviewing - the skill that matters is judgement over diffs, not typing speed or boilerplate generation To adapt now: pick a stack and use it daily, write specs before code, build the habit of reviewing diffs fast, and move procedural knowledge into reusable agent skills A year ago, in April 2025, “AI in your workflow” mostly meant a chat window in a browser tab and an autocomplete plugin in your editor. You typed, it suggested, you accepted or rejected. The interaction model was small. The blast radius was small. The verb was “ask”. ...

Start here#

I want to build#

I want context#

Resources#

Start here

I want to build

I want context

Resources