Llm | jamesm.blog

Context Engineering: The Discipline That Replaced Prompt Engineering

TL;DR Prompt engineering optimised the wording of a single human-written request. Context engineering optimises the entire set of tokens in the model’s window across a whole run - system prompt, tool definitions, retrieved documents, tool results, conversation history, and memory The shift happened because of agents. The window is no longer one prompt you wrote - it is an accumulation that grows on every step, and most of it is produced by the system, not by you More context is not better context. Research on “context rot” and the older lost-in-the-middle effect show model accuracy degrades as the window fills, even well below the advertised limit The four levers are retrieval (what you pull in), memory (what persists across runs), tool results (what tools dump back), and compaction (what you summarise and discard) Treat the window as a budget. Measure its token composition, design tools to return terse output, curate rather than accumulate, and keep the static prefix stable so prompt caching still works For a few years, “prompt engineering” was the named skill of working with language models. It meant finding the wording, the framing, the few-shot examples, and the role instructions that coaxed the best answer out of a single request. It produced a small industry of prompt libraries, prompt marketplaces, and job titles. And in 2026 it is mostly gone, absorbed into something larger and harder. ...

Multimodal AI in 2026: Vision + Text + Audio - What's Actually Useful

TL;DR Document understanding is the unglamorous killer application - invoices, contracts, and scanned PDFs that were painful to extract data from are now tractable without dedicated pipelines Vision models still under-deliver on precise spatial reasoning, object counting, and subtle medical or scientific imagery - these remain jobs for specialist models Audio is the modality with the most upside: beyond transcription, it carries tone, pace, and hesitation that text loses, enabling fault detection, emotional analysis, and richer inputs The teams getting real value treat multimodal as an invisible enabling capability within a workflow, not a feature to demo - and they verify high-stakes outputs just as they would text The right question when evaluating multimodal is not “can we use this” but “what specific user problem becomes tractable that previously was not” When the first multimodal frontier models shipped, the demos were genuinely impressive. A photo of a fridge interior with the model suggesting a recipe. A handwritten napkin sketch becoming working code. A short audio clip of a meeting being transcribed, summarised, and structured. It looked, briefly, like the boundary between modalities had collapsed and we were entering a new regime in which models could reason fluidly across text, images, and sound. ...

Prompt Caching: The Quiet Performance Win for LLM Applications

TL;DR Prompt caching saves the computed representation of a prompt’s static prefix so subsequent requests reuse it rather than recompute it - cached tokens cost roughly 10% of normal input token prices The savings are highest when prompts have a long, identical prefix across requests - system prompts, tool definitions, and few-shot examples can make up 80-90% of total input cost The most common mistake is interpolating variables into the system prompt, which breaks caching silently; fix it by moving all static content to the top and dynamic content to the end Cache lifetimes are bounded (minutes to a few hours per provider) and any change to the prefix - including whitespace - creates a new cache miss Track your cache hit rate explicitly on every LLM dashboard; a dropping hit rate usually signals unintended prompt construction changes, and fixing it is the highest-leverage cost optimisation available If you build LLM applications for any length of time, you eventually notice that you are paying to have the model read the same instructions over and over again. The system prompt, the tool definitions, the few-shot examples, the structured output schema - all of it goes back into the model on every single request, and you pay for the input tokens every single time. For a chatbot doing one or two thousand requests a day this is annoying. For an agent doing tens of thousands of requests with long contexts, it is the dominant cost line. ...

AI-Native Pipelines - What Changes When Your Consumer Is an LLM, Not a Dashboard

TL;DR Data pipelines were optimised for human consumers - dashboards, BI tools, analysts. In 2026 a growing share of pipeline output flows directly to language models, agents, and retrieval systems. That changes the design constraints in ways that catch teams off guard. Aggregation matters less. Context fidelity matters more. Freshness behaves differently. Schema moves from rigid to negotiated. Cost shifts from compute to tokens. The biggest mistake is treating an LLM consumer as if it were just another dashboard. It is not. It does not skim, it does not interpret charts, it does not have working memory across rows. It needs to be fed. The new patterns - retrieval-aware partitioning, embedding pipelines, structured-document outputs, prompt-shaped views, evaluation harnesses for data quality - are the actual subject of “AI-native data engineering” in 2026. The Underlying Shift For thirty years the implicit consumer of every data pipeline was a human looking at a screen. Even when the pipeline ended in an API or a CSV, the conceptual end-user was someone who would interpret the output with judgement, context, and skim-reading. ...

AI Tools & Frameworks

TL;DR A broad, categorised index of AI tools - art and design, chatbots, coding agents, music, video, text-to-speech, writing, research and more Each entry is a one-line description so you can scan quickly and click into anything interesting Best used as a discovery surface - bookmark the ones that match your work and ignore the rest Heavily skewed toward tools that have actually been useful or notable rather than every product in the space Living list - the tools update as the ecosystem moves, so the categories matter more than any single entry AI Tools Art & Graphic Design AutoDraw - fast drawing for everyone Adobe Firefly - generative AI tool with Generative Fill, part of Adobe Photoshop Cleanup.pictures - remove unwanted objects, defects, people or text from your pictures DALL·E 3 - creates realistic images and art from a description in natural language, integrated into ChatGPT Deep Nostalgia - animate your family photos FLUX - state-of-the-art open-weights text-to-image models from Black Forest Labs Ideogram - text-to-image generator known for accurate in-image text rendering Krea - real-time AI image generation and enhancement Leonardo - create stunning game assets and concept art with AI Magnific - AI upscaler and image enhancer Microsoft Designer - stunning designs, made lightning fast with AI Midjourney - generates images from natural language descriptions, called “prompts” Playground - create any image from your imagination Recraft - generative design tool with brand-consistent vector and raster output Stockimg AI - generate with AI: book covers, wallpapers, posters, logos, stock images, illustrations and art ChatBot ChatGPT - the original mainstream AI chatbot, developed by OpenAI (Generative Pre-trained Transformer) Claude - Anthropic’s frontier assistant; strong on reasoning, coding, and long-context tasks Cohere - build incredible products with world-class language AI, focused on enterprise DeepSeek - open-weight Chinese frontier models with competitive reasoning Google Gemini - conversational generative AI chatbot from Google (formerly Bard), available in free and Pro tiers Grok - xAI’s chatbot integrated with X (Twitter), known for real-time data access Le Chat - Mistral AI’s chatbot with open-weight model heritage Perplexity - answer engine combining LLMs with web search and citations Poe - Quora’s unified front-end for many models (Claude, GPT, Gemini, image models, custom bots) Chrome Extensions alicent - browser extension for ChatGPT Compose - Chrome extension that cuts down your writing time with AI-powered autocompletion & text generation FinalScout - ChatGPT-powered email finding & outreach at scale Voilà - personal AI assistant for supercharged productivity Poised - AI-powered communication coach that helps you speak with confidence and clarity Wiseone - helps you master any topic you are reading online by bringing relevant and reliable information Customer Support Forethought - generative AI for customer support Design Flair - AI design tool for branded content Galileo AI - creates delightful, editable UI designs from a simple text description Coding Agents & AI IDEs Aider - AI pair programmer in your terminal, works with any LLM and your git repo Claude Code - Anthropic’s official terminal-based coding agent powered by Claude Cline - open-source autonomous coding agent for VS Code Continue - open-source AI code assistant for VS Code and JetBrains Cursor - AI-first code editor (VS Code fork) with chat, edit, and agent modes GitHub Copilot - AI pair programmer integrated into VS Code, JetBrains, Neovim and Visual Studio Replit - browser-based IDE with Replit Agent for building full apps from prompts Sourcegraph Cody - AI coding assistant with deep codebase context Tabnine - privacy-focused AI code completion Windsurf - agentic IDE (formerly Codeium) with Cascade autonomous workflows Zed - high-performance collaborative editor with native AI assistant integration AI App Builders Bolt - prompt-to-app builder by StackBlitz, generates full-stack web apps in the browser Lovable - build production-ready web apps from natural language v0 - Vercel’s generative UI tool for React and Tailwind components Gaming Nvidia Game AI Voyager - open-ended embodied agent with large language models (LLMs) Marketing Automizy - email marketing software designed to increase your email open rates Music Beatoven - create customisable royalty-free music that elevates your story Boomy - create original songs in seconds Brain.fm - functional music designed to improve focus, relaxation, and sleep LALAL.AI - extract vocal, accompaniment, and various instruments from any audio or video Soundraw - royalty-free AI-generated music Stable Audio - Stability AI’s text-to-audio model for music and sound design Suno - generate full songs with vocals and instruments from a text prompt Udio - high-fidelity AI music generation with strong lyric and style control Project Management LiquidPlanner - project management solution that dynamically adapts to change and manages uncertainty to help teams plan, predict, and perform with confidence Prompt Engineering Borriss The Advanced Prompt Writer Tool - write complex prompts in seconds Text to Speech & Voice Cleanvoice - removes filler sounds, stuttering, and mouth sounds from your podcast or audio recording ElevenLabs - market-leading AI voice generation, cloning, and dubbing LOVO - AI voice generator and text-to-speech Murf - AI voice generator and text-to-speech Play.ht - realistic AI voices and voice cloning for content creators Resemble AI - voice cloning, real-time TTS, and speech-to-speech Speechify - reads text aloud using computer-generated text-to-speech voices Video Descript - write, record, transcribe, edit, collaborate, and share your videos and podcasts HeyGen - AI avatars and video translation for marketing and training Kling - high-quality text-to-video model from Kuaishou Pika - text- and image-to-video generation with creative effects Runway - leading AI video platform with the Gen-3 / Gen-4 family of models Sora - OpenAI’s text-to-video model, available via ChatGPT Synthesia - AI video creation with realistic avatars for enterprise training Veo - Google DeepMind’s text-to-video model Website Builders 10Web - AI-powered WordPress platform Durable - AI website builder that generates an entire website with images and copy in seconds Writing AISEO - AI writing assistant that delivers undetectable, human-like content in just a few clicks Beautiful - jumpstart your presentations Bertha - create engaging content without the hassle of creating it Decktopus - AI-powered presentation generator Fireflies - automate your meeting notes: record, transcribe, search and analyze voice conversations Gamma - start writing beautiful & engaging content with none of the formatting and design work Jasper - AI writer and AI art generator Kickresume - create a beautiful resume in minutes using AI & customizable templates Notion - organizational tools including task management, project tracking, to-do lists, bookmarking, and more Ocoya - create and schedule social media, content marketing & copywriting quicker using AI Paperpal - real-time, subject-specific language suggestions that help you write better, faster Postwise - craft engaging posts with AI, schedule effortlessly and watch your followers grow Quillbot - AI-powered paraphrasing tool to enhance your writing Saga AI - write faster, and do better work directly in Saga with the help of a digital AI assistant Scribe - automatically create step-by-step guides in seconds simply by watching you work Simplified - supercharge content creation Sudowrite - AI novel writing assistant that makes the creative writing process more fun and interactive Text Blaze - eliminate repetitive typing and mistakes Trinka - online grammar checker and language correction AI tool for academic and technical writing Writesonic - create SEO-optimized and plagiarism-free content for your blogs, ads, emails, and website 10X faster YouTube Eightify - YouTube summaries powered by ChatGPT TubeBuddy - optimize your YouTube channel faster Other Adriel - handle complex marketing campaigns and reach your advertising goals AdScale - boost your ad performance by automating everything from ad creation, and optimization, to audience targeting and performance tracking Akkio - predictive AI for Analysts Audiense - audience Intelligence platform, helping marketers and consumer researchers to be innovative and develop relevant audience-centric strategies through proprietary social consumer segmentation Bardeen - mission is to help people leverage technology, do more of what they love, and stay in the flow Brancher - connect AI models to build AI apps in minutes, with no-code Decoherence - create what can’t be filmed DoNotPay - the world’s first robot lawyer Fyle - real-time expense management Google Flood AI Hints - AI assistant that integrates with any software to perform tasks on your behalf Krisp - improves the productivity of online meetings with its AI-powered Voice Clarity and Meeting Assistant Lavender - sales email assistant powered by AI Mixo - helps entrepreneurs quickly launch and validate their business ideas MosaicTrack - smart recruiting solution that leverages the cognitive power of artificial intelligence to read through resumes and social profiles to find the best talent based on culture fit and skill set Nosto - commerce experience platform - an integrated suite of data-fueled personalization and merchandising solutions Octane AI - Shopify app for AI-powered customer engagement Outfits AI - try on any outfits using AI Regie - AI sales assistant for email Snazzy - gives you great content ideas for social media ads, landing pages and more Sprout Social - extract real business value from social Taskade - AI-powered workspace for productivity TldV - AI meeting recorder and summarizer Twain - AI writing assistant Vondy - AI app builder Voyado Elevate - intelligent search and merchandising for online retailers Warmer - AI cold email outreach WNR - prompts made easy with AI templates Research Consensus - search engine that uses AI to extract and distill findings directly from scientific research Elicit - research assistant that finds, summarises, and extracts data from academic papers NotebookLM - Google’s research and note-taking tool with grounded sources and audio overviews ResearchRabbit - citation-graph discovery tool for academic papers Scholarcy - reads research articles, reports, and book chapters and breaks them down into bite-sized sections SciSpace - do hours worth of reading and understanding in minutes Local & Self-Hosted LLMs GPT4All - run open-weight LLMs locally on desktop, no GPU required Jan - open-source ChatGPT alternative that runs fully offline LM Studio - desktop app to discover, download, and chat with local LLMs Ollama - run open models like Llama, Mistral, and Qwen locally with a simple CLI Open WebUI - extensible self-hosted web UI for Ollama and OpenAI-compatible APIs vLLM - high-throughput, memory-efficient inference engine for LLMs AI Development Frameworks AutoGen - Microsoft’s framework for building multi-agent conversations CrewAI - framework for orchestrating role-playing autonomous AI agents Hugging Face - the open-source hub for models, datasets, and ML tooling LangChain - framework for building applications with LLMs through composable chains and agents LangGraph - LangChain’s library for building stateful, multi-actor agent workflows LlamaIndex - data framework for connecting custom data sources to LLMs Pydantic AI - type-safe agent framework from the makers of Pydantic Spec-driven Development (SDD) GitHub Spec Kit - toolkit to help you get started with Spec-Driven Development (SDD) - specifications become executable, directly generating working implementations rather than just guiding them Twitter BlackMagic - enhanced Twitter for pro tweeters Hypefury - personal assistant to grow & monetize your Twitter audience Tribescaler - get more impressions, grow a better network and earn more money Tweet Hunter - build & monetize your Twitter audience Tweetlify - create viral tweets, grow followers & make money AI Research Companies Anthropic - AI safety company, makers of Claude Black Forest Labs - generative image research lab behind the FLUX model family Boston Dynamics - create exceptional robots that enrich people’s lives DeepMind - Google’s AI research lab, makers of Gemini, AlphaFold and Veo Fast.ai - making deep learning easier to use Google AI - Google’s division dedicated to artificial intelligence Meta AI - Meta’s research arm, makers of the Llama open-weight model family Midjourney - independent research lab exploring new mediums of thought Mistral AI - European frontier lab with strong open-weight models OpenAI - AI research and deployment company, makers of GPT and Sora Stability AI - generative AI research lab behind Stable Diffusion and Stable Audio Tesla AI & Robotics - developing and deploying autonomy at scale in vehicles and robots xAI - Elon Musk’s AI lab, makers of Grok Related Reading List of AI GitHub Projects The Complete AI Developer’s Guide: Resources and Best Practices List of AI Courses & Learning Resources AI Conferences Worth Following AI Explainers

When to Fine-Tune vs When to RAG: Choosing Your AI Architecture

TL;DR The default choice for most teams should be RAG - it is reversible in days, whereas a bad fine-tuning decision is an expensive sunk cost that requires retraining to fix RAG fails when the question requires reasoning across an entire knowledge domain rather than extracting a specific answer from a passage; fine-tuning handles that case better Fine-tuning fails silently when underlying facts change - it produces confidently wrong, stale answers with no warning; RAG automatically picks up changes at query time A practical decision framework: use RAG for volatile facts and cited answers, use fine-tuning for stable style, voice, and cross-domain reasoning The best production systems use both: a fine-tuned base model for stable domain knowledge, augmented with retrieval for current and specific information The question I get asked most often by engineers starting to build with language models is some variation of: “should we fine-tune or should we do RAG?” It is almost always the wrong question, but it is the wrong question in an instructive way. The reason it gets asked so much is that the choice feels architectural, and architectural choices feel like the kind of thing you commit to once and live with. In practice, the choice is closer to “should I use a database or a cache” - the answer is usually some of both, applied to different problems, and the ratio shifts as the system matures. ...

AI Hallucinations: Understanding and Mitigating False Outputs

TL;DR AI hallucinations are not perceptual errors - they are confident pattern completions that happen to be unanchored in the world, and no model will ever stop producing them entirely because truth is not what the training objective optimises for Hallucinations cluster into five distinct types: factual, citation, code and API, instruction (claiming to have done something it did not), and reasoning - each with a different root cause and a different mitigation The mitigations that genuinely move the dial are structural: retrieval-augmented generation, tool use over recall, constrained structured outputs, explicit verification layers, and lower temperature for factual tasks The model is not the product; the model surrounded by retrieval, verification, structured outputs, calibration, and human-in-the-loop review is the product Hallucination is not the bug - the absence of a system around the model is the bug, and treating it as an engineering problem rather than a model problem is what separates demos from production The word “hallucination” is one of the most successful pieces of accidental marketing in our industry. It is a soft, almost endearing way to describe an LLM stating with full confidence that a function exists when it does not, that a court case was decided when it was not, that a paper was written by an author who has never published in that field. It makes the failure sound like a quirk rather than the central reliability problem of the entire technology. ...

An AI Tooling Learning Path: Logical Phases for 2026

TL;DR The order you learn AI tools matters as much as which tools you learn - most people start with terminal agents or editors before they understand how models actually fail The seven-phase path runs: fundamentals, chat interfaces, AI-native editors, terminal agents, local models, orchestration, and review and evaluation Terminal agents (Claude Code, Cline, Aider) represent the biggest mindset shift - you move from driving with suggestions to specifying and letting the model execute Local models via Ollama belong in phase five, once you have felt the pain of API costs and know which tasks actually need frontier capability Review, evaluation, and capture (phase seven) is the phase most developers skip - and the one that separates AI-curious from AI-competent The hardest part of learning AI tooling in 2026 is not any single tool. It is the order you meet them in. ...

DGX Spark vs Mac Studio: Which Personal AI Supercomputer Should You Buy?

TL;DR Best value: Mac Studio M4 Max at $1,999 for most local LLM work Best prefill speed: DGX Spark at $4,699 (3.8× faster prompt processing) Best token generation: Mac Studio M3 Ultra at $3,999 (819 GB/s bandwidth) Best for fine-tuning: DGX Spark (CUDA ecosystem wins) Best combined setup: DGX Spark + M3 Ultra = 2.8× faster than either alone Introduction The market for personal AI supercomputers has exploded in 2025-2026. Two standout options have emerged: NVIDIA’s DGX Spark and Apple’s Mac Studio lineup. Both promise desktop-scale AI compute, but they approach the problem very differently. This guide breaks down the specs, costs, and real-world performance to help you decide which is right for you. ...

Which Mac Studio Should You Buy for Running LLMs Locally?

TL;DR Best entry point: M2 Max 32-64 GB (~£1.4k-£2k) for 7B-13B models at 25-40 tok/s Best sweet spot: M2 Ultra 64-128 GB (~£3k-£4.5k) handles 30B+ models comfortably Best for 70B models: M3 Ultra 128 GB+ (~£5.5k+) with 800+ GB/s bandwidth Newer alternative: M4 Max (£2k-£4k) - lower bandwidth (410-546 GB/s) than Ultra chips, but still solid for 7B-13B models Key rule: Memory bandwidth matters more than raw compute for token generation Reality check: A RTX 5090 rig is 2-3× faster for similar money - buy Mac for simplicity and unified memory You want to run large language models locally on a Mac Studio. Good idea - unified memory is genuinely useful for LLMs. But the specs matter, and there are some hard truths about what “works” versus what feels responsive. More importantly: the right Mac depends entirely on which model you want to run. ...