Multimodal AI in 2026 Banner

Multimodal AI in 2026: Vision + Text + Audio - What's Actually Useful

TL;DR Document understanding is the unglamorous killer application - invoices, contracts, and scanned PDFs that were painful to extract data from are now tractable without dedicated pipelines Vision models still under-deliver on precise spatial reasoning, object counting, and subtle medical or scientific imagery - these remain jobs for specialist models Audio is the modality with the most upside: beyond transcription, it carries tone, pace, and hesitation that text loses, enabling fault detection, emotional analysis, and richer inputs The teams getting real value treat multimodal as an invisible enabling capability within a workflow, not a feature to demo - and they verify high-stakes outputs just as they would text The right question when evaluating multimodal is not “can we use this” but “what specific user problem becomes tractable that previously was not” When the first multimodal frontier models shipped, the demos were genuinely impressive. A photo of a fridge interior with the model suggesting a recipe. A handwritten napkin sketch becoming working code. A short audio clip of a meeting being transcribed, summarised, and structured. It looked, briefly, like the boundary between modalities had collapsed and we were entering a new regime in which models could reason fluidly across text, images, and sound. ...

May 9, 2026 · 10 min · James M
Prompt Caching Banner

Prompt Caching: The Quiet Performance Win for LLM Applications

TL;DR Prompt caching saves the computed representation of a prompt’s static prefix so subsequent requests reuse it rather than recompute it - cached tokens cost roughly 10% of normal input token prices The savings are highest when prompts have a long, identical prefix across requests - system prompts, tool definitions, and few-shot examples can make up 80-90% of total input cost The most common mistake is interpolating variables into the system prompt, which breaks caching silently; fix it by moving all static content to the top and dynamic content to the end Cache lifetimes are bounded (minutes to a few hours per provider) and any change to the prefix - including whitespace - creates a new cache miss Track your cache hit rate explicitly on every LLM dashboard; a dropping hit rate usually signals unintended prompt construction changes, and fixing it is the highest-leverage cost optimisation available If you build LLM applications for any length of time, you eventually notice that you are paying to have the model read the same instructions over and over again. The system prompt, the tool definitions, the few-shot examples, the structured output schema - all of it goes back into the model on every single request, and you pay for the input tokens every single time. For a chatbot doing one or two thousand requests a day this is annoying. For an agent doing tens of thousands of requests with long contexts, it is the dominant cost line. ...

May 9, 2026 · 10 min · James M
Reasoning Models in 2026 - o3, R2, and the Compute-at-Inference Shift Banner

Reasoning Models in 2026: o3, R2, and the Compute-at-Inference Shift

Two years ago the way to make a model better was to train a bigger one. By the start of 2026 that recipe has stopped being the most interesting answer. The frontier has moved to a different lever - letting the model think for longer at inference time, generating intermediate reasoning, and only then producing the final answer. The category has a name now (reasoning models) and a family of products built around it. The interesting questions are no longer whether the trick works, because it clearly does, but when to reach for one, where it lands in production, and what the costs actually look like once the demo glow wears off. ...

May 8, 2026 · 15 min · James M
Scott Galloway on AI - The Marketing Professor's Case That the Rich Don't Need You Anymore Banner

Scott Galloway on AI: The Marketing Professor's Case That the Rich Don't Need You Anymore

Scott Galloway is the kind of commentator the AI conversation rarely produces: not a researcher, not a founder, not a doomer, not a booster. He is a marketing professor and a serial entrepreneur with a record of correctly reading the corporate stories of the last two decades, and he has spent the last two years pointing at the AI story with increasing concern. The headline of his pitch - that AI was not built for ordinary people and that the rich no longer need them - is provocative on purpose. The argument underneath is more careful, and worth pulling apart on its own terms. ...

May 4, 2026 · 14 min · James M
ETL Tools and Data Integration

ETL Tools & Data Integration Platforms

What is ETL? ETL is a foundational data engineering process that powers modern analytics: Extract - Retrieve data from various sources (databases, APIs, files, cloud services, streaming platforms) Transform - Clean, validate, deduplicate, and reshape data into required data models Load - Move processed data into data warehouses, data lakes, or analytical systems ETL ensures data quality, consistency, and accessibility for analytics and reporting. In 2026 the dominant pattern is ELT (Extract-Load-Transform), which leverages cloud data warehouse compute for transformation, and increasingly EtLT (adding lightweight pre-load transforms for streaming and schema drift). See the Fundamentals of Data Engineering book for a deeper framing. ...

May 4, 2026 · 9 min · James M
AI-Native Pipelines Banner

AI-Native Pipelines - What Changes When Your Consumer Is an LLM, Not a Dashboard

TL;DR Data pipelines were optimised for human consumers - dashboards, BI tools, analysts. In 2026 a growing share of pipeline output flows directly to language models, agents, and retrieval systems. That changes the design constraints in ways that catch teams off guard. Aggregation matters less. Context fidelity matters more. Freshness behaves differently. Schema moves from rigid to negotiated. Cost shifts from compute to tokens. The biggest mistake is treating an LLM consumer as if it were just another dashboard. It is not. It does not skim, it does not interpret charts, it does not have working memory across rows. It needs to be fed. The new patterns - retrieval-aware partitioning, embedding pipelines, structured-document outputs, prompt-shaped views, evaluation harnesses for data quality - are the actual subject of “AI-native data engineering” in 2026. The Underlying Shift For thirty years the implicit consumer of every data pipeline was a human looking at a screen. Even when the pipeline ended in an API or a CSV, the conceptual end-user was someone who would interpret the output with judgement, context, and skim-reading. ...

May 3, 2026 · 9 min · James M
Onchain AI Agents Hype Reality Banner

Onchain AI Agents - Hype, Reality, and Where the Money Actually Flows

TL;DR “Onchain AI agents” became the dominant crypto narrative in 2025 and has cooled meaningfully in 2026 as the picture has gotten clearer. The honest taxonomy has three buckets: agents that hold wallets and trade, agents that automate DeFi operations, and agents that exist primarily as tokens with a chatbot attached. Only the first two are doing real work. Real revenue is concentrated in agent-driven DeFi automation, MEV strategies executed by agents, and onchain payment rails for AI services. Most of the rest is meme economics dressed in technical clothing. The structural question - “do AI agents need crypto rails at all” - has become a genuinely live debate. The answer in 2026 is “yes, but only for a narrow set of jobs, and most of those jobs are not what was being pitched.” If you are evaluating an onchain AI agent project, the test is brutally simple: strip away the token and ask whether the agent does something useful. If the answer is no, the project is a token with extra steps. How We Got Here The phrase “onchain AI agent” started showing up in crypto Twitter in late 2024 and exploded in early 2025. By the middle of last year there were thousands of agent tokens, dozens of agent platforms, and a handful of agents with billion-dollar implied market caps doing things that would have embarrassed a 2010-era chatbot. ...

May 3, 2026 · 9 min · James M
Agent Protocols MCP A2A ACP Banner

The Quiet Standardisation of Agent Protocols - MCP, A2A, ACP Compared

TL;DR The 2026 agent ecosystem has, while nobody was paying close attention, converged on three protocols that solve different problems and partly overlap: MCP (Model Context Protocol), A2A (Agent-to-Agent), and ACP (Agent Communication Protocol). MCP is the model-to-tool protocol. It standardises how an agent talks to its tools, data sources, and local context. This is the one that has clearly won its layer. A2A is the agent-to-agent protocol. It standardises how separately deployed agents discover each other, exchange tasks, and pass results. Adoption is growing but the picture is less settled. ACP is the orchestration-and-runtime protocol. It standardises how an agent runtime exposes its lifecycle, state, and operations to the systems around it. Newer, more enterprise-focused, and not yet a clear winner. The mental model: MCP for tools, A2A for peers, ACP for the platform. Build with all three in mind even if you only need one today. Why Protocols, Why Now A year ago “agents” was still a debate about whether the things existed. By mid-2026 the debate has shifted. Agents exist. They do useful work. The interesting question is no longer “will this work” but “how do we connect them to everything else.” ...

May 3, 2026 · 8 min · James M
Five AI Tokens Worth Understanding in 2026 Banner

Five AI Tokens Worth Understanding in 2026 (And One You're Probably Missing)

A technical reader’s guide to where AI and crypto actually meet - without the hype. TL;DR The AI-token sector has stratified. There is a clear top tier of projects with real engineering, real revenue and visible institutional interest, and a long tail of speculation. The total AI-crypto market just crossed $17B and the measurable-infrastructure share is growing faster than the speculative tail. The five tokens worth understanding in May 2026 are Bittensor (TAO) as the conviction long, Virtuals Protocol (VIRTUAL) as the speculative growth bet, Render (RENDER) as the infrastructure hold, Artificial Superintelligence Alliance (FET / ASI) as the deep value play, and NEAR Protocol (NEAR) as the AI commerce layer. Every name on the list has drawn down 60%+ from its all-time high in the last 18 months. The drawdowns are not theoretical and they will happen again. Position-sizing matters more than picks. Worth flagging without putting them in the main basket - Kite (KITE), Internet Computer (ICP) and The Graph (GRT). Worth avoiding - the long tail of “AI memecoin” launches. Nothing here is investment advice. Prices are snapshots from publicly available data (CoinGecko, CoinMarketCap) as of 4 May 2026 and will be stale within hours. Why The Sector Looks Different In 2026 A year ago the AI-token sector was mostly a betting market on which token had “AI” most prominently in its tagline. In May 2026 the picture has changed character. There is a clear top tier of projects with measurable engineering output, real revenue, and visible institutional interest, and a long tail of names whose only product is a narrative. The total AI-crypto market cap just crossed $17B, and the share of that capital flowing into infrastructure with measurable usage has grown faster than the speculative tail. ...

May 3, 2026 · 13 min · James M
LLM-Powered Personal Productivity Banner

LLM-Powered Personal Productivity: Building a Private Automation Stack

TL;DR The interesting question in 2026 is not “can a local model do this”, it is “which jobs should you give it”. My stack: Ollama for inference, Letta for persistent agent memory, Obsidian as the second brain, Home Assistant for the physical world, and a small router that decides where each thought goes. Three jobs are the sweet spot for local: inbox triage, note enrichment, and routine automation. Each one is repetitive, private, and tolerant of a bit of latency. Two jobs are still worth handing to a frontier cloud model: anything novel-and-hard, and anything where you want the best draft on the first attempt. The bit nobody talks about is the router. The model is not the product. The thing that decides which model gets which job is the product. Why Local Got Interesting For years the answer to “should I run an LLM locally” was “no, just use the API”. The API was cheaper, faster, smarter, and you did not have to think about VRAM. The only reason to go local was privacy, and most people did not actually care about privacy enough to give up the quality gap. ...

May 3, 2026 · 9 min · James M