Multimodal AI in 2026 Banner

Multimodal AI in 2026: Vision + Text + Audio - What's Actually Useful

TL;DR Document understanding is the unglamorous killer application - invoices, contracts, and scanned PDFs that were painful to extract data from are now tractable without dedicated pipelines Vision models still under-deliver on precise spatial reasoning, object counting, and subtle medical or scientific imagery - these remain jobs for specialist models Audio is the modality with the most upside: beyond transcription, it carries tone, pace, and hesitation that text loses, enabling fault detection, emotional analysis, and richer inputs The teams getting real value treat multimodal as an invisible enabling capability within a workflow, not a feature to demo - and they verify high-stakes outputs just as they would text The right question when evaluating multimodal is not “can we use this” but “what specific user problem becomes tractable that previously was not” When the first multimodal frontier models shipped, the demos were genuinely impressive. A photo of a fridge interior with the model suggesting a recipe. A handwritten napkin sketch becoming working code. A short audio clip of a meeting being transcribed, summarised, and structured. It looked, briefly, like the boundary between modalities had collapsed and we were entering a new regime in which models could reason fluidly across text, images, and sound. ...

May 9, 2026 · 10 min · James M
Prompt Caching Banner

Prompt Caching: The Quiet Performance Win for LLM Applications

TL;DR Prompt caching saves the computed representation of a prompt’s static prefix so subsequent requests reuse it rather than recompute it - cached tokens cost roughly 10% of normal input token prices The savings are highest when prompts have a long, identical prefix across requests - system prompts, tool definitions, and few-shot examples can make up 80-90% of total input cost The most common mistake is interpolating variables into the system prompt, which breaks caching silently; fix it by moving all static content to the top and dynamic content to the end Cache lifetimes are bounded (minutes to a few hours per provider) and any change to the prefix - including whitespace - creates a new cache miss Track your cache hit rate explicitly on every LLM dashboard; a dropping hit rate usually signals unintended prompt construction changes, and fixing it is the highest-leverage cost optimisation available If you build LLM applications for any length of time, you eventually notice that you are paying to have the model read the same instructions over and over again. The system prompt, the tool definitions, the few-shot examples, the structured output schema - all of it goes back into the model on every single request, and you pay for the input tokens every single time. For a chatbot doing one or two thousand requests a day this is annoying. For an agent doing tens of thousands of requests with long contexts, it is the dominant cost line. ...

May 9, 2026 · 10 min · James M
Reasoning Models in 2026 - o3, R2, and the Compute-at-Inference Shift Banner

Reasoning Models in 2026: o3, R2, and the Compute-at-Inference Shift

Two years ago the way to make a model better was to train a bigger one. By the start of 2026 that recipe has stopped being the most interesting answer. The frontier has moved to a different lever - letting the model think for longer at inference time, generating intermediate reasoning, and only then producing the final answer. The category has a name now (reasoning models) and a family of products built around it. The interesting questions are no longer whether the trick works, because it clearly does, but when to reach for one, where it lands in production, and what the costs actually look like once the demo glow wears off. ...

May 8, 2026 · 15 min · James M
The Modern Lakehouse Stack Banner

The Modern Lakehouse Stack: What Actually Belongs in Production

The word “lakehouse” has been doing a lot of work for the last five years. It has been used to describe everything from a thin SQL layer over object storage to a fully integrated platform with governance, lineage, ML training, and BI built on top. Like most umbrella terms, this elasticity has been useful for marketers and confusing for engineers. This post is the version of the conversation I would have with a senior engineer who has been asked to “build out our lakehouse” and wants to know which pieces are load-bearing and which are noise. It draws on what I have actually seen ship and survive in production data platforms in 2026, and it tries to be specific about why each layer is in the stack rather than just describing the picture as a fait accompli. ...

May 8, 2026 · 9 min · James M
Scott Galloway on AI - The Marketing Professor's Case That the Rich Don't Need You Anymore Banner

Scott Galloway on AI: The Marketing Professor's Case That the Rich Don't Need You Anymore

Scott Galloway is the kind of commentator the AI conversation rarely produces: not a researcher, not a founder, not a doomer, not a booster. He is a marketing professor and a serial entrepreneur with a record of correctly reading the corporate stories of the last two decades, and he has spent the last two years pointing at the AI story with increasing concern. The headline of his pitch - that AI was not built for ordinary people and that the rich no longer need them - is provocative on purpose. The argument underneath is more careful, and worth pulling apart on its own terms. ...

May 4, 2026 · 14 min · James M
Hybrid Systems Montage MC-707 Banner

Hybrid Systems: Montage + MC-707 Architecture and Workflow

The Yamaha Montage M and the Roland MC-707 are both, on paper, complete instruments. The Montage is a flagship synth workstation with three distinct sound engines and the kind of polyphony and DSP headroom that makes most studio plugins look slow. The MC-707 is a compact groovebox with eight tracks, an internal sequencer, sample playback, and the kind of immediate hands-on workflow that makes laptop production feel laborious by comparison. ...

May 4, 2026 · 9 min · James M
Yamaha Montage M Six Months In Banner

The Yamaha Montage M: 6 Months In Real World Usage

A six-month review is a different beast from a release-day one. The honeymoon is over. The early enthusiasm has cooled. The features that demoed well in the showroom have either earned their place in your daily workflow or quietly been abandoned, and the features you initially overlooked have either continued to be irrelevant or become indispensable. This is a six-month review of the Yamaha Montage M, the M8X variant specifically, from the perspective of someone using it as the centrepiece of a hybrid hardware rig rather than a stage instrument or a sound-design lab. The conclusions are mine, the use case is specific, and your mileage will genuinely vary, but the patterns I have noticed are likely to repeat across other working setups. ...

May 4, 2026 · 9 min · James M
ETL Tools and Data Integration

ETL Tools & Data Integration Platforms

What is ETL? ETL is a foundational data engineering process that powers modern analytics: Extract - Retrieve data from various sources (databases, APIs, files, cloud services, streaming platforms) Transform - Clean, validate, deduplicate, and reshape data into required data models Load - Move processed data into data warehouses, data lakes, or analytical systems ETL ensures data quality, consistency, and accessibility for analytics and reporting. In 2026 the dominant pattern is ELT (Extract-Load-Transform), which leverages cloud data warehouse compute for transformation, and increasingly EtLT (adding lightweight pre-load transforms for streaming and schema drift). See the Fundamentals of Data Engineering book for a deeper framing. ...

May 4, 2026 · 9 min · James M
The State of Blockchain in 2026 Banner

The State of Blockchain in 2026

TL;DR The blockchain industry in 2026 is no longer arguing about whether it has a future. The arguments are about which layers do which jobs. Bitcoin remains the reserve asset and the most credible neutral settlement layer. Ethereum is the dominant smart-contract base layer, with most activity now happening on its Layer 2s. Solana has taken the high-throughput application crown. Polkadot is mid-pivot from infrastructure to applications. The two structural shifts that define 2026 are modular blockchains (Celestia, EigenLayer) and the stablecoin economy, where annual settlement volume now exceeds Visa. Real-world asset tokenization has gone from a slide-deck thesis to a $30B+ live market, led by BlackRock’s BUIDL and tokenized US treasuries. The destination for the next two years is clear: payments, treasuries, and AI agents using crypto rails - and most users will not know they are using a blockchain. What Actually Survived It is worth saying out loud: most of the things that called themselves “the future of finance” in 2021 are gone. The 2022-2023 unwind cleared out the projects that had no users, no revenue, and no reason to exist. What remains in 2026 is a much smaller, much more boring, and much more useful set of networks. ...

May 4, 2026 · 15 min · James M
Interstellar Physics and Philosophy Banner

The Physics and Philosophy of Interstellar

There are not many films where the visual effects pipeline produces a peer-reviewed physics paper. Christopher Nolan’s Interstellar is one of them. The visualisation of the supermassive black hole Gargantua was rigorous enough that it ended up in Classical and Quantum Gravity, co-authored by the visual effects team and Nobel laureate Kip Thorne. That single fact captures what makes the film unusual. It is, on the surface, a story about love, time, and survival. Underneath, it is a serious attempt to take Einstein’s general relativity and put it on a 70mm IMAX screen with as little fudging as Hollywood would allow. ...

May 4, 2026 · 14 min · James M