Reasoning Models in 2026 - o3, R2, and the Compute-at-Inference Shift Banner

Reasoning Models in 2026: o3, R2, and the Compute-at-Inference Shift

Two years ago the way to make a model better was to train a bigger one. By the start of 2026 that recipe has stopped being the most interesting answer. The frontier has moved to a different lever - letting the model think for longer at inference time, generating intermediate reasoning, and only then producing the final answer. The category has a name now (reasoning models) and a family of products built around it. The interesting questions are no longer whether the trick works, because it clearly does, but when to reach for one, where it lands in production, and what the costs actually look like once the demo glow wears off. ...

May 8, 2026 · 15 min · James M

Chatbots & Large Language Models (LLMs)

TL;DR An LLM is the underlying reasoning engine; a chatbot is the product experience wrapped around it - they are related but not the same thing LLMs excel at summarizing, rewriting, generating drafts, and coding, but should be treated as fast collaborators rather than infallible oracles The main model families are frontier models (GPT, Claude, Gemini), open-weight / self-hostable models (Llama), and product-specific assistants (ChatGPT, Cursor, Copilot) Choose the right tool for the job: chatbots for convenience and exploration, APIs for automation, coding-native tools for repo-aware work The market is now split between AI as a consumer product and AI as programmable infrastructure - understanding both layers makes the landscape far less confusing Most people still talk about chatbots and large language models as if they are the same thing. ...

May 17, 2024 · 6 min · James M

Google Gemini Ultra

TL;DR Gemini Ultra was Google DeepMind’s flagship Gemini tier at launch in early 2024 - notable for hitting 90.0% on the MMLU benchmark Multimodal across text, images, video, and code, with strong coding and reasoning performance for its era Has since been superseded by Gemini 2.0, Gemini 2.5 Pro, and other specialised variants in the Gemini family Documented here for historical context - the original Ultra branding is no longer the main consumer-facing model Access today is through AI Studio, Vertex AI, and Google One AI Premium plans About Note: Gemini Ultra (released early 2024) has since been superseded by more advanced versions. As of 2026, Google’s flagship models include Gemini 2.0, Gemini 2.5 Pro, and specialized variants. This article documents the original Gemini Ultra for historical context. ...

March 29, 2024 · 2 min · James M

Google Gemini Advanced

TL;DR Gemini Advanced was Google’s tier-based offering for power users at launch in early 2024 - enhanced reasoning, longer context, and multimodal input Built around Gemini Ultra at first, then succeeded by Gemini 2.0 Flash, 2.5 Pro, and other newer models Use cases: coding, creative collaboration, long-form multi-turn conversations, document and media analysis Available through Google One AI Premium with web, mobile, and API access; pricing has shifted as the lineup evolved Documented here as the original 2024 positioning - check Google’s current site for what “Advanced” maps to today About Note: Gemini Advanced (released early 2024) was Google’s tier-based offering. By 2026, this has evolved into a more diverse model lineup including Gemini Pro, Gemini 2.0, and Gemini 2.5 Pro with different access tiers. This article reflects the original 2024 positioning. ...

March 29, 2024 · 2 min · James M