Mac Studio LLMs Icon

Which Mac Studio Should You Buy for Running LLMs Locally?

TL;DR Best entry point: M2 Max 32-64 GB (~£1.4k-£2k) for 7B-13B models at 25-40 tok/s Best sweet spot: M2 Ultra 64-128 GB (~£3k-£4.5k) handles 30B+ models comfortably Best for 70B models: M3 Ultra 128 GB+ (~£5.5k+) with 800+ GB/s bandwidth Newer alternative: M4 Max (£2k-£4k) - lower bandwidth (410-546 GB/s) than Ultra chips, but still solid for 7B-13B models Key rule: Memory bandwidth matters more than raw compute for token generation Reality check: A RTX 5090 rig is 2-3× faster for similar money - buy Mac for simplicity and unified memory You want to run large language models locally on a Mac Studio. Good idea - unified memory is genuinely useful for LLMs. But the specs matter, and there are some hard truths about what “works” versus what feels responsive. More importantly: the right Mac depends entirely on which model you want to run. ...

April 18, 2026 · 10 min · James M

Chatbots & Large Language Models (LLMs)

TL;DR An LLM is the underlying reasoning engine; a chatbot is the product experience wrapped around it - they are related but not the same thing LLMs excel at summarizing, rewriting, generating drafts, and coding, but should be treated as fast collaborators rather than infallible oracles The main model families are frontier models (GPT, Claude, Gemini), open-weight / self-hostable models (Llama), and product-specific assistants (ChatGPT, Cursor, Copilot) Choose the right tool for the job: chatbots for convenience and exploration, APIs for automation, coding-native tools for repo-aware work The market is now split between AI as a consumer product and AI as programmable infrastructure - understanding both layers makes the landscape far less confusing Most people still talk about chatbots and large language models as if they are the same thing. ...

May 17, 2024 · 6 min · James M