Qwen

TL;DR Best entry point: M2 Max 32-64 GB (~£1.4k-£2k) for 7B-13B models at 25-40 tok/s Best sweet spot: M2 Ultra 64-128 GB (~£3k-£4.5k) handles 30B+ models comfortably Best for 70B models: M3 Ultra 128 GB+ (~£5.5k+) with 800+ GB/s bandwidth Newer alternative: M4 Max (£2k-£4k) - lower bandwidth (410-546 GB/s) than Ultra chips, but still solid for 7B-13B models Key rule: Memory bandwidth matters more than raw compute for token generation Reality check: A RTX 5090 rig is 2-3× faster for similar money - buy Mac for simplicity and unified memory July 2026 update: Apple’s memory crunch has killed new 256GB/512GB Ultra configs for now - big-memory Macs are refurb-only until the M5 Ultra (tested up to 768GB) lands late 2026 On the horizon: the M7 Ultra, rumoured for around 2029, is reportedly designed to support up to 1.5TB of unified memory - see the road ahead below You want to run large language models locally on a Mac Studio. Good idea - unified memory is genuinely useful for LLMs. But the specs matter, and there are some hard truths about what “works” versus what feels responsive. More importantly: the right Mac depends entirely on which model you want to run. ...

Which Mac Studio Should You Buy for Running LLMs Locally?

The State of Open-Weight Models in 2026: Llama, Qwen, Mistral, DeepSeek