The State of Open-Weight Models in 2026 Banner

The State of Open-Weight Models in 2026: Llama, Qwen, Mistral, DeepSeek

The open-weight model conversation in 2023 was about whether the open ecosystem could keep up with the frontier labs at all. The conversation in 2024 was about how big the gap was. The conversation in 2026 has changed shape: on most benchmarks that matter to most production workloads, the open-weight ecosystem has either closed or substantially narrowed the gap, and the strategic question is no longer “can we use open models” but “which open model fits this workload best.” ...

May 12, 2026 · 13 min · James M
Mac Studio LLMs Icon

Which Mac Studio Should You Buy for Running LLMs Locally?

TL;DR Best entry point: M2 Max 32-64 GB (~£1.4k-£2k) for 7B-13B models at 25-40 tok/s Best sweet spot: M2 Ultra 64-128 GB (~£3k-£4.5k) handles 30B+ models comfortably Best for 70B models: M3 Ultra 128 GB+ (~£5.5k+) with 800+ GB/s bandwidth Newer alternative: M4 Max (£2k-£4k) - lower bandwidth (410-546 GB/s) than Ultra chips, but still solid for 7B-13B models Key rule: Memory bandwidth matters more than raw compute for token generation Reality check: A RTX 5090 rig is 2-3× faster for similar money - buy Mac for simplicity and unified memory You want to run large language models locally on a Mac Studio. Good idea - unified memory is genuinely useful for LLMs. But the specs matter, and there are some hard truths about what “works” versus what feels responsive. More importantly: the right Mac depends entirely on which model you want to run. ...

April 18, 2026 · 10 min · James M