Claude Mythos: The AI Benchmark Breaker That Won't Be Released

TL;DR Claude Mythos Preview set new records across coding, mathematics, and reasoning: 93.9% on SWE-bench Verified, 97.6% on USAMO 2026, and leads GPT-5.4 on every shared benchmark The USAMO result - a 55-point jump over Claude Opus 4.6 - suggests genuinely different reasoning capabilities, not just incremental improvement, and Anthropic screened against memorization concerns Despite dominating benchmarks, Mythos is not publicly available because it autonomously discovered thousands of zero-day vulnerabilities across every major OS and browser Access is restricted to 12 major tech and finance companies via Project Glasswing, a defensive cybersecurity research initiative backed by $100M in Anthropic usage credits The wider implication: we have entered an era where “the best model” and “the publicly available model” may be permanently different things, with security becoming a deployment constraint alongside capability Anthropic released Claude Mythos Preview on April 7, 2026 - and immediately announced it won’t be publicly available. ...

April 8, 2026 · 5 min · James M

DeepSeek 🤯

TL;DR DeepSeek’s January 2025 release of R1 shook markets - a frontier-grade reasoning model trained for a reported $6M, a fraction of US lab budgets The app shot to #1 on Apple’s App Store inside days, and the open weights forced an industry-wide rethink of what training really costs Subsequent releases (V3 and beyond) cemented DeepSeek as a serious competitor in the open-source and cost-efficient AI category The story is less “China caught up” and more “the cost floor moved” - implications for closed-model pricing, GPU demand, and open-weight strategy Worth understanding as the moment that made cheap, capable, open models a credible default rather than a curiosity Overview Wow, crazy times, the best technology in the world is now becoming incredibly cheap and accessible to everyone! 🤯 ...

January 27, 2025 · 2 min · James M