TL;DR

  • Claude Mythos is Anthropic’s most powerful model to date, scoring 93.9% on SWE-bench and 97.6% on USAMO 2026 - a 55-point leap over rival models
  • It is not publicly available; Anthropic restricted access to 12 vetted companies through Project Glasswing, focused on defensive cybersecurity
  • Mythos autonomously identified thousands of zero-day vulnerabilities, including a 27-year-old unpatched OpenBSD bug - making its offensive potential too dangerous to democratize
  • This marks a shift away from open innovation toward controlled deployment, where the most capable AI may never be publicly released
  • The Mythos story forces a rethink of how we evaluate AI: benchmark performance and public availability are no longer the same thing

Anthropic built its most capable model to date, demonstrated it autonomously discovering thousands of zero-day vulnerabilities, and then declined to release it. That is the Mythos story, and it is worth sitting with rather than rushing past. The benchmarks are striking, but the decision not to publish is the more consequential part - it signals a real shift in how frontier AI labs are thinking about deployment.

Benchmark Shattering, Publicly Unavailable

Claude Mythos didn’t just outperform existing models; it redefined the ceiling of AI performance. Across a spectrum of demanding tasks, its scores were unprecedented:

  • Coding Prowess: On SWE-bench Verified, Mythos reportedly achieved an astonishing 93.9%, a significant leap over its closest rivals. This level of coding proficiency suggests an AI capable of independently tackling complex software engineering challenges.
  • Mathematical Intuition: Perhaps most strikingly, Mythos scored 97.6% on the USAMO 2026, a mathematical olympiad. This wasn’t a marginal improvement, but a staggering 55-point jump over leading models. Such a leap indicates a fundamental difference in its reasoning and problem-solving capabilities, pushing the boundaries of what AI can achieve in abstract thought.

These aren’t just academic achievements; they point to a system that possesses an almost uncanny understanding across diverse domains, from logical code construction to deep mathematical principles.

The Price of Power: Project Glasswing

So, why is an AI of this caliber kept under wraps? The answer lies in Project Glasswing, Anthropic’s initiative to deploy Mythos exclusively to a consortium of 12 major technology and finance companies. The mission? Defensive cybersecurity.

Mythos, in its limited preview, demonstrated an alarming proficiency in discovering zero-day vulnerabilities. It autonomously identified thousands of previously unknown flaws across critical systems, from operating systems to web browsers. This extraordinary talent, while invaluable for defense, simultaneously highlights the immense offensive potential. An AI that can find a 27-year-old unpatched OpenBSD bug is an AI that could, in the wrong hands, unravel the digital world.

Anthropic’s reasoning is clear: the model’s ability to uncover systemic weaknesses is too potent to be democratized without stringent controls. The risks of widespread availability - from state-sponsored cyber warfare to sophisticated criminal exploitation - outweigh the benefits of public access.

Redefining AI Deployment and Ethics

The story of Claude Mythos forces a critical re-evaluation of how we approach frontier AI development and deployment.

  1. Capability vs. Control: It underscores a growing tension between building increasingly powerful AI and ensuring its safe, controlled deployment. The most advanced AI might not be the most accessible.
  2. Security as a Design Constraint: Mythos demonstrates that for certain classes of AI, security isn’t an afterthought but a fundamental constraint shaping its entire business model and distribution.
  3. Beyond Benchmarks: While benchmarks remain crucial, Mythos shows they don’t tell the whole story. The “best” model, by performance metrics, might paradoxically be the one we can’t fully unleash.

This situation challenges the traditional Silicon Valley ethos of rapid, open innovation. Instead, it suggests a future where the most potent AI systems operate within tightly guarded perimeters, accessible only to those vetted to handle their profound implications.

The Future is Restricted

The question Mythos poses is practical, not philosophical: if Project Glasswing works - if defensive use of a restricted frontier model genuinely outpaces offensive misuse - then this deployment model becomes the template. The most capable AI available to vetted research partners. A safer, capable version available to the public. A gap between those two tiers that never fully closes.

That gap is not a failure of openness. It may be the most honest reckoning with what these systems can actually do. Whether Anthropic’s controls hold under sustained adversarial pressure is the experiment now underway. The results will shape how every lab behind them handles the same decision.