Claude Mythos: The AI Benchmark Breaker That Won't Be Released

TL;DR Claude Mythos Preview set new records across coding, mathematics, and reasoning: 93.9% on SWE-bench Verified, 97.6% on USAMO 2026, and leads GPT-5.4 on every shared benchmark The USAMO result - a 55-point jump over Claude Opus 4.6 - suggests genuinely different reasoning capabilities, not just incremental improvement, and Anthropic screened against memorization concerns Despite dominating benchmarks, Mythos is not publicly available because it autonomously discovered thousands of zero-day vulnerabilities across every major OS and browser Access is restricted to 12 major tech and finance companies via Project Glasswing, a defensive cybersecurity research initiative backed by $100M in Anthropic usage credits The wider implication: we have entered an era where “the best model” and “the publicly available model” may be permanently different things, with security becoming a deployment constraint alongside capability Anthropic released Claude Mythos Preview on April 7, 2026 - and immediately announced it won’t be publicly available. ...

April 8, 2026 · 5 min · James M

Claude Code vs Cursor: A 6-Month Comparison

After six months of daily use, here is how the two heavyweights of AI-assisted coding compare: the terminal-native Claude Code and the IDE-integrated Cursor.

April 8, 2026 · 3 min · James M
AI subscription pricing illustration

Is the $20 AI Subscription Era Over?

TL;DR The $20/month subscription tier is not disappearing, but what you get for it is quietly shrinking - agent features are being capped or metered while the price holds The Claude Code episode (briefly paywalled for Pro users) was a deliberate A/B test, not a glitch - a signal that Anthropic is steering heavy users toward the Max tier at $100 - $200/month Agent workflows like Claude Code consume 50 - 500x more tokens than a chat session, making flat all-you-can-eat pricing economically unsustainable for power users Most major providers (Anthropic, OpenAI, Google, Cursor) are projected to raise consumer tiers by $5 - $10 by end of 2026, with sharper increases at the enterprise level If you are a chat-only user the $20 plan remains a good deal; if you are running agents daily, budget for a higher tier or pay-as-you-go API access instead For the last three years, $20 a month has been the magic number. Claude Pro, ChatGPT Plus, Gemini Advanced, Copilot Pro, Cursor Pro - all twenty dollars, all clearly priced to anchor against Netflix rather than against enterprise software. That anchor is cracking. The labs are burning cash on inference for power users, the frontier models cost more per token than they did a year ago, and agent tools like Claude Code and Codex are consuming ten to a hundred times the compute a chat session does. Something has to give. ...

April 3, 2026 · 10 min · James M

Claude Code Just Got a Serious Code Review Feature

TL;DR Claude Code’s new Code Review feature dispatches multiple AI agents in parallel to review a PR from different angles, rather than running a single shallow model pass over the diff The motivation is real: Anthropic’s internal code output per engineer increased by around 200%, making human review the bottleneck - and humans consistently miss subtle bugs on large diffs Multi-agent review cross-checks findings, filters false positives, and ranks issues by severity before posting a clean, high-signal review comment plus inline annotations Review depth scales with PR size; typical runs take about 20 minutes and cost $15 - $25, which is cheap compared to the cost of a production bug Humans still approve PRs - the tool’s role is a thorough pre-review pass, not automated sign-off, making it a complement to human judgment rather than a replacement I genuinely think a lot of people still underestimate how fast the AI developer tooling ecosystem is evolving. ...

March 9, 2026 · 5 min · James M

Hitting Claude Code Limits? Here’s the Setup I’m Moving Toward

TL;DR Hitting Claude Code Pro usage limits does not mean upgrading to the $200/month plan - a hybrid AI stack is a smarter and cheaper alternative The tiering strategy: local models (free) for quick edits, cheap cloud APIs for general coding, and frontier models only for architecture or complex multi-file reasoning Tools like Ollama or LM Studio with coding models such as DeepSeek Coder or Qwen2.5 handle the majority of everyday tasks locally at no cost Cheap cloud inference providers (Groq, Together AI, DeepInfra) offer capable open models at fractions of a cent per session for heavier work A realistic usage split of 80% local / 15% cheap APIs / 5% frontier models dramatically reduces limit burn while keeping Claude available when it genuinely matters I keep running into the same problem with Claude Code Pro ($20/month): I burn through the usage limits faster than I expect. The obvious solution is upgrading to the $200/month plan, but that feels excessive for how I actually use it. ...

March 9, 2026 · 4 min · James M

Chatbots & Large Language Models (LLMs)

TL;DR An LLM is the underlying reasoning engine; a chatbot is the product experience wrapped around it - they are related but not the same thing LLMs excel at summarizing, rewriting, generating drafts, and coding, but should be treated as fast collaborators rather than infallible oracles The main model families are frontier models (GPT, Claude, Gemini), open-weight / self-hostable models (Llama), and product-specific assistants (ChatGPT, Cursor, Copilot) Choose the right tool for the job: chatbots for convenience and exploration, APIs for automation, coding-native tools for repo-aware work The market is now split between AI as a consumer product and AI as programmable infrastructure - understanding both layers makes the landscape far less confusing Most people still talk about chatbots and large language models as if they are the same thing. ...

May 17, 2024 · 6 min · James M