AI cloud subscriptions have fragmented into a crowded market. Frontier-lab APIs compete with open-weights challengers, consumer chat plans compete with agent platforms, and every provider is reshuffling model tiers every few months. This guide organizes the 2026 landscape so you can pick a plan without reading six pricing pages.

For background on how these costs behave over time, see Token Economics: Why Costs Aren’t Going Down and Local vs Cloud AI in 2026.

TL;DR Comparison

Consumer and team plans

Plan Price Best for
Claude Pro $20/mo Individual power users
Claude Max $100 - $200/mo Heavy Claude Code users
ChatGPT Plus $20/mo General-purpose chat
ChatGPT Pro $200/mo Unlimited o-series reasoning
Gemini Advanced $20/mo Google Workspace users
Copilot Pro $20/mo Microsoft 365 integration
X Premium+ $16/mo Grok inside X
SuperGrok $30/mo Higher Grok limits, Grok 4 access
Claude Team $30/user/mo Small engineering teams
ChatGPT Team $25/user/mo Mixed-function teams

Developer APIs

All API prices normalised to USD per 1M tokens. Consumer plans listed at flat monthly price.

Provider Flagship model Input $/1M Output $/1M Context Notable feature
Anthropic Claude Opus 4.7 $15 $75 200K Prompt caching, computer use
Anthropic Claude Sonnet 4.6 $3 $15 200K Best price/performance
Anthropic Claude Haiku 4.5 $0.80 $4 200K Cheap, fast, capable
OpenAI GPT-5 $10 $30 400K Strong multimodal
OpenAI GPT-5 mini $0.40 $1.60 400K High volume default
Google Gemini 2.5 Pro $1.25 $10 2M Huge context window
Google Gemini 2.5 Flash $0.30 $2.50 1M Cheapest long-context
xAI Grok 4 $5 $15 256K Real-time X data
Mistral Mistral Large 2 $2 $6 128K EU-hosted, open weights
DeepSeek DeepSeek V3 $0.14 $0.28 128K Cheapest frontier-class

Prices shift constantly - always check the provider’s pricing page before committing to a plan.

Developer APIs in Depth

Anthropic (Claude API)

Anthropic leads on long-horizon reasoning, tool use, and code generation. The 4.x line (Opus 4.7, Sonnet 4.6, Haiku 4.5) covers the full price/performance curve without forcing you to choose a weaker model to save money.

  • Prompt caching: Up to 90% discount on repeated input tokens. Critical for agents that reuse system prompts or codebases.
  • Batch API: 50% discount for non-realtime workloads.
  • Computer use: Claude can drive a desktop via screenshots and mouse/keyboard.
  • Best for: Agentic workflows, code assistants, long-document analysis, safety-sensitive deployments.

See Anthropic’s pricing page and the Claude API docs. I’ve written more about model selection in Claude Opus 4.7 and Claude Token Efficiency Mindset.

OpenAI

OpenAI still has the broadest feature surface: vision, audio, image generation, assistants, the Responses API, and a deep ecosystem of third-party integrations. GPT-5 is the flagship; GPT-5 mini handles most production traffic at a fraction of the cost.

  • Structured outputs: Strict JSON schema enforcement.
  • Realtime API: Low-latency voice conversations.
  • Batch API: 50% discount on async jobs.
  • Best for: Multimodal apps, voice agents, teams already on the OpenAI ecosystem.

Current rates are on the OpenAI pricing page.

Google (Gemini API)

Gemini’s killer feature is context window. 1 - 2 million tokens is enough to put an entire codebase, a textbook, or hours of video into a single prompt. Pricing is competitive, especially for Flash.

  • Context caching: Discounts for repeated long contexts.
  • Grounding with Google Search: Native web retrieval.
  • Native multimodality: Video, audio, and PDF input.
  • Best for: Long-context research, video understanding, Google Cloud shops.

See Google AI pricing. Related: LLM Context Window Arms Race.

Microsoft Azure OpenAI

Same OpenAI models, delivered through Azure with enterprise controls: data residency, private networking, SLAs, and BAA/HIPAA coverage. You pay an infrastructure premium over direct OpenAI access, but regulated industries usually need it.

  • Pricing: Per-model rates matching OpenAI’s, plus Azure compute and networking.
  • Best for: Regulated enterprises, Azure-native stacks, teams needing Microsoft’s compliance posture.

Azure OpenAI pricing.

xAI (Grok API)

xAI’s differentiator is live X (Twitter) data access and a looser content policy than the major labs. Grok 4 is genuinely competitive on reasoning benchmarks.

  • Live search: Native access to the X firehose.
  • Best for: Real-time social signal analysis, apps where content-policy friction is a blocker.

Mistral AI

Europe’s frontier lab. Mistral Large 2 is competitive at the mid-tier, and the open weights (Mistral Small, Mixtral) mean you can move workloads on-prem without rewriting prompts.

  • La Plateforme: Managed API with EU data residency.
  • Open weights: Self-host the smaller models on your own GPUs.
  • Best for: EU compliance, teams planning a cloud-to-on-prem path.

Mistral pricing.

DeepSeek

DeepSeek V3 and R1 collapsed the price floor in 2025 and remain the cheapest frontier-class models in 2026. Hosted pricing is roughly 10x cheaper than comparable Western models.

  • Best for: Cost-sensitive batch workloads, experimentation, non-critical inference.
  • Caveat: Data residency in China - check your compliance requirements before routing production traffic.

Consumer and Team Plans

Claude.ai

  • Free: Limited daily messages on Claude Sonnet 4.6.
  • Pro ($20/mo): ~5x the free limits, Projects, extended thinking.
  • Max ($100 - $200/mo): 5x or 20x Pro’s usage, designed for Claude Code.
  • Team ($30/user/mo, 5 seat min): Central billing, shared Projects.
  • Enterprise: SSO, audit logs, data residency, custom usage.

The Max tiers are specifically positioned for developers running Claude Code sessions all day - at that usage, Max is cheaper than the API.

ChatGPT

  • Free: GPT-5 mini with limits; GPT-5 available sparingly.
  • Plus ($20/mo): GPT-5, DALL-E, Advanced Voice, Code Interpreter.
  • Pro ($200/mo): Unlimited o-series reasoning models, longer context, research tools.
  • Team ($25/user/mo): Shared workspace, admin console.
  • Enterprise: Custom pricing, SSO, unlimited high-speed access.

Gemini Advanced

Bundled with Google One AI Premium at $20/mo. Includes Gemini 2.5 Pro in Gmail, Docs, and Sheets. Best value if you already live in Google Workspace.

Microsoft Copilot Pro

$20/mo consumer plan; $30/user/mo for Microsoft 365 Copilot. Deep integration across Word, Excel, PowerPoint, and Outlook. Less interesting as a standalone chat product.

Grok (xAI)

Grok ships through X’s subscription tiers rather than a standalone consumer plan.

  • X Premium+ ($16/mo): Grok inside the X app, ad-free timeline, longer posts.
  • SuperGrok ($30/mo): Higher rate limits, priority access to Grok 4, voice mode.
  • SuperGrok Heavy ($300/mo): Power-user tier with the largest context and deepest reasoning mode.

Worth considering if you already live on X or need live social data inside your chat sessions. Otherwise Claude Pro or ChatGPT Plus cover the same ground more cleanly.

How to Choose

Match the plan to the shape of your usage, not the brand:

  • Daily coding with an agent: Claude Max. Cheaper than the API once you pass ~$100/mo of equivalent usage. See My AI Dev Stack.
  • Weekend experimentation: Pay-as-you-go API with a spending cap. Skip subscriptions entirely.
  • Production app, low volume: Haiku 4.5 or GPT-5 mini. Both handle the vast majority of real tasks.
  • Production app, high volume: DeepSeek V3 for batch, Gemini Flash for long context, Claude Haiku for quality-sensitive paths. Route per task.
  • Regulated enterprise: Azure OpenAI or Claude Enterprise via AWS Bedrock / GCP Vertex.
  • Long documents or codebases: Gemini 2.5 Pro (2M context) or Claude with prompt caching.
  • Voice or realtime: OpenAI Realtime API.
  • Maximum cost pressure: DeepSeek hosted, or self-host Mistral / Llama on your own GPUs - see GPU Servers vs API Credits.

Multi-Provider Strategies

Most serious deployments now route across providers. A typical stack:

  • A cheap default model (Haiku, GPT-5 mini, or Gemini Flash) for 90% of requests.
  • A frontier model (Opus, GPT-5, Gemini Pro) for hard cases or user-visible quality.
  • A local or DeepSeek fallback for batch and non-critical paths.

What this requires in practice:

  • An abstraction layer - OpenRouter, LiteLLM, or a thin internal wrapper.
  • Eval harnesses that run the same prompts across providers and score output quality.
  • Per-provider observability on latency, error rate, and cost.
  • Contingency for provider outages and model deprecations.

Is Cloud Still Worth It vs Building Local?

The maths has genuinely shifted. Open-weights models like Llama 3.3, Qwen 2.5, and DeepSeek V3 now run on prosumer hardware with quality that beats GPT-4 from two years ago. A Mac Studio M3 Ultra with 512GB unified memory can run 70B-parameter models at useful speeds, and a DGX Spark or dual Mac Studio setup pushes into frontier territory.

Rough break-even maths:

  • A $10,000 Mac Studio amortised over 3 years is ~$280/mo. Add electricity and it’s ~$300/mo.
  • That buys you unlimited tokens on models in the Llama 3.3 70B / Qwen 72B class.
  • Equivalent Claude Sonnet 4.6 usage at $3/$15 per 1M tokens costs $300/mo at around 15 - 20M tokens/mo - roughly one heavy Claude Code user.

When local wins:

  • You’re a single heavy user hitting API bills above $200 - $300/mo.
  • Privacy or data residency rules out cloud entirely.
  • You want to tinker, fine-tune, or run agents overnight without a meter running.
  • Latency matters and your workload can use smaller models well.

When cloud still wins:

  • You need frontier capability (Opus 4.7, GPT-5, Gemini 2.5 Pro). No local hardware matches these yet.
  • Your usage is bursty - a Mac Studio sitting idle most of the day is wasted capex.
  • You need multimodal (video, voice, image generation) at production quality.
  • You’re a team - sharing one Mac Studio across five developers reintroduces all the queueing problems a cloud API solves.

The pragmatic answer for most people: keep a $20/mo Claude Pro or ChatGPT Plus subscription for frontier work, and run a local model for the 80% of tasks (summarisation, classification, basic codegen, drafts) where a 70B open model is already good enough. This hybrid setup is cheaper than either extreme and removes the “am I wasting tokens” tax on casual experimentation.

See Local vs Cloud AI in 2026 and GPU Servers vs API Credits for deeper analysis.

What to Watch in 2026

Pricing keeps falling at the low end while the frontier stays expensive. Context windows are growing faster than most apps can use them. Consumer subscriptions are absorbing agent capabilities (Claude Code, ChatGPT agents, Gemini in Workspace), which is changing the API-vs-subscription calculation for individual developers. Re-evaluate your stack every quarter - last year’s best choice is rarely this year’s.

Resources