Pricing

AI API Pricing Comparison: Cost Per Million Tokens

Updated 2026-03-10

Data Notice: Figures, rates, and statistics cited in this article are based on the most recent available data at time of writing and may reflect projections or prior-year figures. Always verify current numbers with official sources before making financial, medical, or educational decisions.

AI API Pricing Comparison: Cost Per Million Tokens

AI API pricing changes frequently as providers compete on cost and capability. This page compares current pricing across all major providers in a single, easy-to-reference table. Bookmark it and check back regularly for updates.

AI model comparisons are based on publicly available benchmarks and editorial testing. Results may vary by use case.

Complete Pricing Table (March 2026)

Premium Tier Models

ModelProviderInput (per 1M tokens)Output (per 1M tokens)Context WindowNotes
Claude Opus 4Anthropic$15.00$75.00200KStrongest reasoning
o3OpenAI$10.00$40.00200K+ thinking token costs
Gemini UltraGoogle$7.00$21.001M+Largest context window

Mid-Tier Models (Best Value)

ModelProviderInput (per 1M tokens)Output (per 1M tokens)Context WindowNotes
Claude Sonnet 4Anthropic$3.00$15.00200KBest quality/cost ratio
GPT-4oOpenAI$2.50$10.00128KStrong generalist
Mistral LargeMistral$2.00$6.00128KGood multilingual
Gemini ProGoogle$1.25$5.001M+Great value with large context

Budget Tier Models

ModelProviderInput (per 1M tokens)Output (per 1M tokens)Context WindowNotes
o3-miniOpenAI$1.10$4.40200KBudget reasoning
Claude Haiku 4Anthropic$0.25$1.25200KFast, very cheap
GPT-4o miniOpenAI$0.15$0.60128KBudget general purpose
Gemini FlashGoogle$0.075$0.301M+Cheapest capable model

All prices as of March 2026. Check provider websites for the latest pricing.

Cost Per Common Task

TaskApproximate TokensOpus 4Sonnet 4GPT-4oHaiku 4Flash
Single question + answer500 in / 300 out$0.03$0.006$0.004$0.0005$0.0001
Blog post generation500 in / 1,500 out$0.12$0.024$0.016$0.002$0.0005
Document summary (20 pages)15K in / 500 out$0.26$0.053$0.043$0.004$0.001
Code review (large file)5K in / 2K out$0.23$0.045$0.033$0.004$0.001
Full book analysis150K in / 2K out$2.40$0.48N/A*$0.04$0.01

GPT-4o’s 128K limit means it cannot process a full book in one pass.

Cost Reduction Features

Prompt Caching (Anthropic)

Anthropic offers prompt caching that reduces the cost of repeated context (system prompts, reference documents) by up to 90%. This is significant for applications that reuse the same context across many queries.

  • Cache write: 1.25x base input cost
  • Cache read: 0.1x base input cost (90% discount)

Batch Processing

Several providers offer discounts for non-time-sensitive batch processing:

ProviderBatch DiscountTurnaround
Anthropic50% offWithin 24 hours
OpenAI50% offWithin 24 hours
GoogleVariesVaries

Volume Discounts

Enterprise agreements with committed usage can reduce pricing further. Contact providers directly for volume pricing.

AI API pricing has fallen dramatically:

YearCost for GPT-4-class 1M Output TokensReduction
2023~$60.00Baseline
2024~$15.0075% reduction
2025~$10.0083% reduction
2026~$5-10.0085-92% reduction

Expect continued price reductions as inference efficiency improves and competition intensifies.

Choosing the Right Tier

Use premium models when:

  • Complex reasoning, analysis, or coding is required
  • Accuracy on difficult problems justifies the cost
  • The cost of errors exceeds the cost of using a better model

Use mid-tier models when:

  • You need good quality for general tasks
  • You want the best quality-to-cost ratio
  • Volume is moderate (hundreds to thousands of queries/day)

Use budget models when:

  • Tasks are simple (classification, extraction, routing)
  • Volume is very high (millions of queries/day)
  • Speed matters more than maximum quality
  • You are building a first pass before human review

AI Costs Explained: API Pricing, Token Limits, and Hidden Fees

Key Takeaways

  • AI API pricing spans a 100x range from Gemini Flash ($0.075/1M input) to Claude Opus 4 ($15/1M input).
  • Mid-tier models (Claude Sonnet 4, GPT-4o) offer the best quality-to-cost ratio for most applications.
  • Prompt caching and batch processing can reduce costs by 50-90% for eligible workloads.
  • Prices have dropped 85-92% since 2023 and continue falling.
  • Output tokens are always more expensive than input tokens (2-5x), so controlling output length saves money.

Next Steps


This content is for informational purposes only and reflects independently researched comparisons. AI model capabilities change frequently — verify current specs with providers. Not professional advice.