Inteligência em modelos de IA

Preços · 2026-06-29

Preços de LLM

Custo por token nos principais provedores, normalizado em USD por 1M de tokens.

Cheapest LLM APIs

AI APIs ranked by input + output token cost.

OpenAI API Pricing

All OpenAI model prices in one table — GPT-5, GPT-5 Mini, embeddings and more.

Anthropic Claude Pricing

All Anthropic Claude prices — Opus, Sonnet, Haiku and prompt caching costs.

Sobre esta lista

Almost every commercial LLM provider charges separately for input tokens (the prompt you send) and output tokens (the response you get back). Output tokens typically cost 3–5× more than input tokens because generation is autoregressive — each token depends on the previous one and cannot be batched as efficiently.

The table below normalises all prices to USD per 1 million tokens. This is the industry-standard unit; per-1K or per-token rates are easy to misread by 1000×. The "Total" column sums input + output for a quick apples-to-apples comparison, but your real cost depends heavily on your input/output ratio.

What the table does NOT show

  • Prompt caching discounts — Anthropic and OpenAI offer cache_read rates 5–10× cheaper than standard input. If you reuse a long system prompt, caching dominates total cost.
  • Tiered pricing above 200K tokens — Google and Anthropic charge a premium for very long inputs. Each model detail page shows the >200K tier when applicable.
  • Reasoning tokens — thinking models (o-series, Claude Extended Thinking, DeepSeek R1) bill internal reasoning at output rates, often 2–3× the visible answer length.
  • Volume discounts and prepaid credits — most providers offer 10–30% off at scale. These are not reflected here.
  • Audio / image surcharges — multimodal inputs have separate per-image or per-second rates.

How to use this page

  1. Find the cheapest model that meets your context window and capability requirements (tool calling, structured output, vision).
  2. Open its detail page to check cache pricing, output limits and provider availability.
  3. Use the pricing calculator to estimate monthly cost for your actual token volume.
  4. Compare 2–3 finalists side-by-side on a comparison page.

Prices are refreshed daily. Models showing "Unknown" do not publish a public per-token rate — this usually means enterprise-only or invite-gated access. We deliberately do not show them as $0 to avoid misleading rankings.

#ModelVendorInput / 1MOutput / 1MTotalContext
1BGE Reranker Basecloudflare-ai-gateway$0.003Unknown$0.003128K
2Voxtral Small 24B 2507Mistral$0.002$0.002$0.00532K
3All-MiniLM-L6-v2digitalocean$0.009Unknown$0.009256
4Multi-QA-mpnet-base-dot-v1digitalocean$0.009Unknown$0.009512
5Qwen3 Embedding 8BAlibaba (Qwen)$0.010Unknown$0.01033K
6Qwen3 Embedding 0.6BAlibaba (Qwen)$0.010Unknown$0.01033K
7Qwen3 Embedding 4BAlibaba (Qwen)$0.010Unknown$0.01033K
8BGE Reranker v2 M3digitalocean$0.010Unknown$0.0108K
9BGE M3cloudflare-ai-gateway$0.012Unknown$0.012128K
10PLaMo Embedding 1Bcloudflare-ai-gateway$0.019Unknown$0.019128K
11Llama 3.2 1B InstructMeta$0.010$0.010$0.02060K
12text-embedding-3-smallOpenAI$0.020Unknown$0.0208K
13llama-3.1-nemotron-safety-guard-8b-v3NVIDIA$0.010$0.010$0.020128K
14Prompt Guard 2 86MMeta$0.010$0.010$0.020512
15Llama Prompt Guard 2 22MMeta$0.010$0.010$0.020512
16E5 Large v2digitalocean$0.020Unknown$0.020512
17BGE M3digitalocean$0.020Unknown$0.0208K
18BGE Small EN v1.5cloudflare-ai-gateway$0.020Unknown$0.020128K
19text-embedding-3-smallazure$0.020Unknown$0.0208K
20text-embedding-3-smallazure-cognitive-services$0.020Unknown$0.0208K
21DistilBERT SST-2 INT8cloudflare-ai-gateway$0.026Unknown$0.026128K
22Llama 3.2 3B InstructMeta$0.020$0.020$0.04080K
23PaddleOCR-VLnovita-ai$0.020$0.020$0.04016K
24Ling-2.6-flashopenrouter$0.010$0.030$0.040262K
25Meta-Llama-3.1-8B-InstructMeta$0.020$0.030$0.050128K
26Nomic Embed Text v1.5tinfoil$0.050Unknown$0.0508K
27Meta Llama 3.1 8B Instruct TurboMeta$0.020$0.030$0.050128K
28Mistral NemoMistral$0.020$0.040$0.060128K
29Gemma 3n 4BGoogle$0.020$0.040$0.06033K
30BGE Base EN v1.5cloudflare-ai-gateway$0.067Unknown$0.067128K
31Meta-Llama-3-8B-InstructMeta$0.030$0.040$0.0708K
32Llama Guard 3 8BMeta$0.020$0.060$0.080131K
33Ministral 3B (latest)Mistral$0.040$0.040$0.080128K
34Ministral 3Bazure$0.040$0.040$0.080128K
35Ministral 3Bazure-cognitive-services$0.040$0.040$0.080128K
36Llama 3 8B LunarisMeta$0.040$0.050$0.0908K
37GTE Large (v1.5)digitalocean$0.090Unknown$0.0908K
38Llama-3.2-11B-Vision-InstructMeta$0.049$0.049$0.098128K
39Mistral EmbedMistral$0.100Unknown$0.1008K
40text-embedding-ada-002OpenAI$0.100Unknown$0.1008K
41L3 8B Stheno V3.2novita-ai$0.050$0.050$0.1008K
42Sao10k L3 8B Lunaris novita-ai$0.050$0.050$0.1008K
43text-embedding-ada-002azure$0.100Unknown$0.1008K
44text-embedding-ada-002azure-cognitive-services$0.100Unknown$0.1008K
45Gemma 3 4B ITGoogle$0.040$0.080$0.120128K
46MythoMax 13Bkilo$0.060$0.060$0.1204K
47MythoMax 13Bopenrouter$0.060$0.060$0.1204K
48Sarvam 30Bfastrouter$0.020$0.100$0.120128K
49IBM Granite 4.0 H Microcloudflare-ai-gateway$0.017$0.110$0.127128K
50IBM: Granite 4.0 Microkilo$0.017$0.110$0.127131K
51Granite 4.0 H Microcloudflare-workers-ai$0.017$0.112$0.129131K
52Granite 4.0 Microopenrouter$0.017$0.112$0.129131K
53text-embedding-3-largeOpenAI$0.130Unknown$0.1308K
54Llama 3.1 8BMeta$0.050$0.080$0.130131K
55text-embedding-3-largeazure$0.130Unknown$0.1308K
56text-embedding-3-largeazure-cognitive-services$0.130Unknown$0.1308K
57Sarvam 30Bnano-gpt$0.028$0.111$0.13966K
58Google Gemma 3 27B InstructGoogle$0.030$0.110$0.140203K
59baichuan-m2-32bnovita-ai$0.070$0.070$0.140131K
60Model Routerazure$0.140Unknown$0.140128K
61Model Routerazure-cognitive-services$0.140Unknown$0.140128K
62Google Gemma 3 12BGoogle$0.050$0.100$0.150131K
63Gemini Embedding 001Google$0.150Unknown$0.1502K
64LiquidAI: LFM2-24B-A2Bkilo$0.030$0.120$0.15033K
65LFM2-24B-A2Btogetherai$0.030$0.120$0.15033K
66LFM2-24B-A2Bopenrouter$0.030$0.120$0.15033K
67LFM2 24B A2Bnano-gpt$0.030$0.120$0.15033K
68IBM: Granite 4.1 8Bkilo$0.050$0.100$0.150131K
69Granite 4.1 8Bopenrouter$0.050$0.100$0.150131K
70Granite 4.1 8Bnano-gpt$0.050$0.100$0.150131K
71DeepSeek R1 Distill Llama 70BMeta$0.030$0.130$0.16033K
72gpt-oss-20bOpenAI$0.029$0.140$0.169128K
73R1 Distill Llama 70BDeepSeek$0.030$0.140$0.1708K
74Qwen3 235B A22B 2507Alibaba (Qwen)$0.071$0.100$0.171262K
75Nova Microvercel$0.035$0.140$0.175128K
76Amazon: Nova Micro 1.0kilo$0.035$0.140$0.175128K
77Nova Microamazon-bedrock$0.035$0.140$0.175128K
78Nova Micro 1.0openrouter$0.035$0.140$0.175128K
79Amazon Nova Micro 1.0nano-gpt$0.036$0.139$0.175128K
80gpt-oss-120bOpenAI$0.030$0.150$0.180128K
81Mythomax L2 13Bnovita-ai$0.090$0.090$0.1804K
82Phi 4 Multimodalnano-gpt$0.070$0.110$0.180128K
83Manta Mini 1.0nano-gpt$0.020$0.160$0.1808K
84Manta Flash 1.0nano-gpt$0.020$0.160$0.18016K
85Command R7BCohere$0.037$0.150$0.188128K
86Command R7B ArabicCohere$0.037$0.150$0.188128K
87Qwen3.5 9BAlibaba (Qwen)$0.040$0.150$0.190262K
88GPT OSS 20Bllmgateway$0.040$0.150$0.190131K
89Trinity Minivercel$0.045$0.150$0.195131K
90Arcee AI: Trinity Minikilo$0.045$0.150$0.195131K
91Trinity Miniopenrouter$0.045$0.150$0.195131K
92Trinity Mininano-gpt$0.045$0.150$0.195131K
93Trinity Miniclarifai$0.045$0.150$0.195131K
94Qwen3 235B A22B Instruct 2507Alibaba (Qwen)$0.100$0.100$0.200262K
95Qwen3-235B-A22B-Thinking-2507Alibaba (Qwen)$0.100$0.100$0.200262K
96nvidia-nemotron-nano-9b-v2NVIDIA$0.040$0.160$0.200131K
97Phi-4Microsoft$0.060$0.140$0.200128K
98Ministral 3 3B 2512Mistral$0.100$0.100$0.200131K
99Llama 3.2 1B InstructMeta$0.100$0.100$0.200128K
100Ministral 8B (latest)Mistral$0.100$0.100$0.200128K

Showing top 100 of 977. Use the full directory to see the rest.

Frequently asked questions

Why are output tokens more expensive than input tokens?

Generation is autoregressive — each output token requires a full forward pass conditioned on all previous tokens, which cannot be batched as efficiently as reading input tokens in parallel. The 3–5× premium most providers charge reflects this compute asymmetry.

What does 'per 1M tokens' mean in practice?

One million tokens is roughly 750,000 English words or 3,000 pages of standard text. A typical chatbot request uses 1,000–5,000 input tokens and 200–1,000 output tokens, so 1M tokens represents hundreds to thousands of requests depending on your workload.

Why are some models showing 'Unknown' instead of a price?

We deliberately do not coerce missing data to $0. 'Unknown' means the provider does not publish a public per-token rate — often models behind enterprise sales or invite-only access. Treating Unknown as free would push paid-but-unpriced models to the top of every cheap list.

How often do these prices change?

Vendor list-price moves are typically picked up within hours of an announcement, and our pipeline re-syncs daily. Each change is written to /changelog so you can audit historical pricing over time.

Does this include prompt caching or batch discounts?

No. The table shows standard headline rates only. Prompt caching (Anthropic, OpenAI) can reduce input cost by 50–90%. Batch API discounts (OpenAI) offer ~50% off for non-real-time workloads. Both are shown on each model's detail page.

Last updated:

Prices in USD per 1M tokens. Unknown means the provider does not publish per-token pricing.

Pricing and capabilities are refreshed daily and reconciled against each provider's official documentation. Always verify critical production decisions with the provider directly.