Preços · 2026-06-29

Preços de LLM

Custo por token nos principais provedores, normalizado em USD por 1M de tokens.

Cheapest LLM APIs

AI APIs ranked by input + output token cost.

OpenAI API Pricing

All OpenAI model prices in one table — GPT-5, GPT-5 Mini, embeddings and more.

Anthropic Claude Pricing

All Anthropic Claude prices — Opus, Sonnet, Haiku and prompt caching costs.

Sobre esta lista

Almost every commercial LLM provider charges separately for input tokens (the prompt you send) and output tokens (the response you get back). Output tokens typically cost 3–5× more than input tokens because generation is autoregressive — each token depends on the previous one and cannot be batched as efficiently.

The table below normalises all prices to USD per 1 million tokens. This is the industry-standard unit; per-1K or per-token rates are easy to misread by 1000×. The "Total" column sums input + output for a quick apples-to-apples comparison, but your real cost depends heavily on your input/output ratio.

What the table does NOT show

Prompt caching discounts — Anthropic and OpenAI offer cache_read rates 5–10× cheaper than standard input. If you reuse a long system prompt, caching dominates total cost.
Tiered pricing above 200K tokens — Google and Anthropic charge a premium for very long inputs. Each model detail page shows the >200K tier when applicable.
Reasoning tokens — thinking models (o-series, Claude Extended Thinking, DeepSeek R1) bill internal reasoning at output rates, often 2–3× the visible answer length.
Volume discounts and prepaid credits — most providers offer 10–30% off at scale. These are not reflected here.
Audio / image surcharges — multimodal inputs have separate per-image or per-second rates.

How to use this page

Find the cheapest model that meets your context window and capability requirements (tool calling, structured output, vision).
Open its detail page to check cache pricing, output limits and provider availability.
Use the pricing calculator to estimate monthly cost for your actual token volume.
Compare 2–3 finalists side-by-side on a comparison page.

Prices are refreshed daily. Models showing "Unknown" do not publish a public per-token rate — this usually means enterprise-only or invite-gated access. We deliberately do not show them as $0 to avoid misleading rankings.

#	Model	Vendor	Input / 1M	Output / 1M	Total	Context
1	BGE Reranker Base	cloudflare-ai-gateway	$0.003	Unknown	$0.003	128K
2	Voxtral Small 24B 2507	Mistral	$0.002	$0.002	$0.005	32K
3	All-MiniLM-L6-v2	digitalocean	$0.009	Unknown	$0.009	256
4	Multi-QA-mpnet-base-dot-v1	digitalocean	$0.009	Unknown	$0.009	512
5	Qwen3 Embedding 8B	Alibaba (Qwen)	$0.010	Unknown	$0.010	33K
6	Qwen3 Embedding 0.6B	Alibaba (Qwen)	$0.010	Unknown	$0.010	33K
7	Qwen3 Embedding 4B	Alibaba (Qwen)	$0.010	Unknown	$0.010	33K
8	BGE Reranker v2 M3	digitalocean	$0.010	Unknown	$0.010	8K
9	BGE M3	cloudflare-ai-gateway	$0.012	Unknown	$0.012	128K
10	PLaMo Embedding 1B	cloudflare-ai-gateway	$0.019	Unknown	$0.019	128K
11	Llama 3.2 1B Instruct	Meta	$0.010	$0.010	$0.020	60K
12	text-embedding-3-small	OpenAI	$0.020	Unknown	$0.020	8K
13	llama-3.1-nemotron-safety-guard-8b-v3	NVIDIA	$0.010	$0.010	$0.020	128K
14	Prompt Guard 2 86M	Meta	$0.010	$0.010	$0.020	512
15	Llama Prompt Guard 2 22M	Meta	$0.010	$0.010	$0.020	512
16	E5 Large v2	digitalocean	$0.020	Unknown	$0.020	512
17	BGE M3	digitalocean	$0.020	Unknown	$0.020	8K
18	BGE Small EN v1.5	cloudflare-ai-gateway	$0.020	Unknown	$0.020	128K
19	text-embedding-3-small	azure	$0.020	Unknown	$0.020	8K
20	text-embedding-3-small	azure-cognitive-services	$0.020	Unknown	$0.020	8K
21	DistilBERT SST-2 INT8	cloudflare-ai-gateway	$0.026	Unknown	$0.026	128K
22	Llama 3.2 3B Instruct	Meta	$0.020	$0.020	$0.040	80K
23	PaddleOCR-VL	novita-ai	$0.020	$0.020	$0.040	16K
24	Ling-2.6-flash	openrouter	$0.010	$0.030	$0.040	262K
25	Meta-Llama-3.1-8B-Instruct	Meta	$0.020	$0.030	$0.050	128K
26	Nomic Embed Text v1.5	tinfoil	$0.050	Unknown	$0.050	8K
27	Meta Llama 3.1 8B Instruct Turbo	Meta	$0.020	$0.030	$0.050	128K
28	Mistral Nemo	Mistral	$0.020	$0.040	$0.060	128K
29	Gemma 3n 4B	Google	$0.020	$0.040	$0.060	33K
30	BGE Base EN v1.5	cloudflare-ai-gateway	$0.067	Unknown	$0.067	128K
31	Meta-Llama-3-8B-Instruct	Meta	$0.030	$0.040	$0.070	8K
32	Llama Guard 3 8B	Meta	$0.020	$0.060	$0.080	131K
33	Ministral 3B (latest)	Mistral	$0.040	$0.040	$0.080	128K
34	Ministral 3B	azure	$0.040	$0.040	$0.080	128K
35	Ministral 3B	azure-cognitive-services	$0.040	$0.040	$0.080	128K
36	Llama 3 8B Lunaris	Meta	$0.040	$0.050	$0.090	8K
37	GTE Large (v1.5)	digitalocean	$0.090	Unknown	$0.090	8K
38	Llama-3.2-11B-Vision-Instruct	Meta	$0.049	$0.049	$0.098	128K
39	Mistral Embed	Mistral	$0.100	Unknown	$0.100	8K
40	text-embedding-ada-002	OpenAI	$0.100	Unknown	$0.100	8K
41	L3 8B Stheno V3.2	novita-ai	$0.050	$0.050	$0.100	8K
42	Sao10k L3 8B Lunaris	novita-ai	$0.050	$0.050	$0.100	8K
43	text-embedding-ada-002	azure	$0.100	Unknown	$0.100	8K
44	text-embedding-ada-002	azure-cognitive-services	$0.100	Unknown	$0.100	8K
45	Gemma 3 4B IT	Google	$0.040	$0.080	$0.120	128K
46	MythoMax 13B	kilo	$0.060	$0.060	$0.120	4K
47	MythoMax 13B	openrouter	$0.060	$0.060	$0.120	4K
48	Sarvam 30B	fastrouter	$0.020	$0.100	$0.120	128K
49	IBM Granite 4.0 H Micro	cloudflare-ai-gateway	$0.017	$0.110	$0.127	128K
50	IBM: Granite 4.0 Micro	kilo	$0.017	$0.110	$0.127	131K
51	Granite 4.0 H Micro	cloudflare-workers-ai	$0.017	$0.112	$0.129	131K
52	Granite 4.0 Micro	openrouter	$0.017	$0.112	$0.129	131K
53	text-embedding-3-large	OpenAI	$0.130	Unknown	$0.130	8K
54	Llama 3.1 8B	Meta	$0.050	$0.080	$0.130	131K
55	text-embedding-3-large	azure	$0.130	Unknown	$0.130	8K
56	text-embedding-3-large	azure-cognitive-services	$0.130	Unknown	$0.130	8K
57	Sarvam 30B	nano-gpt	$0.028	$0.111	$0.139	66K
58	Google Gemma 3 27B Instruct	Google	$0.030	$0.110	$0.140	203K
59	baichuan-m2-32b	novita-ai	$0.070	$0.070	$0.140	131K
60	Model Router	azure	$0.140	Unknown	$0.140	128K
61	Model Router	azure-cognitive-services	$0.140	Unknown	$0.140	128K
62	Google Gemma 3 12B	Google	$0.050	$0.100	$0.150	131K
63	Gemini Embedding 001	Google	$0.150	Unknown	$0.150	2K
64	LiquidAI: LFM2-24B-A2B	kilo	$0.030	$0.120	$0.150	33K
65	LFM2-24B-A2B	togetherai	$0.030	$0.120	$0.150	33K
66	LFM2-24B-A2B	openrouter	$0.030	$0.120	$0.150	33K
67	LFM2 24B A2B	nano-gpt	$0.030	$0.120	$0.150	33K
68	IBM: Granite 4.1 8B	kilo	$0.050	$0.100	$0.150	131K
69	Granite 4.1 8B	openrouter	$0.050	$0.100	$0.150	131K
70	Granite 4.1 8B	nano-gpt	$0.050	$0.100	$0.150	131K
71	DeepSeek R1 Distill Llama 70B	Meta	$0.030	$0.130	$0.160	33K
72	gpt-oss-20b	OpenAI	$0.029	$0.140	$0.169	128K
73	R1 Distill Llama 70B	DeepSeek	$0.030	$0.140	$0.170	8K
74	Qwen3 235B A22B 2507	Alibaba (Qwen)	$0.071	$0.100	$0.171	262K
75	Nova Micro	vercel	$0.035	$0.140	$0.175	128K
76	Amazon: Nova Micro 1.0	kilo	$0.035	$0.140	$0.175	128K
77	Nova Micro	amazon-bedrock	$0.035	$0.140	$0.175	128K
78	Nova Micro 1.0	openrouter	$0.035	$0.140	$0.175	128K
79	Amazon Nova Micro 1.0	nano-gpt	$0.036	$0.139	$0.175	128K
80	gpt-oss-120b	OpenAI	$0.030	$0.150	$0.180	128K
81	Mythomax L2 13B	novita-ai	$0.090	$0.090	$0.180	4K
82	Phi 4 Multimodal	nano-gpt	$0.070	$0.110	$0.180	128K
83	Manta Mini 1.0	nano-gpt	$0.020	$0.160	$0.180	8K
84	Manta Flash 1.0	nano-gpt	$0.020	$0.160	$0.180	16K
85	Command R7B	Cohere	$0.037	$0.150	$0.188	128K
86	Command R7B Arabic	Cohere	$0.037	$0.150	$0.188	128K
87	Qwen3.5 9B	Alibaba (Qwen)	$0.040	$0.150	$0.190	262K
88	GPT OSS 20B	llmgateway	$0.040	$0.150	$0.190	131K
89	Trinity Mini	vercel	$0.045	$0.150	$0.195	131K
90	Arcee AI: Trinity Mini	kilo	$0.045	$0.150	$0.195	131K
91	Trinity Mini	openrouter	$0.045	$0.150	$0.195	131K
92	Trinity Mini	nano-gpt	$0.045	$0.150	$0.195	131K
93	Trinity Mini	clarifai	$0.045	$0.150	$0.195	131K
94	Qwen3 235B A22B Instruct 2507	Alibaba (Qwen)	$0.100	$0.100	$0.200	262K
95	Qwen3-235B-A22B-Thinking-2507	Alibaba (Qwen)	$0.100	$0.100	$0.200	262K
96	nvidia-nemotron-nano-9b-v2	NVIDIA	$0.040	$0.160	$0.200	131K
97	Phi-4	Microsoft	$0.060	$0.140	$0.200	128K
98	Ministral 3 3B 2512	Mistral	$0.100	$0.100	$0.200	131K
99	Llama 3.2 1B Instruct	Meta	$0.100	$0.100	$0.200	128K
100	Ministral 8B (latest)	Mistral	$0.100	$0.100	$0.200	128K

Showing top 100 of 977. Use the full directory to see the rest.

Frequently asked questions

Why are output tokens more expensive than input tokens?

Generation is autoregressive — each output token requires a full forward pass conditioned on all previous tokens, which cannot be batched as efficiently as reading input tokens in parallel. The 3–5× premium most providers charge reflects this compute asymmetry.

What does 'per 1M tokens' mean in practice?

One million tokens is roughly 750,000 English words or 3,000 pages of standard text. A typical chatbot request uses 1,000–5,000 input tokens and 200–1,000 output tokens, so 1M tokens represents hundreds to thousands of requests depending on your workload.

Why are some models showing 'Unknown' instead of a price?

We deliberately do not coerce missing data to $0. 'Unknown' means the provider does not publish a public per-token rate — often models behind enterprise sales or invite-only access. Treating Unknown as free would push paid-but-unpriced models to the top of every cheap list.

How often do these prices change?

Vendor list-price moves are typically picked up within hours of an announcement, and our pipeline re-syncs daily. Each change is written to /changelog so you can audit historical pricing over time.

Does this include prompt caching or batch discounts?

No. The table shows standard headline rates only. Prompt caching (Anthropic, OpenAI) can reduce input cost by 50–90%. Batch API discounts (OpenAI) offer ~50% off for non-real-time workloads. Both are shown on each model's detail page.

Pricing deep-dives

Tools

Learn more

Last updated: 2026-06-29

Prices in USD per 1M tokens. Unknown means the provider does not publish per-token pricing.

Pricing and capabilities are refreshed daily and reconciled against each provider's official documentation. Always verify critical production decisions with the provider directly.