Loading...
Loading...
Paste your prompt, estimate tokens, and compare API costs across 18 models from OpenAI, Anthropic, Google, DeepSeek, and Meta.
Pricing verified as of March 2026 from official provider pages.
Output tokens set to match input estimate (0 tokens).
Adjusts the thinking-token multiplier for reasoning models (o3, o4-mini, DeepSeek Reasoner).
| Model | Provider | Context | Total Cost |
|---|---|---|---|
| GPT-5.41M experimental | OpenAI | 272K | $0.00 |
| GPT-5.4 Mini | OpenAI | 128K | $0.00 |
| GPT-5 | OpenAI | 128K | $0.00 |
| GPT-5 Mini | OpenAI | 128K | $0.00 |
| GPT-4.1 | OpenAI | 1.0M | $0.00 |
| GPT-4.1 Mini | OpenAI | 1.0M | $0.00 |
| GPT-4.1 Nano | OpenAI | 1.0M | $0.00 |
| GPT-4o | OpenAI | 128K | $0.00 |
| GPT-4o Mini | OpenAI | 128K | $0.00 |
| o4-miniReasoning | OpenAI | 200K | $0.00 |
| o3Reasoning | OpenAI | 200K | $0.00 |
| o3-miniReasoning | OpenAI | 200K | $0.00 |
| Claude Opus 4.6 | Anthropic | 200K | $0.00 |
| Claude Sonnet 4.6 | Anthropic | 200K | $0.00 |
| Claude Haiku 4.5 | Anthropic | 200K | $0.00 |
| Gemini 3.1 ProPreview | 1M | $0.00 | |
| Gemini 3 Flash | 1M | $0.00 | |
| Gemini 3.1 Flash-LitePreview | 1M | $0.00 | |
| Gemini 2.5 ProThinking | 1.0M | $0.00 | |
| Gemini 2.5 FlashThinking | 1.0M | $0.00 | |
| Gemini 2.5 Flash-Lite | 1.0M | $0.00 | |
| DeepSeek V4Flagship | DeepSeek | 164K | $0.00 |
| DeepSeek V3.2 | DeepSeek | 164K | $0.00 |
| DeepSeek ReasonerReasoning | DeepSeek | 164K | $0.00 |
| Llama 4 MaverickOpen source | Meta (Groq) | 1M | $0.00 |
| Llama 4 ScoutOpen source, 10M ctx | Meta (Groq) | 10M | $0.00 |
Tokens are not words. LLMs process text as tokens, which are subword units that may be full words, word fragments, or individual characters.
On average, 1 token ≈ 4 characters or ≈ 0.75 words in English. Code uses ~3.5 chars/token. CJK/Arabic text uses ~1.5 tokens per character.
Different tokenizers: OpenAI uses tiktoken (BPE), Anthropic and Google use their own proprietary tokenizers. Token counts can vary ~10-15% between models for the same text.
1. Be concise: remove filler words and redundancy from prompts.
2. Use system prompts wisely: they count as input tokens on every API call.
3. Set max_tokens: cap output length in API calls to control costs.
4. Use cheaper models for simple tasks (GPT-4o Mini, Haiku, Flash).
5. Batch API: most providers offer 50% savings for async processing.
6. Prompt caching: reuse cached prompts for 50-90% input savings.
Pricing last updated: March 2026. Prices may change - always verify with official provider pricing pages before production use.
LLM providers use tokenizers (like tiktoken for OpenAI) that split text into subword units called tokens. On average, 1 token is about 4 characters or 0.75 words for English text. Code and non-Latin scripts (CJK, Arabic, Hebrew) use more tokens per character. This tool uses content-aware heuristics to estimate token counts.
Reasoning models like o3, o4-mini, and DeepSeek Reasoner generate internal chain-of-thought tokens before producing the final answer. These thinking tokens are billed as output tokens, significantly increasing the effective cost. The Reasoning Effort toggle (Low 1.5x, Medium 3x, High 5x) lets you estimate this overhead.
A context window is the maximum number of tokens an LLM can process in a single request, including both input and output tokens. For example, GPT-4o has a 128K token context window, while Claude Sonnet 4.6 supports 200K tokens. Exceeding the context window causes the request to fail.
To reduce costs: (1) Use smaller, cheaper models for simple tasks, (2) Minimize prompt length by removing unnecessary context, (3) Set lower max output tokens, (4) Cache frequent responses, (5) Use batch APIs for non-real-time workloads, (6) Consider open-source models like Llama for high-volume use cases.