AI Workflow Cost Calculator

Model chat, agent, copilot, support, or retrieval workflows. Retrieval, reranking, cache, and infra are optional cost layers.

How this tool works

This tool models the main cost blocks in AI workflows: generation, retrieval, reranking, embeddings ingestion, vector lookups, cache savings, and infra overhead with a daily pricing snapshot. If your workflow is chat-first rather than retrieval-heavy, set retrieval and reranking layers to zero.

How It Works

  1. Choose provider/model and set request plus token behavior assumptions.
  2. Compute each cost component independently, then sum to total cost per user/month.
  3. Derive break-even price and gross margin at your current price point.

Formulas

embedding_ingestion_share = embedding_ingestion_monthly / monthly_active_users

cost_per_user_month = generation + retrieval + reranking + embedding_ingestion_share + vector_db + cache + infra

gross_margin_pct = (price_per_user_month - cost_per_user_month) / price_per_user_month * 100

Assumptions and Units

  • Currency: USD
  • Token unit: token
  • Cache component is negative when savings reduce total cost
  • Embedding refresh is amortized across active users before it appears in per-user totals
  • Pricing source: daily pricing snapshot in repo, no runtime scraping

Example Scenario

For a chat agent, internal copilot, docs assistant, or support workflow, compare your current seat price against calculated break-even and sensitivity to the top cost drivers.

FAQs

Can I share scenarios? Yes. Use the share URL to preserve assumptions, preset choice, and snapshot context.

What should I optimize first? Start with the highest sensitivity drivers before cutting model quality or user experience, especially if retrieval and reranking are optional for your workflow.

Can I use this for LLM pricing decisions? Yes. It is designed for deterministic LLM workflow cost and margin decisions with explicit assumptions.

Related resources: AI Break-even Price Calculator, LLM Model Cost Comparison, RAG Retrieval Cost Calculator, Reranking Cost Calculator, Cache Savings Simulator, Context Window Cost Calculator, RAG vs Long-Context Calculator, Embedding Ingestion Cost Calculator, How Much Does an AI Agent Cost?, What Is an AI Agent?, What Is RAG?, RAG Cost Components Explained, How Many Tokens Per Request?, What Cache Hit Rate Means for RAG.

Pricing snapshot: 2026-03-15Provider: OpenAIModel: GPT-5 Mini

Decision Signal

Baseline

Current gross margin under this sample scenario is 91.4%.

This sample assumes 500 active users, 80 requests/user/month, 600 prompt tokens, 250 output tokens, GPT-5 Mini, $29/user/month pricing, and optional retrieval layers still turned on.

High margin here is driven partly by the sample pricing assumption. Try your own price to validate realism.

Step 1 Provider and Model

Switch model assumptions using prices from the selected snapshot.

Step 2 Quick Mode

Use plain-language assumptions first. Open Advanced assumptions only if needed.
Starting pointApply the generic baseline or a conservative stress test. If neither is selected, you are working from custom inputs.

Baseline starts from the generic sample scenario before workflow-specific presets.

Quick workflowStart with the closest workflow shape, then fine-tune the assumptions below.

Step 3 Advanced Assumptions

Tune retrieval, reranking, embeddings, vector, caching, and infra.
Show advanced inputs

Only adjust these once your Quick Mode assumptions feel realistic.

Scenario Share URL

Share this link to load these exact assumptions. Pricing uses the published snapshot shown on the page.

Paste into ChatGPT or Claude to discuss this scenario.

Cost / user / month

$2.487

Gross margin

91.4%

Estimated monthly AI cost

$1,243.52

Estimated monthly gross profit

$13,256.48

Top Cost Drivers

Most sensitive variables when each is moved up by 10%.
Requests Per User Month10.0%
Rerank Docs9.7%
Output Tokens0.2%

Totals

Summary metrics for monthly unit economics and margin.

Cost per request

$0.03109

Cost per user/month

$2.487

Gross margin

91.4%

Break-even price

$2.487
Cost per request$0.03109
Cost per user/month$2.487
Gross margin %91.4%
Break-even price$2.487

Component Breakdown (USD/user/month)

Each cost component is computed independently and summed.

Largest cost block: reranking, not generation.

Generation

$0.052

Retrieval

$0.0144

Reranking

$2.4

Embeddings Ingestion

$0

Vector Db

$0.0006

Cache

$-0

Infra

$0.02
GenerationModel input/output token spend for requests.$0.052
RetrievalExtra model input spend from retrieved context chunks.$0.0144
RerankingReranker cost based on docs scored per request.$2.4
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.$0
Vector DbVector database query cost across all requests.$0.0006
CacheSavings from cache hits. Negative means lower total cost.$-0
InfraNon-model infra overhead per request.$0.02
Sensitivity RankingChange in total cost when one variable is increased by 10%.
VariableDelta cost %
Requests Per User MonthUser activity level per month.10.0%
Rerank DocsDocs reranked per request.9.7%
Output TokensGenerated tokens per request.0.2%
Retrieved ChunksRetrieved chunk count per request.0.1%
Tokens Per ChunkAverage chunk size in tokens.0.1%
Input TokensPrompt-side tokens per request.0.0%
Vector Queries Per RequestVector query count per request.0.0%
Monthly Active UsersActive-user estimate used to amortize fixed monthly embedding refresh.-0.0%
Cache Hit RateFraction of requests served by cache.0.0%

Assumptions and Units

Explicit assumptions to keep outputs reproducible and auditable.
  • CurrencyUSD
  • Token unittoken
  • Pricing snapshot2026-03-15
  • Selected model rowOpenAI/GPT-5 Mini
  • Monthly users500 active users for business totals and fixed-term amortization
  • Embedding refreshAmortized per user from the fixed monthly corpus refresh term
  • Cache componentNegative value means cost savings

Recommended Next Step

Use these links to lower top cost drivers without guessing.

Optimize the biggest modeled cost driver first. Compare infra only after model, retrieval, reranking, or context changes stop being the better lever.

Sources and Snapshot

Pricing comes from a daily snapshot generated by batch workflows.

Active Pricing Row

Selected model

OpenAI / GPT-5 Mini

  • Input tokens$0.25 / 1M
  • Output tokens$2 / 1M

Shared retrieval defaults

  • Embedding input$0.02 / 1M
  • Rerank docs$1 / 1K