AI Workflow Cost

What will this AI workflow cost per user each month?

How this tool works

This tool models the main cost blocks in AI workflows: generation, retrieval, reranking, embeddings ingestion, vector lookups, cache savings, and infra overhead with a daily pricing snapshot. If your workflow is chat-first rather than retrieval-heavy, set retrieval and reranking layers to zero.

How It Works

  1. Choose provider/model and set request plus token behavior assumptions.
  2. Compute each cost component independently, then sum to total cost per user/month.
  3. Derive break-even price and gross margin at your current price point.

Formulas

embedding_ingestion_share = embedding_ingestion_monthly / monthly_active_users

cost_per_user_month = generation + retrieval + reranking + embedding_ingestion_share + vector_db + cache + infra

gross_margin_pct = (price_per_user_month - cost_per_user_month) / price_per_user_month * 100

Assumptions and Units

  • Currency: USD
  • Token unit: token
  • Cache component is negative when savings reduce total cost
  • Embedding refresh is amortized across active users before it appears in per-user totals
  • Pricing source: daily pricing snapshot in repo, no runtime scraping

Example Scenario

For a chat agent, internal AI assistant, docs assistant, or support workflow, compare your current seat price against calculated break-even and sensitivity to the top cost drivers.

FAQs

What should I optimize first? Start with the highest sensitivity drivers before cutting model quality or user experience, especially if retrieval and reranking are optional for your workflow.

Can I use this for LLM pricing decisions? Yes. It is designed for deterministic LLM workflow cost and margin decisions with explicit assumptions.

Related resources: Break-even Price, Compare Model Costs, How Much Does an AI Agent Cost?, What Is an AI Agent?, What Is RAG?, RAG Cost Components Explained, How Many Tokens Per Request?, What Cache Hit Rate Means for RAG.

Pricing snapshot: 2026-04-17Provider: OpenAIModel: GPT-5 Mini

Decision Signal

Baseline

Baseline preview margin is 91.4%.

High margin here is partly driven by the sample price. Check your own price.

Step 1 Provider and Model

Switch model assumptions using prices from the selected snapshot.

Step 2 Quick Mode

Use plain-language assumptions first. Open Advanced assumptions only if needed.
Starting pointApply the generic baseline or a more conservative downside scenario. If neither is selected, you are working from custom inputs.

Baseline starts from the generic sample scenario before workflow-specific presets.

Quick workflowStart with the closest workflow shape, then fine-tune the assumptions below.

Step 3 Advanced Assumptions

Tune retrieval, reranking, embeddings, vector, caching, and infra.
Show advanced inputs

Only adjust these once your Quick Mode assumptions feel realistic.

Scenario actions

Copy scenario URL

Paste into ChatGPT or Claude, or share with a teammate.

Save and track this scenario

Track pricing drift on this scenario and get an email if the latest result changes.

How tracking works

After you click Save and track, we carry this exact calculator state into the tracked-scenarios page so you can sign in and confirm the save.

We save your assumptions and the pricing snapshot used for this result.

When a newer pricing snapshot lands, we recompute the same scenario, show what changed, and email you if the latest result moved.

1 tracked scenario free, then $12/mo or $120/yr for up to 25 tracked scenarios.

Cost per user/month

$2.487

Gross margin

91.4%

Estimated monthly AI cost

$1,243.52

Estimated monthly gross profit

$13,256.48

Top Cost Drivers

Most sensitive variables when each is moved up by 10%.
Requests Per User Month10.0%
Rerank Docs9.7%
Output Tokens0.2%

Totals

Summary metrics for monthly unit economics and margin.
Cost per request
$0.03109
Cost per user/month
$2.487
Gross margin %
91.4%
Break-even price
$2.487
Cost per request$0.03109
Cost per user/month$2.487
Gross margin %91.4%
Break-even price$2.487

Component Breakdown (USD/user/month)

Each cost component is computed independently and summed.

Largest cost block: reranking, not generation.

GenerationModel input/output token spend for requests.
$0.052
RetrievalExtra model input spend from retrieved context chunks.
$0.0144
RerankingReranker cost based on docs scored per request.
$2.4
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.
$0
Vector DbVector database query cost across all requests.
$0.0006
CacheSavings from cache hits. Negative means lower total cost.
$-0
InfraNon-model infra overhead per request.
$0.02
GenerationModel input/output token spend for requests.$0.052
RetrievalExtra model input spend from retrieved context chunks.$0.0144
RerankingReranker cost based on docs scored per request.$2.4
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.$0
Vector DbVector database query cost across all requests.$0.0006
CacheSavings from cache hits. Negative means lower total cost.$-0
InfraNon-model infra overhead per request.$0.02
Sensitivity RankingChange in total cost when one variable is increased by 10%.
VariableDelta cost %
Requests Per User MonthUser activity level per month.10.0%
Rerank DocsDocs reranked per request.9.7%
Output TokensGenerated tokens per request.0.2%
Retrieved ChunksRetrieved chunk count per request.0.1%
Tokens Per ChunkAverage chunk size in tokens.0.1%
Input TokensPrompt-side tokens per request.0.0%
Vector Queries Per RequestVector query count per request.0.0%
Monthly Active UsersActive-user estimate used to amortize fixed monthly embedding refresh.-0.0%
Cache Hit RateFraction of requests served by cache.0.0%

Assumptions and Units

Explicit assumptions to keep outputs reproducible and auditable.
  • CurrencyUSD
  • Token unittoken
  • Pricing snapshot2026-04-17
  • Selected model rowOpenAI/GPT-5 Mini
  • Volume basisBusiness totals and fixed monthly terms use monthly active users as the denominator
  • Embedding refreshAmortized per user from the fixed monthly corpus refresh term
  • Cache componentNegative value means cost savings

Recommended Next Step

Use these links to lower top cost drivers without guessing.

Optimize the biggest modeled cost driver first. Compare infra only after model, retrieval, reranking, or context changes stop being the better lever.

Compare infra providers

View Infra Recommendations

Sources and Snapshot

Pricing comes from the current dated snapshot.

Active Pricing Row

Selected model

OpenAI / GPT-5 Mini

  • Input tokens$0.25 / 1M
  • Output tokens$2 / 1M

Shared retrieval defaults

  • Embedding input$0.02 / 1M
  • Rerank docs$1 / 1K

Continue Analysis

Move to the next tool or guide without losing your current scenario.

Switch tools

Read guides