Cache Savings

How much can caching save per user and per month?

How this tool works

This simulator compares baseline and candidate cache-hit assumptions with identical workload inputs, then estimates per-user savings, monthly savings, and break-even impact for a retrieval-heavy workflow.

How It Works

  1. Set provider/model plus shared usage assumptions.
  2. Set baseline cache hit rate and candidate target hit rate.
  3. Review savings, breakdown changes, and sensitivity ranking.

Formula

savings_per_user_month = cost_baseline - cost_candidate

monthly_savings = savings_per_user_month * monthly_active_users

Assumptions and Units

  • Currency: USD
  • Token unit: token
  • Cache hit rates are bounded from 0 to 0.99
  • Pricing source: daily pricing snapshot in repo, no runtime scraping

Related resources: Retrieval Cost, Rerank Cost, Prompt Overhead, RAG or Long Prompt, Indexing Cost, What Cache Hit Rate Means for RAG, RAG Cost Components Explained, How To Estimate Requests Per User/Month.

Pricing snapshot: 2026-04-20Provider: OpenAIModel: GPT-5 Mini

Step 1 Provider and Model

Select the pricing row used for baseline and candidate cache scenarios.

Step 2 Quick Mode

Set baseline usage and cache assumptions before advanced tuning.

Estimate whether cache quality and freshness work can pay back this quarter.

Step 3 Advanced Assumptions

Tune retrieval and infra assumptions after Quick Mode is calibrated.
Show advanced inputs

Scenario actions

Copy scenario URL

Paste into ChatGPT or Claude, or share with a teammate.

Save and track this scenario

Track pricing drift on this scenario and get an email if the latest result changes.

How tracking works

After you click Save and track, we carry this exact calculator state into the tracked-scenarios page so you can sign in and confirm the save.

We save your assumptions and the pricing snapshot used for this result.

When a newer pricing snapshot lands, we recompute the same scenario, show what changed, and email you if the latest result moved.

1 tracked scenario free, then $12/mo or $120/yr for up to 25 tracked scenarios.

Headline metric

Cache plan saves money

Savings per user / month: $0.1822

Hit-rate lift: 7.00 points (18.0% to 25.0%).

Savings per user / month

$0.1822

Monthly savings

$118.45

Cache savings delta

+$0.1822

Equivalent free requests

10.2

Totals

Baseline vs candidate totals under the same cache-hit assumptions.
Cost per request
Baseline
$0.01779
Candidate
$0.01627
Delta
-$0.00152
Cost per user/month
Baseline
$2.1348
Candidate
$1.9526
Delta
-$0.1822
Gross margin %
Baseline
95.6%
Candidate
96.0%
Delta
+0.4%
Break-even price
Baseline
$2.1348
Candidate
$1.9526
Delta
-$0.1822
MetricBaselineCandidateDelta
Cost per request$0.01779$0.01627-$0.00152
Cost per user/month$2.1348$1.9526-$0.1822
Gross margin %95.6%96.0%+0.4%
Break-even price$2.1348$1.9526-$0.1822

Component Breakdown

Baseline and candidate components are computed independently, then differenced.
GenerationModel input/output token spend for requests.
Baseline
$0.114
Candidate
$0.114
Delta
$0
RetrievalExtra model input spend from retrieved context chunks.
Baseline
$0.0396
Candidate
$0.0396
Delta
$0
RerankingReranker cost based on docs scored per request.
Baseline
$2.4
Candidate
$2.4
Delta
$0
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.
Baseline
$0
Candidate
$0
Delta
$0
Vector DbVector database query cost across all requests.
Baseline
$0.0018
Candidate
$0.0018
Delta
$0
CacheSavings from cache hits. Negative means lower total cost.
Baseline
$-0.4686
Candidate
$-0.6508
Delta
-$0.1822
InfraNon-model infra overhead per request.
Baseline
$0.048
Candidate
$0.048
Delta
$0
ComponentBaselineCandidateDelta
GenerationModel input/output token spend for requests.$0.114$0.114$0
RetrievalExtra model input spend from retrieved context chunks.$0.0396$0.0396$0
RerankingReranker cost based on docs scored per request.$2.4$2.4$0
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.$0$0$0
Vector DbVector database query cost across all requests.$0.0018$0.0018$0
CacheSavings from cache hits. Negative means lower total cost.$-0.4686$-0.6508-$0.1822
InfraNon-model infra overhead per request.$0.048$0.048$0
Sensitivity RankingDelta in total cost if one variable increases by 10%.
VariableCost delta %
Requests Per User MonthUser activity level per month.10.00%
Rerank DocsDocs reranked per request.9.22%
Cache Hit RateFraction of requests served by cache.-3.33%
Output TokensGenerated tokens per request.0.32%
Retrieved ChunksRetrieved chunk count per request.0.15%
Tokens Per ChunkAverage chunk size in tokens.0.15%
Input TokensPrompt-side tokens per request.0.12%
Vector Queries Per RequestVector query count per request.0.01%
Monthly Active UsersActive-user estimate used to amortize fixed monthly embedding refresh.-0.00%

Assumptions and Units

Explicit assumptions keep this comparison reproducible.
  • CurrencyUSD
  • Token unittoken
  • Pricing snapshot2026-04-20
  • Selected model rowOpenAI/GPT-5 Mini
  • Comparison ruleOnly cache hit rate changes; other usage assumptions stay shared
  • Volume basisMonthly savings and fixed monthly terms use monthly active users as the denominator

Recommended Next Step

Use this section to turn cache deltas into the next implementation checks.

Validate provider constraints and cache implementation path, then re-check assumptions.

Sources and Snapshot

Pricing comes from the current dated snapshot.

Active Pricing Row

Candidate

OpenAI / GPT-5 Mini

  • Input tokens$0.25 / 1M
  • Output tokens$2 / 1M

Shared retrieval defaults

  • Embedding input$0.02 / 1M
  • Rerank docs$1 / 1K