What does this cache tool report?

It reports savings per user, monthly aggregate savings, cache-savings delta, and break-even delta between baseline and target hit rates for a workflow.

Can cache improvements increase costs?

No when hit rate rises, but stale cache policies can reduce quality, so pair this with freshness checks.

Cache Savings Simulator

Pricing snapshot: 2026-07-20Provider: OpenAIModel: GPT-5 Mini

Step 1 Provider and Model

ProvideriModeli

Step 2 Quick Mode

Use-case preseti

Estimate whether cache quality and freshness work can pay back this quarter.

Requests per user / monthiExpected user activity each month.Base prompt tokens / requestiNon-retrieval prompt tokens per request.Output tokens / requestiAverage response tokens.Candidate cache hit rate (0 to 0.99)iTarget hit rate after cache improvements.Baseline cache hit rate (0 to 0.99)iMonthly active usersi

Optional Advanced assumptions

Show advanced inputs

Retrieved chunks / requestiAverage chunk count in retrieved context.Tokens per chunkiAverage chunk size in tokens.Rerank docs / requestiDocs scored by reranker each request.Embedding ingestion tokens / monthiMonthly corpus updates re-embedded.Vector queries / requestiVector DB lookups each request.Vector cost / query (USD)iAverage per-query vector DB cost.Infra cost / request (USD)iNon-model compute/network overhead.Price per user / month (USD)iCurrent list price for margin context.

Scenario actions

Copy scenario URL

Paste into ChatGPT or Claude, or share with a teammate.

Save and track this scenario

Track pricing drift on this scenario and get an email if the latest result changes.

How tracking works

After you click Save and track, we carry this exact calculator state into the tracked-scenarios page so you can sign in and confirm the save.

We save your assumptions and the pricing snapshot used for this result.

When a newer pricing snapshot lands, we recompute the same scenario, show what changed, and email you if the latest result moved.

1 tracked scenario free, then $12/mo or $120/yr for up to 25 tracked scenarios.

Headline metric

Cache plan saves money

Savings per user / month: $0.1822

Hit-rate lift: 7.00 points (18.0% to 25.0%).

Savings per user / month

$0.1822

Monthly savings

$118.45

Cache savings delta

+$0.1822

Equivalent free requests

10.2

Totals

Cost per request

Baseline: $0.01779
Candidate: $0.01627
Delta: -$0.00152

Cost per user/month

Baseline: $2.1348
Candidate: $1.9526
Delta: -$0.1822

Gross margin %

Baseline: 95.6%
Candidate: 96.0%
Delta: +0.4%

Break-even price

Baseline: $2.1348
Candidate: $1.9526
Delta: -$0.1822

Metric	Baseline	Candidate	Delta
Cost per request	$0.01779	$0.01627	-$0.00152
Cost per user/month	$2.1348	$1.9526	-$0.1822
Gross margin %	95.6%	96.0%	+0.4%
Break-even price	$2.1348	$1.9526	-$0.1822

Component Breakdown

Generationi

Baseline: $0.114
Candidate: $0.114
Delta: $0

Retrievali

Baseline: $0.0396
Candidate: $0.0396
Delta: $0

Rerankingi

Baseline: $2.4
Candidate: $2.4
Delta: $0

Embeddings Ingestioni

Baseline: $0
Candidate: $0
Delta: $0

Vector Dbi

Baseline: $0.0018
Candidate: $0.0018
Delta: $0

Cachei

Baseline: $-0.4686
Candidate: $-0.6508
Delta: -$0.1822

Infrai

Baseline: $0.048
Candidate: $0.048
Delta: $0

Component	Baseline	Candidate	Delta
Generationi	$0.114	$0.114	$0
Retrievali	$0.0396	$0.0396	$0
Rerankingi	$2.4	$2.4	$0
Embeddings Ingestioni	$0	$0	$0
Vector Dbi	$0.0018	$0.0018	$0
Cachei	$-0.4686	$-0.6508	-$0.1822
Infrai	$0.048	$0.048	$0

Sensitivity Rankingi

Variable	Cost delta %
Requests Per User Monthi	10.00%
Rerank Docsi	9.22%
Cache Hit Ratei	-3.33%
Output Tokensi	0.32%
Retrieved Chunksi	0.15%
Tokens Per Chunki	0.15%
Input Tokensi	0.12%
Vector Queries Per Requesti	0.01%
Monthly Active Usersi	-0.00%

Assumptions and Units

CurrencyUSD
Token unittoken
Pricing snapshot2026-07-20
Selected model rowOpenAI/GPT-5 Mini
Comparison ruleOnly cache hit rate changes; other usage assumptions stay shared
Volume basisMonthly savings and fixed monthly terms use monthly active users as the denominator

Recommended Next Step

Validate provider constraints and cache implementation path, then re-check assumptions.

Cache tuning references

What Cache Hit Rate Means for RAG RAG Cost Components Explained

Compare infra providers

View Infra Recommendations

Sources and Snapshot

Active Pricing Row

Candidate

OpenAI / GPT-5 Mini

Input tokens$0.25 / 1M
Output tokens$2 / 1M

Shared retrieval defaults

Embedding input$0.02 / 1M
Rerank docs$1 / 1K

Snapshot date: 2026-07-20
Source links and update notes: Pricing Snapshot Reference

Continue Analysis

Switch tools

Read guides

Cache Savings

How It Works

Formula

Assumptions and Units

Step 1 Provider and Model

Step 2 Quick Mode

Optional Advanced assumptions

Scenario actions

Copy scenario URL

Save and track this scenario

Headline metric

Totals

Component Breakdown

Assumptions and Units

Recommended Next Step

Sources and Snapshot

Continue Analysis