What does this chunk-impact simulator estimate?

It compares baseline and candidate chunk assumptions and reports retrieval token delta, per-user cost delta, and break-even delta for a retrieval-heavy workflow.

Does lower chunk count always lower total cost?

Usually yes for token spend, but validate answer quality and recall before shipping reductions.

RAG Retrieval Cost Calculator

Pricing snapshot: 2026-06-14Provider: OpenAIModel: GPT-5 Mini

Step 1 Provider and Model

ProviderModel

Step 2 Quick Mode

Use-case preset

Evaluate chunk compression before changing reranking or model tier.

Requests per user / monthExpected user activity each month.Base prompt tokens / requestNon-retrieval prompt tokens before chunk context.Output tokens / requestAverage response length per request.Price per user / month (USD)Current list price for margin context.Monthly active usersUsed for monthly impact context in results.

Optional Advanced assumptions

Show advanced inputs

Baseline retrieved chunks / requestBaseline tokens per chunkCandidate retrieved chunks / requestChunk count you want to test.Candidate tokens per chunkChunk size you want to test.Rerank docs / requestDocs scored by reranker each request.Embedding ingestion tokens / monthMonthly corpus updates re-embedded.Vector queries / requestVector DB lookups each request.Vector cost / query (USD)Average per-query vector DB cost.Infra cost / request (USD)Non-model compute/network overhead.Cache hit rate (0 to 0.99)Share of requests served from cache.

Scenario actions

Copy scenario URL

Paste into ChatGPT or Claude, or share with a teammate.

Save and track this scenario

Track pricing drift on this scenario and get an email if the latest result changes.

How tracking works

After you click Save and track, we carry this exact calculator state into the tracked-scenarios page so you can sign in and confirm the save.

We save your assumptions and the pricing snapshot used for this result.

When a newer pricing snapshot lands, we recompute the same scenario, show what changed, and email you if the latest result moved.

1 tracked scenario free, then $12/mo or $120/yr for up to 25 tracked scenarios.

Headline metric

Candidate chunk plan lowers cost

Total cost delta per user / month: $-0.0081

Candidate retrieval tokens / request: 1,320 vs baseline 1,680.

Cost delta / user / month

$-0.0081

Retrieval token delta / request

-360

Break-even delta

$-0.0081

Monthly cost delta

$-5.26

Totals

Cost per request

Baseline: $0.01634
Candidate: $0.01627
Delta: -$0.00007

Cost per user/month

Baseline: $1.9607
Candidate: $1.9526
Delta: -$0.0081

Gross margin %

Baseline: 96.0%
Candidate: 96.0%
Delta: +0.0%

Break-even price

Baseline: $1.9607
Candidate: $1.9526
Delta: -$0.0081

Metric	Baseline	Candidate	Delta
Cost per request	$0.01634	$0.01627	-$0.00007
Cost per user/month	$1.9607	$1.9526	-$0.0081
Gross margin %	96.0%	96.0%	+0.0%
Break-even price	$1.9607	$1.9526	-$0.0081

Component Breakdown

Generation

Baseline: $0.114
Candidate: $0.114
Delta: $0

Retrieval

Baseline: $0.0504
Candidate: $0.0396
Delta: -$0.0108

Reranking

Baseline: $2.4
Candidate: $2.4
Delta: $0

Embeddings Ingestion

Baseline: $0
Candidate: $0
Delta: $0

Vector Db

Baseline: $0.0018
Candidate: $0.0018
Delta: $0

Cache

Baseline: $-0.6536
Candidate: $-0.6508
Delta: +$0.0027

Infra

Baseline: $0.048
Candidate: $0.048
Delta: $0

Component	Baseline	Candidate	Delta
Generation	$0.114	$0.114	$0
Retrieval	$0.0504	$0.0396	-$0.0108
Reranking	$2.4	$2.4	$0
Embeddings Ingestion	$0	$0	$0
Vector Db	$0.0018	$0.0018	$0
Cache	$-0.6536	$-0.6508	+$0.0027
Infra	$0.048	$0.048	$0

Sensitivity Ranking

Variable	Cost delta %
Requests Per User Month	10.00%
Rerank Docs	9.22%
Cache Hit Rate	-3.33%
Output Tokens	0.32%
Retrieved Chunks	0.15%
Tokens Per Chunk	0.15%
Input Tokens	0.12%
Vector Queries Per Request	0.01%
Monthly Active Users	-0.00%

Assumptions and Units

CurrencyUSD
Token unittoken
Pricing snapshot2026-06-14
Selected model rowOpenAI/GPT-5 Mini
Comparison ruleOnly retrieval chunk assumptions change; non-retrieval inputs stay shared
Volume basisFixed monthly terms and business totals use monthly active users as the denominator

Recommended Next Step

Validate infra and retrieval assumptions, then confirm quality on sampled traffic.

Chunking Playbook

How To Choose Chunk Size and Chunk Count RAG Cost Components Explained

Compare infra providers

View Infra Recommendations

Sources and Snapshot

Active Pricing Row

Candidate

OpenAI / GPT-5 Mini

Input tokens$0.25 / 1M
Output tokens$2 / 1M

Shared retrieval defaults

Embedding input$0.02 / 1M
Rerank docs$1 / 1K

Snapshot date: 2026-06-14
Source links and update notes: Pricing Snapshot Reference

Continue Analysis

Switch tools

Read guides

Retrieval Cost

How It Works

Formula

Assumptions and Units

Step 1 Provider and Model

Step 2 Quick Mode

Optional Advanced assumptions

Scenario actions

Copy scenario URL

Save and track this scenario

Headline metric

Totals

Component Breakdown

Assumptions and Units

Recommended Next Step

Sources and Snapshot

Continue Analysis