Rerank Cost

What does reranking more results cost me?

How this tool works

This simulator runs the deterministic workflow economics model twice with shared assumptions and isolates what rerank-depth changes do to reranking cost, total cost per user, and break-even price.

How It Works

  1. Set provider/model plus workload assumptions used in both runs.
  2. Set baseline and candidate rerank docs per request.
  3. Compare reranking cost, total cost delta, and break-even delta.

Formula

reranking_cost_per_request = rerank_docs / 1000 * rerank_per_1k_docs

total_cost_delta = cost_candidate - cost_baseline

Assumptions and Units

  • Currency: USD
  • Rerank unit: documents scored per request
  • Baseline and candidate use the same non-rerank assumptions
  • Pricing source: daily pricing snapshot in repo, no runtime scraping

Related resources: Retrieval Cost, Cache Savings, Prompt Overhead, RAG or Long Prompt, Indexing Cost, AI Research Assistant Cost per Report, What Is Reranking in RAG?, RAG Cost Components Explained, Model Selection: Quality vs Unit Cost.

Pricing snapshot: 2026-04-20Provider: OpenAIModel: GPT-5 Mini

Step 1 Provider and Model

Choose the pricing row used for both baseline and candidate rerank assumptions.

Step 2 Quick Mode

Set the baseline workload first, then compare rerank depth assumptions.

Check whether a narrower rerank set can preserve quality before changing models.

Step 3 Advanced Assumptions

Adjust baseline rerank depth and other shared cost assumptions after Quick Mode is close.
Show advanced inputs

Scenario actions

Copy scenario URL

Paste into ChatGPT or Claude, or share with a teammate.

Save and track this scenario

Track pricing drift on this scenario and get an email if the latest result changes.

How tracking works

After you click Save and track, we carry this exact calculator state into the tracked-scenarios page so you can sign in and confirm the save.

We save your assumptions and the pricing snapshot used for this result.

When a newer pricing snapshot lands, we recompute the same scenario, show what changed, and email you if the latest result moved.

1 tracked scenario free, then $12/mo or $120/yr for up to 25 tracked scenarios.

Headline metric

Candidate rerank plan lowers cost

Total cost delta per user / month: $-0.72

Candidate rerank docs / request: 20 vs baseline 28. Reranking cost / request: $0.02 vs baseline $0.028.

Cost delta / user / month

$-0.72

Reranking cost delta

$-0.96

Break-even delta

$-0.72

Monthly cost delta

$-468

Totals

Baseline vs candidate totals under the same reranking assumptions.
Cost per request
Baseline
$0.02227
Candidate
$0.01627
Delta
-$0.006
Cost per user/month
Baseline
$2.6726
Candidate
$1.9526
Delta
-$0.72
Gross margin %
Baseline
94.5%
Candidate
96.0%
Delta
+1.5%
Break-even price
Baseline
$2.6726
Candidate
$1.9526
Delta
-$0.72
MetricBaselineCandidateDelta
Cost per request$0.02227$0.01627-$0.006
Cost per user/month$2.6726$1.9526-$0.72
Gross margin %94.5%96.0%+1.5%
Break-even price$2.6726$1.9526-$0.72

Component Breakdown

Baseline and candidate components are computed independently, then differenced.
GenerationModel input/output token spend for requests.
Baseline
$0.114
Candidate
$0.114
Delta
$0
RetrievalExtra model input spend from retrieved context chunks.
Baseline
$0.0396
Candidate
$0.0396
Delta
$0
RerankingReranker cost based on docs scored per request.
Baseline
$3.36
Candidate
$2.4
Delta
-$0.96
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.
Baseline
$0
Candidate
$0
Delta
$0
Vector DbVector database query cost across all requests.
Baseline
$0.0018
Candidate
$0.0018
Delta
$0
CacheSavings from cache hits. Negative means lower total cost.
Baseline
$-0.8908
Candidate
$-0.6508
Delta
+$0.24
InfraNon-model infra overhead per request.
Baseline
$0.048
Candidate
$0.048
Delta
$0
ComponentBaselineCandidateDelta
GenerationModel input/output token spend for requests.$0.114$0.114$0
RetrievalExtra model input spend from retrieved context chunks.$0.0396$0.0396$0
RerankingReranker cost based on docs scored per request.$3.36$2.4-$0.96
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.$0$0$0
Vector DbVector database query cost across all requests.$0.0018$0.0018$0
CacheSavings from cache hits. Negative means lower total cost.$-0.8908$-0.6508+$0.24
InfraNon-model infra overhead per request.$0.048$0.048$0
Sensitivity RankingDelta in total cost if one variable increases by 10%.
VariableCost delta %
Requests Per User MonthUser activity level per month.10.00%
Rerank DocsDocs reranked per request.9.22%
Cache Hit RateFraction of requests served by cache.-3.33%
Output TokensGenerated tokens per request.0.32%
Retrieved ChunksRetrieved chunk count per request.0.15%
Tokens Per ChunkAverage chunk size in tokens.0.15%
Input TokensPrompt-side tokens per request.0.12%
Vector Queries Per RequestVector query count per request.0.01%
Monthly Active UsersActive-user estimate used to amortize fixed monthly embedding refresh.-0.00%

Assumptions and Units

Explicit assumptions keep this comparison reproducible.
  • CurrencyUSD
  • Token unittoken
  • Rerank unitdocuments scored per request
  • Pricing snapshot2026-04-20
  • Selected model rowOpenAI/GPT-5 Mini
  • Comparison ruleOnly rerank depth changes; non-rerank inputs stay shared

Recommended Next Step

Use this section to translate rerank-depth changes into the next quality and infrastructure checks.

Validate retrieval quality and infra assumptions before increasing rerank depth on live traffic.

Sources and Snapshot

Pricing comes from the current dated snapshot.

Active Pricing Row

Candidate

OpenAI / GPT-5 Mini

  • Input tokens$0.25 / 1M
  • Output tokens$2 / 1M

Shared retrieval defaults

  • Embedding input$0.02 / 1M
  • Rerank docs$1 / 1K