Prompt Overhead

What does extra prompt and context text cost me?

How this tool works

This estimator compares baseline and target non-task prompt-token assumptions with shared workload inputs, then reports cost and break-even deltas from trimming prompt overhead in an agent or assistant flow. Bigger context windows do not make repeated prompt overhead free.

How It Works

  1. Set provider/model plus workload assumptions.
  2. Set baseline non-task prompt tokens and target trimmed value.
  3. Review cost delta, overhead-share shift, and monthly savings.

Formula

candidate_input_tokens = task_tokens + target_non_task_tokens

cost_delta = cost_candidate - cost_baseline

Assumptions and Units

  • Currency: USD
  • Token unit: token
  • Task tokens stay constant between baseline and candidate runs
  • Pricing source: daily pricing snapshot in repo, no runtime scraping

Related resources: Retrieval Cost, Rerank Cost, Cache Savings, RAG or Long Prompt, Indexing Cost, Context Bloat in RAG, How Many Tokens Per Request?, How To Choose Chunk Size and Chunk Count, RAG Cost Components Explained.

Pricing snapshot: 2026-04-20Provider: OpenAIModel: GPT-5 Mini

Step 1 Provider and Model

Select the pricing row used for baseline and candidate prompt-overhead scenarios.

Step 2 Quick Mode

Set baseline overhead assumptions before advanced tuning.

Measure overhead from repeated policies, role prompts, and long history blocks.

Step 3 Advanced Assumptions

Tune retrieval and infra assumptions after Quick Mode is calibrated.
Show advanced inputs

Scenario actions

Copy scenario URL

Paste into ChatGPT or Claude, or share with a teammate.

Save and track this scenario

Track pricing drift on this scenario and get an email if the latest result changes.

How tracking works

After you click Save and track, we carry this exact calculator state into the tracked-scenarios page so you can sign in and confirm the save.

We save your assumptions and the pricing snapshot used for this result.

When a newer pricing snapshot lands, we recompute the same scenario, show what changed, and email you if the latest result moved.

1 tracked scenario free, then $12/mo or $120/yr for up to 25 tracked scenarios.

Headline metric

Target prompt overhead lowers cost

Cost delta per user / month: $-0.0054

Overhead share moves from 42.0% to 23.7%.

Cost delta / user / month

$-0.0054

Monthly savings

$3.51

Input token delta / request

-240

Break-even delta

$-0.0054

Totals

Baseline vs candidate totals under the same prompt-overhead assumptions.
Cost per request
Baseline
$0.01627
Candidate
$0.01623
Delta
-$0.00004
Cost per user/month
Baseline
$1.9526
Candidate
$1.9472
Delta
-$0.0054
Gross margin %
Baseline
96.0%
Candidate
96.0%
Delta
+0.0%
Break-even price
Baseline
$1.9526
Candidate
$1.9472
Delta
-$0.0054
MetricBaselineCandidateDelta
Cost per request$0.01627$0.01623-$0.00004
Cost per user/month$1.9526$1.9472-$0.0054
Gross margin %96.0%96.0%+0.0%
Break-even price$1.9526$1.9472-$0.0054

Component Breakdown

Baseline and candidate components are computed independently, then differenced.
GenerationModel input/output token spend for requests.
Baseline
$0.114
Candidate
$0.1068
Delta
-$0.0072
RetrievalExtra model input spend from retrieved context chunks.
Baseline
$0.0396
Candidate
$0.0396
Delta
$0
RerankingReranker cost based on docs scored per request.
Baseline
$2.4
Candidate
$2.4
Delta
$0
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.
Baseline
$0
Candidate
$0
Delta
$0
Vector DbVector database query cost across all requests.
Baseline
$0.0018
Candidate
$0.0018
Delta
$0
CacheSavings from cache hits. Negative means lower total cost.
Baseline
$-0.6508
Candidate
$-0.649
Delta
+$0.0018
InfraNon-model infra overhead per request.
Baseline
$0.048
Candidate
$0.048
Delta
$0
ComponentBaselineCandidateDelta
GenerationModel input/output token spend for requests.$0.114$0.1068-$0.0072
RetrievalExtra model input spend from retrieved context chunks.$0.0396$0.0396$0
RerankingReranker cost based on docs scored per request.$2.4$2.4$0
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.$0$0$0
Vector DbVector database query cost across all requests.$0.0018$0.0018$0
CacheSavings from cache hits. Negative means lower total cost.$-0.6508$-0.649+$0.0018
InfraNon-model infra overhead per request.$0.048$0.048$0
Sensitivity RankingDelta in total cost if one variable increases by 10%.
VariableCost delta %
Requests Per User MonthUser activity level per month.10.00%
Rerank DocsDocs reranked per request.9.24%
Cache Hit RateFraction of requests served by cache.-3.33%
Output TokensGenerated tokens per request.0.32%
Retrieved ChunksRetrieved chunk count per request.0.15%
Tokens Per ChunkAverage chunk size in tokens.0.15%
Input TokensPrompt-side tokens per request.0.09%
Vector Queries Per RequestVector query count per request.0.01%
Monthly Active UsersActive-user estimate used to amortize fixed monthly embedding refresh.-0.00%

Assumptions and Units

Explicit assumptions keep this comparison reproducible.
  • CurrencyUSD
  • Token unittoken
  • Pricing snapshot2026-04-20
  • Selected model rowOpenAI/GPT-5 Mini
  • Comparison ruleTask tokens stay fixed; only non-task prompt overhead changes
  • Volume basisMonthly savings and fixed monthly terms use monthly active users as the denominator

Recommended Next Step

Use this section to turn overhead deltas into the next prompt and infrastructure checks.

Confirm infrastructure options and trim plan before applying prompt-template changes globally.

Sources and Snapshot

Pricing comes from the current dated snapshot.

Active Pricing Row

Candidate

OpenAI / GPT-5 Mini

  • Input tokens$0.25 / 1M
  • Output tokens$2 / 1M

Shared retrieval defaults

  • Embedding input$0.02 / 1M
  • Rerank docs$1 / 1K