Step 1 Baseline and Candidate Models
Set the current RAG model as baseline and the long-context alternative as candidate.Step 2 Quick Mode
Set shared workload assumptions, then size the long-context prompt.Compare wide docs retrieval against a larger single prompt before changing search architecture.
Step 3 Advanced Assumptions
Tune the baseline retrieval stack and candidate cache/infra only after Quick Mode is close.Show advanced inputs
Scenario Share URL
Share this link to load these exact assumptions. Pricing uses the published snapshot shown on the page.Paste into ChatGPT or Claude to discuss this scenario.
Decision Signal
Long-context path lowers modeled costSwitching to GPT-5.2 and removing the retrieval stack changes cost per user by -$0.2397 and margin by +0.8%.
RAG request input tokens
2,220Long-context prompt tokens
2,600Retrieval-stack delta
-$0.9807Monthly cost impact
-$191.73Top Cost Drivers
Baseline sensitivity: cost change when one variable is increased by 10%.Totals
Baseline vs candidate totals under the same task assumptions.| Metric | RAG baseline | Long context | Delta |
|---|---|---|---|
| Cost per request | $0.0108 | $0.00681 | -$0.00399 |
| Cost per user / month | $0.648 | $0.4084 | -$0.2397 |
| Gross margin % | 97.8% | 98.6% | +0.8% |
| Break-even price | $0.648 | $0.4084 | -$0.2397 |
| Monthly gross profit impact | At 800 active users | +$191.73 | |
Component Breakdown
Each component is computed independently, then compared across both architectures.| Component | RAG baseline | Long context | Delta |
|---|---|---|---|
| GenerationModel input/output token spend for requests. | $0.0435 | $0.483 | +$0.4395 |
| RetrievalExtra model input spend from retrieved context chunks. | $0.0198 | $0 | -$0.0198 |
| RerankingReranker cost based on docs scored per request. | $0.96 | $0 | -$0.96 |
| Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost. | $0 | $0 | -$0 |
| Vector DbVector database query cost across all requests. | $0.0009 | $0 | -$0.0009 |
| CacheSavings from cache hits. Negative means lower total cost. | $-0.3972 | $-0.0896 | +$0.3075 |
| InfraNon-model infra overhead per request. | $0.021 | $0.015 | -$0.006 |
Sensitivity RankingBaseline sensitivity: cost change when one variable is increased by 10%.
| Variable | Delta cost % |
|---|---|
| Requests Per User MonthUser activity level per month. | 10.0% |
| Rerank DocsDocs reranked per request. | 9.2% |
| Cache Hit RateFraction of requests served by cache. | -6.1% |
| Output TokensGenerated tokens per request. | 0.3% |
| Retrieved ChunksRetrieved chunk count per request. | 0.2% |
| Tokens Per ChunkAverage chunk size in tokens. | 0.2% |
| Input TokensPrompt-side tokens per request. | 0.1% |
| Vector Queries Per RequestVector query count per request. | 0.0% |
| Monthly Active UsersActive-user estimate used to amortize fixed monthly embedding refresh. | -0.0% |
Assumptions and Units
Explicit assumptions to keep architecture comparisons reproducible and auditable.- CurrencyUSD
- Token unittoken
- Pricing snapshot2026-03-15
- Baseline modelOpenAI / GPT-5 Mini
- Candidate modelOpenAI / GPT-5.2
- Monthly users800 active users for fixed-term amortization and business totals
- Fixed monthly termEmbedding refresh is treated as a fixed monthly cost in business totals
Recommended Next Step
Use this section to turn the architecture comparison into the next implementation checks.If the long-context path still looks viable after quality and latency checks, review infra constraints next.
Compare infra providers
View Infra RecommendationsSources and Snapshot
Pricing comes from a daily snapshot generated by batch workflows.Active Pricing Rows
RAG baseline
OpenAI / GPT-5 Mini
- Input tokens$0.25 / 1M
- Output tokens$2 / 1M
Long context
OpenAI / GPT-5.2
- Input tokens$1.75 / 1M
- Output tokens$14 / 1M
Shared retrieval defaults
- Embedding input$0.02 / 1M
- Rerank docs$1 / 1K
- Snapshot date: 2026-03-15
- Source links and update notes: Pricing Snapshot Reference
Continue Analysis
Move to the next tool or guide without losing your current scenario.Switch tools
- AI Workflow Cost Calculator
- AI Break-even Price Calculator
- LLM Model Cost Comparison
- RAG Retrieval Cost Calculator
- Reranking Cost Calculator
- Cache Savings Simulator
- Context Window Cost Calculator
- Embedding Ingestion Cost Calculator
- Browse all tools
Read guides