RAG Cost Components Explained
Most teams underestimate cost because they only track model tokens. Retrieval-heavy agent workflows and RAG systems have multiple layers.
Question
How should I structure agent cost math before summing total unit cost?
Quick answer
Formula: total_cost_per_user_month = generation + retrieval + reranking + embedding_ingestion_share + vector_db + cache + infra
- Assumption: compute each component independently, then sum.
- Assumption: cache is a savings term and can be negative.
- Assumption: currency and units must stay explicit.
- Assumption: embedding refresh starts as a fixed monthly corpus term and becomes a per-user share only after amortization.
Example: if generation is $1.80, retrieval is $0.60, reranking is $0.30, amortized embedding share is $0.20, vector DB is $0.10, cache is -$0.40, and infra is $0.20, total is $2.80 per user/month.
Core Components
- Generation: model input + output token spend.
- Retrieval context: extra tokens added from chunks.
- Reranking: relevance scoring cost per document set.
- Embedding ingestion share: amortized per-user share of the monthly cost to process corpus changes.
- Vector DB: lookup cost per request.
- Cache: cost reduction from repeated answers.
- Infra: non-model compute and network overhead.
Try it in calculators: AI Workflow Cost, Rerank Cost
Run the Calculator
Open the related calculator with your own assumptions before you compare infra, packaging, or rollout choices.
Open Related Calculator