RAG Cost Components Explained
Most teams underestimate cost because they only track model tokens. RAG economics has multiple layers.
Core Components
- Generation: model input + output token spend.
- Retrieval context: extra tokens added from chunks.
- Reranking: relevance scoring cost per document set.
- Embedding ingestion: monthly cost to process corpus changes.
- Vector DB: lookup cost per request.
- Cache: cost reduction from repeated answers.
- Infra: non-model compute and network overhead.
Try it in calculator: RAG Cost per User