RAG Cost Components Explained

Most teams underestimate cost because they only track model tokens. Retrieval-heavy agent workflows and RAG systems have multiple layers.

Question

How should I structure agent cost math before summing total unit cost?

Quick answer

Formula: total_cost_per_user_month = generation + retrieval + reranking + embedding_ingestion_share + vector_db + cache + infra

  • Assumption: compute each component independently, then sum.
  • Assumption: cache is a savings term and can be negative.
  • Assumption: currency and units must stay explicit.
  • Assumption: embedding refresh starts as a fixed monthly corpus term and becomes a per-user share only after amortization.

Example: if generation is $1.80, retrieval is $0.60, reranking is $0.30, amortized embedding share is $0.20, vector DB is $0.10, cache is -$0.40, and infra is $0.20, total is $2.80 per user/month.

Core Components

  • Generation: model input + output token spend.
  • Retrieval context: extra tokens added from chunks.
  • Reranking: relevance scoring cost per document set.
  • Embedding ingestion share: amortized per-user share of the monthly cost to process corpus changes.
  • Vector DB: lookup cost per request.
  • Cache: cost reduction from repeated answers.
  • Infra: non-model compute and network overhead.

Try it in calculators: AI Workflow Cost, Rerank Cost

Run the Calculator

Open the related calculator with your own assumptions before you compare infra, packaging, or rollout choices.

Open Related Calculator