What Cache Hit Rate Means for RAG and Agent Workflows

Cache hit rate is the share of requests served from cache instead of full model plus retrieval execution in a live workflow.

Question

How does cache hit rate change agent unit cost?

Quick answer

Formula: effective_cost_per_request = base_cost_per_request * (1 - cache_hit_rate)

  • Assumption: cache-hit requests avoid most generation and retrieval costs.
  • Assumption: hit rate is measured on production-like traffic, not synthetic tests.
  • Assumption: freshness constraints can cap achievable hit rate.

Example: at $0.020/request base cost and 40% hit rate, effective cost becomes $0.012/request.

Practical Ranges

  • Low reuse products: 5-15%
  • Mixed workloads: 20-40%
  • FAQ-like workloads: 40-70%

How To Improve It

  • Normalize semantically equivalent queries.
  • Cache at retrieval + response layers.
  • Use freshness windows by content type.

Back to calculators: Cache Savings, AI Workflow Cost