What Cache Hit Rate Means for RAG and Agent Workflows
Cache hit rate is the share of requests served from cache instead of full model plus retrieval execution in a live workflow.
Question
How does cache hit rate change agent unit cost?
Quick answer
Formula: effective_cost_per_request = base_cost_per_request * (1 - cache_hit_rate)
- Assumption: cache-hit requests avoid most generation and retrieval costs.
- Assumption: hit rate is measured on production-like traffic, not synthetic tests.
- Assumption: freshness constraints can cap achievable hit rate.
Example: at $0.020/request base cost and 40% hit rate, effective cost becomes $0.012/request.
Practical Ranges
- Low reuse products: 5-15%
- Mixed workloads: 20-40%
- FAQ-like workloads: 40-70%
How To Improve It
- Normalize semantically equivalent queries.
- Cache at retrieval + response layers.
- Use freshness windows by content type.
Back to calculators: Cache Savings, AI Workflow Cost