What Cache Hit Rate Means for RAG
Cache hit rate is the share of requests served from cache instead of full model + retrieval execution.
Practical Ranges
- Low reuse products: 5-15%
- Mixed workloads: 20-40%
- FAQ-like workloads: 40-70%
How To Improve It
- Normalize semantically equivalent queries.
- Cache at retrieval + response layers.
- Use freshness windows by content type.
Back to calculator: RAG Cost per User