How Many Tokens Per Request?
Token counts are workload-specific, but you can start with practical defaults and refine from logs.
Quick Starting Heuristics
- Support bot: 400-900 input tokens, 120-300 output tokens.
- Internal copilot: 700-1400 input tokens, 200-450 output tokens.
- Research assistant: 1200-3000 input tokens, 300-900 output tokens.
How To Improve Accuracy
- Sample production requests and compute p50/p90 token usage.
- Model retrieval context separately from prompt template tokens.
- Track changes after prompt, retrieval, or model updates.
Back to calculator: RAG Cost per User