How Many Tokens Per Request?

Token counts are workload-specific, but you can start with practical defaults for agent and retrieval workflows and refine from logs.

Question

How do I estimate token load per request for cost modeling?

Quick answer

Formula: effective_input_tokens = base_prompt_tokens + retrieved_chunks * tokens_per_chunk

  • Assumption: input and output token estimates should use p50 and p90 ranges.
  • Assumption: retrieved chunk tokens are modeled separately from base prompt tokens.
  • Assumption: the same token profile is used when comparing models.

Example: base prompt 500 + (5 chunks * 180) gives 1,400 effective input tokens per request.

Quick Starting Heuristics

  • Support agent: 400-900 input tokens, 120-300 output tokens.
  • Internal copilot: 700-1400 input tokens, 200-450 output tokens.
  • Research assistant: 1200-3000 input tokens, 300-900 output tokens.

How To Improve Accuracy

  • Sample production requests and compute p50/p90 token usage.
  • Model retrieval context separately from prompt template tokens.
  • Track changes after prompt, retrieval, or model updates.

Back to calculators: Prompt Overhead, AI Workflow Cost