Context Bloat in RAG

Context bloat is non-task prompt overhead that raises token spend without improving user outcomes in RAG systems and agent workflows.

Question

How do I trim context bloat in an agent or RAG workflow without hurting answer quality?

Formula: candidate_input_tokens = task_tokens + target_non_task_tokens

Formula: cost_delta = cost_candidate - cost_baseline

Assumption: task tokens stay constant between baseline and candidate runs.
Assumption: non-task tokens are measured on production-like requests.
Assumption: quality is validated with sampled prompts before global rollout.

Example: baseline input 1800 with 850 non-task tokens trimmed to 300 lowers candidate input to 1250 before pricing is applied.