How Much Does an AI Agent Cost?

For most retrieval-heavy agent workflows, estimate one workflow first, then roll it up to per-user monthly cost.

Question

How much does an AI agent cost?

Quick answer

Formula: cost_per_user_month = generation + retrieval + reranking + embedding_ingestion_share + vector_db + cache + infra

  • Assumption: this structure fits retrieval-heavy AI features and assistants especially well.
  • Assumption: cache is modeled as a signed savings term and can be negative.
  • Assumption: monthly cost should be paired with explicit requests-per-user assumptions, not one vague average.
  • Assumption: embedding refresh is a fixed monthly term that should be amortized across active users before it is treated as per-user cost.

Example: if generation=$1.80, retrieval=$0.60, reranking=$0.30, embedding_ingestion_share=$0.20, vector_db=$0.10, cache=-$0.40, and infra=$0.20, cost_per_user_month=$2.80.

Fastest Working Method

  1. Estimate requests per active user and tokens per workflow.
  2. Break the workflow into generation, retrieval, reranking, cache, vector, and infra terms.
  3. Roll the result up to cost per user/month before discussing price or packaging.
  4. Check which component dominates before adding more complexity.

What Usually Moves Cost Most

  • Request frequency per user.
  • Retrieved context size and rerank depth.
  • Model choice for generation and fallback traffic.
  • Cache hit rate and repeated-answer reuse.
  • How often the knowledge base is re-embedded.

Worked Monthly Cost Example

If a docs assistant averages 80 requests per active user each month at $0.028 per request, then variable workflow spend is 80 * 0.028 = $2.24. Add $0.36 of monthly embedding, vector, cache, and infra overhead and the full unit cost lands at $2.60 per active user each month.

The same workflow reaches $5.12 per active user each month if usage doubles to 160 requests and per-request cost rises to $0.030 because retrieval depth or output length increased. That is why request volume and workflow shape matter more than one average token estimate.

When a Simple Estimate Breaks

  • If the workflow loops multiple times before answering.
  • If tool calls or API actions are a material share of spend.
  • If p90 sessions are much larger than average sessions.

Open companion tool: AI Workflow Cost

Price the workflow next: Break-even Price

Related reads: What Is an AI Agent?, RAG Cost Components Explained

Run the Calculator

Open the related calculator with your own assumptions before you compare infra, packaging, or rollout choices.

Open Related Calculator