Agent Run Cost

What does one completed AI agent run cost?

How this tool works

This calculator treats one completed agent run as the unit of analysis. It prices the model calls, prompt and tool context, optional retrieval and reranking, vector lookups, cache savings, infrastructure overhead, and fixed refresh work needed to complete the run.

Use this when

  • You need the cost of one agent run before setting usage credits, caps, or package allowances.
  • You want model-call count, context size, retrieval, cache, and infra visible as separate cost drivers.
  • You need a per-run price floor before modeling tool-use fees, approval costs, or multi-agent fan-out.

How It Works

  1. Choose the provider/model and set model requests per completed agent run.
  2. Model prompt size, generated output, optional retrieval, reranking, vector lookups, cache, infra, and monthly refresh work.
  3. Compare cost per run, monthly agent spend, break-even run price, margin, and top sensitivity drivers.

Formulas

refresh_share_usd = refresh_monthly_usd / monthly_agent_runs

cost_per_agent_run_usd = generation + retrieval + reranking + refresh_share_usd + vector_db + cache + infra

optional_margin_pct = (run_price_or_chargeback_usd - cost_per_agent_run_usd) / run_price_or_chargeback_usd * 100

Assumptions and Units

  • Currency: USD
  • Unit of analysis: one completed agent run
  • Model requests per run includes planning, tool-call reasoning, retries, and final-response generation
  • Fixed refresh work is amortized across monthly agent runs before it appears in per-run cost
  • Pricing source: daily pricing snapshot in repo, no runtime scraping

Worked Example

Start with a chat-style agent at four model requests per run, then add retrieval or tool context only when it is part of the run. If per-run margin is tight, use Coding Agent Cost per Task for accepted coding output or Support Bot Cost per Ticket for ticket-level support economics.

FAQs

Is one agent run the same as one model request? No. One run can include planning, tool calls, retries, and final generation, so the model-request count stays explicit.

What usually moves agent-run cost first? Model requests per run, prompt and tool context, retrieval depth, and cache hit rate usually move the result before small list-price differences.

Related resources: AI Workflow Cost, Break-even Price, Compare Model Costs, Agent Run Cost, How Much Does an AI Agent Cost?, How To Price AI Agent Usage With Credits, Caps, and Margin, What Is an AI Agent?, RAG Cost Components Explained, How Many Tokens Per Request?, Coding Agent Cost per Task, AI Support Agent Cost per Ticket.

Pricing snapshot: 2026-06-26Provider: OpenAIModel: GPT-5 Mini

Decision Signal

Healthy

Current agent-run margin is 90.8%. Use this to compare modeled run cost with a per-run price, credit value, or internal chargeback.

Step 1 Provider and Model

Switch model assumptions using prices from the selected snapshot.

Step 2 Quick Mode

Use plain-language assumptions first. Open Advanced assumptions only if needed.
Starting pointApply the generic baseline or a more conservative downside scenario. If neither is selected, you are working from custom inputs.
Quick workflowStart with the closest workflow shape, then fine-tune the assumptions below.

Chat-first workflow with no retrieval, reranking, vector queries, or embedding ingestion layers.

Optional Advanced assumptions

Tune retrieval, reranking, embeddings, vector, caching, and infra.
Show advanced inputs

Only adjust these once your Quick Mode assumptions feel realistic.

Scenario actions

Copy scenario URL

Paste into ChatGPT or Claude, or share with a teammate.

Save and track this scenario

Track pricing drift on this scenario and get an email if the latest result changes.

How tracking works

After you click Save and track, we carry this exact calculator state into the tracked-scenarios page so you can sign in and confirm the save.

We save your assumptions and the pricing snapshot used for this result.

When a newer pricing snapshot lands, we recompute the same scenario, show what changed, and email you if the latest result moved.

1 tracked scenario free, then $12/mo or $120/yr for up to 25 tracked scenarios.

Cost per agent run

$0.0046

Gross margin

90.8%

Estimated monthly run cost

$9.23

Estimated monthly gross profit

$90.77

Top Cost Drivers

Most sensitive variables when each is moved up by 10%.
Model requests / agent run10.0%
Output Tokens5.3%
Input Tokens1.9%

Totals

Summary metrics for monthly unit economics and margin.
Cost per model request
$0.00115
Cost per agent run
$0.0046
Gross margin %
90.8%
Break-even price
$0.0046
Cost per model request$0.00115
Cost per agent run$0.0046
Gross margin %90.8%
Break-even price$0.0046

Component Breakdown (USD/agent run)

Each cost component is computed independently and summed.

Largest cost block: Generation.

GenerationModel input/output token spend for requests.
$0.0035
RetrievalExtra model input spend from retrieved context chunks.
$0
RerankingReranker cost based on docs scored per request.
$0
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.
$0
Vector DbVector database query cost across all requests.
$0
CacheSavings from cache hits. Negative means lower total cost.
$-0.0002
InfraNon-model infra overhead per request.
$0.0014
GenerationModel input/output token spend for requests.$0.0035
RetrievalExtra model input spend from retrieved context chunks.$0
RerankingReranker cost based on docs scored per request.$0
Embeddings IngestionAmortized per-user share of the fixed monthly corpus embedding refresh cost.$0
Vector DbVector database query cost across all requests.$0
CacheSavings from cache hits. Negative means lower total cost.$-0.0002
InfraNon-model infra overhead per request.$0.0014
Sensitivity RankingChange in total cost when one variable is increased by 10%.
VariableDelta cost %
Model requests / agent runUser activity level per month.10.0%
Output TokensGenerated tokens per request.5.3%
Input TokensPrompt-side tokens per request.1.9%
Cache Hit RateFraction of requests served by cache.-0.5%
Retrieved ChunksRetrieved chunk count per request.0.0%
Tokens Per ChunkAverage chunk size in tokens.0.0%
Rerank DocsDocs reranked per request.0.0%
Vector Queries Per RequestVector query count per request.0.0%
Monthly agent runsActive-user estimate used to amortize fixed monthly embedding refresh.0.0%

Assumptions and Units

Explicit assumptions to keep outputs reproducible and auditable.
  • CurrencyUSD
  • Token unittoken
  • Pricing snapshot2026-06-26
  • Selected model rowOpenAI/GPT-5 Mini
  • Volume basisBusiness totals and fixed monthly terms use monthly agent runs as the denominator
  • Embedding refreshAmortized per agent run from the fixed monthly corpus, tool, or index refresh term
  • Cache componentNegative value means cost savings

Recommended Next Step

Use these links to lower top cost drivers without guessing.

If agent-run cost is high, pressure-test model-call count, context size, retrieval depth, and cache reuse before changing packaging.

Sources and Snapshot

Pricing comes from the current dated snapshot.

Active Pricing Row

Selected model

OpenAI / GPT-5 Mini

  • Input tokens$0.25 / 1M
  • Output tokens$2 / 1M

Shared retrieval defaults

  • Embedding input$0.02 / 1M
  • Rerank docs$1 / 1K