Model Selection: Quality vs Unit Cost

Model selection is a margin decision as much as a quality decision. One workflow often calls the model multiple times, so per-call differences compound quickly.

Question

How do I compare model quality tradeoffs against margin?

Quick answer

Formula: margin_delta_pct = margin_candidate_pct - margin_baseline_pct

  • Assumption: compare models under identical usage and retrieval assumptions.
  • Assumption: evaluate quality on business-critical tasks, not benchmark alone.
  • Assumption: latency and reliability are part of decision cost.

Example: if candidate raises margin by 4 points but drops quality on key intents, use routing or keep baseline.

What To Compare

  • Cost per request at your real token profile.
  • Answer quality on business-critical tasks.
  • Latency and reliability under load.
  • Margin impact for your expected user behavior.

When a Cheaper Model Is Actually More Expensive

A lower-cost model can lose the comparison if it increases retries, second-pass routing, or human review. Saving $0.004 per request looks good until the weaker model raises the fallback share from 4% to 18% and forces more premium calls or support cleanup.

In practice, compare blended workflow cost, not only first-pass token cost. If the cheaper model is only safe for routine intents, route those requests explicitly and keep the stronger model on high-risk tasks.

Routing Pattern To Test First

  • Use the lower-cost model for routine summarization, classification, and low-risk drafting.
  • Keep the stronger model on high-value intents where a wrong answer triggers retries or manual review.
  • Measure second-pass share, fallback rate, and resolution quality before deciding the cheaper model really wins.

Open companion tool: Compare Model Costs

Baseline cost inputs: AI Workflow Cost

Run the Calculator

Open the related calculator with your own assumptions before you compare infra, packaging, or rollout choices.

Open Related Calculator