Model Selection: Quality vs Unit Cost
Model selection is a margin decision as much as a quality decision. One workflow often calls the model multiple times, so per-call differences compound quickly.
Question
How do I compare model quality tradeoffs against margin?
Quick answer
Formula: margin_delta_pct = margin_candidate_pct - margin_baseline_pct
- Assumption: compare models under identical usage and retrieval assumptions.
- Assumption: evaluate quality on business-critical tasks, not benchmark alone.
- Assumption: latency and reliability are part of decision cost.
Example: if candidate raises margin by 4 points but drops quality on key intents, use routing or keep baseline.
What To Compare
- Cost per request at your real token profile.
- Answer quality on business-critical tasks.
- Latency and reliability under load.
- Margin impact for your expected user behavior.
When a Cheaper Model Is Actually More Expensive
A lower-cost model can lose the comparison if it increases retries, second-pass routing, or human review. Saving $0.004 per request looks good until the weaker model raises the fallback share from 4% to 18% and forces more premium calls or support cleanup.
In practice, compare blended workflow cost, not only first-pass token cost. If the cheaper model is only safe for routine intents, route those requests explicitly and keep the stronger model on high-risk tasks.
Routing Pattern To Test First
- Use the lower-cost model for routine summarization, classification, and low-risk drafting.
- Keep the stronger model on high-value intents where a wrong answer triggers retries or manual review.
- Measure second-pass share, fallback rate, and resolution quality before deciding the cheaper model really wins.
Open companion tool: Compare Model Costs
Baseline cost inputs: AI Workflow Cost
Run the Calculator
Open the related calculator with your own assumptions before you compare infra, packaging, or rollout choices.
Open Related Calculator