Chapter 7: Choosing the Right Model: Capability Tiers, Not Hype

Series: LLM Development Guide

Chapter 7 of 15

Previous: Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene

Next: Chapter 8: Security & Sensitive Data: Sanitize, Don’t Paste Secrets

What you’ll be able to do

You’ll be able to pick a model and interface deliberately:

Use capability tiers instead of memorizing brand names.
Upgrade quickly when quality is the bottleneck.
Avoid wasting flagship models on structured boilerplate.

TL;DR

Treat model choice as a cost-of-mistakes problem.
Use flagship models for planning, debugging, and high-stakes decisions.
Use mid-tier models for implementation with strong references.
Use fast/cheap models for boilerplate and simple transformations.
If you’ve spent ~10 minutes fighting output quality, upgrade or shrink scope.

As-of note

As of 2026-02-14, model names, pricing, and product policies change frequently. Prefer tier-based guidance, and verify vendor policies directly before using tools with sensitive data.

The capability tiers
Task-to-tier mapping
Red flags: upgrade now
A selection checklist
Verification

The capability tiers

Think in tiers:

Flagship: best reasoning and instruction-following for novel work.
Mid-tier: strong general performance for structured work with references.
Fast/cheap: good for simple tasks, higher error rate on complex reasoning.

This framing stays useful even when names change.

Task-to-tier mapping

Use flagship for:

Planning and architecture.
Debugging complex failures.
Security-sensitive review.
Anything where mistakes are expensive.

Use mid-tier for:

Implementation that follows existing patterns.
Refactors with clear examples.
Writing tests when the behavior is already defined.

Use fast/cheap for:

Syntax lookups.
Boilerplate you will review.
Mechanical transformations.

Red flags: upgrade now

Upgrade when you see:

The model repeats the same misunderstanding.
Output ignores constraints.
“Looks right” code fails in tests.
You are on the third prompt iteration for the same unit.

The cheapest model is the one that gets you to a correct verified change with the least total time.

A selection checklist

Before you start, answer:

Is this novel or pattern-following?
Do I have reference implementations?
What is the cost of mistakes?
Is this structured or ambiguous?
Am I debugging or implementing?

If uncertain:

Start with flagship for planning.
Drop to mid-tier once you have a stable pattern and good references.

Verification

A practical way to keep this from being hand-wavy is to force a written decision per phase.

Create a small note file per task:

mkdir -p work-notes

cat > work-notes/model-selection.md <<'MD'
# Model Selection (Per Task)

## Task
<What are we doing?>

## Risk
- Cost of mistakes:
- Can I review the output competently?

## References
- <Paths to reference implementations>

## Model decision
- Tier: <flagship|mid-tier|fast>
- Why:
- When to upgrade:

## Outcome
- Did we upgrade?
- What broke / what worked:
MD

Expected result:

You can justify the model choice in one minute.
You have a trigger for upgrading when output quality is the bottleneck.

Continue -> Chapter 8: Security & Sensitive Data: Sanitize, Don’t Paste Secrets

Llm Software-Engineering Workflow Agents

Authors

Roy Gabriel

DevOps Architect · Applied AI Engineer

I’ve spent 20 years building systems across embedded firmware, security platforms, fintech, and enterprise architecture. Today I focus on production AI systems in Go: multi-agent orchestration, MCP server ecosystems, and the DevOps platforms that keep them running. I care about systems that work under pressure: observable, recoverable, and built to last.

← Chapter 8: Security & Sensitive Data: Sanitize, Don't Paste Secrets January 29, 2026

Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene January 25, 2026 →