Chapter 7: Choosing the Right Model: Capability Tiers, Not Hype

January 27, 2026 · 3 min read
blog

Series: LLM Development Guide

Chapter 7 of 15

Previous: Chapter 6: Scaling the Workflow: Phases, Parallelism, Hygiene

Next: Chapter 8: Security & Sensitive Data: Sanitize, Don’t Paste Secrets

What you’ll be able to do

You’ll be able to pick a model and interface deliberately:

  • Use capability tiers instead of memorizing brand names.
  • Upgrade quickly when quality is the bottleneck.
  • Avoid wasting flagship models on structured boilerplate.

TL;DR

  • Treat model choice as a cost-of-mistakes problem.
  • Use flagship models for planning, debugging, and high-stakes decisions.
  • Use mid-tier models for implementation with strong references.
  • Use fast/cheap models for boilerplate and simple transformations.
  • If you’ve spent ~10 minutes fighting output quality, upgrade or shrink scope.

As-of note

As of 2026-02-14, model names, pricing, and product policies change frequently. Prefer tier-based guidance, and verify vendor policies directly before using tools with sensitive data.

Table of contents

The capability tiers

Think in tiers:

  • Flagship: best reasoning and instruction-following for novel work.
  • Mid-tier: strong general performance for structured work with references.
  • Fast/cheap: good for simple tasks, higher error rate on complex reasoning.

This framing stays useful even when names change.

Task-to-tier mapping

Use flagship for:

  • Planning and architecture.
  • Debugging complex failures.
  • Security-sensitive review.
  • Anything where mistakes are expensive.

Use mid-tier for:

  • Implementation that follows existing patterns.
  • Refactors with clear examples.
  • Writing tests when the behavior is already defined.

Use fast/cheap for:

  • Syntax lookups.
  • Boilerplate you will review.
  • Mechanical transformations.

Red flags: upgrade now

Upgrade when you see:

  • The model repeats the same misunderstanding.
  • Output ignores constraints.
  • “Looks right” code fails in tests.
  • You are on the third prompt iteration for the same unit.

The cheapest model is the one that gets you to a correct verified change with the least total time.

A selection checklist

Before you start, answer:

  • Is this novel or pattern-following?
  • Do I have reference implementations?
  • What is the cost of mistakes?
  • Is this structured or ambiguous?
  • Am I debugging or implementing?

If uncertain:

  • Start with flagship for planning.
  • Drop to mid-tier once you have a stable pattern and good references.

Verification

A practical way to keep this from being hand-wavy is to force a written decision per phase.

Create a small note file per task:

mkdir -p work-notes

cat > work-notes/model-selection.md <<'MD'
# Model Selection (Per Task)

## Task
<What are we doing?>

## Risk
- Cost of mistakes:
- Can I review the output competently?

## References
- <Paths to reference implementations>

## Model decision
- Tier: <flagship|mid-tier|fast>
- Why:
- When to upgrade:

## Outcome
- Did we upgrade?
- What broke / what worked:
MD

Expected result:

  • You can justify the model choice in one minute.
  • You have a trigger for upgrading when output quality is the bottleneck.

Continue -> Chapter 8: Security & Sensitive Data: Sanitize, Don’t Paste Secrets

Authors
DevOps Architect · Applied AI Engineer
I’ve spent 20 years building systems across embedded firmware, security platforms, fintech, and enterprise architecture. Today I focus on production AI systems in Go: multi-agent orchestration, MCP server ecosystems, and the DevOps platforms that keep them running. I care about systems that work under pressure: observable, recoverable, and built to last.