Structured Course

Model Selection

From first install to production patterns. Every lesson is standalone — jump to what you need, or work through from beginner to advanced.

147 lessons 3 levels Beginner → Advanced
Beginner
49 lessons · 7 chapters
See all →
Why Model Selection Matters 7
The Model Landscape 2026 7
Evaluation Dimensions 7
+4 more chapters
Start Beginner →
Intermediate
49 lessons · 7 chapters
See all →
Reasoning Models Selection 7
Open Source Model Selection 7
Vision Model Selection 7
+4 more chapters
Start Intermediate →
Advanced
49 lessons · 7 chapters
See all →
Capability Assessment Framework 7
Cost and Latency Matrix 7
Open Source vs Proprietary Decision 7
+4 more chapters
Start Advanced →

Full Course Contents

Beginner

49 lessons
4 Task-Model Matching 7

Intermediate

49 lessons
1 Reasoning Models Selection 7
1
When reasoning models are worth the cost Reasoning models (o3, Claude Opus extended thinking) cost 10-40× more per token but solve specific high-stakes problems where traditional models fail: and the domain determines whether that ROI exists.
2
o1 vs o3: capability and cost comparison o3 costs 3-4x more but solves reasoning problems o1 cannot; choose based on task complexity, not brand loyalty.
3
DeepSeek R1: open source reasoning DeepSeek R1 shifts reasoning workloads from proprietary inference APIs to self-hosted open models, reducing vendor lock-in and enabling compliance-sensitive deployments: but reasoning tokens cost 4-6x standard inference.
4
QwQ-32B: local reasoning option QwQ-32B enables on-premise reasoning workflows for regulated industries where API calls create compliance friction: but only if your infrastructure can handle 32GB memory and latency isn't a constraint.
5
Gemini 2.5 Pro thinking mode Gemini 2.5 Pro's thinking mode trades latency for reasoning depth: understand when that tradeoff wins in your domain and when it costs you.
6
Reasoning models for math and code Reasoning models (o3, Claude Opus extended thinking) solve symbolic problems that token-prediction models fail on, but cost 10–100x more and require architectural redesign around latency.
7
When standard models beat reasoning models Reasoning models (o3, extended thinking) cost 5–50x more per token and add 10–60 second latencies: standard models win most production systems when speed, cost, or user experience matter more than perfect reasoning.
2 Open Source Model Selection 7
1
Llama 4 Scout: MoE, 10M context Llama 4 Scout's mixture-of-experts architecture trades inference speed for cost efficiency, but 10M context windows demand careful orchestration of memory, latency, and regulatory compliance in production.
2
Llama 3.3 70B: production quality open source Llama 3.3 70B is the first open-source model that matches closed-source performance at enterprise scale, eliminating vendor lock-in as a business risk: but only if you deploy it on infrastructure you control.
3
Qwen3: multilingual and coding Qwen3 is the only open-weight model with genuine parity on non-English code and documentation, making it the strategic choice for global engineering teams: but only if you can run it yourself or negotiate vendor pricing.
4
Mistral open source family Mistral's open models let you control inference costs and data residency, but you inherit deployment complexity that closed APIs hide from you.
5
DeepSeek V3: efficiency leader DeepSeek V3 achieves GPT-4-class reasoning at 1/10th the inference cost, forcing architects to rethink the economics of model selection: but latency and vendor lock-in create new tradeoffs.
6
Gemma 3: Google open weight Gemma 3 is production-grade for cost-sensitive inference, but requires your own infrastructure: avoiding vendor lock-in while accepting operational burden.
7
Choosing open source by use case Open source model selection depends on regulatory requirements, inference latency, cost constraints, and whether you can run inference on your own infrastructure: not just model benchmarks.
3 Vision Model Selection 7
1
GPT-4.1 vision capabilities GPT-4.1 vision solves document intelligence and visual inspection at enterprise scale, but fails on real-time video, medical imaging diagnostics, and tasks requiring spatial reasoning beyond 2D layout.
2
Gemini 2.5 Pro: When Multimodal Inference Changes Architecture Gemini 2.5 Pro's native video understanding and document processing capabilities eliminate entire pipeline stages: but only if you architect for concurrent multimodal input, not sequential image extraction.
3
Claude Sonnet Vision: When to Use Multi-Modal Analysis in Production Claude Sonnet's vision capability is production-ready for document analysis and compliance workflows, but requires careful integration planning around latency, cost per image, and human review checkpoints that teams consistently underestimate.
4
Llama 4 native multimodal Llama 4's native multimodal capabilities eliminate the need for separate vision encoders, but production deployment requires careful consideration of cost, latency, and token efficiency trade-offs.
5
Document Understanding: Comparing OCR, LLM Extraction, and Specialized Models Document understanding requires choosing between three fundamentally different approaches, each with hard limits in accuracy, cost, and compliance that no amount of engineering can overcome.
6
Chart and graph analysis Chart and graph analysis requires different model architectures than text or images alone: and the compliance burden varies dramatically by industry.
7
Vision model cost comparison Vision model costs vary 1000x by vendor and deployment method: pick wrong and burn your budget before you prove value.
4 Embedding Model Selection 7
1
text-embedding-3-large vs small: When to choose each model text-embedding-3-large handles semantic complexity and rare use cases; text-embedding-3-small is production-grade for 99% of retrieval systems and costs 5x less.
2
Cohere embed-v3: When to Use It Over OpenAI & Alternatives Cohere embed-v3 is the right choice when you need multilingual semantic search at scale without vendor lock-in to OpenAI, but only if your infrastructure can handle non-US data residency requirements.
3
Voyage AI embeddings Voyage AI embeddings are purpose-built for semantic search and RAG in enterprise contexts, but vendor lock-in and cost-per-token trade-offs require deliberate architecture decisions.
4
BGE-M3: When to Use Open Source Embeddings Instead of Proprietary APIs BGE-M3 gives you production-grade multilingual embeddings you can self-host: eliminating API costs and data residency concerns, but requiring infrastructure ownership.
5
Multilingual embedding requirements Multilingual embeddings require architectural decisions about tokenization, alignment, and vector space quality that depend on your language pairs and compliance context: not all embedding models are equal across languages.
6
Embedding dimension vs quality Higher embedding dimensions improve semantic fidelity but multiply inference cost, latency, and storage: the optimization you must get right before scaling to production.
7
Embedding model benchmarks: MTEB MTEB scores are necessary but insufficient: your embedding model must match your retrieval architecture and query distribution, not just benchmark leaderboards.

Advanced

49 lessons
3 Open Source vs Proprietary Decision 7
1
When open source wins Open source models beat proprietary APIs when you need latency guarantees, cost predictability at scale, or legal certainty over data residency: not because they're cheaper upfront, but because they're the only option that fits the constraint.
2
When proprietary is required Proprietary models aren't a luxury choice: they're a compliance and liability requirement in regulated industries, and choosing open-source when you need proprietary creates legal and operational risk that no engineering excellence can fix.
3
Self-hosted cost analysis Self-hosting only makes financial sense if your inference volume is predictable, your latency SLA is sub-100ms, and you've modeled the true cost of ops ownership.
4
Inference infrastructure for OSS Open-source model inference requires fundamentally different infrastructure decisions than closed-API models: compliance, cost, and latency trade-offs are not optional.
5
Fine-tuning OSS vs proprietary Fine-tuning proprietary models locks you into vendor ecosystems and compliance frameworks; OSS gives control but saddles you with infrastructure, security, and regulatory certification responsibility.
6
Data privacy advantage of OSS Open-source models run on your infrastructure eliminate data residency violations and give you legal control that closed APIs never can: but only if you architect it correctly.
7
Support and Community: The Hidden Cost of Model Selection Your model choice locks you into a vendor's support ecosystem: and that ecosystem becomes your technical ceiling when things break in production.
6 Selection for Specific Industries 7
1
Healthcare: HIPAA, accuracy requirements In healthcare, model accuracy is not a performance metric: it's a liability surface that HIPAA, FDA oversight, and malpractice law make you personally responsible for.
2
Finance: compliance, explainability In regulated finance, model selection is constrained by explainability requirements and regulatory approval timelines: not just accuracy.
3
Legal: citation accuracy, hallucination risk LLMs hallucinate case citations and statutory references with confidence: this creates malpractice liability that no architecture pattern fully eliminates.
4
Code generation: benchmark-driven selection Code generation models must be evaluated on your codebase's actual patterns and compliance boundaries, not generic benchmarks: and this requires building a custom evaluation framework before choosing vendors.
5
Customer support: cost and latency focus In customer support, the model you choose is determined by your SLA (response time) and cost-per-interaction, not by accuracy alone: and those constraints eliminate most frontier models.
6
Creative: quality over cost In creative domains (copywriting, design, strategy), model quality directly impacts brand value and client retention: cost optimization often destroys the output you're trying to monetize.
7
Research: frontier model access Frontier model access requires vendor contracts, SLA negotiation, and inference cost modeling before you can even evaluate whether a cutting-edge model is actually the right choice for your problem.
7 Future-Proofing Model Selection 7
1
Tracking model release cadence Model release cadence is a business, legal, and technical constraint that determines which foundation models you can deploy in production: and when you can update them without breaking compliance.
2
Evaluating new models systematically Model evaluation isn't about benchmark scores: it's about measuring performance on your actual data distribution under your actual constraints, with explicit governance for when to switch.
3
Migration playbook for model upgrades Model migrations are operational events with regulatory, financial, and reputational consequences: they require staged rollouts, baseline metrics, and explicit sign-off from non-technical stakeholders before touching production.
4
Avoiding over-optimization for one model Optimizing a model for one vendor's infrastructure or API creates technical debt that resurfaces when models change, regulations tighten, or costs spike.
5
Building abstraction layers Abstraction layers isolate business logic from model volatility, turning model selection from a technical dead-end into a runtime decision.
6
Community intelligence: following benchmarks Public benchmarks are a starting point, not a destination: your domain constraints will disqualify 80% of top-ranked models.
7
Long-term provider relationship strategy Your model provider choice locks you into their roadmap, pricing, and compliance posture for 3–5 years; choose based on regulatory trajectory and contractual escape routes, not current API performance.