🎚️RAG vs Fine-tuning vs Prompt Engineering

Three ways to inject knowledge into an LLM. Picking the right one saves months of wasted infrastructure.

aitechnical

Why it matters

Most teams over-invest in fine-tuning because it sounds sophisticated. The reality: prompting + RAG handles 95% of production cases at lower cost and faster iteration. Knowing when each is right is foundational AI PM judgment.

The core idea

Default to prompt engineering — it's the fastest. Add RAG when you have a corpus the model needs to ground in. Use fine-tuning only when prompt + RAG can't achieve consistency or specialized behavior. The decision matters because the infrastructure investment for fine-tuning is 5-10x that of RAG.

The three approaches

Prompt engineering. Craft the system prompt + few-shot examples to elicit the behavior you want. No training. No infrastructure. Iterate in hours.

When it's enough:

Behavior can be specified in instructions
Knowledge fits in context window
Knowledge doesn't change often

When it's not:

Knowledge is too large for context (whole knowledge base)
Knowledge changes frequently
Model needs to learn a specialized style or capability

RAG (Retrieval-Augmented Generation). Store corpus in vector DB. At query time, retrieve relevant chunks. Inject into prompt. Let the LLM answer based on retrieved context.

When to use:

Large or constantly-updating corpus (docs, support tickets, internal wiki)
Need fresh data (model's training cutoff doesn't include recent info)
Need citations / source attribution
Cost-sensitive (smaller prompts than stuffing everything in)

This is the dominant production AI pattern in 2026.

Fine-tuning. Further-train a base model on your data. Either supervised fine-tuning (input-output pairs) or RLHF (reinforcement learning from human feedback).

When to use:

Need consistent output format that prompt can't enforce
Specialized capability the base model lacks
High volume where smaller fine-tuned model is cheaper than frontier model per call
Domain language so different that prompting can't bridge

When NOT to use:

Knowledge injection (RAG is better — easier to update, cheaper)
General improvement (frontier models keep improving; your fine-tuned model gets stale)
Quick iteration (fine-tuning takes days to weeks; prompting takes minutes)

The decision tree

Can you specify the behavior in <2K tokens of prompt + examples? → Prompting.
Do you have a corpus the model needs to reference? → Add RAG.
Even with RAG, is the model failing consistently? → Consider fine-tuning, but try prompt + RAG with frontier model first.

The cost economics

For a moderate AI feature (5K input + 2K output, 100K calls/month):

Prompting + RAG with frontier model: ~$5K-15K/month
Fine-tuned smaller model: ~$2K-5K/month + $5-20K one-time fine-tuning cost
Self-hosted open-source model: ~$2K-10K/month (infra) + engineering time

Fine-tuning's cost advantage only materializes at high call volumes (millions/month). Most products don't get there.

The hybrid that works in 2026

The pattern most production AI products converge on:

Frontier model + RAG for complex/high-value calls
Smaller model for high-volume routing and intent classification
Fine-tuned model for specialized capabilities (very rare)
Eval-driven optimization at every layer

Real-world examples

Perplexity

RAG as core capability

Perplexity is essentially a sophisticated RAG product — search the web, retrieve relevant content, ground the LLM in real sources. The fine-tuning matters less than the retrieval quality. The discipline of investing in retrieval over model-tuning is one reason they out-product competitors.

Go deeper — recommended reading

RAG vs. Fine-tuning vs. Prompt Engineering: The Complete Guide

Aakash Gupta · Product Growth

↗

Prompt Engineering in 2025

Aakash Gupta · Product Growth

↗

Anthropic Prompt Engineering Guide

Anthropic

↗

Interview questions (1)

I'm building a customer support AI. Should I use RAG or fine-tune?

ai-pmsenior

▼

Almost certainly RAG.

Customer support requires three things AI must do well: (1) answer based on accurate, current information (your docs, your product), (2) cite sources so the human escalator can verify, (3) handle the case where the answer isn't in the corpus.

RAG handles all three. Fine-tuning handles none well — fine-tuned models can't easily cite, don't update when your docs change, and tend to hallucinate when the question is outside the training data.

The right architecture: store your docs + support tickets in a vector DB, retrieve top relevant chunks per query, prompt a frontier model with the chunks + clear instructions to cite or escalate. Build an eval suite with 100+ real customer questions + expected answers. Measure.

You'd consider fine-tuning only if RAG + prompting + frontier model couldn't achieve the quality bar, AND you had millions of queries per month where the cost difference mattered. For most companies, that day never comes.

Related concepts

🧠Everything You Need to Know about AI (for PMs)

The foundational vocabulary and mental model. If you can speak fluently about LLMs, RAG, agents, evals, and the cost stack, you're already ahead of 80% of PMs.

📝Prompt Engineering in 2026

The patterns that work with current frontier models. Less about clever tricks, more about clear instructions and good examples.

📊Evals — The FAQ Every AI PM Needs

Evals are how you know if your AI product actually works. The single most-skipped discipline by junior AI teams.