Three main ways to adapt an LLM to your use case, each with different tradeoffs on cost, latency, quality, and complexity. Here is how to choose.

The Three Approaches

Prompting

Use a pretrained model with a carefully-crafted prompt. No training, no retrieval, just in-context instructions.

RAG (Retrieval-Augmented Generation)

Add a retrieval step: find relevant documents from your data, pass to the LLM as context, generate answer grounded in retrieval.

Fine-tuning

Update model weights on your own data. Creates a customized model.

Quick Decision Rubric

Need	Best Approach
Access to current / proprietary data	RAG
Consistent style / format	Fine-tune
Complex reasoning on general topics	Prompting (best modern models)
Domain-specific vocabulary / terminology	Fine-tune
Large corpus to query	RAG
Low latency, simple task	Fine-tune (smaller model)
Fast iteration, prototyping	Prompting
Compliance / auditability	RAG (citable sources)

## Prompting — When It Wins

Strengths

Zero training cost
Fastest to iterate
Works with any model
Access to latest model improvements automatically

Weaknesses

Can't access data the model wasn't trained on
Prompts can be brittle (fragile to model updates)
Long prompts are expensive at scale

When to use

Proof-of-concept
General reasoning / writing
Low-volume tasks
When you want to leverage cutting-edge models as they ship

RAG — When It Wins

Strengths

Access to custom / proprietary data
Data updates without retraining
Citations and auditability
Same cost-structure as prompting (model inference)

Weaknesses

Retrieval quality becomes a bottleneck
Chunking / embedding model decisions matter
Still bound by context window
Pipeline complexity (vector DB, embedding models, chunking)

When to use

Q&A over your documentation
Customer support with knowledge base
Research assistants
Any case with citable sources needed

Fine-Tuning — When It Wins

Strengths

Consistent output style / format
Can use smaller, cheaper models
Lower per-inference cost (amortized over volume)
Bakes in domain terminology

Weaknesses

Training cost ($100s-$10Ks)
Slower iteration (each change requires retraining)
Model staleness (doesn't benefit from base-model updates)
Risk of overfitting on small training sets

When to use

High-volume production tasks
Specific output format (JSON schemas, SQL, etc.)
Domain language (legal, medical, regulatory)
Latency-sensitive (small fine-tuned model beats big general model)

Combining Approaches

Most sophisticated systems combine:

RAG for data access + fine-tuning for style
Prompting for the flexible outer shell, fine-tuned for specific sub-tasks
Fine-tune the retriever (embedding model) + prompt the generator

Cost Comparison (Illustrative)

For a high-volume production task at 1M requests / month:

Prompting with GPT-5: $10K-$30K / month
RAG with GPT-5 (shorter prompts via retrieval): $3K-$10K / month
Fine-tuned GPT-5-mini: $500-$2K / month + $5K one-time training

Cost savings compound at scale. Fine-tuning wins at volume.

Common Mistakes

Fine-tuning too early

Before you have >1,000 high-quality examples, fine-tuning usually underperforms prompting on the latest model.

RAG for stable data

If your data doesn't change, bake it into a fine-tuned model instead of paying retrieval cost on every request.

Prompting for high-volume

At 10M+ requests / month, a well-fine-tuned small model beats prompting the biggest model on cost by 10x+.

Mixing without evaluation

Combining approaches increases complexity; measure whether complexity is earning its keep.

Key Takeaways

Prompting: fast, flexible, expensive at scale
RAG: custom data access, citations, medium complexity
Fine-tuning: consistent style, domain language, cheapest per inference but costly to train
Combine approaches for sophisticated systems
Start simple (prompting); add complexity only when evals show clear wins

See also [/guides/what-is-rag-retrieval-augmented-generation](/guides/what-is-rag-retrieval-augmented-generation).

Fine-Tuning vs RAG vs Prompting: When to Use Each LLM Approach

The Three Approaches

Prompting

RAG (Retrieval-Augmented Generation)

Fine-tuning

Quick Decision Rubric

Strengths

Weaknesses

When to use

RAG — When It Wins

Strengths

Weaknesses

When to use

Fine-Tuning — When It Wins

Strengths

Weaknesses

When to use

Combining Approaches

Cost Comparison (Illustrative)

Common Mistakes

Fine-tuning too early

RAG for stable data

Prompting for high-volume

Mixing without evaluation

Key Takeaways