The Three Approaches
Prompting
Use a pretrained model with a carefully-crafted prompt. No training, no retrieval, just in-context instructions.
RAG (Retrieval-Augmented Generation)
Add a retrieval step: find relevant documents from your data, pass to the LLM as context, generate answer grounded in retrieval.
Fine-tuning
Update model weights on your own data. Creates a customized model.
Quick Decision Rubric
| Need | Best Approach |
|---|---|
| Access to current / proprietary data | RAG |
| Consistent style / format | Fine-tune |
| Complex reasoning on general topics | Prompting (best modern models) |
| Domain-specific vocabulary / terminology | Fine-tune |
| Large corpus to query | RAG |
| Low latency, simple task | Fine-tune (smaller model) |
| Fast iteration, prototyping | Prompting |
| Compliance / auditability | RAG (citable sources) |
Strengths
- Zero training cost
- Fastest to iterate
- Works with any model
- Access to latest model improvements automatically
Weaknesses
- Can't access data the model wasn't trained on
- Prompts can be brittle (fragile to model updates)
- Long prompts are expensive at scale
When to use
- Proof-of-concept
- General reasoning / writing
- Low-volume tasks
- When you want to leverage cutting-edge models as they ship
RAG — When It Wins
Strengths
- Access to custom / proprietary data
- Data updates without retraining
- Citations and auditability
- Same cost-structure as prompting (model inference)
Weaknesses
- Retrieval quality becomes a bottleneck
- Chunking / embedding model decisions matter
- Still bound by context window
- Pipeline complexity (vector DB, embedding models, chunking)
When to use
- Q&A over your documentation
- Customer support with knowledge base
- Research assistants
- Any case with citable sources needed
Fine-Tuning — When It Wins
Strengths
- Consistent output style / format
- Can use smaller, cheaper models
- Lower per-inference cost (amortized over volume)
- Bakes in domain terminology
Weaknesses
- Training cost ($100s-$10Ks)
- Slower iteration (each change requires retraining)
- Model staleness (doesn't benefit from base-model updates)
- Risk of overfitting on small training sets
When to use
- High-volume production tasks
- Specific output format (JSON schemas, SQL, etc.)
- Domain language (legal, medical, regulatory)
- Latency-sensitive (small fine-tuned model beats big general model)
Combining Approaches
Most sophisticated systems combine:
- RAG for data access + fine-tuning for style
- Prompting for the flexible outer shell, fine-tuned for specific sub-tasks
- Fine-tune the retriever (embedding model) + prompt the generator
Cost Comparison (Illustrative)
For a high-volume production task at 1M requests / month:
- Prompting with GPT-5: $10K-$30K / month
- RAG with GPT-5 (shorter prompts via retrieval): $3K-$10K / month
- Fine-tuned GPT-5-mini: $500-$2K / month + $5K one-time training
Cost savings compound at scale. Fine-tuning wins at volume.
Common Mistakes
Fine-tuning too early
Before you have >1,000 high-quality examples, fine-tuning usually underperforms prompting on the latest model.
RAG for stable data
If your data doesn't change, bake it into a fine-tuned model instead of paying retrieval cost on every request.
Prompting for high-volume
At 10M+ requests / month, a well-fine-tuned small model beats prompting the biggest model on cost by 10x+.
Mixing without evaluation
Combining approaches increases complexity; measure whether complexity is earning its keep.
Key Takeaways
- Prompting: fast, flexible, expensive at scale
- RAG: custom data access, citations, medium complexity
- Fine-tuning: consistent style, domain language, cheapest per inference but costly to train
- Combine approaches for sophisticated systems
- Start simple (prompting); add complexity only when evals show clear wins
See also [/guides/what-is-rag-retrieval-augmented-generation](/guides/what-is-rag-retrieval-augmented-generation).