Overview
Re-ranking is the process of re-scoring and re-ordering retrieved chunks to improve relevance. After initial retrieval (which may return 10-20 chunks), re-ranking selects the most relevant 3-5 chunks for answer generation. Why re-rank?- Embedding-based retrieval is fast but may miss nuances
- Re-rankers use more sophisticated models to assess relevance
- Better chunks = better answers from the LLM
Re-ranking Strategies
Mini RAG supports multiple re-ranking methods:LLM-based
Uses your LLM to score relevance (default)
Cohere API
Specialized re-ranking models via Cohere
Local Models
Open-source cross-encoders running locally
Quick Start
Strategy 1: LLM-Based Re-ranking
Uses your configured LLM to score chunk relevance.Configuration
Pros & Cons
Pros
- No additional API needed
- Uses existing LLM
- Good quality
- Simple setup
Cons
- Slower than dedicated rerankers
- More expensive per query
- Limited by LLM context
Strategy 2: Cohere Re-rank API
Uses Cohere’s specialized re-ranking models.Configuration
Available Models
| Model | Languages | Best For |
|---|---|---|
rerank-english-v3.0 | English | English content (best quality) |
rerank-multilingual-v3.0 | 100+ languages | International content |
Setup
Get API Key
Sign up at cohere.com and get your API key
Pros & Cons
Pros
- Very fast
- High quality
- Specialized for reranking
- Cost-effective
Cons
- Requires API key
- External dependency
- API limits apply
Strategy 3: Local Cross-Encoders
Uses open-source sentence-transformer models locally.Configuration
Available Models
| Model | Size | Quality | Speed |
|---|---|---|---|
cross-encoder/ms-marco-TinyBERT-L-2-v2 | Tiny | Good | Fast |
cross-encoder/ms-marco-MiniLM-L-6-v2 | Small | Better | Medium |
cross-encoder/ms-marco-MiniLM-L-12-v2 | Medium | Best | Slower |
Pros & Cons
Pros
- No API costs
- Data privacy (runs locally)
- No rate limits
- Open source
Cons
- Requires local compute
- GPU recommended
- Model download needed
- Slower than Cohere
Strategy 4: Custom Re-ranker
Provide your own re-ranker instance:Disabling Re-ranking
Comparison
Performance
Speed Comparison
| Method | Speed | Cost | Quality |
|---|---|---|---|
| No reranking | Fastest | Free | Baseline |
| Cohere | Fast | Low ($0.002/1K docs) | Excellent |
| Local Cross-Encoder | Medium | Free | Very Good |
| LLM-based | Slow | Medium | Good |
Quality Comparison
Best Practices
Choose the Right Strategy
Choose the Right Strategy
Selection guide:
- Cohere: Best balance for production (fast + high quality)
- LLM: Simple setup, good for prototyping
- Local: Data privacy requirements, no API costs
- None: Speed is critical, budget is tight
Top-K Configuration
Top-K Configuration
Balance retrieval and reranking:
- Higher
top_k: More candidates, better recall - Lower
rerank_top_k: Only best chunks for LLM
Combine with Hybrid Search
Combine with Hybrid Search
Optimal pipeline:
- Hybrid search retrieves 20 diverse chunks
- Re-ranker selects top 5 most relevant
- LLM generates answer from top 5
Monitor Performance
Monitor Performance
Use observability to track:
- Reranking latency
- Score distributions
- Cost per query
- Quality improvements
Cost Analysis
Cohere Rerank Pricing
LLM-based Cost
Local Cross-Encoder
Troubleshooting
Cohere API errors
Cohere API errors
Solution: Check your API key:Ensure it’s set in your
.env file and valid.Local model not loading
Local model not loading
Solution: Install sentence-transformers:First run downloads the model (~100MB).
Slow reranking
Slow reranking
Solutions:
- Switch to Cohere (fastest)
- Reduce
top_k(fewer chunks to rerank) - Use smaller local model
- Disable reranking if speed is critical
Poor reranking quality
Poor reranking quality
Solutions:
- Try Cohere (usually best quality)
- Increase
top_k(more candidates) - Ensure good initial retrieval
- Check if chunks are well-formed
Advanced Usage
Dynamic Reranker Selection
Accessing Reranked Scores
Next Steps
Hybrid Search
Combine reranking with hybrid search
Query Rewriting
Improve retrieval with query variations
AgenticRAG
Complete RAG pipeline documentation
Examples
See reranking in action
