Overview
Mini RAG uses a clean, configuration-based API that organizes settings into logical groups. This approach provides better organization, easier maintenance, and clearer code.Configuration Classes
Mini RAG provides four main configuration classes:LLMConfig
Configure your language model settings
RetrievalConfig
Control retrieval behavior
RerankerConfig
Choose and configure reranking
ObservabilityConfig
Enable monitoring and tracing
LLMConfig
Configure your language model for answer generation.Basic Configuration
Complete Options
Parameter Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | "gpt-4" | Model identifier |
api_key | Optional[str] | None | API key (uses env var if None) |
base_url | Optional[str] | None | Custom API endpoint |
temperature | float | 0.7 | Sampling temperature |
timeout | float | 60.0 | Request timeout |
max_retries | int | 3 | Retry attempts |
Common Configurations
Using OpenAI
Using Azure OpenAI
Using Compatible API
RetrievalConfig
Control how documents are retrieved and processed.Basic Configuration
Complete Options
Parameter Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
top_k | int | 5 | Initial retrieval count |
rerank_top_k | int | 3 | Final chunk count after reranking |
use_query_rewriting | bool | True | Generate query variations |
use_reranking | bool | True | Rerank retrieved chunks |
use_hybrid_search | bool | False | Combine semantic + keyword search |
rrf_k | int | 60 | RRF fusion constant |
Tuning Guidelines
For Comprehensive Answers
For Fast, Focused Answers
For Technical/Keyword Queries
RerankerConfig
Choose and configure your reranking strategy.Available Rerankers
Mini RAG supports four reranking strategies:- LLM-based (default): Uses your LLM to score chunks
- Cohere: Uses Cohere’s specialized reranking API
- Sentence Transformer: Uses local cross-encoder models
- None: Disables reranking
LLM-based Reranking
Cohere Reranking
Sentence Transformer Reranking
Custom Reranker
Disable Reranking
Comparison
| Reranker | Pros | Cons | Best For |
|---|---|---|---|
| LLM-based | Simple, no extra APIs | Uses LLM tokens | General use |
| Cohere | Highest quality | Requires API key, costs | Production quality |
| Sentence Transformer | Local, private, free | Requires GPU for speed | Privacy-sensitive |
| None | Fastest | Lower quality | Speed-critical |
ObservabilityConfig
Enable monitoring and tracing with Langfuse.Basic Configuration
Complete Options
What Gets Tracked
When enabled, Mini RAG tracks:- 🔍 Query rewriting operations
- 📚 Document retrieval metrics
- 🎯 Reranking performance
- 💬 LLM generation calls
- 📄 Document indexing pipeline
- ⏱️ Latency for each step
- 🎭 Input/output data
Setup Langfuse
1
Sign Up
Create a free account at Langfuse Cloud
2
Get API Keys
Get your public and secret keys from project settings
3
Set Environment Variables
4
Enable in Code
Embedding Configuration
Configure the embedding model separately if needed.Basic Configuration
Complete Options
Provider Examples
OpenAI
Azure OpenAI
Local Model
Vector Store Configuration
Configure Milvus vector storage.Basic Configuration
Complete Options
Parameter Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
uri | str | - | Milvus server URI |
token | str | - | Authentication token |
collection_name | str | - | Collection identifier |
dimension | int | - | Embedding vector dimension |
metric_type | str | "IP" | Distance metric (IP, L2, COSINE) |
index_type | str | "IVF_FLAT" | Index algorithm |
nlist | int | 128 | Number of cluster units |
Metric Types
- IP (Inner Product): Fast, recommended for normalized vectors (cosine similarity)
- L2: Euclidean distance
- COSINE: Direct cosine similarity
Index Types
- IVF_FLAT: Good balance of speed and accuracy
- IVF_SQ8: Faster, uses less memory
- HNSW: Highest accuracy, more memory
Full Configuration Example
Putting it all together:Environment Variables
Recommended.env file structure:
