Overview
This guide will walk you through building a complete RAG (Retrieval-Augmented Generation) application using Mini RAG. You’ll learn how to:
Set up your environment
Index documents
Query your knowledge base
Access and use retrieved sources
Prerequisites: Python >= 3.11, OpenAI API key, and a Milvus instance (local or cloud)
Step 1: Installation
Install Mini RAG using your preferred package manager:
Step 2: Environment Setup
Create a .env file in your project directory with the required credentials:
# OpenAI Configuration
OPENAI_API_KEY = sk-your-api-key-here
OPENAI_BASE_URL = https://api.openai.com/v1 # Optional
EMBEDDING_MODEL = text-embedding-3-small
# Milvus Configuration
MILVUS_URI = https://your-milvus-instance.com
MILVUS_TOKEN = your-milvus-token
# Optional: Cohere for re-ranking
COHERE_API_KEY = your-cohere-api-key
# Optional: Langfuse for observability
LANGFUSE_PUBLIC_KEY = pk-lf-...
LANGFUSE_SECRET_KEY = sk-lf-...
LANGFUSE_HOST = https://cloud.langfuse.com
Step 3: Basic Usage
Create your first RAG application:
import os
from mini import (
AgenticRAG,
EmbeddingModel,
VectorStore
)
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Initialize components
embedding_model = EmbeddingModel()
vector_store = VectorStore(
uri = os.getenv( "MILVUS_URI" ),
token = os.getenv( "MILVUS_TOKEN" ),
collection_name = "my_documents" ,
dimension = 1536 # For text-embedding-3-small
)
# Initialize RAG system
rag = AgenticRAG(
vector_store = vector_store,
embedding_model = embedding_model
)
# Index a document
print ( "📄 Indexing document..." )
num_chunks = rag.index_document( "path/to/your/document.pdf" )
print ( f "✅ Indexed { num_chunks } chunks" )
# Query the system
print ( " \n 🔍 Querying..." )
response = rag.query( "What is the main topic of the document?" )
# Display results
print ( f " \n 💬 Answer: \n { response.answer } " )
print ( f " \n 📚 Sources: { len (response.retrieved_chunks) } chunks used" )
Run your application:
Step 4: Advanced Configuration
Enhance your RAG system with advanced features:
import os
from mini import (
AgenticRAG,
LLMConfig,
RetrievalConfig,
RerankerConfig,
EmbeddingModel,
VectorStore
)
from dotenv import load_dotenv
load_dotenv()
# Initialize with custom configuration
embedding_model = EmbeddingModel()
vector_store = VectorStore(
uri = os.getenv( "MILVUS_URI" ),
token = os.getenv( "MILVUS_TOKEN" ),
collection_name = "my_documents" ,
dimension = 1536
)
rag = AgenticRAG(
vector_store = vector_store,
embedding_model = embedding_model,
# Configure LLM
llm_config = LLMConfig(
model = "gpt-4o-mini" ,
temperature = 0.7
),
# Configure retrieval
retrieval_config = RetrievalConfig(
top_k = 10 , # Retrieve 10 chunks initially
rerank_top_k = 3 , # Keep top 3 after re-ranking
use_query_rewriting = True , # Generate query variations
use_reranking = True , # Re-rank results
use_hybrid_search = True # Use semantic + BM25 search
),
# Configure re-ranker
reranker_config = RerankerConfig(
type = "llm" # Options: "llm", "cohere", "sentence-transformer"
)
)
# Index multiple documents
documents = [
"document1.pdf" ,
"document2.docx" ,
"document3.txt"
]
print ( "📄 Indexing documents..." )
total_chunks = rag.index_documents(documents)
print ( f "✅ Indexed { total_chunks } total chunks" )
# Query with detailed response
response = rag.query(
query = "What are the key findings?" ,
return_sources = True
)
# Access response details
print ( f " \n 💬 Answer: \n { response.answer } " )
print ( f " \n 🔄 Query variations: { response.rewritten_queries } " )
print ( f " \n 📊 Metadata: { response.metadata } " )
# Show source chunks
print ( f " \n 📚 Retrieved { len (response.retrieved_chunks) } chunks:" )
for i, chunk in enumerate (response.retrieved_chunks, 1 ):
print ( f " \n { i } . [Score: { chunk.reranked_score :.4f} ]" )
print ( f " Text: { chunk.text[: 150 ] } ..." )
print ( f " Source: { chunk.metadata } " )
Step 5: Enable Observability
Track and monitor your RAG pipeline with Langfuse:
from mini import (
AgenticRAG,
ObservabilityConfig,
EmbeddingModel,
VectorStore
)
rag = AgenticRAG(
vector_store = vector_store,
embedding_model = embedding_model,
# Enable observability
observability_config = ObservabilityConfig( enabled = True )
)
# All operations are now traced in Langfuse
rag.index_document( "document.pdf" )
response = rag.query( "What is this about?" )
View traces in Langfuse Monitor query rewriting, retrieval, re-ranking, and LLM generation in real-time
Common Use Cases
What You’ve Learned
How to install and configure Mini RAG
Setting up environment variables
Initializing core components
How to index single and multiple documents
Understanding document chunking
Managing your knowledge base
How to query your RAG system
Accessing retrieved sources
Understanding response structure
Configuring query rewriting
Enabling hybrid search
Using different re-ranking strategies
Monitoring with observability
Next Steps
Need Help?