Skip to main content

Overview

This guide will walk you through building a complete RAG (Retrieval-Augmented Generation) application using Mini RAG. You’ll learn how to:
  • Set up your environment
  • Index documents
  • Query your knowledge base
  • Access and use retrieved sources
Prerequisites: Python >= 3.11, OpenAI API key, and a Milvus instance (local or cloud)

Step 1: Installation

Install Mini RAG using your preferred package manager:
uv add mini-rag

Step 2: Environment Setup

Create a .env file in your project directory with the required credentials:
.env
# OpenAI Configuration
OPENAI_API_KEY=sk-your-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional
EMBEDDING_MODEL=text-embedding-3-small

# Milvus Configuration
MILVUS_URI=https://your-milvus-instance.com
MILVUS_TOKEN=your-milvus-token

# Optional: Cohere for re-ranking
COHERE_API_KEY=your-cohere-api-key

# Optional: Langfuse for observability
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com
Don’t have a Milvus instance? Get started quickly with Zilliz Cloud (free tier available) or run Milvus locally with Docker.

Step 3: Basic Usage

Create your first RAG application:
app.py
import os
from mini import (
    AgenticRAG,
    EmbeddingModel,
    VectorStore
)
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize components
embedding_model = EmbeddingModel()

vector_store = VectorStore(
    uri=os.getenv("MILVUS_URI"),
    token=os.getenv("MILVUS_TOKEN"),
    collection_name="my_documents",
    dimension=1536  # For text-embedding-3-small
)

# Initialize RAG system
rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model
)

# Index a document
print("📄 Indexing document...")
num_chunks = rag.index_document("path/to/your/document.pdf")
print(f"✅ Indexed {num_chunks} chunks")

# Query the system
print("\n🔍 Querying...")
response = rag.query("What is the main topic of the document?")

# Display results
print(f"\n💬 Answer:\n{response.answer}")
print(f"\n📚 Sources: {len(response.retrieved_chunks)} chunks used")
Run your application:
python app.py

Step 4: Advanced Configuration

Enhance your RAG system with advanced features:
advanced_app.py
import os
from mini import (
    AgenticRAG,
    LLMConfig,
    RetrievalConfig,
    RerankerConfig,
    EmbeddingModel,
    VectorStore
)
from dotenv import load_dotenv

load_dotenv()

# Initialize with custom configuration
embedding_model = EmbeddingModel()

vector_store = VectorStore(
    uri=os.getenv("MILVUS_URI"),
    token=os.getenv("MILVUS_TOKEN"),
    collection_name="my_documents",
    dimension=1536
)

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    # Configure LLM
    llm_config=LLMConfig(
        model="gpt-4o-mini",
        temperature=0.7
    ),
    # Configure retrieval
    retrieval_config=RetrievalConfig(
        top_k=10,              # Retrieve 10 chunks initially
        rerank_top_k=3,        # Keep top 3 after re-ranking
        use_query_rewriting=True,  # Generate query variations
        use_reranking=True,    # Re-rank results
        use_hybrid_search=True # Use semantic + BM25 search
    ),
    # Configure re-ranker
    reranker_config=RerankerConfig(
        type="llm"  # Options: "llm", "cohere", "sentence-transformer"
    )
)

# Index multiple documents
documents = [
    "document1.pdf",
    "document2.docx",
    "document3.txt"
]

print("📄 Indexing documents...")
total_chunks = rag.index_documents(documents)
print(f"✅ Indexed {total_chunks} total chunks")

# Query with detailed response
response = rag.query(
    query="What are the key findings?",
    return_sources=True
)

# Access response details
print(f"\n💬 Answer:\n{response.answer}")
print(f"\n🔄 Query variations: {response.rewritten_queries}")
print(f"\n📊 Metadata: {response.metadata}")

# Show source chunks
print(f"\n📚 Retrieved {len(response.retrieved_chunks)} chunks:")
for i, chunk in enumerate(response.retrieved_chunks, 1):
    print(f"\n{i}. [Score: {chunk.reranked_score:.4f}]")
    print(f"   Text: {chunk.text[:150]}...")
    print(f"   Source: {chunk.metadata}")

Step 5: Enable Observability

Track and monitor your RAG pipeline with Langfuse:
with_observability.py
from mini import (
    AgenticRAG,
    ObservabilityConfig,
    EmbeddingModel,
    VectorStore
)

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    # Enable observability
    observability_config=ObservabilityConfig(enabled=True)
)

# All operations are now traced in Langfuse
rag.index_document("document.pdf")
response = rag.query("What is this about?")

View traces in Langfuse

Monitor query rewriting, retrieval, re-ranking, and LLM generation in real-time

Common Use Cases

What You’ve Learned

  • How to install and configure Mini RAG
  • Setting up environment variables
  • Initializing core components
  • How to index single and multiple documents
  • Understanding document chunking
  • Managing your knowledge base
  • How to query your RAG system
  • Accessing retrieved sources
  • Understanding response structure
  • Configuring query rewriting
  • Enabling hybrid search
  • Using different re-ranking strategies
  • Monitoring with observability

Next Steps

1

Explore Core Concepts

2

Learn About Features

Discover hybrid search, re-ranking, and more
3

Check API Reference

Browse the complete API documentation
4

Build Something

Try one of our example projects

Need Help?