Quick Start

Overview

This guide will walk you through building a complete RAG (Retrieval-Augmented Generation) application using Mini RAG. You’ll learn how to:

Set up your environment
Index documents
Query your knowledge base
Access and use retrieved sources

Prerequisites: Python >= 3.11, OpenAI API key, and a Milvus instance (local or cloud)

Step 1: Installation

Install Mini RAG using your preferred package manager:

uv add mini-rag

Step 2: Environment Setup

Create a .env file in your project directory with the required credentials:

.env

# OpenAI Configuration
OPENAI_API_KEY=sk-your-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional
EMBEDDING_MODEL=text-embedding-3-small

# Milvus Configuration
MILVUS_URI=https://your-milvus-instance.com
MILVUS_TOKEN=your-milvus-token

# Optional: Cohere for re-ranking
COHERE_API_KEY=your-cohere-api-key

# Optional: Langfuse for observability
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com

Don’t have a Milvus instance? Get started quickly with Zilliz Cloud (free tier available) or run Milvus locally with Docker.

Step 3: Basic Usage

Create your first RAG application:

app.py

import os
from mini import (
    AgenticRAG,
    EmbeddingModel,
    VectorStore
)
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize components
embedding_model = EmbeddingModel()

vector_store = VectorStore(
    uri=os.getenv("MILVUS_URI"),
    token=os.getenv("MILVUS_TOKEN"),
    collection_name="my_documents",
    dimension=1536  # For text-embedding-3-small
)

# Initialize RAG system
rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model
)

# Index a document
print("📄 Indexing document...")
num_chunks = rag.index_document("path/to/your/document.pdf")
print(f"✅ Indexed {num_chunks} chunks")

# Query the system
print("\n🔍 Querying...")
response = rag.query("What is the main topic of the document?")

# Display results
print(f"\n💬 Answer:\n{response.answer}")
print(f"\n📚 Sources: {len(response.retrieved_chunks)} chunks used")

Run your application:

python app.py

Step 4: Advanced Configuration

Enhance your RAG system with advanced features:

advanced_app.py

import os
from mini import (
    AgenticRAG,
    LLMConfig,
    RetrievalConfig,
    RerankerConfig,
    EmbeddingModel,
    VectorStore
)
from dotenv import load_dotenv

load_dotenv()

# Initialize with custom configuration
embedding_model = EmbeddingModel()

vector_store = VectorStore(
    uri=os.getenv("MILVUS_URI"),
    token=os.getenv("MILVUS_TOKEN"),
    collection_name="my_documents",
    dimension=1536
)

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    # Configure LLM
    llm_config=LLMConfig(
        model="gpt-4o-mini",
        temperature=0.7
    ),
    # Configure retrieval
    retrieval_config=RetrievalConfig(
        top_k=10,              # Retrieve 10 chunks initially
        rerank_top_k=3,        # Keep top 3 after re-ranking
        use_query_rewriting=True,  # Generate query variations
        use_reranking=True,    # Re-rank results
        use_hybrid_search=True # Use semantic + BM25 search
    ),
    # Configure re-ranker
    reranker_config=RerankerConfig(
        type="llm"  # Options: "llm", "cohere", "sentence-transformer"
    )
)

# Index multiple documents
documents = [
    "document1.pdf",
    "document2.docx",
    "document3.txt"
]

print("📄 Indexing documents...")
total_chunks = rag.index_documents(documents)
print(f"✅ Indexed {total_chunks} total chunks")

# Query with detailed response
response = rag.query(
    query="What are the key findings?",
    return_sources=True
)

# Access response details
print(f"\n💬 Answer:\n{response.answer}")
print(f"\n🔄 Query variations: {response.rewritten_queries}")
print(f"\n📊 Metadata: {response.metadata}")

# Show source chunks
print(f"\n📚 Retrieved {len(response.retrieved_chunks)} chunks:")
for i, chunk in enumerate(response.retrieved_chunks, 1):
    print(f"\n{i}. [Score: {chunk.reranked_score:.4f}]")
    print(f"   Text: {chunk.text[:150]}...")
    print(f"   Source: {chunk.metadata}")

Step 5: Enable Observability

Track and monitor your RAG pipeline with Langfuse:

with_observability.py

from mini import (
    AgenticRAG,
    ObservabilityConfig,
    EmbeddingModel,
    VectorStore
)

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    # Enable observability
    observability_config=ObservabilityConfig(enabled=True)
)

# All operations are now traced in Langfuse
rag.index_document("document.pdf")
response = rag.query("What is this about?")

View traces in Langfuse

Monitor query rewriting, retrieval, re-ranking, and LLM generation in real-time

Common Use Cases

Document Q&A

Build a question-answering system over your documents

Chatbot

Create an intelligent chatbot with document context

Research Assistant

Analyze and extract insights from research papers

FastAPI Integration

Deploy as a REST API with FastAPI

What You’ve Learned

Basic Setup

How to install and configure Mini RAG
Setting up environment variables
Initializing core components

Document Indexing

How to index single and multiple documents
Understanding document chunking
Managing your knowledge base

Querying

How to query your RAG system
Accessing retrieved sources
Understanding response structure

Advanced Features

Configuring query rewriting
Enabling hybrid search
Using different re-ranking strategies
Monitoring with observability

Next Steps

Explore Core Concepts

Learn about document loading, chunking, and embeddings

Learn About Features

Discover hybrid search, re-ranking, and more

Check API Reference

Browse the complete API documentation

Build Something

Try one of our example projects

Need Help?

GitHub Issues

Report bugs or request features

Discussions

Ask questions and share ideas

Getting Started

Core Concepts

Features

Guides

Examples

Overview

Step 1: Installation

Step 2: Environment Setup

Step 3: Basic Usage

Step 4: Advanced Configuration

Step 5: Enable Observability

View traces in Langfuse

Common Use Cases

Document Q&A

Chatbot

Research Assistant

FastAPI Integration

What You’ve Learned

Next Steps

Need Help?

GitHub Issues

Discussions

Getting Started

Core Concepts

Features

Guides

Examples

​Overview

​Step 1: Installation

​Step 2: Environment Setup

​Step 3: Basic Usage

​Step 4: Advanced Configuration

​Step 5: Enable Observability

View traces in Langfuse

​Common Use Cases

Document Q&A

Chatbot

Research Assistant

FastAPI Integration

​What You’ve Learned

​Next Steps

​Need Help?

GitHub Issues

Discussions

Overview

Step 1: Installation

Step 2: Environment Setup

Step 3: Basic Usage

Step 4: Advanced Configuration

Step 5: Enable Observability

Common Use Cases

What You’ve Learned

Next Steps

Need Help?