Core Concepts

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by providing them with relevant information from external knowledge sources. Instead of relying solely on the model’s training data, RAG systems:

Retrieve relevant information from a knowledge base
Augment the user’s query with this context
Generate an informed response using an LLM

Mini RAG Architecture

Mini RAG follows a modular, pipeline-based architecture that makes it easy to understand, customize, and extend:

Core Components

Document Loader

Load and parse documents from multiple formats (PDF, DOCX, images, etc.)

Chunker

Split documents into optimal chunks for embedding and retrieval

Embedding Model

Convert text into vector embeddings for semantic search

Vector Store

Store and search embeddings using Milvus

The RAG Pipeline

1. Indexing Phase

When you index a document, Mini RAG performs the following steps:

Load Document

The DocumentLoader reads and converts the document to text using MarkItDown

Chunk Text

The Chunker splits the text into optimal chunks using Chonkie

Generate Embeddings

The EmbeddingModel converts each chunk into a vector embedding

Store Vectors

The VectorStore saves embeddings and metadata to Milvus

# Example: Indexing a document
rag = AgenticRAG(vector_store=vector_store, embedding_model=embedding_model)
num_chunks = rag.index_document("research_paper.pdf")
print(f"Indexed {num_chunks} chunks")

2. Query Phase

When you query the system, Mini RAG:

Rewrite Query (Optional)

Generate multiple query variations to improve retrieval coverage

Embed Query

Convert the query (and variations) into vector embeddings

Find the most similar chunks using vector search (or hybrid search)

Rerank (Optional)

Re-rank retrieved chunks for better relevance

Generate Answer

Use LLM to generate an answer based on retrieved context

# Example: Querying
response = rag.query("What are the key findings?")
print(response.answer)

Modular Design

One of Mini RAG’s strengths is its modularity. You can:

Use Individual Components

from mini.loader import DocumentLoader
from mini.chunker import Chunker
from mini.embedding import EmbeddingModel

# Use components independently
loader = DocumentLoader()
text = loader.load("document.pdf")

chunker = Chunker()
chunks = chunker.chunk(text)

embedding_model = EmbeddingModel()
embeddings = embedding_model.embed_chunks(chunks)

Mix and Match

# Use your own chunking strategy
chunks = my_custom_chunker(text)

# But use Mini RAG for embeddings and storage
embeddings = embedding_model.embed_chunks(chunks)
vector_store.insert(embeddings, chunks)

Build Custom Pipelines

# Create your own RAG pipeline
class CustomRAG:
    def __init__(self):
        self.loader = DocumentLoader()
        self.chunker = Chunker()
        self.embedding_model = EmbeddingModel()
        self.vector_store = VectorStore(...)
    
    def index(self, path):
        text = self.loader.load(path)
        chunks = self.chunker.chunk(text)
        embeddings = self.embedding_model.embed_chunks(chunks)
        return self.vector_store.insert(embeddings, chunks)

Configuration-Based API

Mini RAG uses a clean, configuration-based API that organizes settings into logical groups:

from mini import AgenticRAG, LLMConfig, RetrievalConfig, RerankerConfig

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    # LLM settings
    llm_config=LLMConfig(
        model="gpt-4o-mini",
        temperature=0.7
    ),
    # Retrieval settings
    retrieval_config=RetrievalConfig(
        top_k=10,
        use_query_rewriting=True,
        use_hybrid_search=True
    ),
    # Reranker settings
    reranker_config=RerankerConfig(
        type="cohere"
    )
)

Benefits

Better Organization

Related settings grouped together logically

Type Safety

Validated with Pydantic dataclasses

Easy Maintenance

Change one config without affecting others

Clear Code

Self-documenting configuration objects

Key Design Principles

Simplicity First

Mini RAG prioritizes ease of use. Get started with just a few lines of code, then customize as needed.

Production Ready

Built with production use cases in mind: error handling, retries, timeouts, observability, and comprehensive configuration.

Modular & Extensible

Use the full pipeline or individual components. Easy to extend with custom implementations.

Pythonic API

Clean, intuitive API that follows Python best practices and conventions.

Type Safe

Leverages Pydantic for data validation and type safety throughout the library.

Understanding the Response

When you query Mini RAG, you get a comprehensive response object:

response = rag.query("What is the budget?")

# Access different parts of the response
print(response.answer)              # Generated answer
print(response.original_query)      # Your original query
print(response.rewritten_queries)   # Query variations (if enabled)
print(response.retrieved_chunks)    # Retrieved context chunks
print(response.metadata)            # Additional metadata

# Inspect retrieved chunks
for chunk in response.retrieved_chunks:
    print(chunk.text)               # Chunk text
    print(chunk.score)              # Similarity score
    print(chunk.reranked_score)     # Reranked score (if enabled)
    print(chunk.metadata)           # Chunk metadata

Next Steps

Document Loading

Learn how to load documents from various formats

Chunking

Understand text chunking strategies

Embeddings

Explore embedding generation options

Vector Store

Master vector storage and search

Getting Started

Features

Guides

Examples

Core Concepts

What is RAG?

Mini RAG Architecture

Core Components

Document Loader

Chunker

Embedding Model

Vector Store

The RAG Pipeline

1. Indexing Phase

2. Query Phase

Modular Design

Use Individual Components

Mix and Match

Build Custom Pipelines

Configuration-Based API

Benefits

Better Organization

Type Safety

Easy Maintenance

Clear Code

Key Design Principles

Understanding the Response

Next Steps

Document Loading

Chunking

Embeddings

Vector Store

Getting Started

Core Concepts

Features

Guides

Examples

​What is RAG?

​Mini RAG Architecture

​Core Components

Document Loader

Chunker

Embedding Model

Vector Store

​The RAG Pipeline

​1. Indexing Phase

​2. Query Phase

​Modular Design

​Use Individual Components

​Mix and Match

​Build Custom Pipelines

​Configuration-Based API

​Benefits

Better Organization

Type Safety

Easy Maintenance

Clear Code

​Key Design Principles

​Understanding the Response

​Next Steps

Document Loading

Chunking

Embeddings

Vector Store

What is RAG?

Mini RAG Architecture

Core Components

The RAG Pipeline

1. Indexing Phase

2. Query Phase

Modular Design

Use Individual Components

Mix and Match

Build Custom Pipelines

Configuration-Based API

Benefits

Key Design Principles

Understanding the Response

Next Steps