What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by providing them with relevant information from external knowledge sources. Instead of relying solely on the model’s training data, RAG systems:- Retrieve relevant information from a knowledge base
- Augment the user’s query with this context
- Generate an informed response using an LLM
Mini RAG Architecture
Mini RAG follows a modular, pipeline-based architecture that makes it easy to understand, customize, and extend:Core Components
Document Loader
Load and parse documents from multiple formats (PDF, DOCX, images, etc.)
Chunker
Split documents into optimal chunks for embedding and retrieval
Embedding Model
Convert text into vector embeddings for semantic search
Vector Store
Store and search embeddings using Milvus
The RAG Pipeline
1. Indexing Phase
When you index a document, Mini RAG performs the following steps:1
Load Document
The
DocumentLoader reads and converts the document to text using MarkItDown2
Chunk Text
The
Chunker splits the text into optimal chunks using Chonkie3
Generate Embeddings
The
EmbeddingModel converts each chunk into a vector embedding4
Store Vectors
The
VectorStore saves embeddings and metadata to Milvus2. Query Phase
When you query the system, Mini RAG:1
Rewrite Query (Optional)
Generate multiple query variations to improve retrieval coverage
2
Embed Query
Convert the query (and variations) into vector embeddings
3
Search
Find the most similar chunks using vector search (or hybrid search)
4
Rerank (Optional)
Re-rank retrieved chunks for better relevance
5
Generate Answer
Use LLM to generate an answer based on retrieved context
Modular Design
One of Mini RAG’s strengths is its modularity. You can:Use Individual Components
Mix and Match
Build Custom Pipelines
Configuration-Based API
Mini RAG uses a clean, configuration-based API that organizes settings into logical groups:Benefits
Better Organization
Related settings grouped together logically
Type Safety
Validated with Pydantic dataclasses
Easy Maintenance
Change one config without affecting others
Clear Code
Self-documenting configuration objects
Key Design Principles
Simplicity First
Simplicity First
Mini RAG prioritizes ease of use. Get started with just a few lines of code, then customize as needed.
Production Ready
Production Ready
Built with production use cases in mind: error handling, retries, timeouts, observability, and comprehensive configuration.
Modular & Extensible
Modular & Extensible
Use the full pipeline or individual components. Easy to extend with custom implementations.
Pythonic API
Pythonic API
Clean, intuitive API that follows Python best practices and conventions.
Type Safe
Type Safe
Leverages Pydantic for data validation and type safety throughout the library.
