Skip to main content

Overview

The EmbeddingModel class converts text into vector embeddings—numerical representations that capture semantic meaning. These embeddings enable:
  • Semantic search: Find conceptually similar content
  • Similarity comparison: Measure text relatedness
  • Vector storage: Store and retrieve content efficiently
Mini RAG supports OpenAI, Azure OpenAI, and any OpenAI-compatible embedding API.

Basic Usage

from mini.embedding import EmbeddingModel

# Initialize with defaults (uses environment variables)
embedding_model = EmbeddingModel()

# Embed multiple chunks
texts = ["First chunk", "Second chunk", "Third chunk"]
embeddings = embedding_model.embed_chunks(texts)

print(f"Generated {len(embeddings)} embeddings")
print(f"Dimension: {len(embeddings[0])}")

Configuration

Using Environment Variables

The simplest approach uses environment variables:
.env
OPENAI_API_KEY=sk-your-api-key
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_BASE_URL=https://api.openai.com/v1  # Optional
from mini.embedding import EmbeddingModel

# Reads from environment variables
embedding_model = EmbeddingModel()

Using EmbeddingConfig

For more control, use EmbeddingConfig:
from mini.embedding import EmbeddingConfig, EmbeddingModel

config = EmbeddingConfig(
    api_key="sk-your-api-key",
    base_url="https://api.openai.com/v1",
    model="text-embedding-3-small",
    dimensions=1536,      # Optional: specify dimension
    timeout=60.0,         # Request timeout
    max_retries=3         # Retry failed requests
)

embedding_model = EmbeddingModel(config=config)

Direct Initialization

embedding_model = EmbeddingModel(
    api_key="sk-your-api-key",
    model="text-embedding-3-small",
    base_url="https://api.openai.com/v1",
    dimensions=1536
)

Supported Models

OpenAI Models

# text-embedding-3-small (default, 1536 dimensions)
embedding_model = EmbeddingModel(model="text-embedding-3-small")

# text-embedding-3-large (3072 dimensions, higher quality)
embedding_model = EmbeddingModel(model="text-embedding-3-large")

# text-embedding-ada-002 (legacy, 1536 dimensions)
embedding_model = EmbeddingModel(model="text-embedding-ada-002")
ModelDimensionsCostBest For
text-embedding-3-small1536LowestGeneral use, fast retrieval
text-embedding-3-large3072HigherHigh accuracy requirements
text-embedding-ada-0021536MediumLegacy applications

Azure OpenAI

embedding_model = EmbeddingModel(
    api_key="your-azure-key",
    base_url="https://your-resource.openai.azure.com/openai/deployments/your-deployment",
    model="text-embedding-ada-002"
)

Local or Self-Hosted Models

Any OpenAI-compatible API works:
# Using llama.cpp
embedding_model = EmbeddingModel(
    api_key="not-needed",
    base_url="http://localhost:8080/v1",
    model="your-model-name"
)

# Using Ollama
embedding_model = EmbeddingModel(
    api_key="not-needed",
    base_url="http://localhost:11434/v1",
    model="nomic-embed-text"
)

Embedding Operations

Embed Text Chunks

chunks = ["First document chunk", "Second document chunk"]
embeddings = embedding_model.embed_chunks(chunks)

# Each embedding is a list of floats
print(f"Embedding dimension: {len(embeddings[0])}")  # e.g., 1536

Embed a Query

query = "What is the budget for education?"
query_embedding = embedding_model.embed_query(query)

# Returns a single embedding vector
print(f"Query embedding dimension: {len(query_embedding)}")

Integration with AgenticRAG

When using AgenticRAG, embedding is handled automatically:
from mini import AgenticRAG, EmbeddingModel, VectorStore

# Initialize
embedding_model = EmbeddingModel()
vector_store = VectorStore(
    uri=os.getenv("MILVUS_URI"),
    token=os.getenv("MILVUS_TOKEN"),
    collection_name="docs",
    dimension=1536  # Must match embedding dimension
)

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model
)

# Embeddings are generated automatically
rag.index_document("document.pdf")
response = rag.query("What is this about?")

Best Practices

Choose the right model for your use case:
  • text-embedding-3-small: Best balance of cost and performance
  • text-embedding-3-large: When accuracy is critical
  • Local models: When data privacy is required
# For most use cases
embedding_model = EmbeddingModel(model="text-embedding-3-small")
Ensure dimensions match between embeddings and vector store:
# Embedding model dimension
embedding_model = EmbeddingModel(model="text-embedding-3-small")  # 1536

# Vector store must match
vector_store = VectorStore(
    uri=os.getenv("MILVUS_URI"),
    token=os.getenv("MILVUS_TOKEN"),
    collection_name="docs",
    dimension=1536  # Must match!
)
Process multiple texts in batches for efficiency:
# Good: Batch processing
embeddings = embedding_model.embed_chunks(chunks)

# Bad: Individual processing
# embeddings = [embedding_model.embed_query(c) for c in chunks]
Handle API failures gracefully:
try:
    embeddings = embedding_model.embed_chunks(chunks)
except Exception as e:
    print(f"Error generating embeddings: {e}")
    # Implement retry logic or fallback

Advanced Usage

Custom Dimensions

Some models support custom dimensions:
# Reduce dimensions for faster search (3-large only)
embedding_model = EmbeddingModel(
    model="text-embedding-3-large",
    dimensions=1536  # Reduce from 3072
)

Timeout and Retry Configuration

from mini.embedding import EmbeddingConfig, EmbeddingModel

config = EmbeddingConfig(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="text-embedding-3-small",
    timeout=120.0,      # 2 minute timeout
    max_retries=5       # Retry up to 5 times
)

embedding_model = EmbeddingModel(config=config)

Using with Custom Pipelines

from mini.loader import DocumentLoader
from mini.chunker import Chunker
from mini.embedding import EmbeddingModel

# Build custom pipeline
loader = DocumentLoader()
chunker = Chunker()
embedding_model = EmbeddingModel()

# Process document
text = loader.load("document.pdf")
chunks = chunker.chunk(text)
chunk_texts = [c.text for c in chunks]

# Generate embeddings
embeddings = embedding_model.embed_chunks(chunk_texts)

# Now store in vector database
# ...

Performance Considerations

Batch Size

Process 100-500 chunks per batch for optimal performance

Rate Limits

Respect API rate limits (handled automatically with retries)

Caching

Cache embeddings for frequently accessed content

Model Choice

Smaller models are faster; larger models are more accurate

Cost Optimization

ModelPrice per 1M tokens10K chunks (~1M tokens)
text-embedding-3-small$0.02~$0.02
text-embedding-3-large$0.13~$0.13
text-embedding-ada-002$0.10~$0.10
For most applications, text-embedding-3-small offers the best balance of cost and quality.

Troubleshooting

Solution: Verify your API key:
echo $OPENAI_API_KEY
Or check your configuration:
print(embedding_model.client.api_key)
Solution: Ensure vector store dimension matches embedding dimension:
# Check embedding dimension
test_embedding = embedding_model.embed_query("test")
print(f"Embedding dimension: {len(test_embedding)}")

# Create vector store with matching dimension
vector_store = VectorStore(
    uri=os.getenv("MILVUS_URI"),
    token=os.getenv("MILVUS_TOKEN"),
    collection_name="docs",
    dimension=len(test_embedding)
)
Solution: The library handles retries automatically, but you can adjust:
config = EmbeddingConfig(
    max_retries=5,
    timeout=120.0
)
Solution: Use a smaller model or batch process:
# Use smaller model
embedding_model = EmbeddingModel(model="text-embedding-3-small")

# Process in batches
batch_size = 100
for i in range(0, len(chunks), batch_size):
    batch = chunks[i:i+batch_size]
    embeddings = embedding_model.embed_chunks(batch)

Next Steps