Overview
EmbeddingModel generates vector embeddings from text using OpenAI or any OpenAI-compatible API. It supports custom endpoints for Azure OpenAI, local models, and more.
Constructor
from mini.embedding import EmbeddingModel
embedding_model = EmbeddingModel(
api_key: Optional[ str ] = None ,
base_url: Optional[ str ] = None ,
model: Optional[ str ] = None ,
dimensions: Optional[ int ] = None ,
timeout: float = 60.0 ,
max_retries: int = 3
)
Parameters
API key for the embedding service (defaults to OPENAI_API_KEY env var)
Base URL for the API endpoint (defaults to OPENAI_BASE_URL env var or OpenAI’s API)
Model identifier (defaults to EMBEDDING_MODEL env var or “text-embedding-3-small”)
Output dimension for embeddings (if supported by model)
Request timeout in seconds
Maximum number of retry attempts on failure
Examples
Using OpenAI
from mini.embedding import EmbeddingModel
# Uses OPENAI_API_KEY from environment
embedding_model = EmbeddingModel()
# Or specify explicitly
embedding_model = EmbeddingModel(
api_key = "sk-..." ,
model = "text-embedding-3-small"
)
Using Azure OpenAI
embedding_model = EmbeddingModel(
api_key = "your-azure-key" ,
base_url = "https://your-resource.openai.azure.com/openai/deployments/your-deployment" ,
model = "text-embedding-ada-002"
)
Using Local Model
# e.g., llama.cpp, vLLM, or other OpenAI-compatible server
embedding_model = EmbeddingModel(
api_key = "not-needed" ,
base_url = "http://localhost:8080/v1" ,
model = "local-embedding-model"
)
Methods
embed_chunks
Generate embeddings for multiple text chunks.
def embed_chunks (
self ,
chunks : List[ str ]
) -> List[List[ float ]]
Parameters
List of text strings to embed
Returns
List of embedding vectors, one per input chunk
Example
from mini.embedding import EmbeddingModel
embedding_model = EmbeddingModel()
# Embed multiple chunks
chunks = [
"This is the first chunk of text." ,
"This is the second chunk of text." ,
"This is the third chunk of text."
]
embeddings = embedding_model.embed_chunks(chunks)
print ( f "Generated { len (embeddings) } embeddings" )
print ( f "Embedding dimension: { len (embeddings[ 0 ]) } " )
embed_query
Generate embedding for a single query text.
def embed_query (
self ,
query : str
) -> List[ float ]
Parameters
Returns
Embedding vector for the query
Example
from mini.embedding import EmbeddingModel
embedding_model = EmbeddingModel()
# Embed a query
query = "What is machine learning?"
embedding = embedding_model.embed_query(query)
print ( f "Query embedding dimension: { len (embedding) } " )
Supported Models
OpenAI Models
Dimensions : 1536 (default) or configurable
Max tokens : 8191
Cost : Low
Speed : Fast
Best for : Most applications, good balance
Dimensions : 3072 (default) or configurable
Max tokens : 8191
Cost : Higher
Speed : Moderate
Best for : Higher quality requirements
Dimensions : 1536 (fixed)
Max tokens : 8191
Cost : Moderate
Speed : Fast
Best for : Legacy applications
Custom Dimensions
# Use custom dimensions (if model supports it)
embedding_model = EmbeddingModel(
model = "text-embedding-3-small" ,
dimensions = 512 # Smaller for faster search
)
Complete Example
from mini.loader import DocumentLoader
from mini.chunker import Chunker
from mini.embedding import EmbeddingModel
# Initialize components
loader = DocumentLoader()
chunker = Chunker()
embedding_model = EmbeddingModel(
model = "text-embedding-3-small"
)
# Load and chunk document
text = loader.load( "document.pdf" )
chunks = chunker.chunk(text)
print ( f "Processing { len (chunks) } chunks..." )
# Generate embeddings
chunk_texts = [chunk.text for chunk in chunks]
embeddings = embedding_model.embed_chunks(chunk_texts)
print ( f "Generated { len (embeddings) } embeddings" )
print ( f "Embedding dimension: { len (embeddings[ 0 ]) } " )
# Embed a query
query = "What is this document about?"
query_embedding = embedding_model.embed_query(query)
# Calculate similarity (cosine)
import numpy as np
def cosine_similarity ( a , b ):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Find most similar chunk
similarities = [
cosine_similarity(query_embedding, emb)
for emb in embeddings
]
best_idx = np.argmax(similarities)
print ( f " \n Most similar chunk ( { similarities[best_idx] :.3f} ):" )
print (chunks[best_idx].text[: 200 ])
Error Handling
from mini.embedding import EmbeddingModel
embedding_model = EmbeddingModel()
try :
embeddings = embedding_model.embed_chunks(chunks)
except Exception as e:
print ( f "Embedding failed: { e } " )
# Handle error (retry, log, etc.)
Best Practices
Process chunks in batches for efficiency: batch_size = 100
all_embeddings = []
for i in range ( 0 , len (chunks), batch_size):
batch = chunks[i:i + batch_size]
embeddings = embedding_model.embed_chunks(batch)
all_embeddings.extend(embeddings)
Respect API rate limits: import time
embeddings = []
for chunk in chunks:
emb = embedding_model.embed_query(chunk)
embeddings.append(emb)
time.sleep( 0.1 ) # Avoid rate limits
Or use embed_chunks which handles batching automatically.
Use environment variables for credentials: # .env file
OPENAI_API_KEY = sk-...
EMBEDDING_MODEL = text-embedding-3-small
# Code
from dotenv import load_dotenv
load_dotenv()
# Automatically uses env vars
embedding_model = EmbeddingModel()
Ensure dimensions match your vector store: # Embedding model
embedding_model = EmbeddingModel(
model = "text-embedding-3-small" ,
dimensions = 1536
)
# Vector store (must match!)
vector_store = VectorStore(
... ,
dimension = 1536
)
text-embedding-3-small : Best balance for most use cases
text-embedding-3-large : Higher quality, slower, more expensive
Local models : Best for privacy, requires infrastructure
Larger batches are more efficient: # Better
embeddings = embedding_model.embed_chunks(chunks) # All at once
# Slower
embeddings = [
embedding_model.embed_query(chunk)
for chunk in chunks
]
Store embeddings to avoid regenerating: import pickle
# Save
with open ( 'embeddings.pkl' , 'wb' ) as f:
pickle.dump(embeddings, f)
# Load
with open ( 'embeddings.pkl' , 'rb' ) as f:
embeddings = pickle.load(f)
Troubleshooting
Ensure your API key is set: import os
print (os.getenv( "OPENAI_API_KEY" )) # Should not be None
Check model dimensions: embedding = embedding_model.embed_query( "test" )
print ( f "Dimension: { len (embedding) } " )
Handle rate limits gracefully: from time import sleep
try :
embeddings = embedding_model.embed_chunks(chunks)
except Exception as e:
if "rate_limit" in str (e).lower():
print ( "Rate limited, waiting..." )
sleep( 60 )
embeddings = embedding_model.embed_chunks(chunks)
See Also