Skip to main content

Overview

EmbeddingModel generates vector embeddings from text using OpenAI or any OpenAI-compatible API. It supports custom endpoints for Azure OpenAI, local models, and more.

Constructor

from mini.embedding import EmbeddingModel

embedding_model = EmbeddingModel(
    api_key: Optional[str] = None,
    base_url: Optional[str] = None,
    model: Optional[str] = None,
    dimensions: Optional[int] = None,
    timeout: float = 60.0,
    max_retries: int = 3
)

Parameters

api_key
str
API key for the embedding service (defaults to OPENAI_API_KEY env var)
base_url
str
Base URL for the API endpoint (defaults to OPENAI_BASE_URL env var or OpenAI’s API)
model
str
Model identifier (defaults to EMBEDDING_MODEL env var or “text-embedding-3-small”)
dimensions
int
Output dimension for embeddings (if supported by model)
timeout
float
default:"60.0"
Request timeout in seconds
max_retries
int
default:"3"
Maximum number of retry attempts on failure

Examples

Using OpenAI

from mini.embedding import EmbeddingModel

# Uses OPENAI_API_KEY from environment
embedding_model = EmbeddingModel()

# Or specify explicitly
embedding_model = EmbeddingModel(
    api_key="sk-...",
    model="text-embedding-3-small"
)

Using Azure OpenAI

embedding_model = EmbeddingModel(
    api_key="your-azure-key",
    base_url="https://your-resource.openai.azure.com/openai/deployments/your-deployment",
    model="text-embedding-ada-002"
)

Using Local Model

# e.g., llama.cpp, vLLM, or other OpenAI-compatible server
embedding_model = EmbeddingModel(
    api_key="not-needed",
    base_url="http://localhost:8080/v1",
    model="local-embedding-model"
)

Methods

embed_chunks

Generate embeddings for multiple text chunks.
def embed_chunks(
    self,
    chunks: List[str]
) -> List[List[float]]

Parameters

chunks
List[str]
required
List of text strings to embed

Returns

embeddings
List[List[float]]
List of embedding vectors, one per input chunk

Example

from mini.embedding import EmbeddingModel

embedding_model = EmbeddingModel()

# Embed multiple chunks
chunks = [
    "This is the first chunk of text.",
    "This is the second chunk of text.",
    "This is the third chunk of text."
]

embeddings = embedding_model.embed_chunks(chunks)

print(f"Generated {len(embeddings)} embeddings")
print(f"Embedding dimension: {len(embeddings[0])}")

embed_query

Generate embedding for a single query text.
def embed_query(
    self,
    query: str
) -> List[float]

Parameters

query
str
required
Query text to embed

Returns

embedding
List[float]
Embedding vector for the query

Example

from mini.embedding import EmbeddingModel

embedding_model = EmbeddingModel()

# Embed a query
query = "What is machine learning?"
embedding = embedding_model.embed_query(query)

print(f"Query embedding dimension: {len(embedding)}")

Supported Models

OpenAI Models

  • Dimensions: 1536 (default) or configurable
  • Max tokens: 8191
  • Cost: Low
  • Speed: Fast
  • Best for: Most applications, good balance
  • Dimensions: 3072 (default) or configurable
  • Max tokens: 8191
  • Cost: Higher
  • Speed: Moderate
  • Best for: Higher quality requirements
  • Dimensions: 1536 (fixed)
  • Max tokens: 8191
  • Cost: Moderate
  • Speed: Fast
  • Best for: Legacy applications

Custom Dimensions

# Use custom dimensions (if model supports it)
embedding_model = EmbeddingModel(
    model="text-embedding-3-small",
    dimensions=512  # Smaller for faster search
)

Complete Example

from mini.loader import DocumentLoader
from mini.chunker import Chunker
from mini.embedding import EmbeddingModel

# Initialize components
loader = DocumentLoader()
chunker = Chunker()
embedding_model = EmbeddingModel(
    model="text-embedding-3-small"
)

# Load and chunk document
text = loader.load("document.pdf")
chunks = chunker.chunk(text)

print(f"Processing {len(chunks)} chunks...")

# Generate embeddings
chunk_texts = [chunk.text for chunk in chunks]
embeddings = embedding_model.embed_chunks(chunk_texts)

print(f"Generated {len(embeddings)} embeddings")
print(f"Embedding dimension: {len(embeddings[0])}")

# Embed a query
query = "What is this document about?"
query_embedding = embedding_model.embed_query(query)

# Calculate similarity (cosine)
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Find most similar chunk
similarities = [
    cosine_similarity(query_embedding, emb)
    for emb in embeddings
]

best_idx = np.argmax(similarities)
print(f"\nMost similar chunk ({similarities[best_idx]:.3f}):")
print(chunks[best_idx].text[:200])

Error Handling

from mini.embedding import EmbeddingModel

embedding_model = EmbeddingModel()

try:
    embeddings = embedding_model.embed_chunks(chunks)
except Exception as e:
    print(f"Embedding failed: {e}")
    # Handle error (retry, log, etc.)

Best Practices

Process chunks in batches for efficiency:
batch_size = 100
all_embeddings = []

for i in range(0, len(chunks), batch_size):
    batch = chunks[i:i + batch_size]
    embeddings = embedding_model.embed_chunks(batch)
    all_embeddings.extend(embeddings)
Respect API rate limits:
import time

embeddings = []
for chunk in chunks:
    emb = embedding_model.embed_query(chunk)
    embeddings.append(emb)
    time.sleep(0.1)  # Avoid rate limits
Or use embed_chunks which handles batching automatically.
Use environment variables for credentials:
# .env file
OPENAI_API_KEY=sk-...
EMBEDDING_MODEL=text-embedding-3-small
# Code
from dotenv import load_dotenv
load_dotenv()

# Automatically uses env vars
embedding_model = EmbeddingModel()
Ensure dimensions match your vector store:
# Embedding model
embedding_model = EmbeddingModel(
    model="text-embedding-3-small",
    dimensions=1536
)

# Vector store (must match!)
vector_store = VectorStore(
    ...,
    dimension=1536
)

Performance Tips

  • text-embedding-3-small: Best balance for most use cases
  • text-embedding-3-large: Higher quality, slower, more expensive
  • Local models: Best for privacy, requires infrastructure
Larger batches are more efficient:
# Better
embeddings = embedding_model.embed_chunks(chunks)  # All at once

# Slower
embeddings = [
    embedding_model.embed_query(chunk)
    for chunk in chunks
]
Store embeddings to avoid regenerating:
import pickle

# Save
with open('embeddings.pkl', 'wb') as f:
    pickle.dump(embeddings, f)

# Load
with open('embeddings.pkl', 'rb') as f:
    embeddings = pickle.load(f)

Troubleshooting

Ensure your API key is set:
import os
print(os.getenv("OPENAI_API_KEY"))  # Should not be None
Check model dimensions:
embedding = embedding_model.embed_query("test")
print(f"Dimension: {len(embedding)}")
Handle rate limits gracefully:
from time import sleep

try:
    embeddings = embedding_model.embed_chunks(chunks)
except Exception as e:
    if "rate_limit" in str(e).lower():
        print("Rate limited, waiting...")
        sleep(60)
        embeddings = embedding_model.embed_chunks(chunks)

See Also