Skip to main content

Overview

Mini RAG includes built-in observability through Langfuse, allowing you to:
  • Track operations: Monitor indexing, queries, and all pipeline steps
  • Measure performance: Analyze latency, token usage, and costs
  • Debug issues: View detailed traces with inputs/outputs
  • Optimize quality: Understand what’s working and what’s not

Quick Start

1

Get Langfuse Account

Sign up for free at cloud.langfuse.com
2

Get API Keys

Create a new project and copy your API keys
3

Configure Environment

Add keys to your .env file:
.env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com
4

Enable in Code

from mini import AgenticRAG, ObservabilityConfig

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    observability_config=ObservabilityConfig(enabled=True)
)

Configuration

Using Environment Variables

from mini import ObservabilityConfig

# Reads from environment variables
observability_config = ObservabilityConfig(
    enabled=True
)

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    observability_config=observability_config
)

Explicit Configuration

import os
from mini import ObservabilityConfig

observability_config = ObservabilityConfig(
    enabled=True,
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    host="https://cloud.langfuse.com"
)

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    observability_config=observability_config
)

Disabling Observability

# Default: disabled
rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model
    # observability_config not specified = disabled
)

# Explicit disable
observability_config = ObservabilityConfig(enabled=False)

What Gets Tracked

When observability is enabled, Mini RAG automatically traces:

Query Operations

  • Query input
  • Query rewriting
  • Generated variations
  • Retrieval results
  • Re-ranking scores
  • Final answer
  • Response metadata

Indexing Operations

  • Document loading
  • Chunking process
  • Embedding generation
  • Vector storage
  • Chunk counts
  • Processing time

Performance Metrics

  • Latency per step
  • Total query time
  • Token usage
  • API calls
  • Costs

LLM Interactions

  • Model used
  • Prompts sent
  • Responses received
  • Token counts
  • Temperature settings

Langfuse Dashboard

Once enabled, view traces in the Langfuse dashboard:

Traces View

See all operations in a timeline:
Query Trace
├── Query Rewriting (50ms, $0.0001)
│   ├── Input: "What is the budget?"
│   └── Output: ["How much funding...", "Budget allocation..."]
├── Embedding (30ms, $0.0001)
├── Vector Search (20ms, free)
├── Re-ranking (100ms, $0.0003)
└── Answer Generation (500ms, $0.002)
    ├── Context: [3 chunks]
    └── Answer: "The budget is..."

Total: 700ms, $0.0025

Metrics View

Track aggregate metrics:
  • Query count: Number of queries per day/week/month
  • Average latency: Mean response time
  • Cost tracking: Total API costs
  • Token usage: Tokens consumed
  • Error rates: Failed operations

Sessions View

Group related queries:
# Queries are automatically grouped by session
response1 = rag.query("What is the budget?")
response2 = rag.query("How does it compare to last year?")
response3 = rag.query("What are the major expenses?")

# View as a session in Langfuse dashboard

Use Cases

Debugging Poor Answers

observability_config = ObservabilityConfig(enabled=True)
rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    observability_config=observability_config
)

response = rag.query("What is the budget?")

# Check Langfuse dashboard:
# - Were good chunks retrieved?
# - Did reranking help or hurt?
# - Was the LLM prompt appropriate?
# - Did query rewriting generate good variations?

Performance Optimization

# Enable observability to measure
observability_config = ObservabilityConfig(enabled=True)

# Test different configurations
configs = [
    RetrievalConfig(top_k=5),
    RetrievalConfig(top_k=10),
    RetrievalConfig(top_k=15)
]

for config in configs:
    rag = AgenticRAG(
        vector_store=vector_store,
        embedding_model=embedding_model,
        retrieval_config=config,
        observability_config=observability_config
    )
    
    response = rag.query("Test query")
    
# Compare latency and quality in Langfuse

Cost Tracking

# Track costs across different setups
observability_config = ObservabilityConfig(enabled=True)

# Monitor costs in Langfuse:
# - LLM API calls
# - Embedding API calls
# - Re-ranking API calls (if using Cohere)
# - Total cost per query
# - Cost trends over time

Quality Monitoring

# Track retrieval quality
observability_config = ObservabilityConfig(enabled=True)

rag = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    observability_config=observability_config
)

# In Langfuse, monitor:
# - Retrieval scores
# - Reranking improvements
# - Context relevance
# - Answer quality (with user feedback)

Best Practices

Always use observability during development:
import os

# Enable based on environment
is_dev = os.getenv("ENVIRONMENT") == "development"

observability_config = ObservabilityConfig(
    enabled=is_dev
)
Sample traces in production to reduce overhead:
import random

# Sample 10% of queries
sample_rate = 0.1

observability_config = ObservabilityConfig(
    enabled=random.random() < sample_rate
)
Note: Langfuse also supports sampling at the platform level.
Collect user feedback in Langfuse:After getting responses, you can add feedback through Langfuse SDK:
from langfuse import Langfuse

langfuse = Langfuse()

# Add user feedback
langfuse.score(
    name="user_feedback",
    value=1,  # 1 for positive, 0 for negative
    trace_id=trace_id
)
Organize traces with tags:Langfuse automatically captures metadata from Mini RAG operations. You can add custom tags through the Langfuse SDK.

Performance Impact

Observability has minimal performance impact:
OperationOverheadImpact
Query trace~5-10msNegligible
Indexing trace~10-20msNegligible
Network callsAsyncNon-blocking
Data collectionMinimal< 1% CPU
Recommendation: Enable in all environments, use sampling in high-traffic production.

Privacy & Security

What’s sent:
  • Query text
  • Retrieved chunks
  • LLM responses
  • Metadata and scores
What’s NOT sent:
  • API keys (stored encrypted)
  • Vector embeddings
  • Raw documents
Langfuse can be self-hosted:
observability_config = ObservabilityConfig(
    enabled=True,
    host="https://your-langfuse-instance.com"
)
See Langfuse self-hosting docs for setup.
Disable for sensitive content:
# Conditional observability
has_sensitive_data = check_for_pii(query)

observability_config = ObservabilityConfig(
    enabled=not has_sensitive_data
)

Troubleshooting

Solution: Check your API keys:
echo $LANGFUSE_PUBLIC_KEY
echo $LANGFUSE_SECRET_KEY
Ensure they’re valid and have correct permissions.
Solution: Verify Langfuse host:
observability_config = ObservabilityConfig(
    enabled=True,
    host="https://cloud.langfuse.com"  # Correct URL
)
Check network connectivity and firewall rules.
Solution: Some operations may not be traced if they fail early. Check:
  1. Operation completed successfully
  2. Langfuse SDK is up to date
  3. No network interruptions
Solution: Langfuse calls are async, but if you experience issues:
  1. Check network latency to Langfuse
  2. Consider self-hosting closer to your infrastructure
  3. Use sampling to reduce trace volume

Advanced Features

Custom Spans

You can add custom spans using the Langfuse SDK:
from langfuse import Langfuse

langfuse = Langfuse()

# Create custom span
with langfuse.span(name="custom_processing") as span:
    # Your custom processing
    result = process_data(data)
    span.end(output=result)

Experiments

Compare different configurations:
# Tag traces with experiment name
# (Use Langfuse SDK for custom tagging)

# Configuration A
rag_a = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    retrieval_config=RetrievalConfig(top_k=5),
    observability_config=ObservabilityConfig(enabled=True)
)

# Configuration B
rag_b = AgenticRAG(
    vector_store=vector_store,
    embedding_model=embedding_model,
    retrieval_config=RetrievalConfig(top_k=10),
    observability_config=ObservabilityConfig(enabled=True)
)

# Compare in Langfuse dashboard

Datasets

Use Langfuse datasets to track evaluation metrics:
# Create test dataset in Langfuse
# Run queries and compare results
# Track metrics over time

Next Steps