Overview
Mini RAG includes built-in observability through Langfuse, allowing you to:- Track operations: Monitor indexing, queries, and all pipeline steps
- Measure performance: Analyze latency, token usage, and costs
- Debug issues: View detailed traces with inputs/outputs
- Optimize quality: Understand what’s working and what’s not
Quick Start
Get Langfuse Account
Sign up for free at cloud.langfuse.com
Configuration
Using Environment Variables
Explicit Configuration
Disabling Observability
What Gets Tracked
When observability is enabled, Mini RAG automatically traces:Query Operations
- Query input
- Query rewriting
- Generated variations
- Retrieval results
- Re-ranking scores
- Final answer
- Response metadata
Indexing Operations
- Document loading
- Chunking process
- Embedding generation
- Vector storage
- Chunk counts
- Processing time
Performance Metrics
- Latency per step
- Total query time
- Token usage
- API calls
- Costs
LLM Interactions
- Model used
- Prompts sent
- Responses received
- Token counts
- Temperature settings
Langfuse Dashboard
Once enabled, view traces in the Langfuse dashboard:Traces View
See all operations in a timeline:Metrics View
Track aggregate metrics:- Query count: Number of queries per day/week/month
- Average latency: Mean response time
- Cost tracking: Total API costs
- Token usage: Tokens consumed
- Error rates: Failed operations
Sessions View
Group related queries:Use Cases
Debugging Poor Answers
Performance Optimization
Cost Tracking
Quality Monitoring
Best Practices
Enable in Development
Enable in Development
Always use observability during development:
Sample in Production
Sample in Production
Sample traces in production to reduce overhead:Note: Langfuse also supports sampling at the platform level.
Add User Feedback
Add User Feedback
Collect user feedback in Langfuse:After getting responses, you can add feedback through Langfuse SDK:
Use Tags and Metadata
Use Tags and Metadata
Performance Impact
Observability has minimal performance impact:| Operation | Overhead | Impact |
|---|---|---|
| Query trace | ~5-10ms | Negligible |
| Indexing trace | ~10-20ms | Negligible |
| Network calls | Async | Non-blocking |
| Data collection | Minimal | < 1% CPU |
Privacy & Security
Data Sent to Langfuse
Data Sent to Langfuse
What’s sent:
- Query text
- Retrieved chunks
- LLM responses
- Metadata and scores
- API keys (stored encrypted)
- Vector embeddings
- Raw documents
Self-Hosting
Self-Hosting
Disabling for Sensitive Data
Disabling for Sensitive Data
Disable for sensitive content:
Troubleshooting
Traces not appearing
Traces not appearing
Solution: Check your API keys:Ensure they’re valid and have correct permissions.
Connection errors
Connection errors
Solution: Verify Langfuse host:Check network connectivity and firewall rules.
Missing trace data
Missing trace data
Solution: Some operations may not be traced if they fail early. Check:
- Operation completed successfully
- Langfuse SDK is up to date
- No network interruptions
High latency
High latency
Solution: Langfuse calls are async, but if you experience issues:
- Check network latency to Langfuse
- Consider self-hosting closer to your infrastructure
- Use sampling to reduce trace volume
Advanced Features
Custom Spans
You can add custom spans using the Langfuse SDK:Experiments
Compare different configurations:Datasets
Use Langfuse datasets to track evaluation metrics:Next Steps
Langfuse Documentation
Learn more about Langfuse features
AgenticRAG
Complete RAG pipeline documentation
Production Guide
Deploy Mini RAG to production
Examples
See observability in action
