Overview
Mini RAG includes built-in observability through Langfuse, allowing you to:- Track operations: Monitor indexing, queries, and all pipeline steps
- Measure performance: Analyze latency, token usage, and costs
- Debug issues: View detailed traces with inputs/outputs
- Optimize quality: Understand what’s working and what’s not
Quick Start
1
Get Langfuse Account
Sign up for free at cloud.langfuse.com
2
Get API Keys
Create a new project and copy your API keys
3
Configure Environment
Add keys to your
.env file:.env
4
Enable in Code
Configuration
Using Environment Variables
Explicit Configuration
Disabling Observability
What Gets Tracked
When observability is enabled, Mini RAG automatically traces:Query Operations
- Query input
- Query rewriting
- Generated variations
- Retrieval results
- Re-ranking scores
- Final answer
- Response metadata
Indexing Operations
- Document loading
- Chunking process
- Embedding generation
- Vector storage
- Chunk counts
- Processing time
Performance Metrics
- Latency per step
- Total query time
- Token usage
- API calls
- Costs
LLM Interactions
- Model used
- Prompts sent
- Responses received
- Token counts
- Temperature settings
Langfuse Dashboard
Once enabled, view traces in the Langfuse dashboard:Traces View
See all operations in a timeline:Metrics View
Track aggregate metrics:- Query count: Number of queries per day/week/month
- Average latency: Mean response time
- Cost tracking: Total API costs
- Token usage: Tokens consumed
- Error rates: Failed operations
Sessions View
Group related queries:Use Cases
Debugging Poor Answers
Performance Optimization
Cost Tracking
Quality Monitoring
Best Practices
Enable in Development
Enable in Development
Always use observability during development:
Sample in Production
Sample in Production
Sample traces in production to reduce overhead:Note: Langfuse also supports sampling at the platform level.
Add User Feedback
Add User Feedback
Collect user feedback in Langfuse:After getting responses, you can add feedback through Langfuse SDK:
Use Tags and Metadata
Use Tags and Metadata
Performance Impact
Observability has minimal performance impact:| Operation | Overhead | Impact |
|---|---|---|
| Query trace | ~5-10ms | Negligible |
| Indexing trace | ~10-20ms | Negligible |
| Network calls | Async | Non-blocking |
| Data collection | Minimal | < 1% CPU |
Privacy & Security
Data Sent to Langfuse
Data Sent to Langfuse
What’s sent:
- Query text
- Retrieved chunks
- LLM responses
- Metadata and scores
- API keys (stored encrypted)
- Vector embeddings
- Raw documents
Self-Hosting
Self-Hosting
Disabling for Sensitive Data
Disabling for Sensitive Data
Disable for sensitive content:
Troubleshooting
Traces not appearing
Traces not appearing
Solution: Check your API keys:Ensure they’re valid and have correct permissions.
Connection errors
Connection errors
Solution: Verify Langfuse host:Check network connectivity and firewall rules.
Missing trace data
Missing trace data
Solution: Some operations may not be traced if they fail early. Check:
- Operation completed successfully
- Langfuse SDK is up to date
- No network interruptions
High latency
High latency
Solution: Langfuse calls are async, but if you experience issues:
- Check network latency to Langfuse
- Consider self-hosting closer to your infrastructure
- Use sampling to reduce trace volume
