Tracing Core Concepts
If you want to jump straight to logging traces, you can head to the Log traces or Log agents guides.
Tracing is the foundation of observability in Opik. It allows you to monitor, debug, and optimize your LLM applications by capturing detailed information about their execution. Understanding these core concepts is essential for effectively using Opik’s tracing capabilities.
Overview
When working with LLM applications, understanding what’s happening under the hood is crucial for debugging issues, optimizing performance, and ensuring reliability. Opik’s tracing system provides comprehensive observability by capturing detailed execution information at multiple levels.
In order to effectively use Opik’s tracing capabilities, it’s important to understand these key concepts:
- Trace: A complete execution path representing a single interaction with an LLM or agent
- Span: Individual operations or steps within a trace that represent specific actions or computations
- Thread: A collection of related traces that form a coherent conversation or workflow
- Metric: Quantitative measurements that provide objective assessments of your AI models’ performance
- Optimization: The systematic process of refining and evaluating LLM prompts and configurations
- Evaluation: A framework for systematically testing your prompts and models against datasets
Traces
A trace represents a complete execution path for a single interaction with an LLM or agent. Think of it as a detailed record of everything that happened during one request-response cycle. Each trace captures the full context of the interaction, including inputs, outputs, timing, and any intermediate steps.
Key Characteristics of Traces:
- Unique Identity: Each trace has a unique identifier that allows you to track and reference it
- Complete Context: Contains all the information needed to understand what happened during the interaction
- Timing Information: Records when the interaction started, ended, and how long each part took
- Input/Output Data: Captures the exact prompts sent to the LLM and the responses received
- Metadata: Includes additional context like model used, temperature settings, and custom tags
Example Use Cases:
- Debugging: When an LLM produces unexpected output, you can examine the trace to understand what went wrong
- Performance Analysis: Identify bottlenecks and slow operations by analyzing trace timing
- Cost Tracking: Monitor token usage and associated costs for each interaction
- Quality Assurance: Review traces to ensure your application is behaving as expected
Spans
A span represents an individual operation or step within a trace. While a trace shows the complete picture, spans break down the execution into granular, measurable components. This hierarchical structure allows you to understand both the high-level flow and the detailed operations within your LLM application.
Key Characteristics of Spans:
- Hierarchical Structure: Spans can contain other spans, creating a tree-like structure within a trace
- Specific Operations: Each span represents a distinct action, such as a function call, API request, or data processing step
- Detailed Timing: Precise start and end times for each operation
- Context Preservation: Maintains the relationship between parent and child operations
- Custom Attributes: Can include additional metadata specific to the operation
Common Span Types:
- LLM Calls: Individual requests to language models
- Function Calls: Tool or function invocations within an agent
- Data Processing: Transformations or manipulations of data
- External API Calls: Requests to third-party services
- Custom Operations: Any user-defined operation you want to track
Example Span Hierarchy:
Threads
A thread is a collection of related traces that form a coherent conversation or workflow. Threads are essential for understanding multi-turn interactions and maintaining context across multiple LLM calls. They provide a way to group related traces together, making it easier to analyze conversational patterns and user journeys.
Key Characteristics of Threads:
- Conversation Context: Maintains the flow of multi-turn interactions
- Trace Grouping: Organizes related traces under a single thread identifier
- Temporal Ordering: Traces within a thread are ordered chronologically
- Shared Context: Allows you to see how context evolves throughout a conversation
- Cross-Trace Analysis: Enables analysis of patterns across multiple related interactions
When to Use Threads:
- Chat Applications: Group all messages in a conversation
- Multi-Step Workflows: Track complex processes that span multiple LLM calls
- User Sessions: Organize all interactions from a single user session
- Agent Conversations: Follow the complete interaction between an agent and a user
Thread Management:
Threads are created by defining a thread_id
and referencing it in your traces. This allows you to:
- Maintain Context: Keep track of conversation history and user state
- Debug Conversations: Understand how a conversation evolved over time
- Analyze Patterns: Identify common conversation flows and user behaviors
- Optimize Performance: Find bottlenecks in multi-turn interactions
Metrics
Metrics provide quantitative assessments of your AI models’ outputs, enabling objective comparisons and performance tracking over time. They are essential for understanding how well your LLM applications are performing and identifying areas for improvement.
Key Characteristics of Metrics:
- Quantitative Measurement: Provide numerical scores that can be compared and tracked
- Objective Assessment: Remove subjective bias from performance evaluation
- Trend Analysis: Enable tracking of performance changes over time
- Comparative Analysis: Allow comparison between different models, prompts, or configurations
- Automated Evaluation: Can be computed automatically without human intervention
Common Metric Types:
- Accuracy Metrics: Measure how often the model produces correct outputs
- Quality Metrics: Assess the quality of generated text (e.g., coherence, relevance)
- Efficiency Metrics: Track performance characteristics like latency and throughput
- Cost Metrics: Monitor token usage and associated costs
- Custom Metrics: Domain-specific measurements tailored to your use case
Optimization
Optimization is the systematic process of refining and evaluating LLM prompts and configurations to improve performance. It involves iteratively testing different approaches and using data-driven insights to make improvements.
Key Aspects of Optimization:
- Prompt Engineering: Refining the instructions given to LLMs
- Parameter Tuning: Adjusting model settings like temperature, top-p, and max tokens
- Few-shot Learning: Optimizing example selection for in-context learning
- Tool Integration: Improving how LLMs interact with external tools and functions
- Performance Monitoring: Tracking improvements and regressions over time
Evaluation
Evaluation provides a framework for systematically testing your prompts and models against datasets using various metrics to measure performance. It’s the foundation for making data-driven decisions about your LLM applications.
Key Components of Evaluation:
- Datasets: Collections of test cases with inputs and expected outputs
- Experiments: Individual evaluation runs that test specific configurations
- Metrics: Quantitative measures of performance
- Comparative Analysis: Side-by-side comparison of different approaches
- Statistical Significance: Ensuring results are reliable and reproducible
Learn More
Now that you understand the core concepts, explore these resources to dive deeper:
Tracing and Observability:
- Log traces - Learn how to capture traces in your applications
- Log agents - Understand how to trace agent-based applications
- Annotate traces - Add custom metadata to your traces
- Cost tracking - Monitor and analyze costs
Evaluation and Testing:
- Evaluation concepts - Deep dive into evaluation concepts
- Evaluate prompts - Test and compare different prompts
- Evaluate agents - Evaluate complex agent systems
- Metrics overview - Available evaluation metrics
Optimization:
- Agent Optimization concepts - Core optimization concepts
- Optimization algorithms - Available optimization strategies
- Best practices - Optimization best practices
Integration Guides:
- SDK Configuration - Configure Opik in your applications
- Supported Models - Models compatible with Opik
- Integrations - Framework-specific integration guides
Best Practices for Tracing
1. Start with Clear Trace Boundaries
Define clear boundaries for what constitutes a single trace. Typically, this should align with a complete user interaction or business operation.
2. Use Meaningful Span Names
Choose descriptive names for your spans that clearly indicate what operation is being performed. This makes debugging much easier.
3. Leverage Thread IDs for Conversations
Use consistent thread IDs for related interactions. This is especially important for chat applications and multi-step workflows.
4. Add Relevant Metadata
Include custom attributes and metadata that will be useful for analysis. Consider adding user IDs, session information, and business context.
Pro Tip: Start with basic tracing and gradually add more detailed spans as you identify areas that need deeper observability. Don’t try to trace everything at once - focus on the most critical paths first.
Important: Be mindful of sensitive data when tracing. Avoid logging personally identifiable information (PII) or sensitive business data in your traces. Use Opik’s data filtering capabilities to protect sensitive information.