- What are Task Span Metrics?
- When to Use Task Span Metrics
- Creating Task Span Metrics
- Accessing Span Properties
- Basic Properties
- Performance Metrics
- Error Analysis
- Using Task Span Metrics in Evaluation
- Quickly testing task span metrics locally
- Best Practices
- 1. Handle Missing Data Gracefully
- 2. Focus on Execution Patterns
- 3. Combine with Regular Metrics
- 4. Security Considerations
- Complete Example: Agent Trajectory Analysis metric
- Integration with LLM Evaluation
- Related Documentation
Task Span Metrics
Task span metrics are a powerful type of evaluation metric in Opik that can analyze the detailed execution information of your LLM tasks. Unlike traditional metrics that only evaluate input-output pairs, task span metrics have access to the complete execution context, including intermediate steps, metadata, timing information, and hierarchical structure.
Important: only spans created with @track decorators and native OPIK integrations are available for task span metrics.
What are Task Span Metrics?
Task span metrics are evaluation metrics that include a task_span parameter in their score method. The Opik evaluation engine automatically detects that.
When a metric has a task_span parameter, it receives a SpanModel object containing the complete execution context of your task.
The task_span parameter provides:
- Execution Details: Input, output, start/end times, and execution metadata
- Nested Operations: Hierarchical structure of sub-operations and function calls
- Performance Data: Timing, cost, usage statistics, and resource consumption
- Error Information: Detailed error context and diagnostic information
- Provider Metadata: Model information, API provider details, and configuration
When to Use Task Span Metrics
Task span metrics are particularly valuable for:
- Performance Analysis: Evaluating execution speed, resource usage, and efficiency
- Quality Assessment: Analyzing the quality of intermediate steps and decision-making
- Cost Optimization: Tracking and optimizing API costs and resource consumption
- Agent Evaluation: Assessing agent trajectories and decision-making patterns
- Debugging: Understanding execution flows and identifying performance bottlenecks
- Compliance: Ensuring tasks execute within expected parameters and constraints
Creating Task Span Metrics
To create a task span metric, define a class that inherits from BaseMetric and implements a score method that accepts a task_span parameter (you can still add other parameters as in regular metrics, Opik will perform a separate check for task_span argument presence):
Accessing Span Properties
The SpanModel object provides rich information about task execution:
Basic Properties
Performance Metrics
Error Analysis
Task span metrics can analyze execution failures and errors:
Using Task Span Metrics in Evaluation
Task span metrics work seamlessly with regular evaluation metrics. The Opik evaluation engine automatically detects task span metrics by checking if the score method includes a task_span parameter, and handles them appropriately:
Quickly testing task span metrics locally
You can validate a task span metric without running a full evaluation by recording spans locally. The SDK provides a context manager that captures all spans/traces created inside its block and exposes them in-memory.
Note:
- Local recording cannot be nested. If a recording block is already active, entering another will raise an error.
- See the Python SDK reference for more details: Local Recording Context Manager
Best Practices
1. Handle Missing Data Gracefully
Always check for None values in optional span attributes:
2. Focus on Execution Patterns
Use task span metrics to evaluate how your application executes, not just the final output:
3. Combine with Regular Metrics
Task span metrics provide the most value when combined with traditional output-based metrics:
4. Security Considerations
Be mindful of sensitive data in span information:
Complete Example: Agent Trajectory Analysis metric
Here’s a comprehensive example that analyzes agent decision-making:
Integration with LLM Evaluation
For a complete guide on using task span metrics in LLM evaluation workflows, see the Using task span evaluation metrics section in the LLM evaluation guide.
Related Documentation
- Custom Metrics - Creating traditional input/output evaluation metrics
- SpanModel API Reference - Complete SpanModel documentation
- Evaluation Overview - Understanding Opik’s evaluation system