Custom Conversation (Multi-turn) Metrics
Conversation metrics evaluate multi-turn conversations rather than single input-output pairs. These metrics are particularly useful for evaluating chatbots, conversational agents, and any multi-turn dialogue systems.
Understanding the Conversation Format
Conversation thread metrics work with a standardized conversation format:
Creating a Custom Conversation Metric
To create a custom conversation metric, subclass ConversationThreadMetric
and implement the score
method:
Using Custom Conversation Metrics
You can use this metric with evaluate_threads
:
For more details on evaluating conversation threads, see the Evaluate Threads guide.
Next Steps
- Learn about built-in conversation metrics
- Read the Evaluate Threads guide