Simulation¶
The Opik simulation module provides tools for creating multi-turn conversation simulations between simulated users and your applications. This is particularly useful for evaluating agent behavior over multiple conversation turns.
Overview¶
Multi-turn simulation allows you to:
Simulate realistic user interactions with your agent over multiple conversation turns
Generate context-aware user responses based on conversation history
Evaluate agent behavior across extended conversations
Test different user personas and scenarios systematically
Key Components¶
SimulatedUser: A class that generates realistic user responses using LLMs or predefined responses.
run_simulation: A function that orchestrates multi-turn conversations between a simulated user and your application.
Basic Usage¶
Here’s a simple example of how to use the simulation module:
from opik.simulation import SimulatedUser, run_simulation
from opik import track
# Create a simulated user
user_simulator = SimulatedUser(
persona="You are a frustrated customer who wants a refund",
model="openai/gpt-4o-mini"
)
# Define your agent
@track
def my_agent(user_message: str, *, thread_id: str, **kwargs):
# Your agent logic here
return {"role": "assistant", "content": "I can help you with that..."}
# Run the simulation
simulation = run_simulation(
app=my_agent,
user_simulator=user_simulator,
max_turns=5
)
print(f"Thread ID: {simulation['thread_id']}")
print(f"Conversation: {simulation['conversation_history']}")
Integration with Evaluation¶
Simulations work seamlessly with Opik’s evaluation framework:
from opik.evaluation import evaluate_threads
from opik.evaluation.metrics import ConversationThreadMetric
# Run multiple simulations
simulations = []
for persona in ["frustrated_user", "happy_customer", "confused_user"]:
simulator = SimulatedUser(persona=f"You are a {persona}")
simulation = run_simulation(
app=my_agent,
user_simulator=simulator,
max_turns=5
)
simulations.append(simulation)
# Evaluate the threads
results = evaluate_threads(
project_name="my_project",
filter_string='tags contains "simulation"',
metrics=[ConversationThreadMetric()]
)
For more detailed examples and advanced usage patterns, see the individual component documentation.