run_simulation¶
- opik.simulation.run_simulation(app: Callable, user_simulator: SimulatedUser, initial_message: str | None = None, max_turns: int = 5, thread_id: str | None = None, project_name: str | None = None, **app_kwargs: Any) Dict[str, Any]¶
Run a multi-turn conversation simulation between a simulated user and an app.
This function follows LangSmith’s pattern where: 1. The simulator passes single message strings to the app 2. The app manages full conversation history internally using thread_id 3. The app logs traces with thread_id for evaluation
- Parameters:
app – Callable that processes messages and manages conversation history internally. Signature: app(message: str, *, thread_id: str, **kwargs) -> Dict[str, str] The app is automatically decorated with @track and thread_id is injected via opik_args.
user_simulator – SimulatedUser instance that generates user responses
initial_message – Optional initial message from the user. If None, generated by simulator
max_turns – Maximum number of conversation turns (default: 5)
thread_id – Optional thread ID for grouping traces. Generated if not provided
project_name – Optional project name for trace logging
**app_kwargs – Additional keyword arguments passed to the app
- Returns:
thread_id: The thread ID used for this simulation
conversation_history: List of message dicts from the simulation
project_name: Project name if provided
- Return type:
Dict containing
Description¶
The run_simulation function orchestrates multi-turn conversation simulations between a simulated user and your application. It manages the conversation flow, tracks traces, and returns comprehensive results for evaluation.
Key Features¶
Multi-turn conversations: Runs conversations for a specified number of turns
Automatic tracing: Automatically decorates your app with
@trackif not already decoratedThread management: Groups all traces from a simulation under a single thread ID
Error handling: Gracefully handles errors and continues simulation
Flexible configuration: Supports custom parameters and metadata
Function Signature¶
run_simulation(
app: Callable,
user_simulator: SimulatedUser,
initial_message: Optional[str] = None,
max_turns: int = 5,
thread_id: Optional[str] = None,
project_name: Optional[str] = None,
**app_kwargs: Any
) -> Dict[str, Any]
Parameters¶
- app (Callable)
Your application function that processes messages. Must have signature:
app(message: str, *, thread_id: str, **kwargs) -> Dict[str, str]The function will be automatically decorated with
@trackif not already decorated.- user_simulator (SimulatedUser)
Instance of
SimulatedUserthat generates user responses.- initial_message (str, optional)
Optional initial message from the user. If
None, the simulator will generate one.- max_turns (int, optional)
Maximum number of conversation turns. Defaults to 5.
- thread_id (str, optional)
Thread ID for grouping traces. If
None, a new ID will be generated.- project_name (str, optional)
Project name for trace logging. Included in trace metadata.
- app_kwargs (Any)
Additional keyword arguments passed to the app function.
Returns¶
- Dict[str, Any]
Dictionary containing:
thread_id (str): The thread ID used for this simulation
conversation_history (List[Dict[str, str]]): Complete conversation as message dictionaries
project_name (str, optional): Project name if provided
App Function Requirements¶
Your app function must follow this signature:
def my_app(user_message: str, *, thread_id: str, **kwargs) -> Dict[str, str]:
# Process the user message
# Manage conversation history internally using thread_id
# Return assistant response as message dict
return {"role": "assistant", "content": "Your response"}
Key Requirements:
First parameter: Must accept the user message as a string
thread_id parameter: Must accept thread_id as a keyword-only argument
Return format: Must return a dictionary with ‘role’ and ‘content’ keys
History management: Your app is responsible for managing conversation history internally
Examples¶
Basic Usage¶
from opik.simulation import SimulatedUser, run_simulation
from opik import track
# Create a simulated user
user_simulator = SimulatedUser(
persona="You are a customer who wants help with a product",
model="openai/gpt-4o-mini"
)
# Define your agent with conversation history management
agent_history = {}
@track
def customer_service_agent(user_message: str, *, thread_id: str, **kwargs):
if thread_id not in agent_history:
agent_history[thread_id] = []
# Add user message to history
agent_history[thread_id].append({"role": "user", "content": user_message})
# Process with full conversation context
messages = agent_history[thread_id]
# Your agent logic here (e.g., call LLM)
response = "I can help you with that. What specific issue are you experiencing?"
# Add assistant response to history
agent_history[thread_id].append({"role": "assistant", "content": response})
return {"role": "assistant", "content": response}
# Run the simulation
simulation = run_simulation(
app=customer_service_agent,
user_simulator=user_simulator,
max_turns=5,
project_name="customer_service_evaluation"
)
print(f"Thread ID: {simulation['thread_id']}")
print(f"Conversation length: {len(simulation['conversation_history'])}")
Custom Initial Message¶
# Start with a specific initial message
simulation = run_simulation(
app=customer_service_agent,
user_simulator=user_simulator,
initial_message="I'm having trouble with my order",
max_turns=3
)
Custom Thread ID¶
# Use a custom thread ID for easier tracking
custom_thread_id = "simulation_test_001"
simulation = run_simulation(
app=customer_service_agent,
user_simulator=user_simulator,
thread_id=custom_thread_id,
max_turns=5
)
Multiple Simulations¶
# Run multiple simulations with different personas
personas = [
"You are a frustrated customer who wants a refund",
"You are a happy customer who wants to buy more",
"You are a confused user who needs help with setup"
]
simulations = []
for i, persona in enumerate(personas):
simulator = SimulatedUser(persona=persona)
simulation = run_simulation(
app=customer_service_agent,
user_simulator=simulator,
max_turns=5,
project_name="multi_persona_evaluation"
)
simulations.append(simulation)
print(f"Simulation {i+1} completed: {simulation['thread_id']}")
Integration with Evaluation¶
from opik.evaluation import evaluate_threads
from opik.evaluation.metrics import ConversationThreadMetric
# Run simulations
simulation = run_simulation(
app=customer_service_agent,
user_simulator=user_simulator,
max_turns=5,
project_name="evaluation_test"
)
# Evaluate the simulation thread
results = evaluate_threads(
project_name="evaluation_test",
filter_string=f'thread_id = "{simulation["thread_id"]}"',
metrics=[ConversationThreadMetric()]
)
Error Handling¶
@track
def error_prone_agent(user_message: str, *, thread_id: str, **kwargs):
# This might raise an exception
if "error" in user_message.lower():
raise ValueError("Simulated error")
return {"role": "assistant", "content": "Normal response"}
# run_simulation handles errors gracefully
simulation = run_simulation(
app=error_prone_agent,
user_simulator=user_simulator,
max_turns=3
)
# Errors are captured in the conversation history
for message in simulation['conversation_history']:
if "Error processing message" in message.get('content', ''):
print(f"Error occurred: {message['content']}")
Best Practices¶
Thread Management: Always use the provided
thread_idto manage conversation historyError Handling: Implement proper error handling in your app function
Return Format: Always return message dictionaries with ‘role’ and ‘content’ keys
History Management: Keep conversation history in a thread-safe way if running concurrent simulations
Resource Management: Be mindful of token usage with long conversations
Testing: Use fixed responses in SimulatedUser for deterministic testing
Notes¶
The function automatically decorates your app with
@trackif not already decoratedAll traces from a simulation are grouped under the same thread ID
The function handles errors gracefully and continues the simulation
Conversation history is returned as a list of message dictionaries
Custom parameters passed via
**app_kwargsare forwarded to your app function