Optimize Agents with Opik

The Opik Agent Optimizer can optimize both simple prompts and complex agent workflows. For most use cases, you can optimize prompts directly using ChatPrompt. When you need multi-prompt workflows, agent orchestration, or custom execution logic, you’ll use OptimizableAgent to create a custom agent class.

When to use OptimizableAgent vs ChatPrompt

Use ChatPrompt directly (default approach):

Single-prompt optimization - optimizing one prompt template
Most common use case
No custom execution logic needed

Use OptimizableAgent when you need:

Multi-prompt workflows - orchestrating multiple prompts in sequence
Agent framework integration - connecting to ADK, LangGraph, CrewAI, etc.
Custom execution logic - special tool handling, async workflows, etc.

Optimizers work seamlessly with both approaches. The optimizer calls your agent’s invoke_agent() method repeatedly during optimization, passing different prompt candidates to evaluate.

Single-prompt optimization

For most optimization tasks, you can use ChatPrompt directly without creating a custom agent. The optimizer uses a default LiteLLM-based agent under the hood.

1 from opik_optimizer import ChatPrompt, MetaPromptOptimizer
2 from opik.evaluation.metrics import LevenshteinRatio
3 from opik_optimizer.datasets import hotpot
4 
5 dataset = hotpot(count=300)
6 
7 def levenshtein_ratio(dataset_item, llm_output):
8     return LevenshteinRatio().score(
9         reference=dataset_item["answer"], 
10         output=llm_output
11     )
12 
13 prompt = ChatPrompt(
14     system="You are a helpful assistant.",
15     user="{question}",
16     model="openai/gpt-4o-mini"
17 )
18 
19 optimizer = MetaPromptOptimizer(model="openai/gpt-4o")
20 result = optimizer.optimize_prompt(
21     prompt=prompt,
22     dataset=dataset,
23     metric=levenshtein_ratio,
24     max_trials=5,
25     n_samples=50
26 )
27 
28 result.display()

Custom agent for framework integration

When integrating with specific agent frameworks (Google ADK, LangGraph, CrewAI, etc.), you’ll create a custom OptimizableAgent subclass. This allows the optimizer to work with your framework’s execution model.

Here’s an example for Google ADK:

1 from typing import Any, TYPE_CHECKING
2 from opik_optimizer import OptimizableAgent
3 
4 if TYPE_CHECKING:
5     from opik_optimizer.api_objects import chat_prompt
6 
7 class ADKAgent(OptimizableAgent):
8     project_name = "adk-agent"
9 
10     def invoke_agent(
11         self,
12         prompts: dict[str, chat_prompt.ChatPrompt],
13         dataset_item: dict[str, Any],
14         allow_tool_use: bool = False,
15         seed: int | None = None,
16     ) -> str:
17         # Single-prompt agents extract the prompt from the dict
18         if len(prompts) > 1:
19             raise ValueError("ADKAgent only supports single-prompt optimization.")
20         
21         prompt = list(prompts.values())[0]
22         messages = prompt.get_messages(dataset_item)
23         
24         # Your framework-specific execution logic here
25         # ... create ADK agent, run it, return response ...
26         
27         return response

The key points:

Extract the single prompt from prompts dict: prompt = list(prompts.values())[0]
Get formatted messages: messages = prompt.get_messages(dataset_item)
Execute using your framework and return the response string

See sdks/opik_optimizer/scripts/llm_frameworks/ for working examples of framework integrations (ADK, LangGraph, CrewAI, etc.). Each script doubles as both documentation and regression tests.

Multi-prompt optimization

For multi-step agent workflows, you must use OptimizableAgent because ChatPrompt only handles a single prompt. Multi-prompt optimization allows you to optimize multiple prompts that work together in a pipeline.

When to use multi-prompt optimization

Sequential reasoning workflows (analyze → respond)
Multi-hop retrieval pipelines
Agent orchestration with multiple steps
Any workflow where one prompt’s output feeds into another

Implementing a multi-prompt agent

Here’s a simple example of a two-step workflow that analyzes input and then generates a response:

1 from typing import Any
2 from opik_optimizer import ChatPrompt, OptimizableAgent
3 from openai import OpenAI
4 
5 class AnalyzeRespondAgent(OptimizableAgent):
6     """Two-step agent: analyze input, then respond based on analysis."""
7     
8     def __init__(self, model: str = "gpt-4o-mini"):
9         super().__init__()
10         self.model = model
11         self.client = OpenAI()
12 
13     def invoke_agent(
14         self,
15         prompts: dict[str, ChatPrompt],
16         dataset_item: dict[str, Any],
17         allow_tool_use: bool = False,
18         seed: int | None = None,
19     ) -> str:
20         # Step 1: Analyze the input
21         analyze_prompt = prompts["analyze"]
22         analyze_messages = analyze_prompt.get_messages(dataset_item)
23         
24         analyze_response = self.client.chat.completions.create(
25             model=self.model,
26             messages=analyze_messages,
27             seed=seed,
28         )
29         analysis = analyze_response.choices[0].message.content
30 
31         # Step 2: Generate response based on analysis
32         respond_prompt = prompts["respond"]
33         # Pass analysis result to the respond prompt
34         respond_context = {**dataset_item, "analysis": analysis}
35         respond_messages = respond_prompt.get_messages(respond_context)
36         
37         respond_response = self.client.chat.completions.create(
38             model=self.model,
39             messages=respond_messages,
40             seed=seed,
41         )
42         
43         return respond_response.choices[0].message.content

Using the multi-prompt agent

When optimizing, pass a dictionary of prompts instead of a single prompt:

1 from opik_optimizer import ChatPrompt, MetaPromptOptimizer
2 from opik.evaluation.metrics import LevenshteinRatio
3 
4 # Define both prompts in the workflow
5 prompts = {
6     "analyze": ChatPrompt(
7         system="You are an analysis assistant. Extract key information from the input.",
8         user="{text}",
9         model="gpt-4o-mini"
10     ),
11     "respond": ChatPrompt(
12         system="You are a response assistant. Generate a helpful response based on the analysis.",
13         user="Analysis: {analysis}\n\nOriginal question: {text}",
14         model="gpt-4o-mini"
15     ),
16 }
17 
18 optimizer = MetaPromptOptimizer(model="openai/gpt-4o")
19 result = optimizer.optimize_prompt(
20     prompt=prompts,  # Pass dict of prompts
21     agent_class=AnalyzeRespondAgent,  # Use your custom agent
22     dataset=dataset,
23     metric=levenshtein_ratio,
24     max_trials=5,
25     n_samples=50
26 )
27 
28 result.display()

The optimizer will optimize both prompts in the dictionary, trying different combinations to improve performance.

The prompts dict keys (like “analyze” and “respond”) are used to identify which prompt to optimize. The optimizer can optimize all prompts or specific ones based on the optimize_prompt parameter.

Key implementation details

invoke_agent() method signature

All OptimizableAgent subclasses must implement invoke_agent():

1 def invoke_agent(
2     self,
3     prompts: dict[str, ChatPrompt],
4     dataset_item: dict[str, Any],
5     allow_tool_use: bool = False,
6     seed: int | None = None,
7 ) -> str:
8     # Your implementation here
9     return response_string

Parameters:

prompts: Dictionary mapping prompt names to ChatPrompt objects
dataset_item: Dataset row used to format prompt messages
allow_tool_use: Whether tools may be executed (for tool-calling prompts)
seed: Optional random seed for reproducibility

Returns: A single string output that will be scored by your metric function

Extracting messages from prompts

Use ChatPrompt.get_messages() to format the prompt with dataset values:

1 messages = prompt.get_messages(dataset_item)
2 # Returns list of message dicts: [{"role": "system", "content": "..."}, ...]

For multi-prompt workflows, pass additional context when calling get_messages():

1 # Pass intermediate results to subsequent prompts
2 context = {**dataset_item, "intermediate_result": some_value}
3 messages = prompt.get_messages(context)

Best practices

Error handling: Return meaningful error messages if execution fails
Model parameters: Respect prompt.model and prompt.model_kwargs for consistency
Reproducibility: Use the seed parameter when making LLM calls
Opik tracing: The base class handles tracing automatically, but you can add custom metadata via self.trace_metadata

Complete examples

Single-prompt with ChatPrompt (default)

1 from opik_optimizer import ChatPrompt, EvolutionaryOptimizer
2 from opik_optimizer.datasets import hotpot
3 from opik.evaluation.metrics import LevenshteinRatio
4 
5 dataset = hotpot(count=300)
6 
7 def metric(dataset_item, llm_output):
8     return LevenshteinRatio().score(
9         reference=dataset_item["answer"], 
10         output=llm_output
11     )
12 
13 prompt = ChatPrompt(
14     system="You are a helpful assistant.",
15     user="{question}",
16     model="openai/gpt-4o-mini"
17 )
18 
19 optimizer = EvolutionaryOptimizer(
20     model="openai/gpt-4o-mini",
21     population_size=5,
22     num_generations=3
23 )
24 
25 result = optimizer.optimize_prompt(
26     prompt=prompt,
27     dataset=dataset,
28     metric=metric,
29     n_samples=50
30 )
31 
32 result.display()

Multi-prompt workflow

1 from typing import Any
2 from opik_optimizer import ChatPrompt, OptimizableAgent, HRPO
3 from opik.evaluation.metrics import LevenshteinRatio
4 from opik_optimizer.datasets import hotpot
5 from openai import OpenAI
6 
7 class TwoStepAgent(OptimizableAgent):
8     def __init__(self, model: str = "gpt-4o-mini"):
9         super().__init__()
10         self.model = model
11         self.client = OpenAI()
12 
13     def invoke_agent(
14         self,
15         prompts: dict[str, ChatPrompt],
16         dataset_item: dict[str, Any],
17         allow_tool_use: bool = False,
18         seed: int | None = None,
19     ) -> str:
20         # First step
21         step1_prompt = prompts["step1"]
22         step1_messages = step1_prompt.get_messages(dataset_item)
23         step1_response = self.client.chat.completions.create(
24             model=self.model,
25             messages=step1_messages,
26             seed=seed,
27         )
28         step1_result = step1_response.choices[0].message.content
29 
30         # Second step uses result from first step
31         step2_prompt = prompts["step2"]
32         step2_context = {**dataset_item, "step1_result": step1_result}
33         step2_messages = step2_prompt.get_messages(step2_context)
34         step2_response = self.client.chat.completions.create(
35             model=self.model,
36             messages=step2_messages,
37             seed=seed,
38         )
39         
40         return step2_response.choices[0].message.content
41 
42 # Define multi-prompt workflow
43 prompts = {
44     "step1": ChatPrompt(
45         system="Analyze the question and identify key information.",
46         user="{question}",
47         model="gpt-4o-mini"
48     ),
49     "step2": ChatPrompt(
50         system="Answer the question based on the analysis.",
51         user="Question: {question}\n\nAnalysis: {step1_result}",
52         model="gpt-4o-mini"
53     ),
54 }
55 
56 dataset = hotpot(count=300)
57 
58 def metric(dataset_item, llm_output):
59     return LevenshteinRatio().score(
60         reference=dataset_item["answer"], 
61         output=llm_output
62     )
63 
64 optimizer = HRPO(
65     model="openai/gpt-4o-mini",
66     n_threads=2,
67     max_parallel_batches=3
68 )
69 
70 result = optimizer.optimize_prompt(
71     prompt=prompts,
72     agent_class=TwoStepAgent,
73     dataset=dataset,
74     metric=metric,
75     max_trials=5,
76     n_samples=50
77 )
78 
79 result.display()

For advanced multi-prompt examples, see sdks/opik_optimizer/benchmarks/agents/hotpot_multihop_agent.py which implements a complex multi-hop retrieval pipeline with Wikipedia search.

Next steps

Explore optimization algorithms to choose the right optimizer
Learn about defining datasets and metrics
Check framework-specific examples in sdks/opik_optimizer/scripts/llm_frameworks/