Optimize agents

The Opik Agent Optimizer can optimize both simple prompts and complex agent workflows. For most use cases, you can optimize prompts directly using ChatPrompt. When you need multi-prompt workflows, agent orchestration, or custom execution logic, you’ll use OptimizableAgent to create a custom agent class.

When to use OptimizableAgent vs ChatPrompt

Use ChatPrompt directly (default approach):

  • Single-prompt optimization - optimizing one prompt template
  • Most common use case
  • No custom execution logic needed

Use OptimizableAgent when you need:

  • Multi-prompt workflows - orchestrating multiple prompts in sequence
  • Agent framework integration - connecting to ADK, LangGraph, CrewAI, etc.
  • Custom execution logic - special tool handling, async workflows, etc.

Optimizers work seamlessly with both approaches. The optimizer calls your agent’s invoke_agent() method repeatedly during optimization, passing different prompt candidates to evaluate.

Single-prompt optimization

For most optimization tasks, you can use ChatPrompt directly without creating a custom agent. The optimizer uses a default LiteLLM-based agent under the hood.

1from opik_optimizer import ChatPrompt, MetaPromptOptimizer
2from opik.evaluation.metrics import LevenshteinRatio
3from opik_optimizer.datasets import hotpot
4
5dataset = hotpot(count=300)
6
7def levenshtein_ratio(dataset_item, llm_output):
8 return LevenshteinRatio().score(
9 reference=dataset_item["answer"],
10 output=llm_output
11 )
12
13prompt = ChatPrompt(
14 system="You are a helpful assistant.",
15 user="{question}",
16 model="openai/gpt-4o-mini"
17)
18
19optimizer = MetaPromptOptimizer(model="openai/gpt-4o")
20result = optimizer.optimize_prompt(
21 prompt=prompt,
22 dataset=dataset,
23 metric=levenshtein_ratio,
24 max_trials=5,
25 n_samples=50
26)
27
28result.display()

Custom agent for framework integration

When integrating with specific agent frameworks (Google ADK, LangGraph, CrewAI, etc.), you’ll create a custom OptimizableAgent subclass. This allows the optimizer to work with your framework’s execution model.

Here’s an example for Google ADK:

1from typing import Any, TYPE_CHECKING
2from opik_optimizer import OptimizableAgent
3
4if TYPE_CHECKING:
5 from opik_optimizer.api_objects import chat_prompt
6
7class ADKAgent(OptimizableAgent):
8 project_name = "adk-agent"
9
10 def invoke_agent(
11 self,
12 prompts: dict[str, chat_prompt.ChatPrompt],
13 dataset_item: dict[str, Any],
14 allow_tool_use: bool = False,
15 seed: int | None = None,
16 ) -> str:
17 # Single-prompt agents extract the prompt from the dict
18 if len(prompts) > 1:
19 raise ValueError("ADKAgent only supports single-prompt optimization.")
20
21 prompt = list(prompts.values())[0]
22 messages = prompt.get_messages(dataset_item)
23
24 # Your framework-specific execution logic here
25 # ... create ADK agent, run it, return response ...
26
27 return response

The key points:

  • Extract the single prompt from prompts dict: prompt = list(prompts.values())[0]
  • Get formatted messages: messages = prompt.get_messages(dataset_item)
  • Execute using your framework and return the response string

See sdks/opik_optimizer/scripts/llm_frameworks/ for working examples of framework integrations (ADK, LangGraph, CrewAI, etc.). Each script doubles as both documentation and regression tests.

Multi-prompt optimization

For multi-step agent workflows, you must use OptimizableAgent because ChatPrompt only handles a single prompt. Multi-prompt optimization allows you to optimize multiple prompts that work together in a pipeline.

When to use multi-prompt optimization

  • Sequential reasoning workflows (analyze → respond)
  • Multi-hop retrieval pipelines
  • Agent orchestration with multiple steps
  • Any workflow where one prompt’s output feeds into another

Implementing a multi-prompt agent

Here’s a simple example of a two-step workflow that analyzes input and then generates a response:

1from typing import Any
2from opik_optimizer import ChatPrompt, OptimizableAgent
3from openai import OpenAI
4
5class AnalyzeRespondAgent(OptimizableAgent):
6 """Two-step agent: analyze input, then respond based on analysis."""
7
8 def __init__(self, model: str = "gpt-4o-mini"):
9 super().__init__()
10 self.model = model
11 self.client = OpenAI()
12
13 def invoke_agent(
14 self,
15 prompts: dict[str, ChatPrompt],
16 dataset_item: dict[str, Any],
17 allow_tool_use: bool = False,
18 seed: int | None = None,
19 ) -> str:
20 # Step 1: Analyze the input
21 analyze_prompt = prompts["analyze"]
22 analyze_messages = analyze_prompt.get_messages(dataset_item)
23
24 analyze_response = self.client.chat.completions.create(
25 model=self.model,
26 messages=analyze_messages,
27 seed=seed,
28 )
29 analysis = analyze_response.choices[0].message.content
30
31 # Step 2: Generate response based on analysis
32 respond_prompt = prompts["respond"]
33 # Pass analysis result to the respond prompt
34 respond_context = {**dataset_item, "analysis": analysis}
35 respond_messages = respond_prompt.get_messages(respond_context)
36
37 respond_response = self.client.chat.completions.create(
38 model=self.model,
39 messages=respond_messages,
40 seed=seed,
41 )
42
43 return respond_response.choices[0].message.content

Using the multi-prompt agent

When optimizing, pass a dictionary of prompts instead of a single prompt:

1from opik_optimizer import ChatPrompt, MetaPromptOptimizer
2from opik.evaluation.metrics import LevenshteinRatio
3
4# Define both prompts in the workflow
5prompts = {
6 "analyze": ChatPrompt(
7 system="You are an analysis assistant. Extract key information from the input.",
8 user="{text}",
9 model="gpt-4o-mini"
10 ),
11 "respond": ChatPrompt(
12 system="You are a response assistant. Generate a helpful response based on the analysis.",
13 user="Analysis: {analysis}\n\nOriginal question: {text}",
14 model="gpt-4o-mini"
15 ),
16}
17
18optimizer = MetaPromptOptimizer(model="openai/gpt-4o")
19result = optimizer.optimize_prompt(
20 prompt=prompts, # Pass dict of prompts
21 agent_class=AnalyzeRespondAgent, # Use your custom agent
22 dataset=dataset,
23 metric=levenshtein_ratio,
24 max_trials=5,
25 n_samples=50
26)
27
28result.display()

The optimizer will optimize both prompts in the dictionary, trying different combinations to improve performance.

The prompts dict keys (like “analyze” and “respond”) are used to identify which prompt to optimize. The optimizer can optimize all prompts or specific ones based on the optimize_prompt parameter.

Key implementation details

invoke_agent() method signature

All OptimizableAgent subclasses must implement invoke_agent():

1def invoke_agent(
2 self,
3 prompts: dict[str, ChatPrompt],
4 dataset_item: dict[str, Any],
5 allow_tool_use: bool = False,
6 seed: int | None = None,
7) -> str:
8 # Your implementation here
9 return response_string

Parameters:

  • prompts: Dictionary mapping prompt names to ChatPrompt objects
  • dataset_item: Dataset row used to format prompt messages
  • allow_tool_use: Whether tools may be executed (for tool-calling prompts)
  • seed: Optional random seed for reproducibility

Returns: A single string output that will be scored by your metric function

Extracting messages from prompts

Use ChatPrompt.get_messages() to format the prompt with dataset values:

1messages = prompt.get_messages(dataset_item)
2# Returns list of message dicts: [{"role": "system", "content": "..."}, ...]

For multi-prompt workflows, pass additional context when calling get_messages():

1# Pass intermediate results to subsequent prompts
2context = {**dataset_item, "intermediate_result": some_value}
3messages = prompt.get_messages(context)

Best practices

  • Error handling: Return meaningful error messages if execution fails
  • Model parameters: Respect prompt.model and prompt.model_kwargs for consistency
  • Reproducibility: Use the seed parameter when making LLM calls
  • Opik tracing: The base class handles tracing automatically, but you can add custom metadata via self.trace_metadata

Complete examples

Single-prompt with ChatPrompt (default)

1from opik_optimizer import ChatPrompt, EvolutionaryOptimizer
2from opik_optimizer.datasets import hotpot
3from opik.evaluation.metrics import LevenshteinRatio
4
5dataset = hotpot(count=300)
6
7def metric(dataset_item, llm_output):
8 return LevenshteinRatio().score(
9 reference=dataset_item["answer"],
10 output=llm_output
11 )
12
13prompt = ChatPrompt(
14 system="You are a helpful assistant.",
15 user="{question}",
16 model="openai/gpt-4o-mini"
17)
18
19optimizer = EvolutionaryOptimizer(
20 model="openai/gpt-4o-mini",
21 population_size=5,
22 num_generations=3
23)
24
25result = optimizer.optimize_prompt(
26 prompt=prompt,
27 dataset=dataset,
28 metric=metric,
29 n_samples=50
30)
31
32result.display()

Multi-prompt workflow

1from typing import Any
2from opik_optimizer import ChatPrompt, OptimizableAgent, HRPO
3from opik.evaluation.metrics import LevenshteinRatio
4from opik_optimizer.datasets import hotpot
5from openai import OpenAI
6
7class TwoStepAgent(OptimizableAgent):
8 def __init__(self, model: str = "gpt-4o-mini"):
9 super().__init__()
10 self.model = model
11 self.client = OpenAI()
12
13 def invoke_agent(
14 self,
15 prompts: dict[str, ChatPrompt],
16 dataset_item: dict[str, Any],
17 allow_tool_use: bool = False,
18 seed: int | None = None,
19 ) -> str:
20 # First step
21 step1_prompt = prompts["step1"]
22 step1_messages = step1_prompt.get_messages(dataset_item)
23 step1_response = self.client.chat.completions.create(
24 model=self.model,
25 messages=step1_messages,
26 seed=seed,
27 )
28 step1_result = step1_response.choices[0].message.content
29
30 # Second step uses result from first step
31 step2_prompt = prompts["step2"]
32 step2_context = {**dataset_item, "step1_result": step1_result}
33 step2_messages = step2_prompt.get_messages(step2_context)
34 step2_response = self.client.chat.completions.create(
35 model=self.model,
36 messages=step2_messages,
37 seed=seed,
38 )
39
40 return step2_response.choices[0].message.content
41
42# Define multi-prompt workflow
43prompts = {
44 "step1": ChatPrompt(
45 system="Analyze the question and identify key information.",
46 user="{question}",
47 model="gpt-4o-mini"
48 ),
49 "step2": ChatPrompt(
50 system="Answer the question based on the analysis.",
51 user="Question: {question}\n\nAnalysis: {step1_result}",
52 model="gpt-4o-mini"
53 ),
54}
55
56dataset = hotpot(count=300)
57
58def metric(dataset_item, llm_output):
59 return LevenshteinRatio().score(
60 reference=dataset_item["answer"],
61 output=llm_output
62 )
63
64optimizer = HRPO(
65 model="openai/gpt-4o-mini",
66 n_threads=2,
67 max_parallel_batches=3
68)
69
70result = optimizer.optimize_prompt(
71 prompt=prompts,
72 agent_class=TwoStepAgent,
73 dataset=dataset,
74 metric=metric,
75 max_trials=5,
76 n_samples=50
77)
78
79result.display()

For advanced multi-prompt examples, see sdks/opik_optimizer/benchmarks/agents/hotpot_multihop_agent.py which implements a complex multi-hop retrieval pipeline with Wikipedia search.

Next steps