April 24, 2025
We’ve just rolled out two major updates in Opik, Comet’s open-source LLM evaluation platform, that…
At Comet, we’re driven by a commitment to advance innovation in AI, particularly in the realm of LLM observability. Our journey with Opik began with providing tools to make LLM applications more transparent, measurable, and accountable. As one of the largest growing open-source solutions in this space, our team has been deeply researching “what’s next.” We’ve seen compelling early work in prompt and agent optimization, and we believe Opik is uniquely positioned to pioneer this frontier. Our thesis is simple: optimizing your prompts and agents is the logical next step from understanding their behavior through traces and evaluations.
Today, we’re excited to translate this vision into reality with the public beta of Opik Agent Optimizer, a new suite of tools designed to automate and elevate your prompt and agent optimization workflows.
The power of Large Language Models is undeniable, but unlocking their full potential often hinges on the art and science of prompt engineering. As many development teams know, this can be a painstaking, manual process: a constant cycle of tweaking prompts, running experiments, reviewing results, and repeating. This “manual grind” consumes valuable development time that we have heard from our customers and ourselves when working with LLMs.
Furthermore, the LLM landscape is evolving at breakneck speed. New models are released daily, and what worked yesterday might not work tomorrow. Re-engineering prompts for each new model or even minor version updates is a significant hurdle, making scalability and adaptability a constant concern. This is why we asked ourselves: “Does prompt engineering have to be this manual?” Our answer is Opik Agent Optimizer.
We believe the current paradigm for LLM optimization needs a shift. Existing libraries often tie optimization to specific orchestration frameworks, forcing developers to rebuild their applications from scratch. This is a fundamental flaw. The vast majority of developers already have a preferred framework or choose to work without one; they shouldn’t be penalized for it.
Opik Agent Optimizer is built on a different philosophy:
prompt in -> prompt out
function to a complex agent. No framework lock-in.We’re launching this beta with a suite of optimizers, including MIPRO (Meta-Prompt-based agent Optimizer that works with and without tools), which we chose as a starting point due to its readily available library implementation, allowing for quick adaptation. Alongside MIPRO, our research team has been developing three novel optimization algorithms (papers pending) that explore new approaches to enhancing prompt and agent performance:
Although this an executable Python SDK package, all the traces and experiments that are logged as part of your optimization are sent to Opik and optimization process along with each iteration (Trial) and specific improvements, changes in prompts and various run details can be seen in the Opik UI today.
Here’s a brief look at how you might use the SDK’s FewShotBayesianOptimizer
, one of the many optimizer algorithm’s we have available:
from opik.evaluation.metrics import LevenshteinRatio
from opik_optimizer import FewShotBayesianOptimizer
from opik_optimizer.demo import get_or_create_dataset
from opik_optimizer import (
MetricConfig,
TaskConfig,
from_dataset_field,
from_llm_response_text,
)
# Get or create a test dataset
hot_pot_dataset = get_or_create_dataset("hotpot-300")
project_name = "optimize-few-shot-bayesian-hotpot"
# Define the initial prompt to optimize, intentionally vague
prompt_instruction = """
Answer the question.
"""
# Initialize the optimizer
optimizer = FewShotBayesianOptimizer(
model="gpt-4o-mini",
project_name=project_name,
min_examples=3,
max_examples=8,
n_threads=16,
seed=42, # reproducibility
)
# Configure the metric for evaluation
metric_config = MetricConfig(
metric=LevenshteinRatio(project_name=project_name),
inputs={
"output": from_llm_response_text(),
"reference": from_dataset_field(name="answer"),
},
)
# Configure the task details, i.e dataset mapping
task_config = TaskConfig(
instruction_prompt=prompt_instruction,
input_dataset_fields=["question"],
output_dataset_field="answer",
use_chat_prompt=True,
)
# Run the optimization
result = optimizer.optimize_prompt(
dataset=hot_pot_dataset,
metric_config=metric_config,
task_config=task_config,
n_trials=10,
n_samples=150,
)
# Display the results, including the best prompt found
result.display()
Opik Agent Optimizer is designed to seamlessly integrate into and elevate your LLM development lifecycle, transforming it into a more agile, data-driven, and efficient process. We see it as a pivotal component in the how you evolve from classic prompt engineering to evaluation-in-the-loop thinking, guiding you from initial concept to continuously optimized production systems:
This public beta is just the beginning, and we’re already charting an ambitious course for Opik Optimizer. Our roadmap is focused on expanding its capabilities, intelligence, and integration within the broader LLM development ecosystem:
TaskConfig
and MetricConfig
objects) to be even more intuitive and powerful.We welcome suggestions, pull-requests and feedback on any of the preview items and future roadmap for Opik Agent Optimizer.
Opik Agent Optimizer is here to empower you to build better, more efficient, and more adaptable LLM applications. We’re building this in the open and value your feedback immensely.