Overview
Prompt engineering is the practice of designing and refining prompts to help LLMs generate the desired output. Typically prompt engineering is a manual process that involves editing the prompt, evaluating it, reviewing the results and trying again.
Prompt optimization is the process of automating the prompt engineering process.

Why optimize prompts ?
Prompt engineering is a skill that can be difficult to master as highlighted by the Anthropic , OpenAI and Google prompt engineering guides. There are a lot of techniques that can be used to help LLMs generate the desired output.
Prompt optimization solves many of the issues that come with prompt engineering:
- Prompt engineering is not easily repeatable or scalable alone
- Variations across models could lead to performance degration, you need to tune prompts for each model
- Optimization may unlock performance, cost and reliability improvements
- As systems evolve to be more interdependent, manually tuning multiple prompts becomes increasingly difficult.
So when should you use prompt optimization?
Optimization algorithms
Supported algorithms
The Opik Optimizer is an experimental Python library that aims at implementing Prompt and Agent Optimization algorithms in a consistent format.
The following algorithms have been implemented:
If you would like us to implement another optimization algorithm, reach out to us on Github.
Benchmark results
We are currently working on the benchmarking results, these are early preliminary results and are subject to change.
Each optimization algorithm is evaluated against different use-cases and datasets:
- Arc: The ai2_arc dataset contains a set of multiple choice science questions
- GSM8K: The gsm8k dataset contains a set of math problems
- Medhallucination: The medhallucination dataset contains a set of medical questions
- RagBench: The ragbench dataset contains a set of questions and answers
The results above are for gpt-40-mini
, the results might change if you use a different model.
Note: This results are preliminary and subject to change, you can learn more about our benchmarks here._