For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
  • Getting Started
    • Home
    • Quickstart
    • MCP Server
    • Ollie Agent
    • FAQ
    • Changelog
    • Upgrading to Opik 2.0
  • Observability
    • Overview
    • Getting started
    • Concepts
    • Debugging agents with Ollie and Opik Connect
  • Development
    • Overview
    • Agent playground
    • Prompt playground
      • Opik Agent Optimizer
      • Optimization Studio
      • Quickstart
      • Quickstart notebook
      • FAQ
      • Changelog
      • Known Issues
        • Concepts
        • Configure LLM Providers
        • Define datasets
        • Define metrics
        • Optimize prompts
        • Optimize tools (MCP)
        • Optimize agents
        • Optimize multimodal
        • Dashboard results
  • Evaluation
    • Overview
    • Getting started
    • Concepts
  • Production
  • Administration
    • Overview
    • Roles and Permissions
  • Contributing
    • Contribution Overview
LogoLogo
Copy to LLMGithubGo to App
On this page
  • 1. Establish baselines
  • 2. Choose an optimizer
  • 3. Configure the run
  • Optimize tools (MCP)
  • Target specific sections inside a prompt (advanced)
  • Optimize multiple prompts together
  • 4. Evaluate outcomes
  • 5. Ship safely
  • Related guides
DevelopmentOptimization runsOptimization

Optimize prompts

Was this page helpful?
Previous

Optimize tools (MCP)

Next
Built with

Use this playbook whenever you need to improve a prompt (single-turn or agentic) and want a repeatable process rather than manual tweaks.

1. Establish baselines

  • Record the current prompt and score using your production metric.
  • Log at least 10 representative dataset rows so the optimizer can generalize.
  • Capture latency and token costs—optimizations should not regress them unexpectedly.

2. Choose an optimizer

ScenarioRecommended optimizer
General prompt copy editsMetaPrompt
Complex failure analysisHRPO
Need diverse candidatesEvolutionary
Few-shot heavy promptsFew-Shot Bayesian
Tune sampling paramsParameter optimizer

3. Configure the run

1from opik_optimizer import HRPO
2
3optimizer = HRPO(
4 model="openai/gpt-4o",
5 max_parallel_batches=4,
6 seed=42,
7)
8result = optimizer.optimize_prompt(
9 prompt=my_prompt,
10 dataset=my_dataset,
11 metric=answer_quality,
12 max_trials=5,
13 n_samples=50,
14)
  • Set project_name on the optimizer to group runs by team or initiative.
  • Start with max_trials = 3–5. Increase once you confirm the metric is reliable.
  • Use n_samples to limit cost during early exploration; rerun on the full dataset before promoting a prompt.
  • For optimizers with inner-loop evaluations (HRPO, GEPA), set n_samples_minibatch to keep those steps lightweight.
  • Use n_samples_strategy to keep subsampling deterministic (default: "random_sorted").

Optimize tools (MCP)

Tool optimization is now documented separately. Use it when you want to improve MCP tool descriptions without changing prompt text.

  • Optimize tools (MCP)

Target specific sections inside a prompt (advanced)

If you need finer control than roles (for example, only optimize a specific assistant message), use prompt_segments to extract and update parts by segment ID.

Intent/Trigger: use segment-level updates when you need to constrain changes to exact message segments.

  • Required parameters: prompt, dataset, metric
  • Optional parameters: segment update args (updates passed to prompt_segments.apply_segment_updates)
  • Minimal valid payload: optimizer.optimize_prompt(prompt=updated_prompt, dataset=my_dataset, metric=answer_quality)
1from opik_optimizer.utils import prompt_segments
2
3segments = prompt_segments.extract_prompt_segments(my_prompt)
4for segment in segments:
5 print(segment.segment_id, segment.role)
6
7# Update only message:1 (second message)
8updates = {"message:1": "User question: {user_query}"}
9updated_prompt = prompt_segments.apply_segment_updates(my_prompt, updates)
10
11# Use the updated prompt in optimization (the original prompt is unchanged)
12result = optimizer.optimize_prompt(
13 prompt=updated_prompt,
14 dataset=my_dataset,
15 metric=answer_quality,
16)

Optimize multiple prompts together

You can pass a dict of ChatPrompt objects to optimize a coordinated prompt bundle (for example, a multi-agent setup or system/user prompt pair that must stay in sync). Each key names a prompt and is preserved through optimization.

1from opik_optimizer import MetaPromptOptimizer, ChatPrompt
2
3prompts = {
4 "researcher": ChatPrompt(
5 name="researcher",
6 messages=[
7 {"role": "system", "content": "Gather facts and cite sources."},
8 {"role": "user", "content": "{question}"},
9 ],
10 ),
11 "synthesizer": ChatPrompt(
12 name="synthesizer",
13 messages=[
14 {"role": "system", "content": "Summarize findings clearly."},
15 {"role": "user", "content": "{question}"},
16 ],
17 ),
18}
19
20optimizer = MetaPromptOptimizer(model="openai/gpt-4o-mini", prompts_per_round=2)
21result = optimizer.optimize_prompt(
22 prompt=prompts,
23 dataset=my_dataset,
24 metric=answer_quality,
25 max_trials=3,
26)

result.prompt returns a dict keyed by the same names so you can update each agent prompt together.

4. Evaluate outcomes

  • Compare result.score vs. result.initial_score to ensure material improvement.
  • Review the history attribute for regression reasons.
  • Use Dashboard results to visualize per-trial performance.

5. Ship safely

1

Export the prompt

result.prompt returns the best-performing ChatPrompt. Serialize it as JSON and check it into your repo.

2

Automate regression tests

Wire the optimizer run into CI with a smaller dataset so future prompt edits have guardrails.

3

Monitor in production

Trace the new prompt with Opik tracing to confirm real-world performance matches experiment results.

Related guides

  • Optimization Studio
  • Define datasets
  • Define metrics
  • Chaining optimizers
  • Avoiding overfitting – Prevent your prompt from memorizing the training data by using separate validation datasets