For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
  • Getting Started
    • Home
    • Quickstart
    • MCP Server
    • Ollie Agent
    • FAQ
    • Changelog
    • Upgrading to Opik 2.0
  • Observability
    • Overview
    • Getting started
    • Concepts
    • Debugging agents with Ollie and Opik Connect
  • Development
    • Overview
    • Agent playground
    • Prompt playground
  • Evaluation
    • Overview
    • Getting started
    • Concepts
  • Production
  • Administration
    • Overview
    • Roles and Permissions
  • Contributing
    • Contribution Overview
LogoLogo
Copy to LLMGithubGo to App
On this page
  • Compare prompt variants side by side
  • Validate against test suites
  • Template variables
  • Next steps
Development

Prompt Playground

Was this page helpful?
Previous

Prompt Library Overview

Next
Built with

The Playground lets you test prompt changes and compare models without writing code. Create multiple prompt variants, run them side by side, and validate the results against a test suite — all from the Opik UI.

Playground with two prompt variants running against a test suite, showing pass/fail results

Compare prompt variants side by side

Each variant in the Playground is independent — it has its own model, messages, and configuration. This means you can test a prompt change against the current version, try different models on the same prompt, or experiment with temperature and sampling parameters, all in a single view.

Supported providers include OpenAI, Anthropic, Gemini, OpenRouter, Vertex AI, and custom endpoints. Reasoning models like Claude and o1/o3 expose additional controls such as thinking effort.

Click Run (or press Shift+Enter) to execute all variants at once. Results stream in real time with the model’s response, token usage, latency, and a link to the full trace.

Validate against test suites

The real power of the Playground is running your prompt variants against a dataset or test suite. Instead of manually checking a handful of inputs, you can validate across your full set of test cases and see which variant performs better.

1

Bind a dataset or test suite

Click Test on Dataset in the header and select a dataset or test suite. If you’re using template variables ({{variable_name}}), they are automatically mapped to dataset columns.

2

Run the experiment

Click Run experiment to execute all prompt variants against every item in the dataset. Results appear in a table below the prompts, with each variant’s output shown side by side.

3

Review results

When using a test suite, each output is scored against the suite’s evaluation rules and displayed as pass/fail. You can click into any result to inspect the full trace.

Experiments are saved automatically — compare them over time in the Experiments tab.

Template variables

Use {{variable_name}} syntax in your prompt messages to create dynamic templates. When running in standard mode, the Playground asks you to fill in the values. In dataset mode, variables are mapped to dataset columns automatically.

Next steps

  • Prompt Library — Manage prompts and the rest of your agent configuration in one place
  • Test suites — Build the test cases your playground experiments run against
  • Experiments — Review and compare experiment results over time