Define datasets

The optimizer evaluates candidate prompts against datasets stored in Opik. If you are brand new to datasets in Opik, start with Manage datasets; this page highlights specific tips to get you started.

Datasets are a crucial component of the optimizer SDK, serving as a key component to run and evaluate (score) each dataset item using optimizers to develop a better outcome. Without datasets, it’s not possible to steer the optimizer on what is good and bad.

Dataset schema

Every item is a JSON object. Required keys depend on your prompt template; optional keys help with analysis. Schemas are optional—define only the fields your prompt or metrics actually consume.

FieldPurpose
inputs (e.g., question, context)Values substituted into your ChatPrompt placeholders.
answer / labelGround truth used by metrics.
metadataArbitrary dict for tagging scenario, split, or difficulty.

Create or load datasets

1

Create via SDK

1import opik
2
3client = opik.Opik()
4dataset = client.get_or_create_dataset(name="agent-opt-support")
5dataset.insert([
6 {"question": "Summarize Opik.", "answer": "Opik is an LLM observability platform."},
7 {"question": "List two optimizer types.", "answer": "MetaPrompt and Hierarchical Reflective."},
8])
2

Upload from file

  • Prepare a CSV or Parquet file with column headers that match your prompt variables.
  • Load the file via Python (e.g., pandas) and call dataset.insert(...) or related helpers from the Dataset SDK.
  • Verify in the UI that rows include metadata if you plan to filter by scenario.
3

Use built-in samples

The optimizer SDK provides ready-made datasets for quick experiments:

1from opik_optimizer import datasets
2hotpot = datasets.hotpot_300()
3tiny = datasets.tiny_test()

These datasets live in sdks/opik_optimizer/src/opik_optimizer/datasets and mirror the notebook examples.

Best practices

  • Keep datasets immutable during an optimization run; create a new dataset version if you need to add rows.
  • Log context fields if you run RAG-style prompts so failure analyses can surface missing passages.
  • Track splits via metadata (e.g., metadata["split"] = "eval") because dataset tags are not supported yet.
  • Document ownership using dataset descriptions so teams know who curates each collection.
  • Keep schema + prompt in sync – if your prompt expects {context}, ensure every dataset row defines that key or provide defaults in the optimizer.

Validation checklist

  • Confirm row counts in the Opik Datasets tab (or by running len(dataset.get_items()) in Python) before and after uploads.
  • Spot-check rows in the dashboard’s Dataset viewer.
  • If rows include multimodal assets or tool payloads, confirm they appear in the trace tree once you run an optimization.
  • Run an initial small-batch optimization with a few rows of data to validate everything end to end.

Next steps

Define how you will score results with Define metrics, then follow Optimize prompts to launch experiments. For domain-specific scoring, extend the dataset with extra fields and reference them inside Custom metrics.