Datasets
The Opik TypeScript SDK provides robust functionality for creating and managing datasets. Datasets in Opik serve as collections of data items that can be used for various purposes, including evaluation.
Dataset Fundamentals
A dataset in Opik is a named collection of data items. Each dataset:
- Has a unique identifier and name
- Contains items that share a common structure
- Supports powerful deduplication capabilities
- Using for evaluation
TypeScript Type Safety
One of the key features of the Opik SDK is strong TypeScript typing support for datasets. You can define custom types for your dataset items to ensure type safety throughout your application:
Working with Datasets
Creating Datasets
Managing Dataset Items
The Opik SDK automatically handles deduplication when inserting items into a dataset. This feature ensures that identical items are not added multiple times.
Retrieving Dataset Items
Working with JSON
Working with Dataset Versions
Dataset versions are immutable snapshots. Use DatasetVersion for reproducible evaluations—ensuring the same data is used regardless of later changes.
Get a Specific Version
Check Current Version
Use in Evaluations
Pass a DatasetVersion to evaluate() for reproducible experiments:
When comparing experiments (A/B tests), use the same dataset version to isolate the effect of your changes from data variations.
API Reference
The generic type parameter T represents the DatasetItem type that defines
the structure of items stored in this dataset.
OpikClient Dataset Methods
createDataset<T>
Creates a new dataset.
Arguments:
name: string- The name of the datasetdescription?: string- Optional description of the dataset
Returns: Promise<Dataset<T>> - A promise that resolves to the created Dataset object
getDataset<T>
Retrieves an existing dataset by name.
Arguments:
name: string- The name of the dataset to retrieve
Returns: Promise<Dataset<T>> - A promise that resolves to the Dataset object
getOrCreateDataset<T>
Retrieves an existing dataset by name or creates it if it doesn’t exist.
Arguments:
name: string- The name of the datasetdescription?: string- Optional description (used only if creating a new dataset)
Returns: Promise<Dataset<T>> - A promise that resolves to the existing or newly created Dataset object
getDatasets<T>
Retrieves a list of datasets.
Arguments:
maxResults?: number- Optional maximum number of datasets to retrieve (default: 100)
Returns: Promise<Dataset<T>[]> - A promise that resolves to an array of Dataset objects
deleteDataset
Deletes a dataset by name.
Arguments:
name: string- The name of the dataset to delete
Returns: Promise<void>
Dataset Class Methods
insert
Inserts new items into the dataset with automatic deduplication.
Arguments:
items: T[]- List of objects to add to the dataset
Returns: Promise<void>
update
Updates existing items in the dataset.
Arguments:
items: T[]- List of objects to update in the dataset (must include IDs)
Returns: Promise<void>
delete
Deletes items from the dataset.
Arguments:
itemIds: string[]- List of item IDs to delete
Returns: Promise<void>
clear
Deletes all items from the dataset.
Returns: Promise<void>
getItems
Retrieves items from the dataset.
Arguments:
nbSamples?: number- Optional number of items to retrieve (if not set, all items are returned)lastRetrievedId?: string- Optional ID of the last retrieved item for pagination
Returns: Promise<T[]> - A promise that resolves to an array of dataset items
insertFromJson
Inserts items from a JSON string into the dataset.
Arguments:
jsonArray: string- JSON string in array formatkeysMapping?: Record<string, string>- Optional dictionary that maps JSON keys to dataset item field namesignoreKeys?: string[]- Optional array of keys to ignore when constructing dataset items
Returns: Promise<void>
toJson
Exports the dataset to a JSON string.
Arguments:
keysMapping?: Record<string, string>- Optional dictionary that maps dataset item field names to output JSON keys
Returns: Promise<string> - A JSON string representation of all items in the dataset
Dataset Version Methods
getVersionView
Get a read-only view of a specific dataset version.
Arguments:
versionName: string- The version name (e.g., “v1”, “v2”)
Returns: Promise<DatasetVersion<T>>
Throws: DatasetVersionNotFoundError if version doesn’t exist
getCurrentVersionName
Get the name of the latest version.
Returns: Promise<string | undefined> - Version name or undefined if no versions
getVersionInfo
Get metadata about the latest version.
Returns: Promise<DatasetVersionPublic | undefined> - Version info or undefined
DatasetVersion Class
A read-only view of dataset items at a specific version. Cannot modify data.
Properties
getItems
Retrieve items from this version.
Arguments:
nbSamples?: number- Number of items to retrieve (default: all)
Returns: Promise<T[]> - Array of dataset items
toJson
Export version items to JSON string.
Arguments:
keysMapping?: Record<string, string>- Map field names to output keys
Returns: Promise<string> - JSON string
getVersionInfo
Get the full version metadata object.
Returns: DatasetVersionPublic - Version info