Datasets
The Opik TypeScript SDK provides robust functionality for creating and managing datasets. Datasets in Opik serve as collections of data items that can be used for various purposes, including evaluation.
Dataset Fundamentals
A dataset in Opik is a named collection of data items. Each dataset:
- Has a unique identifier and name
- Contains items that share a common structure
- Supports powerful deduplication capabilities
- Using for evaluation
TypeScript Type Safety
One of the key features of the Opik SDK is strong TypeScript typing support for datasets. You can define custom types for your dataset items to ensure type safety throughout your application:
Working with Datasets
Creating Datasets
Managing Dataset Items
The Opik SDK automatically handles deduplication when inserting items into a dataset. This feature ensures that identical items are not added multiple times.
Retrieving Dataset Items
Working with JSON
API Reference
The generic type parameter T
represents the DatasetItem type that defines
the structure of items stored in this dataset.
OpikClient Dataset Methods
createDataset<T>
Creates a new dataset.
Arguments:
name: string
- The name of the datasetdescription?: string
- Optional description of the dataset
Returns: Promise<Dataset<T>>
- A promise that resolves to the created Dataset object
getDataset<T>
Retrieves an existing dataset by name.
Arguments:
name: string
- The name of the dataset to retrieve
Returns: Promise<Dataset<T>>
- A promise that resolves to the Dataset object
getOrCreateDataset<T>
Retrieves an existing dataset by name or creates it if it doesn’t exist.
Arguments:
name: string
- The name of the datasetdescription?: string
- Optional description (used only if creating a new dataset)
Returns: Promise<Dataset<T>>
- A promise that resolves to the existing or newly created Dataset object
getDatasets<T>
Retrieves a list of datasets.
Arguments:
maxResults?: number
- Optional maximum number of datasets to retrieve (default: 100)
Returns: Promise<Dataset<T>[]>
- A promise that resolves to an array of Dataset objects
deleteDataset
Deletes a dataset by name.
Arguments:
name: string
- The name of the dataset to delete
Returns: Promise<void>
Dataset Class Methods
insert
Inserts new items into the dataset with automatic deduplication.
Arguments:
items: T[]
- List of objects to add to the dataset
Returns: Promise<void>
update
Updates existing items in the dataset.
Arguments:
items: T[]
- List of objects to update in the dataset (must include IDs)
Returns: Promise<void>
delete
Deletes items from the dataset.
Arguments:
itemIds: string[]
- List of item IDs to delete
Returns: Promise<void>
clear
Deletes all items from the dataset.
Returns: Promise<void>
getItems
Retrieves items from the dataset.
Arguments:
nbSamples?: number
- Optional number of items to retrieve (if not set, all items are returned)lastRetrievedId?: string
- Optional ID of the last retrieved item for pagination
Returns: Promise<T[]>
- A promise that resolves to an array of dataset items
insertFromJson
Inserts items from a JSON string into the dataset.
Arguments:
jsonArray: string
- JSON string in array formatkeysMapping?: Record<string, string>
- Optional dictionary that maps JSON keys to dataset item field namesignoreKeys?: string[]
- Optional array of keys to ignore when constructing dataset items
Returns: Promise<void>
toJson
Exports the dataset to a JSON string.
Arguments:
keysMapping?: Record<string, string>
- Optional dictionary that maps dataset item field names to output JSON keys
Returns: Promise<string>
- A JSON string representation of all items in the dataset