Datasets Client¶
The Datasets client provides methods for managing datasets in the Opik platform.
- class opik.rest_api.datasets.client.DatasetsClient(*, client_wrapper: SyncClientWrapper)¶
Bases:
object- find_datasets(*, page: int | None = None, size: int | None = None, with_experiments_only: bool | None = None, with_optimizations_only: bool | None = None, prompt_id: str | None = None, name: str | None = None, sorting: str | None = None, filters: str | None = None, request_options: RequestOptions | None = None) DatasetPagePublic¶
Find datasets
- Parameters:
page (Optional[int])
size (Optional[int])
with_experiments_only (Optional[bool])
with_optimizations_only (Optional[bool])
prompt_id (Optional[str])
name (Optional[str])
sorting (Optional[str])
filters (Optional[str])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Dataset resource
- Return type:
- create_dataset(*, name: str, id: str | None = OMIT, visibility: Literal['private', 'public'] | Any | None = OMIT, tags: Sequence[str] | None = OMIT, description: str | None = OMIT, request_options: RequestOptions | None = None) None¶
Create dataset
- Parameters:
name (str)
id (Optional[str])
visibility (Optional[DatasetWriteVisibility])
tags (Optional[Sequence[str]])
description (Optional[str])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Return type:
None
- create_or_update_dataset_items(*, items: Sequence[DatasetItemWrite], dataset_name: str | None = OMIT, dataset_id: str | None = OMIT, request_options: RequestOptions | None = None) None¶
Create/update dataset items based on dataset item id
- Parameters:
items (Sequence[DatasetItemWrite])
dataset_name (Optional[str]) – If null, dataset_id must be provided
dataset_id (Optional[str]) – If null, dataset_name must be provided
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Return type:
None
- get_dataset_by_id(id: str, *, request_options: RequestOptions | None = None) DatasetPublic¶
Get dataset by id
- Parameters:
id (str)
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Dataset resource
- Return type:
- update_dataset(id: str, *, name: str, description: str | None = OMIT, visibility: Literal['private', 'public'] | Any | None = OMIT, tags: Sequence[str] | None = OMIT, request_options: RequestOptions | None = None) None¶
Update dataset by id
- Parameters:
id (str)
name (str)
description (Optional[str])
visibility (Optional[DatasetUpdateVisibility])
tags (Optional[Sequence[str]])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Return type:
None
- delete_dataset(id: str, *, request_options: RequestOptions | None = None) None¶
Delete dataset by id
- Parameters:
id (str)
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Return type:
None
- delete_dataset_by_name(*, dataset_name: str, request_options: RequestOptions | None = None) None¶
Delete dataset by name
- Parameters:
dataset_name (str)
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Return type:
None
- delete_dataset_items(*, item_ids: Sequence[str], request_options: RequestOptions | None = None) None¶
Delete dataset items
- Parameters:
item_ids (Sequence[str])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Return type:
None
- delete_datasets_batch(*, ids: Sequence[str], request_options: RequestOptions | None = None) None¶
Delete datasets batch
- Parameters:
ids (Sequence[str])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Return type:
None
- expand_dataset(id: str, *, model: str, sample_count: int | None = OMIT, preserve_fields: Sequence[str] | None = OMIT, variation_instructions: str | None = OMIT, custom_prompt: str | None = OMIT, request_options: RequestOptions | None = None) DatasetExpansionResponse¶
Generate synthetic dataset samples using LLM based on existing data patterns
- Parameters:
id (str)
model (str) – The model to use for synthetic data generation
sample_count (Optional[int]) – Number of synthetic samples to generate
preserve_fields (Optional[Sequence[str]]) – Fields to preserve patterns from original data
variation_instructions (Optional[str]) – Additional instructions for data variation
custom_prompt (Optional[str]) – Custom prompt to use for generation instead of auto-generated one
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Generated synthetic samples
- Return type:
- find_dataset_items_with_experiment_items(id: str, *, experiment_ids: str, page: int | None = None, size: int | None = None, filters: str | None = None, sorting: str | None = None, search: str | None = None, truncate: bool | None = None, request_options: RequestOptions | None = None) DatasetItemPageCompare¶
Find dataset items with experiment items
- Parameters:
id (str)
experiment_ids (str)
page (Optional[int])
size (Optional[int])
filters (Optional[str])
sorting (Optional[str])
search (Optional[str])
truncate (Optional[bool])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Dataset item resource
- Return type:
- get_dataset_by_identifier(*, dataset_name: str, request_options: RequestOptions | None = None) DatasetPublic¶
Get dataset by name
- Parameters:
dataset_name (str)
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Dataset resource
- Return type:
- get_dataset_experiment_items_stats(id: str, *, experiment_ids: str, filters: str | None = None, request_options: RequestOptions | None = None) ProjectStatsPublic¶
Get experiment items stats for dataset
- Parameters:
id (str)
experiment_ids (str)
filters (Optional[str])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Experiment items stats resource
- Return type:
- get_dataset_item_by_id(item_id: str, *, request_options: RequestOptions | None = None) DatasetItemPublic¶
Get dataset item by id
- Parameters:
item_id (str)
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Dataset item resource
- Return type:
- get_dataset_items(id: str, *, page: int | None = None, size: int | None = None, filters: str | None = None, truncate: bool | None = None, request_options: RequestOptions | None = None) DatasetItemPagePublic¶
Get dataset items
- Parameters:
id (str)
page (Optional[int])
size (Optional[int])
filters (Optional[str])
truncate (Optional[bool])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Dataset items resource
- Return type:
- get_dataset_items_output_columns(id: str, *, experiment_ids: str | None = None, request_options: RequestOptions | None = None) PageColumns¶
Get dataset items output columns
- Parameters:
id (str)
experiment_ids (Optional[str])
request_options (Optional[RequestOptions]) – Request-specific configuration.
- Returns:
Dataset item output columns
- Return type:
- stream_dataset_items(*, dataset_name: str, last_retrieved_id: str | None = OMIT, steam_limit: int | None = OMIT, request_options: RequestOptions | None = None) Iterator[bytes]¶
Stream dataset items
- Parameters:
dataset_name (str)
last_retrieved_id (Optional[str])
steam_limit (Optional[int])
request_options (Optional[RequestOptions]) – Request-specific configuration. You can pass in configuration such as chunk_size, and more to customize the request and response.
- Returns:
Dataset items stream or error during process
- Return type:
Iterator[bytes]
Usage Example¶
import opik
client = opik.Opik()
# Find datasets
datasets = client.rest_client.datasets.find_datasets(
page=0,
size=10
)
# Get a dataset by ID
dataset = client.rest_client.datasets.get_dataset_by_id("dataset-id")
# Create a new dataset
client.rest_client.datasets.create_dataset(
name="my-dataset",
description="A test dataset"
)
# Get dataset items
items = client.rest_client.datasets.get_dataset_items(
dataset_id="dataset-id",
page=0,
size=100
)