For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
    • Overview
  • Python SDK
    • Reference
    • REST API Client
  • Typescript SDK
    • Overview
    • Opik TS
    • Prompts
    • Opik Query Language (OQL)
  • Rest API
    • Overview
        • POSTApply changes to dataset items
        • PATCHBatch update dataset items
        • GETFind datasets
        • POSTCreate dataset
        • PUTCreate/update dataset items
        • POSTCreate dataset items from CSV file
        • POSTCreate dataset items from spans
        • POSTCreate dataset items from traces
        • GETGet dataset by id
        • PUTUpdate dataset by id
        • DELDelete dataset by id
        • POSTDelete dataset by name
        • POSTDelete dataset items
        • POSTDelete datasets
        • GETDownload dataset export file
        • POSTExpand dataset with synthetic samples
        • GETFind dataset items with experiment items
        • POSTGet dataset by name
        • GETGet experiment items stats for dataset
        • GETGet dataset export job status
        • GETGet all dataset export jobs
        • GETGet dataset item by id
        • PATCHPartially update dataset item by id
        • GETGet dataset items
        • GETGet dataset items output columns
        • PUTMark dataset export job as viewed
        • POSTStart dataset CSV export
        • POSTStream dataset items
        • GETCompare latest version with draft
        • POSTCreate version tag
        • DELDelete version tag
        • GETList dataset versions
        • POSTRestore dataset to a previous version
        • POSTRetrieve dataset version by name
        • PATCHUpdate dataset version
LogoLogo
Copy to LLMGithubGo to App
Rest APIAPI ReferenceDatasets

Expand dataset with synthetic samples

POST
/v1/private/datasets/:id/expansions
POST
/api/v1/private/datasets/:id/expansions
$curl -X POST http://localhost:5173/api/v1/private/datasets/id/expansions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "gpt-4"
>}'
200Successful
1{
2 "generated_samples": [
3 {
4 "source": "manual",
5 "data": {},
6 "id": "string",
7 "dataset_item_id": "string",
8 "trace_id": "string",
9 "span_id": "string",
10 "description": "string",
11 "tags": [
12 "string"
13 ],
14 "evaluators": [
15 {
16 "name": "string",
17 "type": "llm_judge",
18 "config": {}
19 }
20 ],
21 "execution_policy": {
22 "runs_per_item": 1,
23 "pass_threshold": 1
24 },
25 "experiment_items": [
26 {
27 "experiment_id": "string",
28 "dataset_item_id": "string",
29 "trace_id": "string",
30 "id": "string",
31 "project_id": "string",
32 "project_name": "string",
33 "input": {},
34 "output": {},
35 "feedback_scores": [
36 {
37 "name": "string",
38 "value": 1.1,
39 "source": "ui",
40 "category_name": "string",
41 "reason": "string",
42 "created_at": "2024-01-15T09:30:00Z",
43 "last_updated_at": "2024-01-15T09:30:00Z",
44 "created_by": "string",
45 "last_updated_by": "string",
46 "value_by_author": {}
47 }
48 ],
49 "comments": [
50 {
51 "text": "string",
52 "id": "string",
53 "created_at": "2024-01-15T09:30:00Z",
54 "last_updated_at": "2024-01-15T09:30:00Z",
55 "created_by": "string",
56 "last_updated_by": "string"
57 }
58 ],
59 "total_estimated_cost": 1.1,
60 "duration": 1.1,
61 "usage": {},
62 "created_at": "2024-01-15T09:30:00Z",
63 "last_updated_at": "2024-01-15T09:30:00Z",
64 "created_by": "string",
65 "last_updated_by": "string",
66 "trace_visibility_mode": "default",
67 "description": "string",
68 "execution_policy": {
69 "runs_per_item": 1,
70 "pass_threshold": 1
71 },
72 "assertion_results": [
73 {
74 "value": "string",
75 "passed": true,
76 "reason": "string"
77 }
78 ],
79 "status": "passed"
80 }
81 ],
82 "run_summaries_by_experiment": {},
83 "dataset_id": "string",
84 "created_at": "2024-01-15T09:30:00Z",
85 "last_updated_at": "2024-01-15T09:30:00Z",
86 "created_by": "string",
87 "last_updated_by": "string"
88 }
89 ],
90 "model": "gpt-4",
91 "total_generated": 10,
92 "generation_time": "2024-01-15T09:30:00Z"
93}
Generate synthetic dataset samples using LLM based on existing data patterns
Was this page helpful?
Previous

Find dataset items with experiment items

Next
Built with

Path parameters

idstringRequiredformat: "uuid"

Request

This endpoint expects an object.
modelstringRequired>=1 character
The model to use for synthetic data generation
sample_countintegerOptional1-200
Number of synthetic samples to generate
preserve_fieldslist of stringsOptional
Fields to preserve patterns from original data
variation_instructionsstringOptional
Additional instructions for data variation
custom_promptstringOptional

Custom prompt to use for generation instead of auto-generated one

max_completion_tokensintegerOptional>=100
Maximum number of tokens for the LLM response. Required by Anthropic, used as maxOutputTokens for Gemini. If not provided, defaults to 4000 for Anthropic models only.

Response

Generated synthetic samples
generated_sampleslist of objects
List of generated synthetic dataset items
modelstring
Model used for generation
total_generatedinteger
Total number of samples generated
generation_timedatetimeRead-only
Generation timestamp