Expand dataset with synthetic samples

POST

/v1/private/datasets/:id/expansions

POST

/api/v1/private/datasets/:id/expansions

$ curl -X POST http://localhost:5173/api/v1/private/datasets/id/expansions \
>      -H "Content-Type: application/json" \
>      -d '{
>   "model": "gpt-4"
> }'

200Successful

1 {
2   "generated_samples": [
3     {
4       "source": "manual",
5       "data": {},
6       "id": "string",
7       "dataset_item_id": "string",
8       "trace_id": "string",
9       "span_id": "string",
10       "description": "string",
11       "tags": [
12         "string"
13       ],
14       "evaluators": [
15         {
16           "name": "string",
17           "type": "llm_judge",
18           "config": {}
19         }
20       ],
21       "execution_policy": {
22         "runs_per_item": 1,
23         "pass_threshold": 1
24       },
25       "experiment_items": [
26         {
27           "experiment_id": "string",
28           "dataset_item_id": "string",
29           "trace_id": "string",
30           "id": "string",
31           "project_id": "string",
32           "project_name": "string",
33           "input": {},
34           "output": {},
35           "trace_metadata": {},
36           "feedback_scores": [
37             {
38               "name": "string",
39               "value": 1.1,
40               "source": "ui",
41               "category_name": "string",
42               "reason": "string",
43               "source_queue_id": "string",
44               "created_at": "2024-01-15T09:30:00Z",
45               "last_updated_at": "2024-01-15T09:30:00Z",
46               "created_by": "string",
47               "last_updated_by": "string",
48               "value_by_author": {}
49             }
50           ],
51           "comments": [
52             {
53               "text": "string",
54               "id": "string",
55               "source_queue_id": "string",
56               "created_at": "2024-01-15T09:30:00Z",
57               "last_updated_at": "2024-01-15T09:30:00Z",
58               "created_by": "string",
59               "last_updated_by": "string"
60             }
61           ],
62           "total_estimated_cost": 1.1,
63           "duration": 1.1,
64           "usage": {},
65           "created_at": "2024-01-15T09:30:00Z",
66           "last_updated_at": "2024-01-15T09:30:00Z",
67           "created_by": "string",
68           "last_updated_by": "string",
69           "trace_visibility_mode": "default",
70           "description": "string",
71           "execution_policy": {
72             "runs_per_item": 1,
73             "pass_threshold": 1
74           },
75           "assertion_results": [
76             {
77               "value": "string",
78               "passed": true,
79               "reason": "string"
80             }
81           ],
82           "status": "passed"
83         }
84       ],
85       "run_summaries_by_experiment": {},
86       "dataset_id": "string",
87       "created_at": "2024-01-15T09:30:00Z",
88       "last_updated_at": "2024-01-15T09:30:00Z",
89       "created_by": "string",
90       "last_updated_by": "string"
91     }
92   ],
93   "model": "gpt-4",
94   "total_generated": 10,
95   "generation_time": "2024-01-15T09:30:00Z"
96 }

Generate synthetic dataset samples using LLM based on existing data patterns

Path parameters

idstringRequiredformat: "uuid"

Request

This endpoint expects an object.

modelstringRequired>=1 character

The model to use for synthetic data generation

sample_countintegerOptional1-200

Number of synthetic samples to generate

preserve_fieldslist of stringsOptional

Fields to preserve patterns from original data

variation_instructionsstringOptional

Additional instructions for data variation

custom_promptstringOptional

Custom prompt to use for generation instead of auto-generated one

max_completion_tokensintegerOptional>=100

Maximum number of tokens for the LLM response. Required by Anthropic, used as maxOutputTokens for Gemini. If not provided, defaults to 4000 for Anthropic models only.

Response

Generated synthetic samples

generated_sampleslist of objects

List of generated synthetic dataset items

modelstring

Model used for generation

total_generatedinteger

Total number of samples generated

generation_timedatetimeRead-only

Generation timestamp