Structured Output Compliance | Opik Documentation

The StructuredOutputCompliance metric allows you to verify whether a given LLM output is valid JSON and adheres to an expected schema. You can optionally provide a Pydantic schema to validate the structure and types of the fields.

How to use the StructuredOutputCompliance metric

You can use the StructuredOutputCompliance metric as follows:

1 from opik.evaluation.metrics import StructuredOutputCompliance
2 from pydantic import BaseModel, Field
3 
4 class User(BaseModel):
5     name: str = Field(description="The name of the user")
6     age: int = Field(description="The age of the user")
7 
8 metric = StructuredOutputCompliance()
9 
10 # Example 1: Valid JSON, but not schema-compliant
11 metric.score(
12     output='{"name": "John Doe"}',
13     schema=User,
14 )
15 
16 # Example 2: Valid JSON and schema-compliant
17 metric.score(
18     output='{"name": "John Doe", "age": 30}',
19     schema=User,
20 )
21 
22 # Example 3: Invalid JSON
23 metric.score(
24     output='{"name": "John Doe", "age": }',
25 )

Asynchronous scoring is also supported with the ascore method.

The StructuredOutputCompliance score is 1 if the output is compliant, and 0 if it is not.

Prompt Template

Opik uses an LLM as a Judge to evaluate the structural compliance. The default model used is gpt-4o, but this can be changed to any model supported by LiteLLM. You can learn more about customizing models in the Customize models for LLM as a Judge metrics section.

The prompt used by the LLM looks like this:

1 You are an expert in structured data validation. Your task is to determine whether the given OUTPUT complies with the expected STRUCTURE. The structure may be described as a JSON schema, a Pydantic model, or simply implied to be valid JSON.
2 
3 Guidelines:
4 
5 1. OUTPUT must be a valid JSON object (not just a string).
6 2. If a schema is provided, the OUTPUT must match the schema exactly in field names, types, and structure.
7 3. If no schema is provided, ensure the OUTPUT is a well-formed and parsable JSON.
8 4. Common formatting issues (missing quotes, incorrect brackets, etc.) should be flagged.
9 5. Partial compliance is considered non-compliant.
10 6. Respond only in the specified JSON format.
11 
12 {examples_str}
13 
14 EXPECTED STRUCTURE (optional):
15 {schema}
16 
17 OUTPUT:
18 {output}
19 
20 Respond in the following JSON format:
21 {{
22     "score": true or false,  // true if output fully complies, false otherwise
23     "reason": ["list of reasons for failure or confirmation"]
24 }}