Structured Output Compliance

The StructuredOutputCompliance metric allows you to verify whether a given LLM output is valid JSON and adheres to an expected schema. You can optionally provide a Pydantic schema to validate the structure and types of the fields.

How to use the StructuredOutputCompliance metric

You can use the StructuredOutputCompliance metric as follows:

1from opik.evaluation.metrics import StructuredOutputCompliance
2from pydantic import BaseModel, Field
3
4class User(BaseModel):
5 name: str = Field(description="The name of the user")
6 age: int = Field(description="The age of the user")
7
8metric = StructuredOutputCompliance()
9
10# Example 1: Valid JSON, but not schema-compliant
11metric.score(
12 output='{"name": "John Doe"}',
13 schema=User,
14)
15
16# Example 2: Valid JSON and schema-compliant
17metric.score(
18 output='{"name": "John Doe", "age": 30}',
19 schema=User,
20)
21
22# Example 3: Invalid JSON
23metric.score(
24 output='{"name": "John Doe", "age": }',
25)

Asynchronous scoring is also supported with the ascore method.

The StructuredOutputCompliance score is 1 if the output is compliant, and 0 if it is not.

Prompt Template

Opik uses an LLM as a Judge to evaluate the structural compliance. The default model used is gpt-4o, but this can be changed to any model supported by LiteLLM. You can learn more about customizing models in the Customize models for LLM as a Judge metrics section.

The prompt used by the LLM looks like this:

1You are an expert in structured data validation. Your task is to determine whether the given OUTPUT complies with the expected STRUCTURE. The structure may be described as a JSON schema, a Pydantic model, or simply implied to be valid JSON.
2
3Guidelines:
4
51. OUTPUT must be a valid JSON object (not just a string).
62. If a schema is provided, the OUTPUT must match the schema exactly in field names, types, and structure.
73. If no schema is provided, ensure the OUTPUT is a well-formed and parsable JSON.
84. Common formatting issues (missing quotes, incorrect brackets, etc.) should be flagged.
95. Partial compliance is considered non-compliant.
106. Respond only in the specified JSON format.
11
12{examples_str}
13
14EXPECTED STRUCTURE (optional):
15{schema}
16
17OUTPUT:
18{output}
19
20Respond in the following JSON format:
21{{
22 "score": true or false, // true if output fully complies, false otherwise
23 "reason": ["list of reasons for failure or confirmation"]
24}}