For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Copy to LLMGithubGo to App
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
DocumentationIntegrationsBuilding Self-Improving AgentsSelf-hosting OpikSDK & API reference
  • Getting Started
    • Home
    • Quickstart
    • Upgrading to Opik 2.0
    • Ollie Agent
    • FAQ
    • Changelog
  • Observability
    • Overview
    • Getting started
    • Concepts
    • Debugging agents with Ollie and Opik Connect
  • Development
    • Overview
    • Agent playground
    • Prompt playground
  • Evaluation
    • Overview
    • Getting started
    • Concepts
      • Overview
      • Heuristic metrics
      • Hallucination
      • LLM Juries
      • G-Eval
      • Conversation-level GEval
      • Compliance risk
      • Prompt uncertainty
      • Moderation
      • Meaning Match
      • Usefulness
      • Summarization consistency
      • Summarization coherence
      • Dialogue helpfulness
      • Answer relevance
      • Context precision
      • Context recall
      • Trajectory accuracy
      • Agent task completion
      • Agent tool correctness
      • Conversational metrics
      • Custom model
      • Advanced configuration
      • Custom metric
      • Custom conversation metric
      • Structured Output Compliance
      • Task span metrics
  • Production
  • Administration
    • Overview
    • Roles and Permissions
  • Contributing
    • Contribution Overview
LogoLogo
Copy to LLMGithubGo to App
On this page
  • How to use the StructuredOutputCompliance metric
  • Prompt Template
EvaluationMetrics

Structured Output Compliance

Was this page helpful?
Previous

Task span metrics

Next
Built with

The StructuredOutputCompliance metric allows you to verify whether a given LLM output is valid JSON and adheres to an expected schema. You can optionally provide a Pydantic schema to validate the structure and types of the fields.

How to use the StructuredOutputCompliance metric

You can use the StructuredOutputCompliance metric as follows:

1from opik.evaluation.metrics import StructuredOutputCompliance
2from pydantic import BaseModel, Field
3
4class User(BaseModel):
5 name: str = Field(description="The name of the user")
6 age: int = Field(description="The age of the user")
7
8metric = StructuredOutputCompliance()
9
10# Example 1: Valid JSON, but not schema-compliant
11metric.score(
12 output='{"name": "John Doe"}',
13 schema=User,
14)
15
16# Example 2: Valid JSON and schema-compliant
17metric.score(
18 output='{"name": "John Doe", "age": 30}',
19 schema=User,
20)
21
22# Example 3: Invalid JSON
23metric.score(
24 output='{"name": "John Doe", "age": }',
25)

Asynchronous scoring is also supported with the ascore method.

The StructuredOutputCompliance score is 1 if the output is compliant, and 0 if it is not.

Prompt Template

Opik uses an LLM as a Judge to evaluate the structural compliance. The default model used is gpt-4o, but this can be changed to any model supported by LiteLLM.

The prompt used by the LLM looks like this:

1You are an expert in structured data validation. Your task is to determine whether the given OUTPUT complies with the expected STRUCTURE. The structure may be described as a JSON schema, a Pydantic model, or simply implied to be valid JSON.
2
3Guidelines:
4
51. OUTPUT must be a valid JSON object (not just a string).
62. If a schema is provided, the OUTPUT must match the schema exactly in field names, types, and structure.
73. If no schema is provided, ensure the OUTPUT is a well-formed and parsable JSON.
84. Common formatting issues (missing quotes, incorrect brackets, etc.) should be flagged.
95. Partial compliance is considered non-compliant.
106. Respond only in the specified JSON format.
11
12{examples_str}
13
14EXPECTED STRUCTURE (optional):
15{schema}
16
17OUTPUT:
18{output}
19
20Respond in the following JSON format:
21{{
22 "score": true or false, // true if output fully complies, false otherwise
23 "reason": ["list of reasons for failure or confirmation"]
24}}