Observability for Gretel with Opik
Gretel is a synthetic data platform that enables you to generate high-quality, privacy-safe datasets for AI model training and evaluation.
This guide explains how to integrate Opik with Gretel to create synthetic Q&A datasets and import them into Opik for model evaluation and optimization.
Account Setup
Comet provides a hosted version of the Opik platform, simply create an account and grab your API Key.
You can also run the Opik platform locally, see the installation guide for more information.
Getting Started
Installation
To use Gretel with Opik, you’ll need to have both the gretel-client
and opik
packages installed:
Configuring Opik
Configure the Opik Python SDK for your deployment type. See the Python SDK Configuration guide for detailed instructions on:
- CLI configuration:
opik configure
- Code configuration:
opik.configure()
- Self-hosted vs Cloud vs Enterprise setup
- Configuration files and environment variables
Configuring Gretel
In order to configure Gretel, you will need to have your Gretel API Key. You can create and manage your Gretel API Keys on this page.
You can set it as an environment variable:
Or set it programmatically:
Two Approaches Available
This integration demonstrates two methods for generating synthetic data with Gretel:
- Data Designer (recommended for custom datasets): Create datasets from scratch with precise control
- Safe Synthetics (recommended for existing data): Generate synthetic versions of existing datasets
Method 1: Using Gretel Data Designer
Generate Q&A Dataset
Use Gretel Data Designer to generate synthetic Q&A data with precise control over the structure:
Convert to Opik Format
Convert the Gretel-generated data to Opik’s expected format:
Upload to Opik
Upload your dataset to Opik for model evaluation:
Method 2: Using Gretel Safe Synthetics
Prepare Sample Data
If you have an existing Q&A dataset, you can use Safe Synthetics to create a synthetic version:
Generate Synthetic Version
Use Safe Synthetics to create a privacy-safe version of your dataset:
Convert and Upload to Opik
Convert the Safe Synthetics data to Opik format and upload:
Using with @track decorator
Use the @track
decorator to create comprehensive traces when working with your Gretel-generated datasets:
Results viewing
Once your Gretel-generated datasets are uploaded to Opik, you can view them in the Opik UI. Each dataset will contain:
- Input questions and expected answers
- Metadata including topic and difficulty levels
- Source information (Data Designer or Safe Synthetics)
- Quality metrics and evaluation results
Feedback Scores and Evaluation
Once your Gretel-generated datasets are in Opik, you can evaluate your LLM applications using Opik’s evaluation framework:
Dataset Size Requirements
When to Use Which Approach?
Environment Variables
Make sure to set the following environment variables:
Troubleshooting
Common Issues
- Authentication Errors: Ensure your Gretel API key is correct and has the necessary permissions
- Dataset Size: Safe Synthetics requires at least 200 records for holdout validation
- Model Suite: Ensure you’re using a compatible model suite (e.g., “apache-2.0”)
- Rate Limiting: Gretel may have rate limits; implement appropriate retry logic
Getting Help
- Check the Gretel Documentation for detailed API information
- Review the Gretel Data Designer Guide for advanced usage
- Contact Gretel support for API-specific problems
- Check Opik documentation for tracing and evaluation features
Next Steps
Once you have Gretel integrated with Opik, you can:
- Evaluate your LLM applications using Opik’s evaluation framework
- Create datasets to test and improve your models
- Set up feedback collection to gather human evaluations
- Monitor performance across different models and configurations