Datasets and testing
Overview
Datasets and testing are critical components of the Opik Optimizer’s effectiveness. This section outlines the requirements for datasets and the methodologies for testing to ensure optimal performance.
Dataset Requirements
Format
Datasets should be structured as a list of dictionaries with the following format:
Size Recommendations
- Minimum: 50 examples
- Provides basic coverage for optimization.
- Suitable for simple use cases.
- Optimal: 100-500 examples
- Better representation of real-world scenarios.
- More reliable optimization results.
- Maximum: Context window dependent
- Limited by model’s maximum context length.
- Consider token limits when preparing data.
Quality Guidelines
-
Diversity
- Cover various scenarios and edge cases.
- Include different types of inputs.
- Represent real-world usage patterns.
-
Accuracy
- Ensure ground truth outputs are correct.
- Validate all examples.
- Maintain consistency in formatting.
-
Representation
- Balance different types of examples.
- Include both simple and complex cases.
- Cover common and edge cases.
Testing Methodology
Evaluation Process
-
Dataset Splitting
- Training set (80%)
- Validation set (20%)
- Optional test set for final evaluation
-
Metrics
- Accuracy
- Precision
- Recall
- F1 Score
- Custom metrics (if applicable)
-
Baseline Comparison
- Compare against unoptimized prompts.
- Measure improvement percentage.
- Document performance changes.
Running Tests
Optimization Process
How Optimization Works
-
Data Utilization
- Input data is used to understand patterns.
- Output data guides prompt refinement.
- Examples inform few-shot learning.
-
Iterative Improvement
- Multiple trials per iteration.
- Performance-based refinement.
- Continuous evaluation.
-
Validation
- Regular testing against validation set.
- Performance monitoring.
- Early stopping if needed.
Best Practices
-
Dataset Preparation
- Clean and preprocess data.
- Ensure consistent formatting.
- Remove duplicates and noise.
-
Testing Strategy
- Use appropriate validation split.
- Monitor overfitting.
- Document all experiments.
-
Results Analysis
- Track performance metrics.
- Compare against baselines.
- Identify improvement patterns.
Common Questions
How many records should I use?
- Start with at least 50 examples.
- Aim for 100-500 for optimal results.
- Consider your specific use case requirements.
Does the algorithm use input and output data?
Yes, the optimization process:
- Uses input data to understand patterns.
- Leverages output data for refinement.
- Combines both for effective optimization.
How is the data used in optimization?
-
Pattern Recognition
- Analyze input-output relationships.
- Identify successful patterns.
- Learn from examples.
-
Prompt Refinement
- Generate improved prompts.
- Test against validation data.
- Iteratively optimize.
Next Steps
- Check FAQ for more questions and troubleshooting tips.
- Explore API Reference for technical details.