Datasets and testing

Overview

Datasets and testing are critical components of the Opik Optimizer’s effectiveness. This section outlines the requirements for datasets and the methodologies for testing to ensure optimal performance.

Dataset Requirements

Format

Datasets should be structured as a list of dictionaries with the following format:

1dataset = [
2 {
3 "input": "Your input text or query",
4 "output": "Expected output or response"
5 },
6 # ... more examples
7]

Size Recommendations

  • Minimum: 50 examples
    • Provides basic coverage for optimization.
    • Suitable for simple use cases.
  • Optimal: 100-500 examples
    • Better representation of real-world scenarios.
    • More reliable optimization results.
  • Maximum: Context window dependent
    • Limited by model’s maximum context length.
    • Consider token limits when preparing data.

Quality Guidelines

  1. Diversity

    • Cover various scenarios and edge cases.
    • Include different types of inputs.
    • Represent real-world usage patterns.
  2. Accuracy

    • Ensure ground truth outputs are correct.
    • Validate all examples.
    • Maintain consistency in formatting.
  3. Representation

    • Balance different types of examples.
    • Include both simple and complex cases.
    • Cover common and edge cases.

Testing Methodology

Evaluation Process

  1. Dataset Splitting

    • Training set (80%)
    • Validation set (20%)
    • Optional test set for final evaluation
  2. Metrics

    • Accuracy
    • Precision
    • Recall
    • F1 Score
    • Custom metrics (if applicable)
  3. Baseline Comparison

    • Compare against unoptimized prompts.
    • Measure improvement percentage.
    • Document performance changes.

Running Tests

1from opik.optimizer import FewShotOptimizer
2
3# Initialize optimizer
4optimizer = FewShotOptimizer(
5 model="openai/gpt-4",
6 project_name="test-project"
7)
8
9# Run optimization with testing
10results = optimizer.optimize(
11 dataset=training_data,
12 validation_data=validation_data,
13 num_trials=10,
14 metrics=["accuracy", "f1"]
15)
16
17# Access test results
18print(f"Training Accuracy: {results.training_accuracy}")
19print(f"Validation Accuracy: {results.validation_accuracy}")
20print(f"Improvement: {results.improvement_percentage}%")

Optimization Process

How Optimization Works

  1. Data Utilization

    • Input data is used to understand patterns.
    • Output data guides prompt refinement.
    • Examples inform few-shot learning.
  2. Iterative Improvement

    • Multiple trials per iteration.
    • Performance-based refinement.
    • Continuous evaluation.
  3. Validation

    • Regular testing against validation set.
    • Performance monitoring.
    • Early stopping if needed.

Best Practices

  1. Dataset Preparation

    • Clean and preprocess data.
    • Ensure consistent formatting.
    • Remove duplicates and noise.
  2. Testing Strategy

    • Use appropriate validation split.
    • Monitor overfitting.
    • Document all experiments.
  3. Results Analysis

    • Track performance metrics.
    • Compare against baselines.
    • Identify improvement patterns.

Common Questions

How many records should I use?

  • Start with at least 50 examples.
  • Aim for 100-500 for optimal results.
  • Consider your specific use case requirements.

Does the algorithm use input and output data?

Yes, the optimization process:

  • Uses input data to understand patterns.
  • Leverages output data for refinement.
  • Combines both for effective optimization.

How is the data used in optimization?

  1. Pattern Recognition

    • Analyze input-output relationships.
    • Identify successful patterns.
    • Learn from examples.
  2. Prompt Refinement

    • Generate improved prompts.
    • Test against validation data.
    • Iteratively optimize.

Next Steps

  • Check FAQ for more questions and troubleshooting tips.
  • Explore API Reference for technical details.