Few-Shot Bayesian Optimizer

Optimize few-shot examples for chat prompts with Bayesian techniques.

The FewShotBayesianOptimizer is a sophisticated prompt optimization tool that combines few-shot learning with Bayesian optimization techniques. It’s designed to iteratively improve prompts by learning from examples and systematically exploring the optimization space, specifically for chat-based prompts.

When to Use This Optimizer: Use FewShotBayesianOptimizer when your primary goal is to find the optimal number and combination of few-shot examples (demonstrations) to accompany your main instruction prompt, particularly for chat models. If your task performance heavily relies on the quality and relevance of in-context examples, this optimizer is ideal.

Key Trade-offs:

  • Focuses exclusively on selecting examples from the provided dataset; it does not optimize the prompt text itself.
  • The search space can be very large if the dataset for example selection is massive; n_iterations needs to be sufficient for adequate exploration.
  • Effectiveness depends on the quality and diversity of the examples within your dataset.

Have questions about FewShotBayesianOptimizer? Our Optimizer & SDK FAQ answers common questions, including when to use this optimizer, how parameters like min_examples and n_iterations work, and whether it modifies the main instruction prompt.

How It Works

This optimizer focuses on finding the optimal set and number of few-shot examples to include with your base instruction prompt for chat models. It uses Optuna, a hyperparameter optimization framework, to guide this search:

  1. Problem Definition: The core task is to select the best combination of few-shot examples from your provided dataset to append to the prompt. The number of examples to select can range from min_examples to max_examples.

  2. (Optional) Create the few-shot prompt template: As a first step the few-shot optimizer updates the prompt with the few-shot examples placeholder and creates a few-shot example template.

  3. Search Space: For each trial, Optuna suggests:

    • n_examples: The number of few-shot examples to use for this particular trial (within the min_examples and max_examples range).
    • example_indices: A list of specific indices pointing to items in your dataset that will serve as these n_examples.
  4. Prompt Construction: For each trial, the few-shot example placeholder is replaced with the examples selected in the search space.

  5. Evaluation: This constructed few-shot prompt is then evaluated against a portion (or all) of your dataset using the specified metric. The evaluation measures how well this particular combination of few-shot examples helps the model perform the task.

  6. Bayesian Optimization (via Optuna):

    • Optuna uses a Bayesian optimization algorithm (typically TPESampler - Tree-structured Parzen Estimator Sampler) to decide which combinations of n_examples and example_indices to try next.
    • It builds a probabilistic model of how different example choices affect the evaluation score and uses this model to balance exploration (trying new, uncertain combinations) and exploitation (focusing on combinations likely to yield high scores).
    • This process is repeated for n_iterations (number of trials).
  7. Result: After all trials, the optimizer returns the prompt with the best few-shot examples that achieved the best score.

Essentially, the FewShotBayesianOptimizer intelligently searches through the vast space of possible few-shot example combinations to find the set that best primes your chat model for the task.

The core of this optimizer relies on robust evaluation (Step 4) where each combination of few-shot examples is assessed using your metric against the dataset. Understanding Opik’s evaluation platform is key to effective use:

Configuration Options

Basic Configuration

1from opik_optimizer import FewShotBayesianOptimizer
2
3optimizer = FewShotBayesianOptimizer(
4 model="openai/gpt-4", # or "azure/gpt-4"
5 temperature=0.1,
6 max_tokens=5000,
7 n_threads=8,
8 seed=42
9)

Advanced Configuration

The FewShotBayesianOptimizer relies on the Optuna library, specifically using its TPESampler for Bayesian optimization. The primary configurable parameters are those exposed directly in its constructor:

  • model: The LLM used for evaluating prompts with different few-shot example combinations.
  • min_examples / max_examples: Define the range for the number of few-shot examples to be selected by Optuna during optimization.
  • n_iterations: The total number of optimization iterations (trials) Optuna will run.
  • n_initial_prompts: The number of initial random configurations Optuna will evaluate before starting its TPE sampling.

LLM call parameters (e.g., temperature, max_tokens, seed) are passed via keyword arguments (**model_kwargs) to the constructor.

The optimizer works by letting Optuna suggest the number of examples (n_examples) and which specific examples from the dataset to use for constructing few-shot prompts. These prompts are then evaluated, and Optuna uses the scores to guide its search for the best combination of examples and their count.

Example Usage

1from opik_optimizer import FewShotBayesianOptimizer
2from opik.evaluation.metrics import LevenshteinRatio
3from opik_optimizer import datasets, ChatPrompt
4
5# Initialize optimizer
6optimizer = FewShotBayesianOptimizer(
7 model="openai/gpt-4",
8 temperature=0.1,
9 max_tokens=5000
10)
11
12# Prepare dataset
13dataset = datasets.hotpot_300()
14
15# Define metric and prompt (see docs for more options)
16def levenshtein_ratio(dataset_item, llm_output):
17 return LevenshteinRatio().score(reference=dataset_item["answer"], output=llm_output)
18
19prompt = ChatPrompt(
20 project_name="my-project",
21 messages=[
22 {"role": "system", "content": "Provide an answer to the question."},
23 {"role": "user", "content": "{question}"}
24 ]
25)
26
27# Run optimization
28# Note: `num_trials` is an argument for older versions or a conceptual representation.
29# For current FewShotBayesianOptimizer, `n_iterations` is set at initialization.
30# The optimize_prompt method itself doesn't take num_trials.
31# We'll use a representative call here.
32results = optimizer.optimize_prompt(
33 prompt=prompt,
34 dataset=dataset,
35 metric=levenshtein_ratio
36)
37
38# Access results
39results.display()

Model Support

The FewShotBayesianOptimizer supports all models available through LiteLLM. This provides broad compatibility with providers like OpenAI, Azure OpenAI, Anthropic, Google, and many others, including locally hosted models.

For detailed instructions on how to specify different models and configure providers, please refer to the main LiteLLM Support for Optimizers documentation page.

Configuration Example using LiteLLM model string

1optimizer = FewShotBayesianOptimizer(
2 model="openai/gpt-4o-mini", # Example: Using OpenAI via LiteLLM
3 temperature=0.1,
4 max_tokens=5000
5)

Best Practices

  1. Dataset Preparation

    • Minimum 50 examples recommended
    • Diverse and representative samples
    • Clear input-output pairs
  2. Parameter Tuning

    • Start with default parameters
    • Adjust based on problem complexity
    • Monitor convergence metrics
  3. Evaluation Strategy

    • Use separate validation set
    • Track multiple metrics
    • Document optimization history
  4. Performance Optimization

    • Adjust n_threads based on resources
    • Balance min_examples and max_examples
    • Monitor memory usage

Research and References

Next Steps