Parameter Optimizer: Bayesian Parameter Tuning

Optimize LLM parameters like temperature and top_p with Bayesian techniques.

The ParameterOptimizer uses Bayesian optimization to tune LLM call parameters such as temperature, top_p, frequency_penalty, and other sampling parameters. Unlike other optimizers that modify the prompt itself, this optimizer keeps your prompt unchanged and focuses solely on finding the best parameter configuration for your specific task.

When to Use: Optimize LLM parameters (temperature, top_p) without changing your prompt. Best when you have a good prompt but need to tune model behavior.

Key Trade-offs: Requires defining parameter search space; doesn’t modify prompt text; uses two-phase Bayesian search.

Have questions about ParameterOptimizer? Our Optimizer & SDK FAQ answers common questions, including when to use this optimizer, how parameters like default_n_trials and local_search_ratio work, and how to define custom parameter search spaces.

How It Works

This optimizer uses Optuna, a hyperparameter optimization framework, to search for the best LLM parameters:

  1. Baseline Evaluation: First evaluates your prompt with its current parameters (or default parameters) to establish a baseline score.

  2. Parameter Space Definition: You define which parameters to optimize and their valid ranges using a ParameterSearchSpace. For example:

    • temperature: float between 0.0 and 2.0
    • top_p: float between 0.0 and 1.0
    • frequency_penalty: float between -2.0 and 2.0
  3. Global Search Phase:

    • Optuna explores the full parameter space using Bayesian optimization (TPESampler by default).
    • Tries various parameter combinations to find promising regions.
    • Evaluates each combination against your dataset using the specified metric.
  4. Local Search Phase (optional):

    • After global search, focuses on the best parameter region found.
    • Performs fine-grained optimization around the best parameters.
    • Controlled by local_search_ratio and local_search_scale.
  5. Parameter Importance Analysis:

    • Calculates which parameters had the most impact on performance.
    • Uses FANOVA importance (requires scikit-learn) or falls back to correlation-based sensitivity analysis.
  6. Result: Returns the best parameter configuration found, along with detailed optimization history and parameter importance rankings.

The optimizer intelligently balances exploration (trying diverse parameters) with exploitation (refining promising configurations) to efficiently find optimal settings.

The core of this optimizer relies on robust evaluation where each parameter configuration is assessed using your metric against the dataset. Understanding Opik’s evaluation platform is key to effective use:

Configuration Options

Basic Configuration

1from opik_optimizer import ParameterOptimizer
2from opik_optimizer.parameter_optimizer import ParameterSearchSpace
3
4optimizer = ParameterOptimizer(
5 model="openai/gpt-4",
6 default_n_trials=20, # Number of optimization trials
7 n_threads=4, # Parallel evaluation threads
8 seed=42
9)

Advanced Configuration

1optimizer = ParameterOptimizer(
2 model="openai/gpt-4",
3 default_n_trials=50, # More trials for thorough optimization
4 n_threads=8, # More parallel threads
5 local_search_ratio=0.3, # 30% of trials for local refinement
6 local_search_scale=0.2, # Scale of local search range
7 seed=42,
8 verbose=1 # Verbosity level (0=off, 1=info, 2=debug)
9)

The key parameters are:

  • model: The LLM used for evaluation with different parameter configurations.
  • default_n_trials: Default number of optimization trials (can be overridden in optimize_parameter).
  • n_threads: Number of parallel threads for evaluation (balance with API rate limits).
  • local_search_ratio: Ratio of trials dedicated to local search around best parameters (0.0-1.0).
  • local_search_scale: Scale factor for local search range (0.0 = no local search, higher = wider range).
  • seed: Random seed for reproducibility.
  • verbose: Logging level (0=warnings only, 1=info, 2=debug).

Example Usage

Basic Example

1from opik_optimizer import ParameterOptimizer, ChatPrompt
2from opik_optimizer.parameter_optimizer import ParameterSearchSpace
3from opik.evaluation.metrics import LevenshteinRatio
4from opik_optimizer import datasets
5
6# Initialize optimizer
7optimizer = ParameterOptimizer(
8 model="openai/gpt-4o-mini",
9 default_n_trials=30,
10 n_threads=8,
11 seed=42
12)
13
14# Prepare dataset
15dataset = datasets.hotpot_300()
16
17# Define metric
18def levenshtein_ratio(dataset_item, llm_output):
19 return LevenshteinRatio().score(reference=dataset_item["answer"], output=llm_output)
20
21# Define prompt (this stays unchanged)
22prompt = ChatPrompt(
23 project_name="parameter-optimization",
24 messages=[
25 {"role": "system", "content": "You are a helpful assistant."},
26 {"role": "user", "content": "{question}"}
27 ]
28)
29
30# Define parameter search space
31parameter_space = ParameterSearchSpace(
32 parameters=[
33 {
34 "name": "temperature",
35 "distribution": "float",
36 "low": 0.0,
37 "high": 2.0
38 },
39 {
40 "name": "top_p",
41 "distribution": "float",
42 "low": 0.1,
43 "high": 1.0
44 }
45 ]
46)
47
48# Run optimization
49results = optimizer.optimize_parameter(
50 prompt=prompt,
51 dataset=dataset,
52 metric=levenshtein_ratio,
53 parameter_space=parameter_space,
54 n_samples=100
55)
56
57# Access results
58results.display()
59print(f"Best temperature: {results.details['optimized_parameters']['temperature']}")
60print(f"Best top_p: {results.details['optimized_parameters']['top_p']}")
61print(f"Parameter importance: {results.details['parameter_importance']}")

Advanced Example with Custom Parameters

1# Optimize more parameters including model selection
2parameter_space = ParameterSearchSpace(
3 parameters=[
4 {
5 "name": "temperature",
6 "distribution": "float",
7 "low": 0.0,
8 "high": 1.5,
9 "step": 0.05 # Optional: quantize values
10 },
11 {
12 "name": "top_p",
13 "distribution": "float",
14 "low": 0.5,
15 "high": 1.0
16 },
17 {
18 "name": "frequency_penalty",
19 "distribution": "float",
20 "low": -1.0,
21 "high": 1.0
22 },
23 {
24 "name": "presence_penalty",
25 "distribution": "float",
26 "low": -1.0,
27 "high": 1.0
28 },
29 {
30 "name": "model",
31 "distribution": "categorical",
32 "choices": ["openai/gpt-4o-mini", "openai/gpt-4o", "openai/gpt-4-turbo"]
33 }
34 ]
35)
36
37# Run with more trials and custom Optuna sampler
38import optuna
39
40results = optimizer.optimize_parameter(
41 prompt=prompt,
42 dataset=dataset,
43 metric=levenshtein_ratio,
44 parameter_space=parameter_space,
45 n_trials=100, # Override default_n_trials
46 n_samples=150,
47 sampler=optuna.samplers.TPESampler(seed=42, n_startup_trials=20)
48)

Parameter Search Space

The ParameterSearchSpace defines which parameters to optimize and their valid ranges. It supports:

Float Parameters

1{
2 "name": "temperature",
3 "distribution": "float",
4 "low": 0.0,
5 "high": 2.0,
6 "step": 0.1, # Optional: quantize to 0.1 increments
7 "log": False # Optional: use log scale for sampling
8}

Integer Parameters

1{
2 "name": "max_tokens",
3 "distribution": "int",
4 "low": 100,
5 "high": 4000,
6 "step": 100, # Optional: sample in steps of 100
7 "log": False # Optional: use log scale
8}

Categorical Parameters

1{
2 "name": "model",
3 "distribution": "categorical",
4 "choices": ["gpt-4o-mini", "gpt-4o", "claude-3-haiku"]
5}

Boolean Parameters

1{
2 "name": "stream",
3 "distribution": "bool"
4}

Targeting Nested Parameters

You can optimize nested parameters in model_kwargs:

1{
2 "name": "model_kwargs.response_format.type",
3 "distribution": "categorical",
4 "choices": ["text", "json_object"]
5}

Model Support

The ParameterOptimizer supports all models available through LiteLLM. This provides broad compatibility with providers like OpenAI, Azure OpenAI, Anthropic, Google, and many others, including locally hosted models.

Configuration Example using LiteLLM model string

1optimizer = ParameterOptimizer(
2 model="openai/gpt-4o-mini", # Using OpenAI via LiteLLM
3 default_n_trials=30,
4 n_threads=8
5)

Best Practices

  1. Start Simple

    • Begin with 1-2 key parameters (e.g., temperature, top_p)
    • Add more parameters once you understand their impact
    • Too many parameters increases search space and trial requirements
  2. Define Reasonable Ranges

    • Use tighter ranges based on domain knowledge
    • For temperature: 0.0-1.0 for factual tasks, 0.5-1.5 for creative tasks
    • For top_p: 0.8-1.0 for most tasks
  3. Trial Budget

    • Start with 20-30 trials for 2-3 parameters
    • Increase to 50-100 trials for 4+ parameters
    • Monitor convergence - stop if improvements plateau
  4. Local Search

    • Use local_search_ratio=0.3 (default) for refinement
    • Increase to 0.4-0.5 if global search found good region quickly
    • Decrease to 0.1-0.2 for more exploration
  5. Parallel Evaluation

    • Set n_threads based on API rate limits
    • More threads = faster optimization but may hit limits
    • Balance speed with cost and rate limit constraints
  6. Parameter Importance

    • Check parameter_importance in results
    • Focus future optimization on high-impact parameters
    • Consider fixing low-impact parameters to reduce search space
  7. Validation

    • Test final parameters on held-out validation set
    • Verify improvements generalize beyond training data
    • Consider parameter stability across different datasets

Research and References

Next Steps

  1. Explore specific Optimizers for algorithm details.
  2. Refer to the FAQ for common questions and troubleshooting.
  3. Refer to the API Reference for detailed configuration options.