Evolutionary Optimizer: Genetic Algorthims

Discover optimal prompts with genetic algorithms and multi-objective optimization.

The EvolutionaryOptimizer uses genetic algorithms to refine and discover effective prompts. It iteratively evolves a population of prompts, applying selection, crossover, and mutation operations to find prompts that maximize a given evaluation metric. This optimizer can also perform multi-objective optimization (e.g., maximizing score while minimizing prompt length) and leverage LLMs for more sophisticated genetic operations.

When to Use This Optimizer: EvolutionaryOptimizer is a great choice when you want to explore a very diverse range of prompt structures or when you have multiple objectives to optimize for (e.g., performance score and prompt length). Its strength lies in its ability to escape local optima and discover novel prompt solutions through its evolutionary mechanisms, especially when enhanced with LLM-driven genetic operators.

Key Trade-offs:

  • Can be computationally intensive due to the need to evaluate a population of prompts over multiple generations.
  • The effectiveness of LLM-driven operators (enable_llm_crossover, LLM-based mutations) comes with additional LLM call costs.
  • Tuning genetic algorithm parameters (population size, mutation/crossover rates) might require some experimentation for optimal results on a specific task.
  • While it can find very novel prompts, it might take longer to converge on a solution compared to more directed optimizers for simpler tasks.

Curious about EvolutionaryOptimizer? The Optimizer & SDK FAQ provides answers to common questions, such as its strengths, how multi-objective optimization (enable_moo) works, the impact of parameters like population_size and num_generations, and the role of output_style_guidance.

How It Works

The EvolutionaryOptimizer is built upon the DEAP library for evolutionary computation. Here’s a breakdown of its core process:

  1. Initialization:

    • A population of candidate prompts is created. This can be based on an initial user-provided prompt, with variations generated by an LLM, or “fresh start” prompts based on the task description.
    • The optimizer can infer an output_style_guidance from the dataset or use a user-provided one. This guidance helps LLM-driven mutations and crossovers generate prompts that elicit responses in the desired style.
  2. Evaluation:

    • Each prompt (individual) in the population is evaluated against the provided dataset using the specified MetricConfig.
    • If multi-objective optimization (enable_moo=True) is active, multiple fitness values are calculated (e.g., primary metric score and prompt length). The default aims to maximize the primary score and minimize length.
  3. Selection:

    • Individuals are selected to become parents for the next generation.
    • For multi-objective optimization, NSGA-II selection (tools.selNSGA2) is used to maintain diversity and select individuals along the Pareto front.
    • For single-objective optimization, tournament selection (tools.selTournament) is typically used. Elitism can also preserve the best individuals.
  4. Crossover:

    • Selected parent prompts are combined to produce offspring.
    • Standard Crossover: Combines parts of parent prompts (e.g., sentence chunks or words).
    • LLM-driven Crossover (enable_llm_crossover=True): An LLM is used to intelligently blend two parent prompts, aiming to create superior child prompts that adhere to the output_style_guidance.
  5. Mutation:

    • Offspring prompts undergo random modifications to introduce new variations.
    • Word-level Mutation: Randomly replaces words with synonyms (via LLM), reorders words, or modifies phrases (via LLM).
    • Structural Mutation: Reorders sentences, combines adjacent sentences, or splits sentences.
    • Semantic Mutation (LLM-driven): An LLM rephrases, simplifies, elaborates, restructures, or focuses the prompt based on various strategies, guided by the output_style_guidance and task context.
    • Radical Innovation Mutation (LLM-driven): An LLM attempts to generate a significantly different and potentially much improved prompt.
    • Adaptive Mutation: The mutation rate can be dynamically adjusted based on population diversity and progress to escape local optima or fine-tune solutions.
  6. Replacement & Iteration:

    • The new generation of prompts (offspring, potentially with elites from the previous generation) replaces the old one.
    • The process (Evaluation, Selection, Crossover, Mutation) repeats for a specified number of num_generations.
  7. Result:

    • The best prompt found during the optimization (or the set of non-dominated solutions from the Pareto front in MOO) is returned as part of the OptimizationResult.

Each prompt in the population is evaluated (Step 2) using your specified MetricConfig against the dataset. This fitness scoring is what drives the evolutionary selection process. To better understand this crucial step, refer to Opik’s evaluation documentation:

Configuration Options

Basic Configuration

1from opik_optimizer import EvolutionaryOptimizer
2
3optimizer = EvolutionaryOptimizer(
4 model="openai/gpt-4", # LLM for evaluating prompts and for LLM-driven genetic operations
5 project_name="evolutionary_opt_project",
6 # Core Genetic Algorithm Parameters
7 population_size=30, # Number of prompts in each generation
8 num_generations=15, # Number of iterations the algorithm will run
9 mutation_rate=0.2, # Probability of mutating an individual
10 crossover_rate=0.8, # Probability of crossing over two individuals
11 tournament_size=4, # Size of the tournament for selection (if not MOO)
12 elitism_size=3, # Number of best individuals to carry over (if not MOO)
13 # Advanced Features
14 adaptive_mutation=True, # Dynamically adjust mutation rate
15 enable_moo=True, # Enable Multi-Objective Optimization (e.g., score vs. length)
16 # Default MOO weights are (1.0, -1.0) for (score, length)
17 enable_llm_crossover=True, # Use LLM for crossover operations
18 output_style_guidance=None, # Optional: Specific guidance for LLM-generated prompts' output style
19 # e.g., "Produce concise, factual, single-sentence answers."
20 infer_output_style=True, # If true and output_style_guidance is not set, infer from dataset
21 # Technical Parameters
22 num_threads=12, # Threads for parallel evaluation
23 seed=42, # Random seed for reproducibility
24 # LLM parameters (passed via **model_kwargs)
25 temperature=0.5, # Temperature for LLM calls (evaluation and reasoning)
26 max_tokens=1024
27)

Advanced Configuration

Key parameters include:

  • model: The primary LLM used for evaluating prompts and, by default, for LLM-driven genetic operations (mutation, crossover, population initialization).
  • population_size, num_generations, mutation_rate, crossover_rate: Standard GA parameters controlling the evolutionary process.
  • enable_moo: Set to True to optimize for multiple objectives. The default is score (maximize) and prompt length (minimize). Fitness weights can be customized by re-registering creator.FitnessMulti if needed before optimizer instantiation.
  • enable_llm_crossover: If True, uses an LLM to perform crossover, which can lead to more semantically meaningful children prompts.
  • adaptive_mutation: If True, the optimizer will adjust the mutation rate based on population diversity and improvement progress.
  • output_style_guidance: A string describing the desired style of the target LLM’s output when using the prompts being optimized. This helps LLM-driven mutations/crossovers generate prompts that are more likely to elicit correctly styled responses.
  • infer_output_style: If True and output_style_guidance is not explicitly provided, the optimizer will attempt to infer this style by analyzing examples from the dataset.
  • **model_kwargs: Additional keyword arguments (e.g., temperature, max_tokens) passed to the underlying LLM calls.

Example Usage

1from opik_optimizer import EvolutionaryOptimizer, TaskConfig, MetricConfig, from_llm_response_text, from_dataset_field
2from opik.evaluation.metrics import LevenshteinRatio # or any other suitable metric
3from opik_optimizer.demo import get_or_create_dataset
4
5# 1. Define your evaluation dataset
6dataset = get_or_create_dataset("tiny-test") # Replace with your actual dataset
7
8# 2. Configure the evaluation metric
9metric_config = MetricConfig(
10 metric=LevenshteinRatio(),
11 inputs={
12 "output": from_llm_response_text(),
13 "reference": from_dataset_field(name="label"), # Assuming 'label' is the ground truth field
14 }
15)
16
17# 3. Define your base prompt and task configuration
18initial_prompt = "Answer the following question truthfully and to the best of your ability."
19task_config = TaskConfig(
20 instruction_prompt=initial_prompt,
21 input_dataset_fields=["text"], # Field(s) from dataset used as input
22 output_dataset_field="label", # Field from dataset for expected output
23 use_chat_prompt=False # Or True, depending on your prompt and model
24)
25
26# 4. Initialize the EvolutionaryOptimizer
27optimizer = EvolutionaryOptimizer(
28 model="openai/gpt-4o-mini", # Choose your LLM for evaluation and reasoning
29 project_name="MyEvolutionaryOptimization",
30 population_size=20,
31 num_generations=10,
32 mutation_rate=0.25,
33 crossover_rate=0.75,
34 enable_moo=True, # Optimize for score and prompt length
35 enable_llm_crossover=True,
36 infer_output_style=True,
37 num_threads=8,
38 seed=123,
39 temperature=0.4
40)
41
42# 5. Run the optimization
43# n_samples controls how many dataset items are used for evaluating each prompt in each generation
44optimization_result = optimizer.optimize_prompt(
45 dataset=dataset,
46 metric_config=metric_config,
47 task_config=task_config,
48 n_samples=50 # Use 50 samples from the dataset for evaluation
49)
50
51# 6. View the results
52optimization_result.display()
53
54# For MOO, the result.prompt is often the one with the best primary score from the Pareto front.
55# You can access the full Pareto front from result.details:
56if optimizer.enable_moo and "pareto_front_solutions" in optimization_result.details:
57 print("\n--- Pareto Front Solutions ---")
58 for sol in optimization_result.details["pareto_front_solutions"]:
59 print(f"Score: {sol['score']:.4f}, Length: {sol['length']:.0f}, Prompt: '{sol['prompt'][:100]}...'")

Model Support

The EvolutionaryOptimizer uses LiteLLM for model interactions. Therefore, it supports all models available through LiteLLM. This includes models from OpenAI, Azure OpenAI, Anthropic, Google (Vertex AI / AI Studio), Mistral AI, Cohere, locally hosted models (e.g., via Ollama), and many others.

Refer to the LiteLLM documentation for a complete list and how to configure them. The model parameter in the constructor should be the LiteLLM model string (e.g., "openai/gpt-4o-mini", "azure/your-deployment", "gemini/gemini-1.5-pro-latest", "ollama_chat/llama3").

For detailed instructions on how to specify different models and configure providers, please refer to the main LiteLLM Support for Optimizers documentation page.

Best Practices

  1. Dataset Quality: A diverse and representative dataset is crucial for meaningful evaluation and evolution of prompts.
  2. Metric Selection: Choose a MetricConfig that accurately reflects the quality of the desired output for your specific task.
  3. Population Size & Generations: Larger populations and more generations can lead to better results but increase computation time and cost. Start with moderate values (e.g., population 20-50, generations 10-25) and adjust based on results and budget.
  4. LLM-driven Operations: enable_llm_crossover=True and the LLM-driven mutations can produce more creative and semantically relevant prompt variations but will increase LLM calls. Balance this with cost.
  5. Multi-Objective Optimization (MOO): If enable_moo=True, consider the trade-off between the primary metric (e.g., accuracy) and secondary objectives (e.g., prompt length). The Pareto front will give you a set of optimal trade-off solutions.
  6. Output Style Guidance: Leveraging output_style_guidance or infer_output_style can significantly help the LLM-driven genetic operators to create prompts that not only perform well but also elicit responses in the correct format or style.
  7. Seeding: Use the seed parameter for reproducible runs, especially during experimentation.
  8. n_samples for optimize_prompt: Carefully choose n_samples. Evaluating every prompt in the population against the full dataset for many generations can be slow. Using a representative subset (n_samples) speeds up evaluation per generation.

Research and References

Genetic algorithms and evolutionary computation are well-established fields. This optimizer draws inspiration from applying these classical techniques to the domain of prompt engineering, with enhancements using LLMs for more intelligent genetic operations. You can see some additional resources:

  • DEAP Library: The underlying evolutionary computation framework used.

Next Steps

1# Create an instance of the EvolutionaryOptimizer
2# This assumes you have your data prepared as shown in the "Preparing Your Data" section
3# and your evaluation function defined as in "Defining an Evaluation Function"
4optimizer = EvolutionaryOptimizer(
5 model="openai/gpt-4",
6 project_name="evolutionary_opt_project",
7 population_size=30,
8 num_generations=15,
9 mutation_rate=0.2,
10 crossover_rate=0.8,
11 tournament_size=4,
12 elitism_size=3,
13 adaptive_mutation=True,
14 enable_moo=True,
15 enable_llm_crossover=True,
16 output_style_guidance=None,
17 infer_output_style=True,
18 num_threads=12,
19 seed=42,
20 temperature=0.5,
21 max_tokens=1024
22)

Advanced Configuration

1# Example of advanced configuration
2advanced_optimizer = EvolutionaryOptimizer(
3 config=optimizer_config,
4 model="openai/gpt-4",
5 project_name="evolutionary_opt_project",
6 population_size=30,
7 num_generations=15,
8 mutation_rate=0.2,
9 crossover_rate=0.8,
10 tournament_size=4,
11 elitism_size=3,
12 adaptive_mutation=True,
13 enable_moo=True,
14 enable_llm_crossover=True,
15 output_style_guidance=None,
16 infer_output_style=True,
17 num_threads=12,
18 seed=42,
19 temperature=0.5,
20 max_tokens=1024
21)