Evolutionary Optimizer: Genetic Algorthims
Discover optimal prompts with genetic algorithms and multi-objective optimization.
The EvolutionaryOptimizer
uses genetic algorithms to refine and discover effective prompts. It iteratively evolves a population of prompts, applying selection, crossover, and mutation operations to find prompts that maximize a given evaluation metric. This optimizer can also perform multi-objective optimization (e.g., maximizing score while minimizing prompt length) and leverage LLMs for more sophisticated genetic operations.
When to Use This Optimizer:
EvolutionaryOptimizer
is a great choice when you want to explore a very diverse range of prompt structures or when you have multiple objectives to optimize for (e.g., performance score and prompt length). Its strength lies in its ability to escape local optima and discover novel prompt solutions through its evolutionary mechanisms, especially when enhanced with LLM-driven genetic operators.
Key Trade-offs:
- Can be computationally intensive due to the need to evaluate a population of prompts over multiple generations.
- The effectiveness of LLM-driven operators (
enable_llm_crossover
, LLM-based mutations) comes with additional LLM call costs. - Tuning genetic algorithm parameters (population size, mutation/crossover rates) might require some experimentation for optimal results on a specific task.
- While it can find very novel prompts, it might take longer to converge on a solution compared to more directed optimizers for simpler tasks.
Curious about EvolutionaryOptimizer
?
The Optimizer & SDK FAQ provides answers to common questions, such as its strengths, how multi-objective optimization (enable_moo
) works, the impact of parameters like population_size
and num_generations
, and the role of output_style_guidance
.
How It Works
The EvolutionaryOptimizer
is built upon the DEAP library for evolutionary computation. Here’s a breakdown of its core process:
-
Initialization:
- A population of candidate prompts is created. This can be based on an initial user-provided prompt, with variations generated by an LLM, or “fresh start” prompts based on the task description.
- The optimizer can infer an
output_style_guidance
from the dataset or use a user-provided one. This guidance helps LLM-driven mutations and crossovers generate prompts that elicit responses in the desired style.
-
Evaluation:
- Each prompt (individual) in the population is evaluated against the provided dataset using the specified
MetricConfig
. - If multi-objective optimization (
enable_moo=True
) is active, multiple fitness values are calculated (e.g., primary metric score and prompt length). The default aims to maximize the primary score and minimize length.
- Each prompt (individual) in the population is evaluated against the provided dataset using the specified
-
Selection:
- Individuals are selected to become parents for the next generation.
- For multi-objective optimization, NSGA-II selection (
tools.selNSGA2
) is used to maintain diversity and select individuals along the Pareto front. - For single-objective optimization, tournament selection (
tools.selTournament
) is typically used. Elitism can also preserve the best individuals.
-
Crossover:
- Selected parent prompts are combined to produce offspring.
- Standard Crossover: Combines parts of parent prompts (e.g., sentence chunks or words).
- LLM-driven Crossover (
enable_llm_crossover=True
): An LLM is used to intelligently blend two parent prompts, aiming to create superior child prompts that adhere to theoutput_style_guidance
.
-
Mutation:
- Offspring prompts undergo random modifications to introduce new variations.
- Word-level Mutation: Randomly replaces words with synonyms (via LLM), reorders words, or modifies phrases (via LLM).
- Structural Mutation: Reorders sentences, combines adjacent sentences, or splits sentences.
- Semantic Mutation (LLM-driven): An LLM rephrases, simplifies, elaborates, restructures, or focuses the prompt based on various strategies, guided by the
output_style_guidance
and task context. - Radical Innovation Mutation (LLM-driven): An LLM attempts to generate a significantly different and potentially much improved prompt.
- Adaptive Mutation: The mutation rate can be dynamically adjusted based on population diversity and progress to escape local optima or fine-tune solutions.
-
Replacement & Iteration:
- The new generation of prompts (offspring, potentially with elites from the previous generation) replaces the old one.
- The process (Evaluation, Selection, Crossover, Mutation) repeats for a specified number of
num_generations
.
-
Result:
- The best prompt found during the optimization (or the set of non-dominated solutions from the Pareto front in MOO) is returned as part of the
OptimizationResult
.
- The best prompt found during the optimization (or the set of non-dominated solutions from the Pareto front in MOO) is returned as part of the
Each prompt in the population is evaluated (Step 2) using your specified MetricConfig
against the dataset
. This fitness scoring is what drives the evolutionary selection process. To better understand this crucial step, refer to Opik’s evaluation documentation:
Configuration Options
Basic Configuration
Advanced Configuration
Key parameters include:
model
: The primary LLM used for evaluating prompts and, by default, for LLM-driven genetic operations (mutation, crossover, population initialization).population_size
,num_generations
,mutation_rate
,crossover_rate
: Standard GA parameters controlling the evolutionary process.enable_moo
: Set toTrue
to optimize for multiple objectives. The default is score (maximize) and prompt length (minimize). Fitness weights can be customized by re-registeringcreator.FitnessMulti
if needed before optimizer instantiation.enable_llm_crossover
: IfTrue
, uses an LLM to perform crossover, which can lead to more semantically meaningful children prompts.adaptive_mutation
: IfTrue
, the optimizer will adjust the mutation rate based on population diversity and improvement progress.output_style_guidance
: A string describing the desired style of the target LLM’s output when using the prompts being optimized. This helps LLM-driven mutations/crossovers generate prompts that are more likely to elicit correctly styled responses.infer_output_style
: IfTrue
andoutput_style_guidance
is not explicitly provided, the optimizer will attempt to infer this style by analyzing examples from the dataset.**model_kwargs
: Additional keyword arguments (e.g.,temperature
,max_tokens
) passed to the underlying LLM calls.
Example Usage
Model Support
The EvolutionaryOptimizer
uses LiteLLM for model interactions. Therefore, it supports all models available through LiteLLM. This includes models from OpenAI, Azure OpenAI, Anthropic, Google (Vertex AI / AI Studio), Mistral AI, Cohere, locally hosted models (e.g., via Ollama), and many others.
Refer to the LiteLLM documentation for a complete list and how to configure them. The model
parameter in the constructor should be the LiteLLM model string (e.g., "openai/gpt-4o-mini"
, "azure/your-deployment"
, "gemini/gemini-1.5-pro-latest"
, "ollama_chat/llama3"
).
For detailed instructions on how to specify different models and configure providers, please refer to the main LiteLLM Support for Optimizers documentation page.
Best Practices
- Dataset Quality: A diverse and representative dataset is crucial for meaningful evaluation and evolution of prompts.
- Metric Selection: Choose a
MetricConfig
that accurately reflects the quality of the desired output for your specific task. - Population Size & Generations: Larger populations and more generations can lead to better results but increase computation time and cost. Start with moderate values (e.g., population 20-50, generations 10-25) and adjust based on results and budget.
- LLM-driven Operations:
enable_llm_crossover=True
and the LLM-driven mutations can produce more creative and semantically relevant prompt variations but will increase LLM calls. Balance this with cost. - Multi-Objective Optimization (MOO): If
enable_moo=True
, consider the trade-off between the primary metric (e.g., accuracy) and secondary objectives (e.g., prompt length). The Pareto front will give you a set of optimal trade-off solutions. - Output Style Guidance: Leveraging
output_style_guidance
orinfer_output_style
can significantly help the LLM-driven genetic operators to create prompts that not only perform well but also elicit responses in the correct format or style. - Seeding: Use the
seed
parameter for reproducible runs, especially during experimentation. n_samples
foroptimize_prompt
: Carefully choosen_samples
. Evaluating every prompt in the population against the full dataset for many generations can be slow. Using a representative subset (n_samples
) speeds up evaluation per generation.
Research and References
Genetic algorithms and evolutionary computation are well-established fields. This optimizer draws inspiration from applying these classical techniques to the domain of prompt engineering, with enhancements using LLMs for more intelligent genetic operations. You can see some additional resources:
- DEAP Library: The underlying evolutionary computation framework used.
Next Steps
- Explore other Optimization Algorithms
- Explore Dataset Requirements
- Try the Example Projects & Cookbooks for runnable Colab notebooks using this optimizer