Multiple Completions (n parameter)
Introduce variety at each trial with pass@k evaluation
When optimizing prompts, single-sample evaluation can be noisy - a good prompt might fail on a particular trial due to LLM stochasticity. The n parameter lets you generate multiple candidate outputs per evaluation and select the best one, introducing variety and reducing evaluation variance.
Available in Opik Optimizer v3.0.0+.
How It Works
When you set n > 1 in your prompt’s model_parameters, the optimizer requests N completions per evaluation, scores each candidate, selects the best one, and logs all scores to the trace. The full explanation of how the n parameter works is maintained in Sampling controls.
Configuration
Set the n parameter in your ChatPrompt.model_parameters:
Higher temperature values increase diversity between the N candidates. Consider using temperature: 0.7-1.0 with n > 1 to maximize variety.
The low-level call_model and call_model_async helpers return a single
response unless you pass return_all=True. Optimizers handle n internally,
so you only need return_all when calling those helpers directly.
Use Cases
Reducing Evaluation Variance
Single-sample evaluation is noisy. With n=3, the optimizer scores each candidate and uses the best result, which makes optimization more robust to stochastic failures.
Pass@k Style Optimization
Inspired by code generation benchmarks (pass@k), this approach measures whether a prompt can produce correct output, not just whether it usually does.
This is useful when:
- Correctness matters more than consistency
- You’ll use majority voting or best-of-k at inference time
- Tasks have high variance (creative writing, complex reasoning)
Handling Stochastic Tasks
Some tasks naturally have multiple valid answers. Using n > 1 helps the optimizer find prompts that can generate any valid answer.
Selection Policy
Currently, the optimizer supports these selection policies:
best_by_metric(default): score each candidate with the metric and pick the best.first: pick the first candidate (fast, deterministic, but ignores scoring).concat: join all candidates into one output string.random: pick a random candidate (seeded if provided).max_logprob: pick the candidate with the highest average token logprob (provider support required; logprobs must be enabled in model kwargs).
Use the selection_policy key in model_parameters to override. The optimizer
routes these policies through a shared candidate-selection utility so behavior
is consistent across optimizers:
For max_logprob, enable logprobs in your model kwargs (provider support varies):
When selection_policy=best_by_metric, the optimizer:
- Each candidate is scored independently using your metric function
- The candidate with the highest score is selected as the final output
- All scores and the chosen index are logged to the trace metadata
The trace metadata includes:
n_requested: Number of completions requestedcandidates_scored: Number of candidates evaluatedcandidate_scores: List of all scores (best_by_metric only)candidate_logprobs: List of logprob scores (max_logprob only)chosen_index: Index of the selected candidate
Cost Considerations
Using n > 1 increases API costs proportionally. With n=3, you pay roughly 3x the completion tokens per evaluation call.
Recommendations:
- Start with
n=3for most use cases - Use
n=5-10only for high-variance tasks - Consider the total optimization budget when choosing N
Limitations
Tool-calling forces n=1
When allow_tool_use=True and tools are defined, the optimizer forces n=1. This is because tool-calling requires maintaining a coherent message thread, which isn’t compatible with multiple independent completions.
Some optimizers ignore n
Prompt synthesis steps that expect a single structured response (such as few-shot and parameter optimizers) ignore n to avoid returning multiple conflicting templates.
Not all providers support n
Some LLM providers don’t support the n parameter. Check your provider’s documentation. LiteLLM will drop unsupported parameters automatically.
Full Example
Related
- Optimize prompts - Core optimization guide
- Define metrics - Create custom metrics
- Custom metrics - Advanced metric patterns
- API Reference - Full parameter documentation