ContextRecall¶
- class opik.evaluation.metrics.ContextRecall(model: str | OpikBaseModel | None = None, name: str = 'context_recall_metric', few_shot_examples: List[FewShotExampleContextRecall] | None = None, track: bool = True, project_name: str | None = None, seed: int | None = None, temperature: float | None = None)¶
- Bases: - BaseMetric- A metric that evaluates the context recall of an input-output pair using an LLM. - This metric uses a language model to assess how well the given output incorporates the provided context for the given input. It returns a score between 0.0 and 1.0, where higher values indicate better context recall. - Parameters:
- model – The language model to use for evaluation. Can be a string (model name) or an opik.evaluation.models.OpikBaseModel subclass instance. opik.evaluation.models.LiteLLMChatModel is used by default. 
- name – The name of the metric. Defaults to “ContextRecallMetric”. 
- few_shot_examples – A list of few-shot examples to provide to the model. If None, uses the default few-shot examples. 
- track – Whether to track the metric. Defaults to True. 
- project_name – Optional project name to track the metric in for the cases when there are no parent span/trace to inherit project name from. 
- seed – Optional seed value for reproducible model generation. If provided, this seed will be passed to the model for deterministic outputs. 
- temperature – Optional temperature value for model generation. If provided, this temperature will be passed to the model. If not provided, the model’s default temperature will be used. 
 
 - Example - >>> from opik.evaluation.metrics import ContextRecall >>> context_recall_metric = ContextRecall() >>> result = context_recall_metric.score("What's the capital of France?", "The capital of France is Paris.", "Paris", ["France is a country in Europe."]) >>> print(result.value) 0.9 >>> print(result.reason) The LLM's response is highly accurate, correctly identifying 'Paris' as the capital of France and aligning with the expected answer ... - score(input: str, output: str, expected_output: str, context: List[str], **ignored_kwargs: Any) ScoreResult¶
- Calculate the context recall score for the given input-output pair. - Parameters:
- input – The input text to be evaluated. 
- output – The output text to be evaluated. 
- expected_output – The expected output for the given input. 
- context – A list of context strings relevant to the input. 
- **ignored_kwargs – Additional keyword arguments that are ignored. 
 
- Returns:
- A ScoreResult object containing the context recall score (between 0.0 and 1.0) and a reason for the score. 
- Return type:
 
 - async ascore(input: str, output: str, expected_output: str, context: List[str], **ignored_kwargs: Any) ScoreResult¶
- Asynchronously calculate the context recall score for the given input-output pair. - This method is the asynchronous version of - score(). For detailed documentation, please refer to the- score()method.- Parameters:
- input – The input text to be evaluated. 
- output – The output text to be evaluated. 
- expected_output – The expected output for the given input. 
- context – A list of context strings relevant to the input. 
- **ignored_kwargs – Additional keyword arguments that are ignored. 
 
- Returns:
- A ScoreResult object with the context recall score and reason. 
- Return type: