Moderation
The Moderation metric allows you to evaluate the appropriateness of the LLM’s response to the given LLM output. It does this by asking the LLM to rate the appropriateness of the response on a scale of 1 to 10, where 1 is the least appropriate and 10 is the most appropriate.
How to use the Moderation metric
You can use the Moderation metric as follows:
Asynchronous scoring is also supported with the ascore method in Python and score method in TypeScript (which is always async).
The moderation score is a float between 0 and 1. A score of 0 indicates
that the content was deemed safe, a score of 1 indicates that the content
was deemed unsafe.
Moderation Prompt
Opik uses an LLM as a Judge to moderate content, for this we have a prompt template that is used to generate the prompt for the LLM. By default, the gpt-4o model is used to detect hallucinations but you can change this to any model supported by LiteLLM by setting the model parameter. You can learn more about customizing models in the Customize models for LLM as a Judge metrics section.
The template uses a few-shot prompting technique to detect moderation issues. The template is as follows:
with VERDICT_KEY being moderation_score and REASON_KEY being reason.