Ragas | Opik Documentation

The Opik SDK provides a simple way to integrate with Ragas, a framework for evaluating RAG systems.

There are two main ways to use Ragas with Opik:

Using Ragas to score traces or spans.
Using Ragas to evaluate a RAG pipeline.

You can check out the Colab Notebook if you’d like to jump straight to the code:

Getting started

You will first need to install the opik and ragas packages:

$ pip install opik ragas

In addition, you can configure Opik using the opik configure command which will prompt you for the correct local server address or if you are using the Cloud platform your API key:

$ opik configure

Using Ragas to score traces or spans

Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, a full list of the supported metrics can be found in the Ragas documentation.

You can use the RagasMetricWrapper to easily integrate Ragas metrics with Opik tracking:

1 # Import the required dependencies
2 from ragas.metrics import AnswerRelevancy
3 from langchain_openai.chat_models import ChatOpenAI
4 from langchain_openai.embeddings import OpenAIEmbeddings
5 from ragas.llms import LangchainLLMWrapper
6 from ragas.embeddings import LangchainEmbeddingsWrapper
7 from opik.evaluation.metrics import RagasMetricWrapper
8 
9 # Initialize the Ragas metric
10 llm = LangchainLLMWrapper(ChatOpenAI())
11 emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings())
12 ragas_answer_relevancy = AnswerRelevancy(llm=llm, embeddings=emb)
13 
14 # Wrap the Ragas metric with RagasMetricWrapper for Opik integration
15 answer_relevancy_metric = RagasMetricWrapper(
16     ragas_answer_relevancy,
17     track=True  # This enables automatic tracing in Opik
18 )

Once the metric wrapper is set up, you can use it to score traces or spans:

1 from opik import track
2 from opik.opik_context import update_current_trace
3 
4 @track
5 def retrieve_contexts(question):
6     # Define the retrieval function, in this case we will hard code the contexts
7     return ["Paris is the capital of France.", "Paris is in France."]
8 
9 @track
10 def answer_question(question, contexts):
11     # Define the answer function, in this case we will hard code the answer
12     return "Paris"
13 
14 @track
15 def rag_pipeline(question):
16     # Define the pipeline
17     contexts = retrieve_contexts(question)
18     answer = answer_question(question, contexts)
19 
20     # Score the pipeline using the RagasMetricWrapper
21     score_result = answer_relevancy_metric.score(
22         user_input=question,
23         response=answer,
24         retrieved_contexts=contexts
25     )
26 
27     # Add the score to the current trace
28     update_current_trace(
29         feedback_scores=[{"name": score_result.name, "value": score_result.value}]
30     )
31 
32     return answer
33 
34 print(rag_pipeline("What is the capital of France?"))

In the Opik UI, you will be able to see the full trace including the score calculation:

Using Ragas metrics to evaluate a RAG pipeline

The RagasMetricWrapper can also be used directly within the Opik evaluation platform. This approach is much simpler than creating custom wrappers:

1. Define the Ragas metric

We will start by defining the Ragas metric, in this example we will use AnswerRelevancy:

1 from ragas.metrics import AnswerRelevancy
2 from langchain_openai.chat_models import ChatOpenAI
3 from langchain_openai.embeddings import OpenAIEmbeddings
4 from ragas.llms import LangchainLLMWrapper
5 from ragas.embeddings import LangchainEmbeddingsWrapper
6 from opik.evaluation.metrics import RagasMetricWrapper
7 
8 # Initialize the Ragas metric
9 llm = LangchainLLMWrapper(ChatOpenAI())
10 emb = LangchainEmbeddingsWrapper(OpenAIEmbeddings())
11 
12 ragas_answer_relevancy = AnswerRelevancy(llm=llm, embeddings=emb)

2. Create the metric wrapper

Simply wrap the Ragas metric with RagasMetricWrapper:

1 # Create the answer relevancy scoring metric
2 answer_relevancy = RagasMetricWrapper(
3     ragas_answer_relevancy,
4     track=True  # Enable tracing for the metric computation
5 )

If you are running within a Jupyter notebook, you will need to add the following line to the top of your notebook:

1 import nest_asyncio
2 nest_asyncio.apply()

3. Use the metric wrapper within the Opik evaluation platform

You can now use the metric wrapper directly within the Opik evaluation platform:

1 from opik.evaluation import evaluate
2 
3 evaluation_task = evaluate(
4     dataset=dataset,
5     task=evaluation_task,
6     scoring_metrics=[answer_relevancy],
7     nb_samples=10,
8 )

The RagasMetricWrapper automatically handles:

Field mapping between Opik and Ragas (e.g., input → user_input, output → response)
Async execution of Ragas metrics
Integration with Opik’s tracing system when track=True
Proper error handling for missing required fields