{"id":8219,"date":"2023-11-30T06:18:54","date_gmt":"2023-11-30T14:18:54","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8219"},"modified":"2025-04-24T17:04:06","modified_gmt":"2025-04-24T17:04:06","slug":"evaluating-rag-pipelines-with-ragas","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/","title":{"rendered":"Evaluating RAG Pipelines With\u00a0ragas"},"content":{"rendered":"\n<section class=\"section section--body\">\n<h2 class=\"section-divider\"><span style=\"color: var(--wpex-heading-color); font-size: var(--wpex-text-2xl); font-weight: var(--wpex-heading-font-weight); font-family: var(--wpex-body-font-family, var(--wpex-font-sans));\">A Guide to Metrics and Stuffing Strategy Assessment<\/span><\/h2>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<figure class=\"graf graf--figure\">\n<\/figure><\/div><\/div><\/section>\n\n\n\n<figure class=\"wp-block-image alignnone graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EqFwjpaB-G0n9Rzd\" alt=\"Evaluating RAG Pipelines With\u00a0ragas, Comet ML, CometLLM\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@austindistel?utm_source=medium&amp;utm_medium=referral\">Austin Distel<\/a> on\u00a0<a href=\"http:\/\/Unsplash.com\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">In this post, you will learn how to set up and evaluate Retrieval-Augmented Generation (<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/deci.ai\/blog\/retrieval-augmented-generation-using-langchain\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/deci.ai\/blog\/retrieval-augmented-generation-using-langchain\/\">RAG<\/a>) pipelines using LangChain.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">You will explore the impact of different chain types\u200a\u2014\u200aMap Reduce, Stuff, Refine, and Re-rank\u200a\u2014\u200aon the performance of your RAG pipeline. This guide is a practical introduction to using the <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/explodinggradients\/ragas\/tree\/main\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/explodinggradients\/ragas\/tree\/main\"><strong class=\"markup--strong markup--p-strong\">ragas<\/strong><\/a> library for RAG pipeline evaluation. Starting with fundamental concepts, you\u2019ll learn how different configurations affect your results.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">This post is designed for those with a technical background in natural language processing and AI, offering detailed guidance on optimizing and evaluating RAG pipelines for improved performance.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">It\u2019s a bare-bones blog, and my thoughts are not yet fully fleshed out. But this is an excellent introduction to using the <code class=\"markup--code markup--p-code\">ragas<\/code> library for evaluating your RAG pipelines. In the future, I would like to look at assessing the <code class=\"markup--code markup--p-code\">chunk_size<\/code> for the RAG metrics. <strong class=\"markup--strong markup--p-strong\">If you\u2019re interested in working with me on a more extensive study of the concepts introduced in this blog, please reach out!<\/strong><\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Start by getting some preliminaries out of the way:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">%%capture\n!pip install langchain openai tiktoken faiss-cpu ragas\n\n<span class=\"hljs-keyword\">import<\/span> os\n<span class=\"hljs-keyword\">import<\/span> getpass\n\nos.environ[<span class=\"hljs-string\">\"OPENAI_API_KEY\"<\/span>] = getpass.getpass(<span class=\"hljs-string\">\"Enter Your OpenAI API Key:\"<\/span>)\n\n<span class=\"hljs-keyword\">from<\/span> langchain.embeddings.openai <span class=\"hljs-keyword\">import<\/span> OpenAIEmbeddings\n<span class=\"hljs-keyword\">from<\/span> langchain.vectorstores <span class=\"hljs-keyword\">import<\/span> FAISS\n<span class=\"hljs-keyword\">from<\/span> langchain.text_splitter <span class=\"hljs-keyword\">import<\/span> RecursiveCharacterTextSplitter\n<span class=\"hljs-keyword\">from<\/span> langchain.chat_models <span class=\"hljs-keyword\">import<\/span> ChatOpenAI\n<span class=\"hljs-keyword\">from<\/span> langchain.chains <span class=\"hljs-keyword\">import<\/span> RetrievalQA\n<span class=\"hljs-keyword\">from<\/span> langchain.document_loaders <span class=\"hljs-keyword\">import<\/span> WebBaseLoader<\/span><\/pre>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Now, download some text files that will be put into a vector database. I\u2019ve written about this in other blogs. Feel free to check them out if you do not understand what\u2019s happening below.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">In a nutshell, the code loads some websites into document objects, splits them into chunks of 1000 characters each, and puts those chunks into a vector database.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\"># The websites for the text you want to load<\/span>\nwebsites = [\n    <span class=\"hljs-string\">\"https:\/\/www.gutenberg.org\/files\/56075\/56075-h\/56075-h.htm#SENECA_OF_A_HAPPY_LIFE\"<\/span>,\n    <span class=\"hljs-string\">\"https:\/\/www.gutenberg.org\/files\/56075\/56075-h\/56075-h.htm#SENECA_OF_ANGER\"<\/span>,\n    <span class=\"hljs-string\">\"https:\/\/www.gutenberg.org\/files\/10661\/10661-h\/10661-h.htm\"<\/span>,\n    <span class=\"hljs-string\">\"https:\/\/www.gutenberg.org\/cache\/epub\/17183\/pg17183-images.html\"<\/span>\n    ]\n\n<span class=\"hljs-comment\"># Use the WebBaseLoader to create Document objects for each webside<\/span>\nweb_loader = WebBaseLoader(websites)\nweb_docs = web_loader.load()\n\n<span class=\"hljs-comment\"># Instantiate a text splitter<\/span>\ntext_splitter = RecursiveCharacterTextSplitter(\n    chunk_size=<span class=\"hljs-number\">1000<\/span>,\n    chunk_overlap=<span class=\"hljs-number\">50<\/span>\n    )\n\n\nweb_texts = text_splitter.split_documents(web_docs)\n\nweb_db = FAISS.from_documents(\n    web_texts,\n    embeddings,\n)<\/span><\/pre>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Here\u2019s an example of what one chunk of the split text looks like:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">web_texts[<span class=\"hljs-number\">42<\/span>]<\/span><\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">Document(page_content=\u201929\\nCHAPTER II.\\nSEVERAL SORTS OF BENEFITS.\\nWe shall divide benefits into absolute and vulgar;\\r\\nthe one appertaining to good life, the other is only\\r\\nmatter of commerce. The former are the more excellent,\\r\\nbecause they can never be made void; whereas\\r\\nall material benefits are tossed back and forward,\\r\\nand change their master. There are some offices\\r\\nthat look like benefits, but are only desirable conveniences,\\r\\nas wealth, etc., and these a wicked man\\r\\nmay receive from a good, or a good man from an\\r\\nevil. Others, again, that bear the face of injuries,\\r\\nwhich are only benefits ill taken; as cutting, lancing,\\r\\nburning, under the hand of a surgeon. The greatest\\r\\nbenefits of all are those of good education, which\\r\\nwe receive from our parents, either in the state of\\r\\nignorance or perverseness; as, their care and tenderness\\r\\nin our infancy; their discipline in our childhood,\\r\\nto keep us to our duties by fear; and, if fair means\\r\\nwill not do, their proceeding afterwards to severity\u2019, metadata={\u2018source\u2019: \u2018<a class=\"markup--anchor markup--blockquote-anchor\" href=\"https:\/\/www.gutenberg.org\/files\/56075\/56075-h\/56075-h.htm#SENECA_OF_A_HAPPY_LIFE\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/www.gutenberg.org\/files\/56075\/56075-h\/56075-h.htm#SENECA_OF_A_HAPPY_LIFE\">https:\/\/www.gutenberg.org\/files\/56075\/56075-h\/56075-h.htm#SENECA_OF_A_HAPPY_LIFE<\/a>', \u2018title\u2019: \u201cThe Project Gutenberg eBook of Seneca\u2019s Morals of a Happy Life, Benefits, Anger and Clemency, by Lucius Annaeus Seneca\u201d, \u2018language\u2019: \u2018No language found.\u2019})<\/pre>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">The code below sets up a RetrievalQA chain, which will be used to retrieve documents using one of four strategies:<\/h4>\n\n\n\n<ol class=\"wp-block-list postList\">\n<li>Map reduce<\/li>\n\n\n\n<li>Stuff<\/li>\n\n\n\n<li>Refine<\/li>\n\n\n\n<li>Re-rank<\/li>\n<\/ol>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">For this blog post, you\u2019ll use the following question to retrieve relevant documents from the vector database.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">question = <span class=\"hljs-string\">\"What does Seneca say are the qualities of a happy life and how can happiness be achieved?\"<\/span><\/span><\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">get_chain_result<\/span>(<span class=\"hljs-params\">chain_type, llm, retriever, question<\/span>):\n    <span class=\"hljs-string\">\"\"\"\n    Initialize a chain of the specified type and invoke it with the given question.\n\n    Parameters:\n    - chain_type (str): The type of the chain (e.g., \"map_reduce\", \"stuff\", \"refine\", \"map_rerank\").\n    - llm: The language model.\n    - retriever: The retriever object.\n    - question (str): The question to be asked.\n\n    Returns:\n    - dict: The result of invoking the chain with the question.\n    \"\"\"<\/span>\n    chain = RetrievalQA.from_chain_type(\n        llm=llm,\n        chain_type=chain_type,\n        retriever=retriever,\n        verbose=<span class=\"hljs-literal\">True<\/span>,\n        return_source_documents=<span class=\"hljs-literal\">True<\/span>\n    )\n    result = chain.invoke(question)\n    <span class=\"hljs-keyword\">return<\/span> result\n\nretriever = web_db.as_retriever()<\/span><\/pre>\n\n\n\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<blockquote class=\"graf graf--pullquote\"><p>Want to learn how to build modern software with LLMs using the newest tools and techniques in the field? <a class=\"markup--anchor markup--pullquote-anchor\" href=\"https:\/\/www.comet.com\/production\/site\/llm-course\/?utm_source=Heartbeat&amp;utm_medium=referral&amp;utm_content=Medium&amp;utm_campaign=Heartbeat_LangChain_Series_HS\" target=\"_blank\" rel=\"noopener ugc nofollow\" data-href=\"https:\/\/www.comet.com\/production\/site\/llm-course\/?utm_source=Heartbeat&amp;utm_medium=referral&amp;utm_content=Medium&amp;utm_campaign=Heartbeat_LangChain_Series_HS\">Check out this free LLMOps course<\/a> from industry expert Elvis Saravia of&nbsp;DAIR.AI!<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/section>\n\n\n\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<h3 class=\"graf graf--h3\">Map Reduce<\/h3>\n<p>Consists of a map step, where each document is individually summarized, and a reduce step where these mini-summaries are combined. An optional compression step can be added.<\/p>\n<figure class=\"graf graf--figure\">\n<\/figure><\/div><\/div><\/section>\n\n\n\n<figure class=\"wp-block-image alignnone graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*i7cwwv_7B_mruY8B\" alt=\"Evaluating RAG Pipelines With\u00a0ragas, Comet ML, CometLLM, old map of Europe\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@britishlibrary?utm_source=medium&amp;utm_medium=referral\">British Library<\/a> on\u00a0<a href=\"http:\/\/Unsplash.com\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">This method runs an initial prompt on each chunk of data and then uses a different prompt to combine all the initial outputs.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\"><code class=\"markup--code markup--p-code\">map_reduce<\/code> separates texts into batches, where you can define the batch size (i.e. <code class=\"markup--code markup--p-code\">llm=OpenAI(batch_size=5)<\/code>).<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">It feeds each batch with the question to LLM separately and comes up with the final answer based on the answers from each batch.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\"><strong class=\"markup--strong markup--p-strong\">Pros:<\/strong> It can scale to more documents and documents of larger length. Since the calls to the LLM are on independent, individual documents they can be parallelized.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\"><strong class=\"markup--strong markup--p-strong\">Cons:<\/strong> This requires more calls to the LLM. You can also get some information during the final combined call.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The code below will run a<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">map_reduce_result = get_chain_result(<span class=\"hljs-string\">\"map_reduce\"<\/span>, llm, retriever, question)\nmap_reduce_result[<span class=\"hljs-string\">'result'<\/span>]<\/span><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">According to the provided text, Seneca views wisdom and virtue as the qualities of a happy life. Happiness can be achieved by first understanding what one ought to do (wisdom) and then living in accordance with that knowledge (virtue).<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Stuff<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Takes multiple small documents and combines them into a single prompt for the LLM. It is cost-efficient because it involves only one request to the LLM.<\/p>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image alignnone graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*iYLl4JdPVwRyMePM\" alt=\"Evaluating RAG Pipelines With\u00a0ragas, Comet ML, CometLLM, a collection of random objects including a dirty shoe, a coconut, a video game controller, an Iphone, a half lemon, eyeglasses, gloves, a white bowl, Apple headphones, a pen container, a paint marker, a wallet with Euros, a small tripod, a lighter, and a tablet\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@luca_tism?utm_source=medium&amp;utm_medium=referral\">Luca Laurence<\/a> on\u00a0<a href=\"http:\/\/Unsplash.com\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The <code class=\"markup--code markup--p-code\">chain_type<\/code> &#8220;stuff&#8221; is the simplest method. It uses all related text from the documents as the context in the prompt to the LLM.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Pros: You only make a single call to the LLM. And when it generates text, it has access to all the data simultaneously. It\u2019s cost-efficient since you only make one call to the LLM<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Cons: This can sometimes exceed the token limit for a LLM.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Use this when you have a bunch of small documents and not too many of them.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">stuff_result = get_chain_result(<span class=\"hljs-string\">\"stuff\"<\/span>, llm, retriever, question)\nstuff_result[<span class=\"hljs-string\">'result'<\/span>]<\/span><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Seneca states that happiness is founded upon wisdom and virtue. To achieve happiness, one must first know what they ought to do and then live according to that knowledge. He emphasizes the importance of philosophy and precept as aids toward a happy life, as well as the blessing of a good conscience. According to Seneca, a good man can never be miserable, nor a wicked man happy, and no man is unfortunate if he cheerfully submits to Providence.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Refine<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><em class=\"markup--em markup--blockquote-em\">Looks at each document individually and updates its answer with each new document.<\/em><\/p>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image alignnone graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*pGoBPTSQV7XfX3tf\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@john_jennings?utm_source=medium&amp;utm_medium=referral\">John Jennings<\/a> on\u00a0<a href=\"http:\/\/Unsplash.com\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">It is useful when there are too many documents, but it can be slow and confusing if they reference each other.<\/p>\n\n\n\n<p class=\"graf graf--p graf--startsWithDoubleQuote wp-block-paragraph\">\u201crefine\u201d also separates texts into batches, but it feeds the first batch to LLM, the answer, and the second to LLM.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">It refines the answer by going through all the batches.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">refine_results = get_chain_result(<span class=\"hljs-string\">\"refine\"<\/span>, llm, retriever, question)\nrefine_results[<span class=\"hljs-string\">'result'<\/span>]<\/span><\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">With the additional context from Seneca\u2019s Chapter IX, \u201cOf Levity of Mind, and Other Impediments of a Happy Life,\u201d we gain a clearer understanding of Seneca\u2019s perspective on the qualities of a happy life and how happiness can be achieved. The refined answer, incorporating the new information, would be as follows:<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">Seneca, the Stoic philosopher, posits that the foundation of a happy life is built upon wisdom and virtue. To achieve happiness, one must first comprehend what is right and proper (wisdom), and then live in accordance with that understanding (virtue). Happiness, in Seneca\u2019s view, is not a matter of external circumstances but is intrinsically linked to one\u2019s inner moral state and the quality of one\u2019s character.<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">The guidance of philosophy and the instruction of precepts are instrumental in leading one towards a life of happiness. These help individuals to cultivate the wisdom and virtue necessary to live in harmony with nature and their own rational nature.<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">A critical quality of a happy life, according to Seneca, is the possession of a good conscience. A person who acts rightly will maintain inner peace and contentment, knowing that their actions align with their values and understanding of what is good.<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">Another quality is the cheerful submission to Providence. Seneca emphasizes that a person who accepts their fate and understands the role of Providence in their life will not be disturbed by misfortunes. In Stoic thought, Providence refers to the rational order of the universe, and aligning oneself with this order is key to tranquility.<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">Seneca also warns against a life lived without purpose or design\u200a\u2014\u200alikening it to straws carried aimlessly by the current of a river. Instead, he advocates for a life lived with intention and reflection, considering not just the parts of life but its entirety.<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">In summary, Seneca\u2019s conception of a happy life arises from a deep engagement with philosophy, the cultivation of virtue, living with intention, and maintaining a good conscience. Moreover, he stresses the importance of accepting one\u2019s role in the greater scheme of things (Providence) and understanding that a virtuous person cannot truly be unhappy, just as a wicked person cannot be truly happy. Living with wisdom and virtue ensures that one is not carried haphazardly through life but instead navigates it with purpose and moral clarity.<\/pre>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Map Re-rank<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Tries to get an answer for each document and assigns it a confidence score, picking the highest confidence answer in the end.<\/p>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image alignnone graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*NmNwxKCUh_AomZdp\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@martinsanchez?utm_source=medium&amp;utm_medium=referral\">Martin Sanchez<\/a> on\u00a0<a href=\"http:\/\/Unsplash.com\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p graf--startsWithDoubleQuote wp-block-paragraph\">\u201cmap-rerank\u201d separates texts into batches, feeds each batch to LLM, returns a score of how fully it answers the question, and comes up with the final answer based on the high-scored answers from each batch.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">rerank_results = get_chain_result(<span class=\"hljs-string\">\"map_rerank\"<\/span>, llm, retriever, question)\nrerank_results[<span class=\"hljs-string\">'result'<\/span>]<\/span><\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">Happiness is founded upon wisdom and virtue. One must first know what to do and then live according to that knowledge. Philosophy and precept are helpful toward a happy life, as is the blessing of a good conscience. A good man can never be miserable, nor a wicked man happy. Happiness is also achieved by cheerfully submitting to Providence.<\/pre>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Introduction to&nbsp;ragas<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The ragas library evaluates Retrieval-Augmented Generation (RAG) pipelines, particularly within the LangChain framework.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">It utilizes Large Language Models (LLMs) to conduct evaluations across various metrics, each addressing a specific aspect of the RAG pipeline\u2019s performance:<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">1. Faithfulness: Assesses the factual accuracy of the generated answer about the provided context.<br>\n2. Answer Relevancy: Measures the relevance of the generated answer to the posed question.<br>\n3. Context Relevancy: Evaluate the signal-to-noise ratio in retrieved contexts.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Ragas uses different methods to leverage LLMs, effectively measuring these aspects while overcoming inherent biases. The library stands out for its ability to provide nuanced insights into the effectiveness of RAG pipelines. It is a valuable tool for developers and researchers working on enhancing the accuracy and relevance of AI-generated content.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">In the code snippet below, the Ragas library evaluates different strategies in a RAG pipeline.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The code imports metrics such as faithfulness, answer relevancy, and context relevancy and then sets up evaluator chains for each metric using <code class=\"markup--code markup--p-code\">RagasEvaluatorChain<\/code>. These chains are stored in a dictionary named <code class=\"markup--code markup--p-code\">eval_chains<\/code>. The function <code class=\"markup--code markup--p-code\">evaluate_strategy<\/code> takes a strategy, and the evaluator chains as inputs and computes scores for each metric. It iterates over each evaluation chain, applies it to the given strategy, and stores the resulting scores in a dictionary, which it returns.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">This allows for a comprehensive evaluation of a strategy across multiple relevant metrics.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-keyword\">from<\/span> ragas.metrics <span class=\"hljs-keyword\">import<\/span> faithfulness, answer_relevancy, context_relevancy\n<span class=\"hljs-keyword\">from<\/span> ragas.langchain <span class=\"hljs-keyword\">import<\/span> RagasEvaluatorChain\n\neval_chains = {\n    m.name: RagasEvaluatorChain(metric=m)\n    <span class=\"hljs-keyword\">for<\/span> m <span class=\"hljs-keyword\">in<\/span> [faithfulness, answer_relevancy, context_relevancy]\n}\n\n<span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">evaluate_strategy<\/span>(<span class=\"hljs-params\">strategy, eval_chains<\/span>):\n    <span class=\"hljs-string\">\"\"\"\n    Evaluate a given strategy using the provided evaluation chains.\n\n    Parameters:\n    - strategy (dict): The strategy to be evaluated.\n    - eval_chains (dict): A dictionary of evaluation chains.\n\n    Returns:\n    - dict: A dictionary containing scores for the given strategy.\n    \"\"\"<\/span>\n    scores = {}\n    <span class=\"hljs-keyword\">for<\/span> name, eval_chain <span class=\"hljs-keyword\">in<\/span> eval_chains.items():\n        score_name = <span class=\"hljs-string\">f\"<span class=\"hljs-subst\">{name}<\/span>_score\"<\/span>\n        <span class=\"hljs-comment\">#Evaluate the strategy using the eval_chain<\/span>\n        evaluation_result = eval_chain(strategy)\n        <span class=\"hljs-comment\">#Retrieve the specific score from the evaluation result<\/span>\n        specific_score = evaluation_result[score_name]\n        <span class=\"hljs-comment\">#Store this score in the scores dictionary<\/span>\n        scores[score_name] = specific_score\n    <span class=\"hljs-keyword\">return<\/span> scores\n\n<span class=\"hljs-comment\"># List of strategies<\/span>\nstrategies = [stuff_result, refine_results, rerank_results, map_reduce_result]\nstrategy_names = [<span class=\"hljs-string\">'stuff_result'<\/span>, <span class=\"hljs-string\">'refine_results'<\/span>, <span class=\"hljs-string\">'rerank_results'<\/span>, <span class=\"hljs-string\">'map_reduce_result'<\/span>]\n\n<span class=\"hljs-comment\"># Collect scores for each strategy<\/span>\nresults = []\n<span class=\"hljs-keyword\">for<\/span> strategy, strategy_name <span class=\"hljs-keyword\">in<\/span> <span class=\"hljs-built_in\">zip<\/span>(strategies, strategy_names):\n    scores = evaluate_strategy(strategy, eval_chains)\n    scores[<span class=\"hljs-string\">'result_type'<\/span>] = strategy_name  <span class=\"hljs-comment\"># Add the strategy name as 'result_type'<\/span>\n    results.append(scores)<\/span><\/pre>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The following code is a helper function to plot the resulting metrics:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">plot_evaluation_score<\/span>(<span class=\"hljs-params\">results, metric<\/span>):\n    <span class=\"hljs-string\">\"\"\"\n    Plot evaluation scores for a specific metric from a list of dictionaries.\n\n    Parameters:\n    - results (list): A list of dictionaries containing evaluation scores.\n    - metric (str): The specific metric to plot.\n    \"\"\"<\/span>\n    <span class=\"hljs-comment\"># Extract scores for the given metric<\/span>\n    scores = [result[metric] <span class=\"hljs-keyword\">for<\/span> result <span class=\"hljs-keyword\">in<\/span> results]\n\n    <span class=\"hljs-comment\"># Plotting<\/span>\n    plt.figure(figsize=(<span class=\"hljs-number\">10<\/span>, <span class=\"hljs-number\">6<\/span>))\n    plt.bar(<span class=\"hljs-built_in\">range<\/span>(<span class=\"hljs-built_in\">len<\/span>(results)), scores, tick_label=[r[<span class=\"hljs-string\">'result_type'<\/span>] <span class=\"hljs-keyword\">for<\/span> r <span class=\"hljs-keyword\">in<\/span> results])\n\n    <span class=\"hljs-comment\"># Customizing the chart<\/span>\n    plt.title(<span class=\"hljs-string\">f'Evaluation Scores for <span class=\"hljs-subst\">{metric}<\/span>'<\/span>)\n    plt.ylabel(<span class=\"hljs-string\">'Score'<\/span>)\n    plt.xlabel(<span class=\"hljs-string\">'Strategy'<\/span>)\n    plt.xticks(rotation=<span class=\"hljs-number\">45<\/span>)\n    plt.tight_layout()\n\n    <span class=\"hljs-comment\"># Display the chart<\/span>\n    plt.show()<\/span><\/pre>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Now, it\u2019s time to assess the performance of each strategy.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Faithfulness Metric<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The faithfulness metric is integral to evaluating the reliability and accuracy of language model outputs in question-answering scenarios.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">It ensures that the information provided by AI systems is contextually relevant and factually accurate, which is paramount in applications where decision-making relies on the model\u2019s outputs.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Definition and&nbsp;Purpose<\/h4>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\"><strong class=\"markup--strong markup--p-strong\">Faithfulness<\/strong> in the context of language models, especially in question-answering systems, measures how accurately and reliably the model\u2019s generated answer adheres to the given context or source material. Ensuring the model\u2019s outputs are relevant, factually correct and consistent with the provided information is crucial.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Calculation Process<\/h4>\n\n\n\n<ol class=\"wp-block-list postList\">\n<li><strong class=\"markup--strong markup--li-strong\">Statement Generation and Identification<\/strong>: The process begins using a Large Language Model (LLM) to analyze the generated answer. This step involves identifying or extracting key statements or assertions in the response. These statements are the crux of the answer and are what the model claims to be true in response to the given question.<\/li>\n\n\n\n<li><strong class=\"markup--strong markup--li-strong\">Statement Verification<\/strong>: Another critical step involves verifying these extracted statements against the provided context or source material. This verification is usually done using another LLM or advanced NLP technique. The purpose is to check each statement for accuracy and alignment with the context. This step can involve cross-referencing the statements with facts or data in the context, ensuring that the model\u2019s responses are plausible and factually grounded.<\/li>\n<\/ol>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Scoring Mechanism<\/h4>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The faithfulness score is quantified on a scale from 0 to 1, where 0 indicates complete unfaithfulness (none of the statements align with the context), and 1 indicates total faithfulness (the context supports all statements).<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The calculation involves counting the number of statements from the generated answer verified as accurate and consistent with the context and dividing this count by the total number of statements made. This ratio gives a precise, quantifiable measure of how faithful the answer is.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Application in&nbsp;<code class=\"markup--code markup--h4-code\">ragas<\/code><\/h4>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">In the <code class=\"markup--code markup--p-code\">ragas<\/code> framework, as seen from the <code class=\"markup--code markup--p-code\"><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_faithfulness.py\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_faithfulness.py\">_faithfulness.py<\/a><\/code> source code, the <code class=\"markup--code markup--p-code\">Faithfulness<\/code> class incorporates these steps as part of its evaluation chain.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\"><span class=\"hljs-comment\"># Plots the 'faithfulness_score' metric<\/span>\nplot_evaluation_score(results, <span class=\"hljs-string\">'faithfulness_score'<\/span>) <\/span><\/pre>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image alignnone graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*I2aaqdPn4ZN2aaZjs2Uc2A.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The \u2018stuff_result\u2019 and \u2018rerank_results\u2019 strategies achieved perfect scores (1.00), indicating a high level of factual accuracy in the context provided.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">However, the \u2018refine_results\u2019 strategy scored slightly lower, suggesting room for improvement in maintaining factual consistency.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Answer Relevancy<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The Answer Relevancy metric is critical in evaluating language models, particularly in question-answering systems.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Focusing on the semantic alignment between the question and the answer ensures that the generated responses are contextually relevant and specifically address the query. This metric is crucial for applications where the accuracy and relevance of LLM-generated responses are paramount.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Definition and&nbsp;Purpose<\/h4>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\"><strong class=\"markup--strong markup--p-strong\">Answer Relevancy<\/strong> is a metric to evaluate how well a generated answer corresponds to a given question or prompt. It assesses the relevance of the content of the answer in the context of the question, ensuring that the response is accurate and directly addresses the query.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Key Attributes<\/h4>\n\n\n\n<ul class=\"wp-block-list postList\">\n<li><strong class=\"markup--strong markup--li-strong\">Scoring Range<\/strong>: The metric provides a score ranging from 0 to 1, with 1 indicating the highest relevance. A higher score signifies that the answer is more directly relevant to the question posed.<\/li>\n\n\n\n<li><strong class=\"markup--strong markup--li-strong\">Completeness and Redundancy<\/strong>: The metric evaluates whether the answer is complete and lacks redundant or unnecessary information. This assessment ensures that the answer is concise and focused on the question.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Calculation Process<\/h4>\n\n\n\n<ul class=\"wp-block-list postList\">\n<li><strong class=\"markup--strong markup--li-strong\">Semantic Analysis via Embeddings<\/strong>: The metric uses embeddings for semantic analysis, which compares the semantic content of the answer to the question.<\/li>\n\n\n\n<li><strong class=\"markup--strong markup--li-strong\">Direct Comparison of Question and Answer<\/strong>: The calculation directly compares the actual question and the generated answer. This comparison assesses how well the answer\u2019s content aligns with the question\u2019s subject matter.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Application in&nbsp;<code class=\"markup--code markup--h4-code\">ragas<\/code><\/h4>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">In the <code class=\"markup--code markup--p-code\">ragas<\/code> framework, the <code class=\"markup--code markup--p-code\"><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_answer_relevance.py\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_answer_relevance.py\">AnswerRelevancy<\/a><\/code> metric forms an integral part of evaluating the performance of AI-generated answers.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The metric ensures that the answers the system provides are contextually appropriate and specifically tailored to the queries they are responding to. Embeddings are a sophisticated approach to understanding the nuances of language and ensuring that the relevancy of answers is accurately captured.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">As suggested by the <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_answer_relevance.py\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_answer_relevance.py\">source code<\/a>, the <code class=\"markup--code markup--p-code\">ragas<\/code> implementation indicates an advanced use of semantic analysis and NLP techniques to automate and scale this evaluation process.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">plot_evaluation_score(results, <span class=\"hljs-string\">'answer_relevancy_score'<\/span>) <\/span><\/pre>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image alignnone graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*mnFgFF1izC-_r54WqDZq0Q.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The \u2018refine_results\u2019 strategy outperformed others with a score of approximately 0.949, indicating its effectiveness in providing relevant answers to the questions.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The \u2018rerank_results\u2019 strategy scored the lowest, suggesting a potential mismatch between the questions and the provided answers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Context Relevancy<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The Context Relevancy metric is critical in evaluating language models, especially in question-answering systems, ensuring that the answers are grounded and supported by relevant context.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">By focusing on the relevancy of individual sentences within the context, the metric provides a detailed and nuanced assessment of the context\u2019s usefulness to the question.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Definition and&nbsp;Purpose<\/h4>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\"><strong class=\"markup--strong markup--p-strong\">Context Relevancy<\/strong> is a metric designed to assess the relevance of the context provided for a given question. It evaluates how well the context supports, relates to, or provides necessary information for answering the question.<\/p>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Key Attributes and&nbsp;Process<\/h4>\n\n\n\n<ul class=\"wp-block-list postList\">\n<li><strong class=\"markup--strong markup--li-strong\">Scoring Range<\/strong>: The metric scores on a scale from 0 to 1, with higher scores indicating greater context relevance to the question.<\/li>\n\n\n\n<li><strong class=\"markup--strong markup--li-strong\">Relevance Evaluation<\/strong>: The metric involves analyzing the sentences in the context to determine their relevance to the question. This process includes (1) Extracting critical sentences from the context and (2) Evaluating these sentences for their direct relevance and support to the question.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Calculation Method<\/h4>\n\n\n\n<ul class=\"wp-block-list postList\">\n<li><strong class=\"markup--strong markup--li-strong\">Sentence Relevance<\/strong>: The <a class=\"markup--anchor markup--li-anchor\" data-href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_context_relevancy.py\" href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_context_relevancy.py\" target=\"_blank\" rel=\"nofollow noopener\">source code<\/a> indicates that the number of relevant sentences is used for scoring. This suggests a quantitative approach where the context is evaluated sentence by sentence.<\/li>\n\n\n\n<li><strong class=\"markup--strong markup--li-strong\">Contextual Alignment<\/strong>: The assessment involves checking whether each sentence contributes meaningfully to answering the question or is aligned with the topic of the question.<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading graf graf--h4\">Application in&nbsp;<code class=\"markup--code markup--h4-code\">ragas<\/code><\/h4>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Within the <code class=\"markup--code markup--p-code\">ragas<\/code> framework, the <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_context_relevancy.py\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_context_relevancy.py\">Context Relevancy<\/a> metric is crucial to ensuring that the AI-generated answers are based on contextually appropriate and relevant information.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">As suggested by the <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_context_relevancy.py\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/github.com\/explodinggradients\/ragas\/blob\/main\/src\/ragas\/metrics\/_context_relevancy.py\">source code<\/a>, the <code class=\"markup--code markup--p-code\">ragas<\/code> implementation indicates an advanced use of language model capabilities to automate and refine this evaluation process, enhancing the accuracy and reliability of the system&#8217;s outputs.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"pre--content\">plot_evaluation_score(results, <span class=\"hljs-string\">'context_relevancy_score'<\/span>) <\/span><\/pre>\n\n\n\n<figure class=\"graf graf--figure\">\n<\/figure>\n\n\n\n<figure class=\"wp-block-image alignnone graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*EbW9BtRq0X3f6rIyePuGSg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Image by author<\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">All strategies scored identically (around 0.037), indicating a consistent but lower ability to filter out irrelevant context across all strategies.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">These results provide valuable insights into the strengths and weaknesses of each RAG pipeline strategy, with implications for their optimization in real-world applications.<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Conclusion<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">To wrap up, this exploration of the ragas library has equipped you with a clear understanding of how different RAG pipeline strategies perform against critical metrics.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">With these insights, you can refine your language models to enhance factual accuracy and relevancy, ensuring that your AI systems deliver precise and reliable outputs. Remember, practical evaluation is critical to advancing the capabilities of LLMs for Retrieval Augmented Generation.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">The findings here serve as a testament to ragas\u2019 utility in achieving that goal.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A Guide to Metrics and Stuffing Strategy Assessment In this post, you will learn how to set up and evaluate Retrieval-Augmented Generation (RAG) pipelines using LangChain. You will explore the impact of different chain types\u200a\u2014\u200aMap Reduce, Stuff, Refine, and Re-rank\u200a\u2014\u200aon the performance of your RAG pipeline. This guide is a practical introduction to using the [&hellip;]<\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[65,7],"tags":[70,71,52,31,34],"coauthors":[166],"class_list":["post-8219","post","type-post","status-publish","format-standard","hentry","category-llmops","category-tutorials","tag-langchain","tag-language-models","tag-llm","tag-llmops","tag-prompt-engineering"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Evaluating RAG Pipelines With\u00a0ragas - Comet<\/title>\n<meta name=\"description\" content=\"Learn how to set up and evaluate Retrieval-Augmented Generation (RAG) pipelines using LangChain and ragas.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Evaluating RAG Pipelines With\u00a0ragas\" \/>\n<meta property=\"og:description\" content=\"Learn how to set up and evaluate Retrieval-Augmented Generation (RAG) pipelines using LangChain and ragas.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-30T14:18:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:04:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EqFwjpaB-G0n9Rzd\" \/>\n<meta name=\"author\" content=\"Harpreet Sahota\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Harpreet Sahota\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"17 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Evaluating RAG Pipelines With\u00a0ragas - Comet","description":"Learn how to set up and evaluate Retrieval-Augmented Generation (RAG) pipelines using LangChain and ragas.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/","og_locale":"en_US","og_type":"article","og_title":"Evaluating RAG Pipelines With\u00a0ragas","og_description":"Learn how to set up and evaluate Retrieval-Augmented Generation (RAG) pipelines using LangChain and ragas.","og_url":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-11-30T14:18:54+00:00","article_modified_time":"2025-04-24T17:04:06+00:00","og_image":[{"url":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EqFwjpaB-G0n9Rzd","type":"","width":"","height":""}],"author":"Harpreet Sahota","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Harpreet Sahota","Est. reading time":"17 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/"},"author":{"name":"Harpreet Sahota","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/46036ab474aa916e2873daece26a28d6"},"headline":"Evaluating RAG Pipelines With\u00a0ragas","datePublished":"2023-11-30T14:18:54+00:00","dateModified":"2025-04-24T17:04:06+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/"},"wordCount":2224,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EqFwjpaB-G0n9Rzd","keywords":["LangChain","Language Models","LLM","LLMOps","Prompt Engineering"],"articleSection":["LLMOps","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/","url":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/","name":"Evaluating RAG Pipelines With\u00a0ragas - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EqFwjpaB-G0n9Rzd","datePublished":"2023-11-30T14:18:54+00:00","dateModified":"2025-04-24T17:04:06+00:00","description":"Learn how to set up and evaluate Retrieval-Augmented Generation (RAG) pipelines using LangChain and ragas.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/#primaryimage","url":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EqFwjpaB-G0n9Rzd","contentUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*EqFwjpaB-G0n9Rzd"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/evaluating-rag-pipelines-with-ragas\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Evaluating RAG Pipelines With\u00a0ragas"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/46036ab474aa916e2873daece26a28d6","name":"Harpreet Sahota","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/2d21512be19ba7e19a71a803309e2a88","url":"https:\/\/secure.gravatar.com\/avatar\/a6ca5a533fc9f143a0a7428037ff652aa0633d66bf27e76ae89b955ae72a0f2d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a6ca5a533fc9f143a0a7428037ff652aa0633d66bf27e76ae89b955ae72a0f2d?s=96&d=mm&r=g","caption":"Harpreet Sahota"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/theartistsofdatasciencegmail-com\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8219","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8219"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8219\/revisions"}],"predecessor-version":[{"id":15433,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8219\/revisions\/15433"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8219"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8219"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8219"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8219"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}