{"id":12497,"date":"2025-01-13T12:18:28","date_gmt":"2025-01-13T20:18:28","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=12497"},"modified":"2025-04-29T12:16:08","modified_gmt":"2025-04-29T12:16:08","slug":"refactoring-rag-retrieval","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/","title":{"rendered":"Build a scalable RAG ingestion pipeline using 74.3% less code"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>Welcome to Lesson 11 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You\u2019ll learn how to use LLMs, vector DVs, and LLMOps best practices to design, train, and deploy a production ready \u201cLLM twin\u201d of yourself. This AI character will write like you, incorporating your style, personality, and voice into an LLM. For a full overview of course objectives and prerequisites, start with <a href=\"https:\/\/www.comet.com\/site\/blog\/an-end-to-end-framework-for-production-ready-llm-systems-by-building-your-llm-twin\/\">Lesson 1.<\/a><\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Lessons<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/an-end-to-end-framework-for-production-ready-llm-systems-by-building-your-llm-twin\/\">An End-to-End Framework for Production-Ready LLM Systems by Building Your LLM Twin<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/the-importance-of-data-pipelines-in-the-era-of-generative-ai\/\">Your Content is Gold: I Turned 3 Years of Blog Posts into an LLM Training<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/llm-twin-3-change-data-capture\/\">I Replaced 1000 Lines of Polling Code with 50 Lines of CDC Magic<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/streaming-pipelines-for-fine-tuning-llms\/\">SOTA Python Streaming Pipelines for Fine-tuning LLMs and RAG \u2014 in Real-Time!<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/advanced-rag-algorithms-optimize-retrieval\/\">The 4 Advanced RAG Algorithms You Must Know to Implement<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/llm-fine-tuning-dataset\/\">Turning Raw Data Into Fine-Tuning Datasets<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/mistral-llm-fine-tuning\/\">8B Parameters, 1 GPU, No Problems: The Ultimate LLM Fine-tuning Pipeline<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/llm-evaluation-best-practices\/\">The Engineer\u2019s Framework for LLM &amp; RAG Evaluation<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/llm-rag-inference-pipelines\/\">Beyond Proof of Concept: Building RAG Systems That Scale<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/rag-evaluation-framework-ragas\/\">The Ultimate Prompt Monitoring Pipeline<\/a><\/span><\/li>\n\n\n\n<li><a href=\"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/\">[Bonus] Build a scalable RAG ingestion pipeline using 74.3% less code<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/\">[Bonus] Build Multi-Index Advanced RAG Apps<\/a><\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Lessons 11<\/strong> and <strong>12<\/strong> are part of a <strong>bonus series<\/strong> in which we will take the advanced RAG system from the <strong>LLM Twin course<\/strong> (written in LangChain) and refactor it using <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a>, a framework specialized in vector computing for information retrieval.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In Lesson 11 (this article), we will learn to build a highly scalable, real-time RAG feature pipeline that ingests multi-data categories into a Redis vector database.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">More concretely we will take the ingestion pipeline implemented in Lesson 4 and swap the chunking, embedding, and vector DB logic with Superlinked.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>You don\u2019t have to read Lesson 4 to read this article. We will give enough context to make sense of it.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the <strong>12th lesson<\/strong>, we will use Superlinked to implement a multi-index query strategy and further optimize the advanced RAG retrieval module (initially built in <strong>Lesson 5<\/strong>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>The value of this article lies in understanding how easy it is to build complex advanced RAG systems using Superlinked.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em><strong>Using Superlinked<\/strong>, we reduced the number of RAG-related lines of code by 74.3%. Powerful, right?<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By the <strong>end of this article, you will learn<\/strong> to build a production-ready feature pipeline built in Superlinked that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>uses <a href=\"https:\/\/bytewax.io\/?utm_source=medium&amp;utm_medium=decodingml&amp;utm_campaign=2024_q1\">Bytewax<\/a> as a stream engine to process data in real-time;<\/li>\n\n\n\n<li>ingests multiple data categories from a <a href=\"https:\/\/www.rabbitmq.com\/\">RabbitMQ queue<\/a>;<\/li>\n\n\n\n<li>validates the data with <a href=\"https:\/\/docs.pydantic.dev\/latest\/\">Pydantic<\/a>;<\/li>\n\n\n\n<li>chunks, and embeds data using <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> for doing RAG;<\/li>\n\n\n\n<li>loads the embedded vectors along their metadata to a <a href=\"https:\/\/redis.io\/docs\/latest\/develop\/get-started\/vector-database\/\">Redis vector DB<\/a>;<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Ultimately, on the infrastructure side, we will show you how to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>deploy a <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> vector compute server;<\/li>\n\n\n\n<li>Dockerize the RAG ecosystem.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Note<\/strong>: In our use case, the <strong>feature pipeline<\/strong> is also a <strong>streaming pipeline<\/strong>, as we use a <a href=\"https:\/\/bytewax.io\/?utm_source=medium&amp;utm_medium=decodingml&amp;utm_campaign=2024_q1\">Bytewax<\/a> streaming engine. Thus, we will use these words <strong>interchangeably<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*r8OjO5BXmeaO1KLMvL9LDQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">The RAG feature pipeline architecture after refactoring.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Quick intro in feature pipelines<\/strong><br>\nThe <strong>feature pipeline<\/strong> is the <strong>first pipeline<\/strong> presented in the <strong>FTI pipeline architecture<\/strong>: feature, training and inference pipelines.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A <strong>feature pipeline<\/strong> takes raw data as input, processes it into features, and stores it in a feature store, from which the training &amp; inference pipelines will use it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The component is completely isolated from the training and inference code. All the communication is done through the feature store.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>To avoid repeating myself, if you are <strong>unfamiliar<\/strong> with the <strong>FTI pipeline architecture<\/strong>, check out Lesson 1 for a refresher.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Table of Contents<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><a href=\"#12we\">What is Superlinked?<\/a><\/li>\n\n\n\n<li><a href=\"#pl90\">The old architecture of the RAG feature pipeline<\/a><\/li>\n\n\n\n<li><a href=\"#98iu\">The new Superlinked architecture of the RAG feature pipeline<\/a><\/li>\n\n\n\n<li><a href=\"#weee\">Understanding the streaming flow for real-time processing<\/a><\/li>\n\n\n\n<li><a href=\"#7hj7\">Loading data to Superlinked<\/a><\/li>\n\n\n\n<li><a href=\"#lkjh\">Exploring the RAG Superlinked server<\/a><\/li>\n\n\n\n<li><a href=\"#d556\">Using Redis as a vector DB<\/a><\/li>\n\n\n\n<li><a href=\"#d678\">Dockerize the application<\/a><\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">\ud83d\udd17 Check out the code on GitHub [1] and support us with a \u2b50\ufe0f<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"12we\">1. What is Superlinked?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><em><a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a><\/em> is a computing framework for turning complex data into vectors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It lets you quickly build multimodal vectors and define weights at query time, so you don\u2019t need a custom reranking algorithm to optimize results.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Superlinked focuses on solving complex problems based on vector embeddings, such as RAG, semantic search, and recommendation systems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I love how Daniel Svonava, the CEO of <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a>, described the value of vector compute and implicitly Superlinked:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Daniel Svonava, CEO at Superlinked:<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>\u201cVectors power most of what you already do online \u2014 hailing a cab, finding a funny video, getting a date, scrolling through a feed or paying with a tap. And yet, building production systems powered by vectors is still too hard! Our goal is to help enterprises put vectors at the center of their data &amp; compute infrastructure, to build smarter and more reliable software.\u201d<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To conclude, Superlinked is a framework that puts the vectors in the center of their universe and allows you to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>chunk and embed embeddings;<\/li>\n\n\n\n<li>store multi-index vectors in a vector DB;<\/li>\n\n\n\n<li>do complex vector search queries on top of your data.<br><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*MmrM2X7FQ4Jzrs-A4066zg.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Screenshot from Superlinked\u2019s landing page<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Superlinked vs LangChain (or LlamaIndex)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Superlinked solely specializes in vector computing (chunking, embedding, vector DBs and vector searches). It is a highly specialized knife for \u201ccutting\u201d vectors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On the other hand, frameworks such as LangChain or LlamaIndex are like Swiss Army Knives, able to do almost everything related to LLM applications.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because of their fast number of features, they couldn\u2019t specialize in a specific niche, such as vector computing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Any framework would do the trick for a quick PoC, but Superlinked will make a difference when working with complex data structures that require multi-indexing and complicated queries.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Also, as a personal note, I love how simple and intuitive Superlinked\u2019s Python SDK is compared to other frameworks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"pl90\">2. The old architecture of the RAG feature pipeline<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Here is a quick recap of the critical aspects of the architecture of the RAG feature pipeline presented in the 4th lesson of the LLM Twin course.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>We are working with <strong>3 different data categories:<\/strong><\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>posts (e.g., LinkedIn, Twitter)<\/li>\n\n\n\n<li>articles (e.g., Medium, Substack, or any other blog)<\/li>\n\n\n\n<li>repositories (e.g., GitHub, GitLab)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Every data category has to be preprocessed differently. For example, you want to chunk the posts into smaller documents while keeping the articles in bigger ones.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>The <strong>solution<\/strong> is based on <strong>CDC<\/strong>, a <strong>queue<\/strong>, a <strong>streaming engine<\/strong>, and a <strong>vector DB<\/strong>:<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">-&gt; The raw data is collected from multiple social platforms and is stored in <a href=\"https:\/\/www.mongodb.com\/\">MongoDB<\/a>. (Lesson 2)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2192 CDC adds any change made to the MongoDB to a <a href=\"https:\/\/www.rabbitmq.com\/\">RabbitMQ<\/a> queue (Lesson 3).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2192 the RabbitMQ queue stores all the events until they are processed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2192 The <a href=\"https:\/\/bytewax.io\/?utm_source=medium&amp;utm_medium=decodingml&amp;utm_campaign=2024_q1\">Bytewax<\/a> streaming engine reads the messages from the RabbitMQ queue and cleans, chunks, and embeds them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u2192 The processed data is uploaded to a <a href=\"https:\/\/qdrant.tech\/?utm_source=decodingml&amp;utm_medium=referral&amp;utm_campaign=llm-course\">Qdrant vector DB<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*trp1lxqWF1v20W7U.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">The old feature\/streaming pipeline architecture that was presented in Lesson 4.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Why is this design robust?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Here are 4 core reasons:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The <strong>data<\/strong> is <strong>processed <\/strong>in<strong> real-time<\/strong>.<\/li>\n\n\n\n<li><strong>Out-of-the-box recovery system:<\/strong> If the streaming pipeline fails to process a message, it will be added back to the queue<\/li>\n\n\n\n<li><strong>Lightweight:<\/strong> No need for any diffs between databases or batching too many records<\/li>\n\n\n\n<li><strong>No I\/O bottlenecks<\/strong> on the source database<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">We recommend reading (or at least skimming) Lesson 4 to understand the details of the old streaming architecture.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What is the issue with this design?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In this architecture, we had to write custom logic to chunk, embed, and load the data to Qdrant.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The issue with this approach is that we had to leverage various libraries, such as LangChain and unstructured, to get the job done.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Also, because we have 3 data categories, we had to write a dispatcher layer that calls the right function depending on its category, which resulted in tons of boilerplate code.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Ultimately, as the chunking and embedding logic is implemented directly in the streaming pipeline, it is harder to scale horizontally. The embedding algorithm needs powerful GPU machines, while the rest of the operations require a strong CPU.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This results in:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>more time spent on development;<\/li>\n\n\n\n<li>more code to maintain;<\/li>\n\n\n\n<li>the code can quickly become less readable;<\/li>\n\n\n\n<li>less freedom to scale.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Superlinked can speed up this process by providing a very intuitive and powerful Python API that can speed up the development of our ingestion and retrieval logic.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Thus, let\u2019s see how to redesign the architecture using Superlinked \u2193<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"98iu\">3. The new Superlinked architecture of the RAG feature pipeline<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The core idea of the architecture will be the same. We still want to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>use a <a href=\"https:\/\/bytewax.io\/?utm_source=medium&amp;utm_medium=decodingml&amp;utm_campaign=2024_q1\">Bytewax streaming engine<\/a> for real-time processing;<\/li>\n\n\n\n<li>read new events from <a href=\"https:\/\/www.rabbitmq.com\/\">RabbitMQ<\/a>;<\/li>\n\n\n\n<li>clean, chunk, and embed the new incoming raw data;<\/li>\n\n\n\n<li>load the processed data to a vector DB.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The question is<\/strong>, how will we do this with Superlinked?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As you can see in the image below, Superlinked will replace the logic for the following operations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>chunking;<\/li>\n\n\n\n<li>embedding;<\/li>\n\n\n\n<li>vector storage;<\/li>\n\n\n\n<li>queries.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Also, we have to swap Qdrant with a Redis vector DB because Superlinked didn\u2019t support Qdrant when I wrote this article. But they plan to add it in future months (along with many other vector DBs).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What will remain unchanged are the following:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>the Bytewax streaming layer;<\/li>\n\n\n\n<li>the RabbitMQ queue ingestion component;<\/li>\n\n\n\n<li>the cleaning logic.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><em>By seeing <strong>what we must change<\/strong> to the architecture to integrate Superlinked, we can <strong>see<\/strong> the <strong>framework\u2019s core features.<\/strong><\/em><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*GeCq4JYPeQyMtyNGUeRfyQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">The components that can be refactored into the Superlinked framework.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Now, let\u2019s take a deeper look at the new architecture.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">All the Superlinked logic will sit on its own server, completely decoupling the vector compute component from the rest of the feature pipeline.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can quickly scale the streaming pipeline or the Superlinked server horizontally based on our needs. Also, this makes it easier to run the embedding models (from Superlinked) on a machine with a powerful GPU while keeping the streaming pipeline on a machine optimized for network I\/O operations.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">All the communication to Superlinked (ingesting or query data) will be done through a REST API, automatically generated based on the schemas and queries you define in your Superlinked application.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <strong>Bytewax streaming pipeline<\/strong> will perform the following operations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>will concurrently read messages from RabbitMQ;<\/li>\n\n\n\n<li>clean each message based on it\u2019s data category;<\/li>\n\n\n\n<li>send the cleaned document to the Superlinked server through an HTTP request.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>On the Superlinked server side,<\/strong> we have defined an ingestion endpoint for each data category (article, post or code). Each endpoint will know how to chunk embed and store every data point based on its category.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Also, we have a query endpoint (automatically generated) for each data category that will take care of embedding the query and perform a vector semantic search operation to retrieve similar results.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*r8OjO5BXmeaO1KLMvL9LDQ.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">The RAG feature pipeline architecture after refactoring.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Now, let\u2019s finally jump into the code \u2193<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"weee\">4. Understanding the streaming flow for real-time processing<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s start with a quick recap of the Bytewax streaming flow we presented in Lesson 4 in more detail.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The<strong> <a href=\"https:\/\/bytewax.io\/?utm_source=medium&amp;utm_medium=decodingml&amp;utm_campaign=2024_q1\">Bytewax<\/a> flow<\/strong> is the <strong>central point<\/strong> of the <strong>streaming pipeline<\/strong>. It defines all the required steps, following the next simplified pattern:<em> \u201cinput -&gt; processing -&gt; output\u201d.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To structure and validate the data, we use <a href=\"https:\/\/docs.pydantic.dev\/latest\/\">Pydantic<\/a>. Between each Bytewax step, we map and pass a different Pydantic model based on its current state: raw, cleaned, chunked, or embedded.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If we get an invalid data point due to contract changes between the feature pipeline and the events coming from RabbitMQ, instead of having side effects in the system, Pydantic will throw an error. Thus, we can quickly react instead of having silent failures or other side effects.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is the Bytewax flow and its core steps \u2193<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:2000\/1*UrW-gY1j8T4jJWKAUdsgIA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Bytewax flow<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Check out Lesson 4 for more details on the Bytewax flow, how the <em>map()<\/em> functions work and how the data is clean. This lesson will primarily focus on Superlinked and how to write an RAG feature pipeline with it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What is important to remain with is that once a message is available in the RabbitMQ queue, it will immediately be:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>consumed;<\/li>\n\n\n\n<li>transformed to a raw Pydantic model (Pydantic automatically validates the structure and data types);<\/li>\n\n\n\n<li>cleaned;<\/li>\n\n\n\n<li>sent to the Superlinked server to be chunked, embedded and saved to a vector DB.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"7hj7\">5. Loading data to Superlinked<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before we explore the Superlinked application, let\u2019s review our <em>Bytewax SuperlinkedOutputSink()<\/em> and <em>SuperlinkedClient()<\/em> classes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <em>SuperlinkedOutputSink()<\/em> class inherits the DynamicSink base class from Bytewax, which implements output nodes in a flow.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Its purpose is to instantiate a new <em>SuperlinkedSinkPartition()<\/em> for each worker within the Bytewax cluster. Thus, we can optimize the system for I\/O operations by scaling our output workers horizontally. <script src=\" https:\/\/cdn.jsdelivr.net\/npm\/prismjs@1.29.0\/prism.min.js \"><\/script><br>\n<script src=\"https:\/\/cdn.jsdelivr.net\/npm\/prismjs@1.29.0\/components\/prism-python.min.js\"><\/script><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> class SuperlinkedOutputSink(DynamicSink):\n    def __init__(self, client: SuperlinkedClient) -&gt; None:\n        self._client = client\n\n    def build(self, worker_index: int, worker_count: int) -&gt; StatelessSinkPartition:\n        return SuperlinkedSinkPartition(client=self._client) <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The <em>SuperlinkedSinkPartition()<\/em> class inherits the <em>StatelessSinkPartition Bytewax base class<\/em> used to create custom stateless partitions. Each partition will run on a different worker. As they are stateless, you can directly spin up new workers when required.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This class takes as input batches of items and sends them to Superlinked through the <em>SuperlinkedClient().<\/em><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> class SuperlinkedSinkPartition(StatelessSinkPartition):\n    def __init__(self, client: SuperlinkedClient):\n        self._client = client\n\n    def write_batch(self, items: list&#91;Document]) -&gt; None:\n        for item in tqdm(items, desc=\"Sending items to Superlinked...\"):\n            match item.type:\n                case \"repositories\":\n                    self._client.ingest_repository(item)\n                case \"posts\":\n                    self._client.ingest_post(item)\n                case \"articles\":\n                    self._client.ingest_article(item)\n                case _:\n                    logger.error(f\"Unknown item type: {item.type}\") <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The <em>SuperlinkedClient()<\/em> is a basic wrapper that makes HTTP requests to the Superlinked server that contains all the RAG logic. We use <em>httpx<\/em> to make POST requests for ingesting or searching data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We will use this class to communicate between:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>the RAG feature pipeline -&gt; Superlinked server (when ingesting data)<\/li>\n\n\n\n<li>the RAG retriever &lt;-&gt; Superlinked server (when retrieving data for passing it to an LLM)<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code> class SuperlinkedClient:\n    def __init__(self, base_url=settings.SUPERLINKED_SERVER_URL) -&gt; None:\n        self.base_url = base_url\n        self.timeout = 600\n        self.headers = {\"Accept\": \"*\/*\", \"Content-Type\": \"application\/json\"}\n\n        self._content_weight = 0.9\n        self._platform_weight = 0.1\n\n    def ingest_repository(self, data: RepositoryDocument) -&gt; None:\n        self.__ingest(f\"{self.base_url}\/api\/v1\/ingest\/repository_schema\", data)\n\n    def ingest_post(self, data: PostDocument) -&gt; None:\n        self.__ingest(f\"{self.base_url}\/api\/v1\/ingest\/post_schema\", data)\n\n    def ingest_article(self, data: ArticleDocument) -&gt; None:\n        self.__ingest(f\"{self.base_url}\/api\/v1\/ingest\/article_schema\", data)\n\n    def __ingest(self, url: str, data: T) -&gt; None:\n        logger.info(f\"Sending article {data.id} to Superlinked at {url}\")\n\n        response = httpx.post(\n            url, headers=self.headers, json=data.model_dump(), timeout=self.timeout\n        )\n\n        if response.status_code != 202:\n            raise httpx.HTTPStatusError(\n                \"Ingestion failed\", request=response.request, response=response\n            )\n\n    def search_repository(\n        self, search_query: str, platform: str, author_id: str, *, limit: int = 3\n    ) -&gt; list&#91;RepositoryDocument]:\n        return self.__search(\n            f\"{self.base_url}\/api\/v1\/search\/repository_query\",\n            RepositoryDocument,\n            search_query,\n            platform,\n            author_id,\n            limit=limit,\n        )\n\n    def search_post(\n        self, search_query: str, platform: str, author_id: str, *, limit: int = 3\n    ) -&gt; list&#91;PostDocument]:\n        ... # URL: f\"{self.base_url}\/api\/v1\/search\/post_query\"\n\n    def search_article(\n        self, search_query: str, platform: str, author_id: str, *, limit: int = 3\n    ) -&gt; list&#91;ArticleDocument]:\n        ... # URL: f\"{self.base_url}\/api\/v1\/search\/article_query\"\n\n    def __search(\n        self,\n        url: str,\n        document_class: type&#91;T],\n        search_query: str,\n        platform: str,\n        author_id: str,\n        *,\n        limit: int = 3,\n    ) -&gt; list&#91;T]:\n        url = f\"{self.base_url}\/api\/v1\/search\/repository_query\"\n\n        data = {\n            \"search_query\": search_query,\n            \"platform\": platform,\n            \"author_id\": author_id,\n            \"limit\": limit,\n            \"content_weight\": self._content_weight,\n            \"platform_weight\": self._platform_weight,\n        }\n        response = httpx.post(\n            url, headers=self.headers, json=data, timeout=self.timeout\n        )\n\n        if response.status_code != 200:\n            raise httpx.HTTPStatusError(\n                \"Search failed\", request=response.request, response=response\n            )\n\n        parsed_results = &#91;]\n        for result in response.json()&#91;\"results\"]:\n            parsed_results.append(document_class(**result&#91;\"obj\"]))\n\n        return parsed_results <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">The Superlinked server URLs are automatically generated as follows:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The ingestion URLs are generated based on the data schemas you defined (e.g., repository schema, post schema, etc.)<\/li>\n\n\n\n<li>The search URLs are created based on the Superlinked queries defined within the application<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">If that doesn\u2019t make sense, it will in just a second after we go through the Superlinked application \u2193<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"lkjh\">6. Exploring the RAG Superlinked server<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">As the RAG <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> server is a different component than the Bytewax one, the implementation sits under the <em>server<\/em> folder at <em>6-bonus-superlinked-rag\/server\/src\/app.py.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Under the hood, Superlinked uses <a href=\"https:\/\/fastapi.tiangolo.com\/\">FastAPI<\/a> to bootstrap a web server over its core engine. You won\u2019t have to interact with FastAPI, but it\u2019s good to know as you leverage its features, such as the <a href=\"https:\/\/fastapi.tiangolo.com\/how-to\/configure-swagger-ui\/\">Swagger UI<\/a> [2] for documentation, which you can access at \/docs:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:2000\/1*hQgmAQUUKo9em5oC21g0QA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Screenshot of the Swagger UI [2]<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Here is a step-by-step implementation of the Superlinked application \u2193<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Settings class<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Use <a href=\"https:\/\/docs.pydantic.dev\/latest\/concepts\/pydantic_settings\/\">Pydantic settings<\/a> to define a global configuration class.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> class Settings(BaseSettings):\n    EMBEDDING_MODEL_ID: str = \"sentence-transformers\/all-mpnet-base-v2\"\n\n    REDIS_HOSTNAME: str = \"redis\"\n    REDIS_PORT: int = 6379\n\n\nsettings = Settings() <\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Schemas<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Superlinked requires you to define your data structure through a set of schemas, which are very similar to data classes or Pydantic models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Superlinked will use these schemas as ORMs to save your data to a specified vector DB.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It will also use them to define ingestion URLs automatically as POST HTTP methods that expect the request body to have the same signature as the schema.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Simple and effective. Cool, right?<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> @schema\nclass PostSchema:\n    id: IdField\n    platform: String\n    content: String\n    author_id: String\n    type: String\n\n\n@schema\nclass ArticleSchema:\n    id: IdField\n    platform: String\n    link: String\n    content: String\n    author_id: String\n    type: String\n\n\n@schema\nclass RepositorySchema:\n    id: IdField\n    platform: String\n    name: String\n    link: String\n    content: String\n    author_id: String\n    type: String\n\n\npost = PostSchema()\narticle = ArticleSchema()\nrepository = RepositorySchema() <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">There is nothing fancy here. Let\u2019s move to Superlinked\u2019s coolest feature, <strong>spaces<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Spaces<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The spaces are where you define your chunking and embedding logic.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A space is scoped at the field of a schema. Thus, if you want to embed multiple attributes of a single schema, you must define multiple spaces and combine them later into a multi-index.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s take the spaces for the article category as an example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> articles_space_content = TextSimilaritySpace(\n    text=chunk(article.content, chunk_size=500, chunk_overlap=50),\n    model=settings.EMBEDDING_MODEL_ID,\n)\narticles_space_plaform = CategoricalSimilaritySpace(\n    category_input=article.platform,\n    categories=&#91;\"medium\", \"superlinked\"],\n    negative_filter=-5.0,\n) <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Chunking is done simply by calling the <em>chunk()<\/em> function on a given schema field and specifying standard parameters such as <em>\u201cchunk_size\u201d<\/em> and <em>\u201cchunk_overlap\u201d.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The embedding is done through the <em>TextSimilaritySpace()<\/em> and CategoricalSimilaritySpace() classes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As the name suggests, the <strong><em>TextSimilaritySpace()<\/em> <\/strong>embeds text data using the model specified within the \u201c<em>model<\/em>\u201d parameter. It supports any HuggingFace model. We are using \u201c<em>sentence-transformers\/all-mpnet-base-v2\u201d.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <em><strong>CategoricalSimilaritySpace()<\/strong><\/em> class uses an <em>n-hot encoded vector<\/em> with the option to apply a negative filter for unmatched categories, enhancing the distinction between matching and non-matching category items.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <em>\u201cnegative_filter\u201d<\/em> parameter allows for the filtering out of unmatched categories by setting them to a large negative value, effectively resulting in a large negative similarity between non-matching category items.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You must also specify all the available categories through the \u201c<em>categories<\/em>\u201d parameter to encode them in n-hot.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As you can see in the GitHub repository, the spaces for the repository and posts look exactly the same.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Indexes<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The indexes define how a collection can be queried. They take one or multiple spaces from the same schema.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is what the article index looks like:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> article_index = Index(\n    &#91;articles_space_content, articles_space_plaform],\n    fields=&#91;article.author_id],\n) <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">As you can see, the vector index combines the article\u2019s content and the posted platform. When the article collection is queried, both embeddings will be considered.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Also, we index the \u201cauthor_id\u201d field to filter articles written by a specific author. It is nothing fancy\u2014it is just a classic filter. However, indexing the fields used in filters is often good practice.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The repository and post indexes look the same, as you can see in the GitHub repository.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Queries<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">We will quickly introduce what a query looks like. But in the 12th lesson, we will insist on the advanced retrieval part, hence on queries.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is what the article query looks like:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> article_query = (\n    Query(\n        article_index,\n        weights={\n            articles_space_content: Param(\"content_weight\"),\n            articles_space_plaform: Param(\"platform_weight\"),\n        },\n    )\n    .find(article)\n    .similar(articles_space_content.text, Param(\"search_query\"))\n    .similar(articles_space_plaform.category, Param(\"platform\"))\n    .filter(article.author_id == Param(\"author_id\"))\n    .limit(Param(\"limit\"))\n) <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">\u2026and here is what it does:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>it queries the article_index using a weighted multi-index between the content and platform vectors (e.g., <code>0.9 * content_embedding + 0.1 * platform_embedding<\/code> );<\/li>\n\n\n\n<li>the search text used to compute query content embedding is specified through the \u201csearch_query\u201d parameter and similar for the platform embedding through the \u201cplatform\u201d parameter;<\/li>\n\n\n\n<li>we filter the results based on the \u201cauthor_id\u201d;<\/li>\n\n\n\n<li>take only the top results using the \u201climit\u201d parameter.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">These parameters are automatically exposed on the REST API endpoint, as seen in the SuperlinkedClient() class.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Sources<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The sources wrap the schemas and allow you to save that schema in the database.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In reality, the source maps the schema to an ORM and automatically generates REST API endpoints to ingest data points.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> article_source = RestSource(article) <\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Executor<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The last step is to define the executor that wraps all the sources, indices, queries and vector DB into a single entity:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> executor = RestExecutor(\n    sources=&#91;article_source, repository_source, post_source],\n    indices=&#91;article_index, repository_index, post_index],\n    queries=&#91;\n        RestQuery(RestDescriptor(\"article_query\"), article_query),\n        RestQuery(RestDescriptor(\"repository_query\"), repository_query),\n        RestQuery(RestDescriptor(\"post_query\"), post_query),\n    ],\n    vector_database=InMemoryVectorDatabase(),\n) <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Now, the last step is to register the executor to the Superlinked engine:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> SuperlinkedRegistry.register(executor) <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">\u2026and that\u2019s it!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Joking\u2026 there is something more. We have to use a Redis database instead of the in-memory one.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"d556\">7. Using Redis as a vector DB<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">First, we have to spin up a <a href=\"https:\/\/redis.io\/docs\/latest\/develop\/get-started\/vector-database\/\">Redis vector database<\/a> that we can work with.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We used Docker and attached a Redis image as a service in a <em>docker-compose<\/em> file along with the <strong>Superlinked<\/strong> poller and executor (which comprise the Superlinked server):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> version: \"3\"\n\nservices:\n  poller:\n    ...\n\n  executor:\n    ...\n\n  redis:\n    image: redis\/redis-stack:latest\n    ports:\n      - \"6379:6379\"\n      - \"8001:8001\"\n    volumes:\n      - redis-data:\/data\n\nvolumes:\n  redis-data: <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Now, Superlinked makes everything easy. The last step is to define a RedisVectorDatabase connector provided by Superlinked:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> vector_database = RedisVectorDatabase(\n    settings.REDIS_HOSTNAME,  # (Mandatory) This is your redis URL without any port or extra fields\n    settings.REDIS_PORT,  # (Mandatory) This is the port and it should be an integer\n) <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">\u2026and swap it in the executor with the<em> InMemoryVectorDatabase<\/em>() one:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> executor = RestExecutor(\n    ...\n    vector_database=vector_database,\n) <\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">As we are using the \u201credis-stack\u201d Docker image, you can visualize everything inside Redis at http:\/\/localhost:8001\/redis-stack\/browser.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:2000\/1*H5GeS8x8_JrQMq26ktk7yw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Screenshot from the Redis Stack<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Now we are done!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We have created a <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> server that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>chunks and embeds every data category differently;<\/li>\n\n\n\n<li>writes all the ingested data to a Redis vector DB;<\/li>\n\n\n\n<li>can ingest and query articles, posts and repositories;<\/li>\n\n\n\n<li>support multi-index vector search between the content and the platform of the data point;<\/li>\n\n\n\n<li>has ingestion and search REST API endpoints;<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">\u2026and all of that in only 486 lines of code. Pretty cool, right?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"d678\">8. Dockerize the application<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The article is already too long.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Thus, we won\u2019t get into the details of Dockerization, but we want to let you know that the repository supports Docker.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is where you can find all the Docker and Docker compose files required to run the RAG feature pipeline and Superlinked server:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/decodingml\/llm-twin-course\/blob\/main\/.docker\/Dockerfile.bytewax.superlinked\">Bytewax Docker<\/a> image;<\/li>\n\n\n\n<li>LLM Twin system <a href=\"https:\/\/github.com\/decodingml\/llm-twin-course\/blob\/main\/docker-compose-superlinked.yml\">docker-compose<\/a> file;<\/li>\n\n\n\n<li>Superlinked server <a href=\"https:\/\/github.com\/decodingml\/llm-twin-course\/blob\/main\/6-bonus-superlinked-rag\/server\/compose.yaml\">docker-compose file<\/a> (where the Redis database is defined).<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><em>\u2192 The <a href=\"https:\/\/github.com\/decodingml\/llm-twin-course\/tree\/main\">GitHub repository<\/a> provides <strong>step-by-step details<\/strong> on building and starting the <strong>Docker images<\/strong> to <strong>run<\/strong> the whole project. \u2190<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conclusion<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Congratulations! You learned to write advanced RAG systems using <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a>.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">More concretely, in Lesson 11, you learned:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>what is <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a>;<\/li>\n\n\n\n<li>how to design a streaming pipeline using <a href=\"https:\/\/bytewax.io\/?utm_source=medium&amp;utm_medium=decodingml&amp;utm_campaign=2024_q1\">Bytewax<\/a>;<\/li>\n\n\n\n<li>how to design a RAG server using Superlinked;<\/li>\n\n\n\n<li>how to take a standard RAG feature pipeline and refactor it using Superlinked;<\/li>\n\n\n\n<li>how to split the feature pipeline into 2 services, one that reads in real-time messages from RabbitMQ and one that chunks, embeds, and stores the data to a vector DB;<\/li>\n\n\n\n<li>how to use a Redis vector DB.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Lesson 12 will teach you how to implement multi-index queries to optimize the RAG retrieval layer further.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>\ud83d\udd17 Check out <a href=\"https:\/\/github.com\/decodingml\/llm-twin-course\">the code on GitHub<\/a> [1] and support us with a \u2b50\ufe0f<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>\u2192 Also, if curious, <strong>check out <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> <\/strong>to learn more about them.<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading pe pf gu bf pg ph qr hz pj pk qs ic pm pn qt pp pq pr qu pt pu pv qv px py pz bk\" id=\"7424\">References<\/h3>\n\n\n\n<h4 class=\"wp-block-heading rg pf gu bf pg rh ri dy pj rj rk ea pm nx rl rm rn ob ro rp rq of rr rs rt ha bk\" id=\"b25a\">Literature<\/h4>\n\n\n\n<p class=\"pw-post-body-paragraph no np gu nq b hx qa ns nt ia qb nv nw nx qc nz oa ob qd od oe of qe oh oi oj gn bk wp-block-paragraph\" id=\"61cf\">[1]&nbsp;<a class=\"af ol\" href=\"https:\/\/github.com\/decodingml\/llm-twin-course\" target=\"_blank\" rel=\"noopener ugc nofollow\">Your LLM Twin Course \u2014 GitHub Repository<\/a>&nbsp;(2024), Decoding ML GitHub Organization<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph no np gu nq b hx nr ns nt ia nu nv nw nx ny nz oa ob oc od oe of og oh oi oj gn bk wp-block-paragraph\" id=\"e80b\">[2]&nbsp;<a class=\"af ol\" href=\"https:\/\/fastapi.tiangolo.com\/how-to\/configure-swagger-ui\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Swagger UI<\/a>, FastAPI documentation<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph no np gu nq b hx nr ns nt ia nu nv nw nx ny nz oa ob oc od oe of og oh oi oj gn bk wp-block-paragraph\" id=\"57e8\">[3]&nbsp;<a class=\"af ol\" href=\"https:\/\/colab.research.google.com\/drive\/1qh6kXvIscntsj50gQ-nugxUymgmieJBm#scrollTo=btx8l6wDlYDU\" target=\"_blank\" rel=\"noopener ugc nofollow\">Superlinked Demo Notebook<\/a>, Google Colab<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph no np gu nq b hx nr ns nt ia nu nv nw nx ny nz oa ob oc od oe of og oh oi oj gn bk wp-block-paragraph\" id=\"4a94\">[4]&nbsp;<a class=\"af ol\" href=\"https:\/\/github.com\/superlinked\/superlinked\/blob\/main\/server\/README.md?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\" target=\"_blank\" rel=\"noopener ugc nofollow\">Superlinked Server<\/a>, Superlinked GitHub repository<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph no np gu nq b hx nr ns nt ia nu nv nw nx ny nz oa ob oc od oe of og oh oi oj gn bk wp-block-paragraph\" id=\"2988\">[5]&nbsp;<a class=\"af ol\" href=\"https:\/\/github.com\/superlinked\/superlinked\/blob\/main\/server\/docs\/redis\/redis.md?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\" target=\"_blank\" rel=\"noopener ugc nofollow\">Superlinked Redis Example<\/a>, Superlinked GitHub repository<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph no np gu nq b hx nr ns nt ia nu nv nw nx ny nz oa ob oc od oe of og oh oi oj gn bk wp-block-paragraph\" id=\"32ca\">[6]&nbsp;<a class=\"af ol\" href=\"https:\/\/github.com\/superlinked\/superlinked\/blob\/main\/notebook\/rag_hr_knowledgebase.ipynb?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\" target=\"_blank\" rel=\"noopener ugc nofollow\">Superlinked RAG Example<\/a>, Superlinked GitHub repository<\/p>\n\n\n\n<h4 class=\"wp-block-heading rg pf gu bf pg rh ri dy pj rj rk ea pm nx rl rm rn ob ro rp rq of rr rs rt ha bk\" id=\"8c48\">Images<\/h4>\n\n\n\n<p class=\"pw-post-body-paragraph no np gu nq b hx qa ns nt ia qb nv nw nx qc nz oa ob qd od oe of qe oh oi oj gn bk wp-block-paragraph\" id=\"353e\">If not otherwise stated, all images are created by the author.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Welcome to Lesson 11 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You\u2019ll learn how to use LLMs, vector DVs, and LLMOps best practices to design, train, and deploy a production ready \u201cLLM twin\u201d of yourself. This AI character will write like you, incorporating your style, personality, and voice [&hellip;]<\/p>\n","protected":false},"author":128,"featured_media":10100,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[65,7],"tags":[],"coauthors":[222,223],"class_list":["post-12497","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-llmops","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Optimize RAG Systems Using Vector Computing<\/title>\n<meta name=\"description\" content=\"Learn techniques to refactor and optimize complex RAG systems using a specialized vector computing tool.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Build a scalable RAG ingestion pipeline using 74.3% less code\" \/>\n<meta property=\"og:description\" content=\"Learn techniques to refactor and optimize complex RAG systems using a specialized vector computing tool.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2025-01-13T20:18:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-29T12:16:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png\" \/>\n\t<meta property=\"og:image:width\" content=\"700\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Paul Iusztin, Decoding ML\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Paul Iusztin, Decoding ML\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"17 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How to Optimize RAG Systems Using Vector Computing","description":"Learn techniques to refactor and optimize complex RAG systems using a specialized vector computing tool.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/","og_locale":"en_US","og_type":"article","og_title":"Build a scalable RAG ingestion pipeline using 74.3% less code","og_description":"Learn techniques to refactor and optimize complex RAG systems using a specialized vector computing tool.","og_url":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2025-01-13T20:18:28+00:00","article_modified_time":"2025-04-29T12:16:08+00:00","og_image":[{"width":700,"height":400,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","type":"image\/png"}],"author":"Paul Iusztin, Decoding ML","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Paul Iusztin, Decoding ML","Est. reading time":"17 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/"},"author":{"name":"Paul Iusztin","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/87bf0cb600025605b68dcd2f0d597560"},"headline":"Build a scalable RAG ingestion pipeline using 74.3% less code","datePublished":"2025-01-13T20:18:28+00:00","dateModified":"2025-04-29T12:16:08+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/"},"wordCount":3409,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","articleSection":["LLMOps","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/","url":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/","name":"How to Optimize RAG Systems Using Vector Computing","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","datePublished":"2025-01-13T20:18:28+00:00","dateModified":"2025-04-29T12:16:08+00:00","description":"Learn techniques to refactor and optimize complex RAG systems using a specialized vector computing tool.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","width":700,"height":400,"caption":"illustration of a human face with colored lines and symbols radiating outward to visualize the concept of neural networks"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Build a scalable RAG ingestion pipeline using 74.3% less code"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/87bf0cb600025605b68dcd2f0d597560","name":"Paul Iusztin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/82264b94fb97af87b79646edc7e4fd81","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2026\/05\/cropped-paul-iusztin-96x96.webp","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2026\/05\/cropped-paul-iusztin-96x96.webp","caption":"Paul Iusztin"},"sameAs":["https:\/\/decodingml.substack.com\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/paul-iusztin\/"}]}},"jetpack_featured_media_url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/12497","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/128"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=12497"}],"version-history":[{"count":2,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/12497\/revisions"}],"predecessor-version":[{"id":15776,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/12497\/revisions\/15776"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/10100"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=12497"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=12497"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=12497"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=12497"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}