{"id":12512,"date":"2025-01-13T12:20:42","date_gmt":"2025-01-13T20:20:42","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=12512"},"modified":"2025-04-29T12:43:04","modified_gmt":"2025-04-29T12:43:04","slug":"multi-index-rag-apps","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/","title":{"rendered":"Build Multi-Index Advanced RAG Apps"},"content":{"rendered":"\n<p><em>Welcome to Lesson 12 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You\u2019ll learn how to use LLMs, vector DVs, and LLMOps best practices to design, train, and deploy a production ready \u201cLLM twin\u201d of yourself. This AI character will write like you, incorporating your style, personality, and voice into an LLM. For a full overview of course objectives and prerequisites, start with Lesson 1.<\/em><\/p>\n\n\n\n<p><strong>Lessons<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/an-end-to-end-framework-for-production-ready-llm-systems-by-building-your-llm-twin\/\">An End-to-End Framework for Production-Ready LLM Systems by Building Your LLM Twin<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/the-importance-of-data-pipelines-in-the-era-of-generative-ai\/\">Your Content is Gold: I Turned 3 Years of Blog Posts into an LLM Training<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/llm-twin-3-change-data-capture\/\">I Replaced 1000 Lines of Polling Code with 50 Lines of CDC Magic<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/streaming-pipelines-for-fine-tuning-llms\/\">SOTA Python Streaming Pipelines for Fine-tuning LLMs and RAG \u2014 in Real-Time!<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/advanced-rag-algorithms-optimize-retrieval\/\">The 4 Advanced RAG Algorithms You Must Know to Implement<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/llm-fine-tuning-dataset\/\">Turning Raw Data Into Fine-Tuning Datasets<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/mistral-llm-fine-tuning\/\">8B Parameters, 1 GPU, No Problems: The Ultimate LLM Fine-tuning Pipeline<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/llm-evaluation-best-practices\/\">The Engineer\u2019s Framework for LLM &amp; RAG Evaluation<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/llm-rag-inference-pipelines\/\">Beyond Proof of Concept: Building RAG Systems That Scale<\/a><\/span><\/li>\n\n\n\n<li><span class=\"s1\"><a href=\"https:\/\/www.comet.com\/site\/blog\/rag-evaluation-framework-ragas\/\">The Ultimate Prompt Monitoring Pipeline<\/a><\/span><\/li>\n\n\n\n<li><a href=\"https:\/\/www.comet.com\/site\/blog\/refactoring-rag-retrieval\/\">[Bonus] Build a scalable RAG ingestion pipeline using 74.3% less code<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/\">[Bonus] Build Multi-Index Advanced RAG Apps<\/a><\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>This article will teach you how to <strong>implement multi-index structures<\/strong> for <strong>building advanced RAG systems.<\/strong><\/p>\n\n\n\n<p>To <strong>implement<\/strong> our <strong>multi-index collections and queries<\/strong>, we will leverage <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a>, a vector compute engine highly optimized for working with vector data, offering solutions for ingestion, embedding, storing and retrieval.<\/p>\n\n\n\n<p>To better understand how <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> queries work, we will gradually present how to build a complex query that uses two vector indexes, adds filters based on the metadata extracted using an LLM, and returns only the top K most similar documents to reduce network I\/O overhead.<\/p>\n\n\n\n<p>Ultimately, we will dig into how Superlinked can help us implement and optimize various advanced RAG methods, such as query expansion, self-query, filtered vector search and rerank.<\/p>\n\n\n\n<p>As this article is part of the <em>LLM Twin course<\/em>, before we start, here is some essential context you have to know to move along with this lesson (which you can <strong><em>read independently if you want to<\/em><\/strong>):<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>In Lesson 11, we implemented the real-time RAG ingestion pipeline (using Bytewax) and server (using Superlinked).<\/li>\n\n\n\n<li>In <a href=\"https:\/\/www.comet.com\/site\/blog\/advanced-rag-algorithms-optimize-retrieval\/\">Lesson 5,<\/a> we presented 4 advanced RAG algorithms in depth and how to implement them.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*ij2b1BY-EPj1o0fX7lHyKA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 1: RAG ingestion pipeline and server<\/figcaption><\/figure>\n\n\n\n<p>Now, let\u2019s move on to Lesson 12, our current lesson.<\/p>\n\n\n\n<p>Table of Contents<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><a href=\"#bobo\">Exploring the multi-index RAG server<\/a><\/li>\n\n\n\n<li><a href=\"#qwer\">Understanding the data ingestion pipeline<\/a><\/li>\n\n\n\n<li><a href=\"#123q\">Writing complex multi-index RAG queries using Superlinked<\/a><\/li>\n\n\n\n<li><a href=\"#43re\">Exploring the 4 advanced RAG optimization techniques<\/a><\/li>\n\n\n\n<li><a href=\"#89ui\">Is Superlinked OP for building RAG and other vector-based apps?<\/a><\/li>\n<\/ol>\n\n\n\n<p>\ud83d\udd17 Check out the code on GitHub [1] and support us with a \u2b50\ufe0f<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"bobo\">1. Exploring the multi-index RAG server<\/h2>\n\n\n\n<p>We are using <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> to implement a powerful vector compute server. With just a few lines of code, we can implement a fully-fledged RAG application exposed as a REST API web server.<\/p>\n\n\n\n<p>When using <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a>, you declare your chunking, embedding and query strategy in a <strong>declarative way (similar to building a graph)<\/strong>, making it extremely easy to implement an end-to-end workflow.<\/p>\n\n\n\n<p>Let\u2019s explore the core steps in how to define an RAG server using Superlinked \u2193<\/p>\n\n\n\n<p>First, you have to define the schema of your data, which in our case are the post, article, and repositories schemas:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> from superlinked import schema\n\n@schema\nclass PostSchema:\n  content: String\n  platform: String\n  ... # Other fields\n\n@schema\nclass RepositorySchema:\n  content: String\n  platform: String\n  ...\n\n@schema\nclass ArticleSchema:\n  content: String\n  platform: String\n  ...\n\npost = PostSchema()\narticle = ArticleSchema()\nrepository = RepositorySchema() <\/code><\/pre>\n\n\n\n<p>You can quickly define an embedding space based on one or more schema attributes. The embedding space is made out of the following properties:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>the field to be embedded;<\/li>\n\n\n\n<li>a model used to embed the field.<\/li>\n<\/ul>\n\n\n\n<p>For example, this is how you can define an embedding space for a piece of text, more precisely on the content of the article:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> from superlinked import TextSimilaritySpace, chunk\n\narticles_space_content = TextSimilaritySpace(\n    text=chunk(article.content, chunk_size=500, chunk_overlap=50),\n    model=settings.EMBEDDING_MODEL_ID,\n)<\/code><\/pre>\n\n\n\n<p>Notice that we also wrapped the article&#8217;s content field with the <em><strong>chunk()<\/strong><\/em> function that automatically chunks the text before embedding it.<\/p>\n\n\n\n<p>The model can be any embedding model available on <em>HuggingFace<\/em> or <em>SentenceTransformers<\/em>. For example, we used the following <em><strong>MODEL_ID<\/strong><\/em>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> from pydantic_settings import BaseSettings\n\nclass Settings(BaseSettings):\n  EMBEDDING_MODEL_ID: str = \"sentence-transformers\/all-mpnet-base-v2\"\n\n  REDIS_HOSTNAME: str = \"redis\"\n  REDIS_PORT: int = 6379\n\nsettings = Settings() <\/code><\/pre>\n\n\n\n<p>It also supports defining an embedding space for categorical variables, such as the article\u2019s platform:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> from superlinked import CategoricalSimilaritySpace,\n\narticles_space_plaform = CategoricalSimilaritySpace(\n    category_input=article.platform,\n    categories=&#91;\"medium\", \"superlinked\"],\n    negative_filter=-5.0,\n) <\/code><\/pre>\n\n\n\n<p>Along with text and categorical embedding spaces, Superlinked supports numerical and temporal variables:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>TextSimilaritySpace [2]<\/li>\n\n\n\n<li>CategoricalSimilaritySpace [5]<\/li>\n\n\n\n<li>RecencySpace [6]<\/li>\n\n\n\n<li>NumberSpace [7]<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Multi-index structures<\/h3>\n\n\n\n<p>Now, we can combine the two embedding spaces defined above into a multi-index structure:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> from superlinked import Index\n\narticle_index = Index(\n    &#91;articles_space_content, articles_space_plaform],\n    fields=&#91;article.author_id],\n) <\/code><\/pre>\n\n\n\n<p>The first attribute is a list with references to the text and categorical embedding spaces. At the same time, the fields parameter contains a list of all the <em>fields<\/em> to which we want to apply filters when querying the data. These steps will optimize retrieval and filter operations to run at low latencies.<\/p>\n\n\n\n<p>Note that when defining an <em>Index<\/em> in Superlinked, we can add as many embedding spaces as we like that originate from the same schema, in our case, the <em>ArticleSchema<\/em>, where the minimum is one, and the maximum is all the schema fields.<\/p>\n\n\n\n<p>\u2026and, viola!<\/p>\n\n\n\n<p><em>We defined a multi-index structure that supports weighted queries in just a few lines of code.<\/em><\/p>\n\n\n\n<p>Using Superlinked and its embedding space and index architecture, we can easily index different data types (text, categorical, number, temporal) into a multi-index structure that offers tremendous flexibility in how we interact with the data.<\/p>\n\n\n\n<p>The following section will show you how to query the multi-index collection defined above. But first, let\u2019s wrap up with the Superlinked RAG server.<\/p>\n\n\n\n<p>To do so, let\u2019s define a connector to a Redis Vector DB:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> from superlinked import RedisVectorDatabase\n\nvector_database = RedisVectorDatabase(\nsettings.REDIS_HOSTNAME,\nsettings.REDIS_PORT,\n)<\/code><\/pre>\n\n\n\n<p>\u2026and ultimately define a RestExecutor that wraps up everything from above into a REST API server:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> from superlinked import RestSource, RestExecutor, SuperlinkedRegistry\n\narticle_source = RestSource(article)\nrepository_source = RestSource(repository)\npost_source = RestSource(post)\n\nexecutor = RestExecutor(\n    sources=&#91;article_source, repository_source, post_source],\n    indices=&#91;article_index, repository_index, post_index],\n    queries=&#91;\n        RestQuery(RestDescriptor(\"article_query\"), article_query),\n        RestQuery(RestDescriptor(\"repository_query\"), repository_query),\n        RestQuery(RestDescriptor(\"post_query\"), post_query),\n    ],\n    vector_database=vector_database,\n)\nSuperlinkedRegistry.register(executor) <\/code><\/pre>\n\n\n\n<p>Based on all the queries defined in the <em><strong>RestExecutor<\/strong><\/em> class, Superlinked will automatically generate endpoints that can be called through HTTP requests.<\/p>\n\n\n\n<p>In Lesson 11, we showed in more detail how the RAG Superlinked server works, how to set it up and how to interact with its query endpoints:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"&quot;qwer\">2. Understanding the data ingestion pipeline<\/h2>\n\n\n\n<p>Before we understand how to build queries for our multi-index collections, let\u2019s have a quick refresher on how the vector DB is populated with article, post, and repository documents.<\/p>\n\n\n\n<p>The data ingestion workflow is illustrated in Figure 2. During the LLM Twin course, we implemented a real-time data collection system in the following way:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We crawl the data from the internet and store it in a MongoDB data warehouse.<\/li>\n\n\n\n<li>We use CDC to capture CRUD events on the database and send them as messages to a RabbitMQ queue.<\/li>\n\n\n\n<li>We use a Bytewax streaming engine to consume and clean the events from RabbitMQ in real time.<\/li>\n\n\n\n<li>Ultimately, the data is ingested into the Superlinked server through HTTP requests.<\/li>\n\n\n\n<li>As seen before, the Superlinked server does the heavy lifting, such as chunking, embedding, and loading all the ingested data into a Redis vector DB.<\/li>\n\n\n\n<li>We implemented a vector DB retrieval client that queries the data from Superlinked through HTTP requests.<\/li>\n\n\n\n<li>The vector DB retrieval will be used within the final RAG component, which generates the final response using the retrieved context and an LLM.<\/li>\n<\/ul>\n\n\n\n<p>Note that whenever we crawl a new document from the Internet, we repeat steps 1\u20135, resulting in a vector DB synced with the external world in real-time.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*ij2b1BY-EPj1o0fX7lHyKA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 2: The RAG data ingestion pipeline and Superlinked server<\/figcaption><\/figure>\n\n\n\n<p><em>If you want to see the full implementation of the steps above, you can always check out the rest of the course\u2019s lessons for free, starting with Lesson 1.<\/em><\/p>\n\n\n\n<p>But now that we have an intuition on how the Redis vector DB is populated with data used for RAG let\u2019s see the true power of Superlinked and build some queries to retrieve data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"123q\">3. Writing complex multi-index RAG queries using Superlinked<\/h2>\n\n\n\n<p>Let\u2019s take a look at the complete article query we want to define:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> article_query = (\nQuery(\narticle_index,\nweights={\narticles_space_content: Param(\"content_weight\"),\narticles_space_plaform: Param(\"platform_weight\"),\n},\n)\n.find(article)\n.similar(articles_space_content.text, Param(\"search_query\"))\n.similar(articles_space_plaform.category, Param(\"platform\"))\n.filter(article.author_id == Param(\"author_id\"))\n.limit(Param(\"limit\"))\n) <\/code><\/pre>\n\n\n\n<p><em>&nbsp;If it seems like a lot, let\u2019s break it into smaller pieces and start with the beginning.<\/em><\/p>\n\n\n\n<p>What if we want to make a basic query that finds the most relevant articles solely based on the similarity between the query and the content of an article?<\/p>\n\n\n\n<p>In the code snippet below, we define a query based on the article\u2019s index to find articles that have the embedding of the content field most similar to the search query:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> article_query = (\n    Query(article_index)\n    .find(article)\n    .similar(articles_space_content.text, Param(\"search_query\"))\n) <\/code><\/pre>\n\n\n\n<p>As seen in the <strong>Exploring the multi-index RAG server<\/strong> section, plugging this query into the <strong><em>RestExecutor<\/em><\/strong> class automatically creates an API endpoint accessible through POST HTTP requests.<\/p>\n\n\n\n<p>In Figure 3, we can observe all the available endpoints automatically generated by Superlinked.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*xfqLcdzdNaNFR6nb.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 3: Screenshot of the Swagger UI [4] generated automatically based on the Superlinked queries.<\/figcaption><\/figure>\n\n\n\n<p>Thus, after starting the Superlinked server, which we showed how to do in Lesson 11, you can access the query as follows:<\/p>\n\n\n\n<p><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> import httpx\n\nurl=f\"{base_url}\/api\/v1\/search\/article_query\"\nheaders = {\"Accept\": \"*\/*\", \"Content-Type\": \"application\/json\"}\n\ndata = {\n      \"search_query\": \"Write me a post about Vector DBs and RAG.\",\n}\nresponse = httpx.post(\n    url, headers=headers, json=data, timeout=600\n)\nprint(result&#91;\"obj\"]) <\/code><\/pre>\n\n\n\n<p>As you can observe, all the attributes wrapped by the <strong><em>Param()<\/em><\/strong> class within the query are expected as parameters within the POST request, such as the <em><strong>Param(\u201csearch_query\u201d)<\/strong><\/em>, which represents the user\u2019s query.<\/p>\n\n\n\n<p>Quite intuitive, right?<\/p>\n\n\n\n<p>Now\u2026 What happens behind the scenes?<\/p>\n\n\n\n<p>After the endpoint is called, the Superlinked server processes the search query based on the <em><strong>articles_space_content<\/strong><\/em> embedding text space, which defines how to chunk and embed a text.<\/p>\n\n\n\n<p>Thus, that will happen to the search query: it will chunk and embed it.<\/p>\n\n\n\n<p>Using the computed query embedding, it will search the vector space based on the article\u2019s content and retrieve the most similar documents:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> articles_space_content = TextSimilaritySpace(\n    text=chunk(article.content, chunk_size=500, chunk_overlap=50),\n    model=\"sentence-transformers\/all-mpnet-base-v2\",\n) <\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Multi-index query<\/h3>\n\n\n\n<p>Now that we understand the basics of how a Superlinked query works, let\u2019s add another layer of complexity and <strong>create a multi-index query<\/strong> based on the article\u2019s content and platform:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> article_query = (\n    Query(\n        article_index,\n        weights={\n            articles_space_content: Param(\"content_weight\"),\n            articles_space_plaform: Param(\"platform_weight\"),\n        },\n    )\n    .find(article)\n    .similar(articles_space_content.text, Param(\"search_query\"))\n    .similar(articles_space_plaform.category, Param(\"platform\"))\n) <\/code><\/pre>\n\n\n\n<p>We added two things.<\/p>\n\n\n\n<p>The <strong>first<\/strong> one is another <strong>similar<\/strong>() function call, which configures the other embedding space we should use for the query, which is <strong>articles_space_plaform<\/strong>.<\/p>\n\n\n\n<p>Now, when making a query, Superlinked will use the embedding of both fields to search for relevant information:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>the search query<\/li>\n\n\n\n<li>the article\u2019s platforms<\/li>\n<\/ul>\n\n\n\n<p>But how do we configure which one is more important?<\/p>\n\n\n\n<p>Here, the <strong>second<\/strong> thing that we added kicks in, which is the <strong>weights<\/strong> parameter within the <strong><em>Query(weights={\u2026})<\/em><\/strong> class.<\/p>\n\n\n\n<p>Using the weights dictionary, we can add different weights per index to configure the importance of each within a particular query.<\/p>\n\n\n\n<p>Let\u2019s better understand this with an example:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> data = {\n    \"search_query\": \"Write me a post about Vector DBs and RAG.\",\n    \"platform\": \"medium\",\n    \"content_weight\": 0.9, # 90%\n    \"platform_weight\": 0.1, # 10%\n}\nresponse = httpx.post(\n    url, headers=self.headers, json=data, timeout=self.timeout\n) <\/code><\/pre>\n\n\n\n<p>In the previous example, we set the content weight to 90% and the platform\u2019s to 10%, which means that the article\u2019s content will most impact our query but still favor articles from the same platform.<\/p>\n\n\n\n<p>By playing with these weights, we tweak the impact of each index in our query.<\/p>\n\n\n\n<p>Now, let\u2019s add the last final pieces of the query, which are the filter() and the limit() functions:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> article_query = (\n    Query(\n        article_index,\n        weights={\n            articles_space_content: Param(\"content_weight\"),\n            articles_space_plaform: Param(\"platform_weight\"),\n        },\n    )\n    .find(article)\n    .similar(articles_space_content.text, Param(\"search_query\"))\n    .similar(articles_space_plaform.category, Param(\"platform\"))\n    .filter(article.author_id == Param(\"author_id\"))\n    .limit(Param(\"limit\"))\n) <\/code><\/pre>\n\n\n\n<p>The <strong><em>author_id<\/em><\/strong> filter helps us retrieve documents only from a specific author, while the limit function controls how many items we want to retrieve.<\/p>\n\n\n\n<p>For example, if we find 10 similar articles but the limit is set to 3, the Superlinked server will always return a maximum of 3 documents. Thus reducing network I\/O between the server and client:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> data = {\n    \"search_query\": \"Write me a post about Vector DBs and RAG.\",\n    \"platform\": \"medium\",\n    \"content_weight\": 0.9, # 90%\n    \"platform_weight\": 0.1, # 10%\n    \"author_id\": 145,\n    \"limit\": 3,\n}\nresponse = httpx.post(\n    url, headers=self.headers, json=data, timeout=self.timeout\n) <\/code><\/pre>\n\n\n\n<p>That\u2019s it! We can further optimize our retrieval step by experimenting with other multi-index configurations and weights.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"43re\">4. Exploring the 4 advanced RAG optimization techniques<\/h2>\n\n\n\n<p>In <a href=\"https:\/\/www.comet.com\/site\/blog\/advanced-rag-algorithms-optimize-retrieval\/\">Lesson 5<\/a>, we explored 4 popular advanced RAG techniques to improve the accuracy of our generative AI system.<\/p>\n\n\n\n<p><em>As a quick reminder, there are 3 main types of advanced RAG techniques:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pre-retrieval optimization [ingestion]: tweak how you create the chunks<\/li>\n\n\n\n<li>Retrieval optimization [retrieval]: improve the queries to your vector DB<\/li>\n\n\n\n<li>Post-retrieval optimization [retrieval]: process the retrieved chunks to filter out the noise<br><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/1*XXo1pdzQsqbUdaQQOBNYvw.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 4: Advanced RAG optimization options<\/figcaption><\/figure>\n\n\n\n<p>Now, let\u2019s explore the 4 methods initially implemented in Lesson 5 and understand how they can be integrated into our new architecture:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Query expansion (retrieval)<\/li>\n\n\n\n<li>Self query (retrieval)<\/li>\n\n\n\n<li>Filtered vector search (retrieval)<\/li>\n\n\n\n<li>Rerank (post-retrieval)<\/li>\n<\/ol>\n\n\n\n<p>By incorporating these 4 advanced RAG optimization techniques, we will better understand where Superlinked shines most.<\/p>\n\n\n\n<p><em><strong>Important<\/strong> &gt; On optimizing the ingestion side, Superlinked handled everything from chunking, embedding, and loading into a vector DB, detailed in Lesson 11.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*95dG1SL3vqhvDEyp.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 5: Advanced RAG architecture<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Query expansion (retrieval)<\/h3>\n\n\n\n<p>To implement query expansion, you use an LLM to generate multiple queries based on your initial user\u2019s query.<\/p>\n\n\n\n<p>These queries will contain multiple perspectives of the initial query.<\/p>\n\n\n\n<p>Thus, when embedded, they hit different areas of your embedding space that are still relevant to our initial question.<\/p>\n\n\n\n<p><strong>Does Superlinked help here?<\/strong> Not really, as you have to expand your query before calling Superlinked.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Self query (retrieval)<\/h3>\n\n\n\n<p>What if you could extract the tags within the query and use them along your vector search?<\/p>\n\n\n\n<p>That is what self-query is all about!<\/p>\n\n\n\n<p>You use an LLM to extract critical metadata fields for your business use case (e.g., tags, author ID, number of comments, likes, shares, etc.)<\/p>\n\n\n\n<p>In our custom solution, we are extracting just the author ID. Thus, a zero-shot prompt engineering technique will do the job.<\/p>\n\n\n\n<p><strong>Does Superlinked help here?<\/strong> Unfortunately, no, as you have to apply a self-query before calling the Superlinked server.<\/p>\n\n\n\n<p>But\u2026 self-queries work hand-in-hand with vector filter searches, which we will explain in the next section.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Filtered vector search (retrieval)<\/h3>\n\n\n\n<p>This is a fancy name for applying a standard filter on your metadata before (or after) doing your vector search, hence \u201cFiltered vector search.\u201d<\/p>\n\n\n\n<p><strong>Does Superlinked help here?<\/strong> Yes! This is where Superlinked shines, allowing you to quickly index data structured on fields other than your vector index (or multi-index).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code> article_index = Index(\n    &#91;articles_space_content, articles_space_plaform],\n    fields=&#91;article.author_id],\n)\n\narticle_query = (\n    Query(article_index)\n    ...\n    .filter(article.author_id == Param(\"author_id\"))\n) <\/code><\/pre>\n\n\n\n<p>Thus, you can implement optimal queries tailored to your data with a few lines of code.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Rerank (post-retrieval)<\/h3>\n\n\n\n<p>Rerank is used to filter out the noise from your retrieved documents.<\/p>\n\n\n\n<p>For example, you retrieved N documents from your vector DB using Superlinked. However, you want to be prudent about your context size, so you use a rerank model to score the relevancy of all the retrieved documents relative to your query.<\/p>\n\n\n\n<p>Then, based on the rerank score, you pick only the top K (where K &lt; N) documents as your final items to build up the context.<\/p>\n\n\n\n<p><strong>Does Superlinked help here?<\/strong> Unfortunately, it doesn\u2019t support cross-encoder models [3] for reranking.<\/p>\n\n\n\n<p>But they are just at the beginning of their journey. Supporting reranking makes a lot of sense. Thus, we speculate that they will add it along with other functionality that optimizes the retrieval component of an RAG system (or other AI application that works with embeddings).<\/p>\n\n\n\n<p><em>In this article, we briefly discussed the 4 advanced RAG methods implemented in our course. Check out Lesson 5 for a detailed explanation of each method.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"89ui\">5. Is Superlinked OP for building RAG and other vector-based apps?<\/h2>\n\n\n\n<p><a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> has incredible potential to build scalable vector servers to ingest and retrieve your data based on operations between embeddings.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*GIaZsWyZmj2zHx8J.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">Figure 6: Screenshot from Superlinked\u2019s landing page<\/figcaption><\/figure>\n\n\n\n<p><em>As you\u2019ve seen in Lesson 11 and Lesson 12, in just a few lines of code, we\u2019ve:<\/em><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>implemented clean and modular schemas for your data;<\/li>\n\n\n\n<li>chunked and embedded the data;<\/li>\n\n\n\n<li>added embedding support for multiple data types (text, categorical, numerical, temporal);<\/li>\n\n\n\n<li>implemented multi-index collections and queries, allowing us to optimize our retrieval step;<\/li>\n\n\n\n<li>connectors for multiple vector DBs (Redis, MongoDB, etc.)<\/li>\n\n\n\n<li>optimized filtered vector search.<\/li>\n<\/ul>\n\n\n\n<p>The truth is that <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a> is still a young Python framework.<\/p>\n\n\n\n<p>But as it grows, it will become more stable and introduce even more features, such as rerank, making it an excellent choice for implementing your vector search layer.<\/p>\n\n\n\n<p>If you are curious, <strong>check out <a href=\"https:\/\/superlinked.com\/?utm_source=community&amp;utm_medium=blog&amp;utm_campaign=oscourse\">Superlinked<\/a><\/strong> to learn more about them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>Within this article, you\u2019ve learned how to implement multi-index collections and queries for advanced RAG using Superlinked.<\/p>\n\n\n\n<p>After to better understand how Superlinked queries work, we gradually presented how to build a complex query that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>uses two vector indexes;<\/li>\n\n\n\n<li>adds filters based on the metadata extracted with an LLM;<\/li>\n\n\n\n<li>returns only the top K elements to reduce network I\/O overhead.<\/li>\n<\/ul>\n\n\n\n<p>Ultimately, we looked into how Superlinked can help us implement and optimize various advanced RAG methods, such as query expansion, self-query, filtered vector search and rerank.<\/p>\n\n\n\n<p><em>With this, we\u2019ve <strong>wrapped up<\/strong> the <strong>LLM Twin open-source course<\/strong>. We hope you enjoyed it and it brought value to your LLM &amp; RAG skills.<\/em><\/p>\n\n\n\n<p>The next step is to <strong>clone<\/strong> our <a href=\"https:\/\/github.com\/decodingml\/llm-twin-course\">LLM Twin GitHub repository<\/a> [1] and <strong>run everything yourself<\/strong> to get the most out of these series.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">References<\/h3>\n\n\n\n<p><strong>Literature<\/strong><br>\n[1] Your LLM Twin Course \u2014 GitHub Repository (2024), Decoding ML\u2019s GitHub Organization<\/p>\n\n\n\n<p>[2] Understand Text Similarity Spaces (2024), Superlinked\u2019s Documentation<\/p>\n\n\n\n<p>[3] Retrieve &amp; Re-Rank, Sentence Transformers Documentation<\/p>\n\n\n\n<p>[4] Swagger UI, FastAPI documentation<\/p>\n\n\n\n<p>[5] Understanding Categorical Similarity Space (2024), Superlinked\u2019s Documentation<\/p>\n\n\n\n<p>[6] Understanding Recency Spaces (2024), Superlinked\u2019s Documentation<\/p>\n\n\n\n<p>[7] Understand Number Spaces \u2014 MinMax Mode (2024), Superlinked\u2019s Documentation<\/p>\n\n\n\n<p><strong>Images<\/strong><br>\nIf not otherwise stated, all images are created by the author.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Welcome to Lesson 12 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You\u2019ll learn how to use LLMs, vector DVs, and LLMOps best practices to design, train, and deploy a production ready \u201cLLM twin\u201d of yourself. This AI character will write like you, incorporating your style, personality, and voice [&hellip;]<\/p>\n","protected":false},"author":128,"featured_media":10100,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[65,7],"tags":[],"coauthors":[222,223],"class_list":["post-12512","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-llmops","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Optimizing Advanced RAG Methods<\/title>\n<meta name=\"description\" content=\"How to implement multi-index structures for building complex RAG systems.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Build Multi-Index Advanced RAG Apps\" \/>\n<meta property=\"og:description\" content=\"How to implement multi-index structures for building complex RAG systems.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2025-01-13T20:20:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-29T12:43:04+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png\" \/>\n\t<meta property=\"og:image:width\" content=\"700\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Paul Iusztin, Decoding ML\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Paul Iusztin, Decoding ML\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Optimizing Advanced RAG Methods","description":"How to implement multi-index structures for building complex RAG systems.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/","og_locale":"en_US","og_type":"article","og_title":"Build Multi-Index Advanced RAG Apps","og_description":"How to implement multi-index structures for building complex RAG systems.","og_url":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2025-01-13T20:20:42+00:00","article_modified_time":"2025-04-29T12:43:04+00:00","og_image":[{"width":700,"height":400,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","type":"image\/png"}],"author":"Paul Iusztin, Decoding ML","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Paul Iusztin, Decoding ML","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/"},"author":{"name":"Paul Iusztin","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/87bf0cb600025605b68dcd2f0d597560"},"headline":"Build Multi-Index Advanced RAG Apps","datePublished":"2025-01-13T20:20:42+00:00","dateModified":"2025-04-29T12:43:04+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/"},"wordCount":2708,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","articleSection":["LLMOps","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/","url":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/","name":"Optimizing Advanced RAG Methods","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","datePublished":"2025-01-13T20:20:42+00:00","dateModified":"2025-04-29T12:43:04+00:00","description":"How to implement multi-index structures for building complex RAG systems.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/07\/rag-evaluation-ragas.png","width":700,"height":400,"caption":"illustration of a human face with colored lines and symbols radiating outward to visualize the concept of neural networks"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/multi-index-rag-apps\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Build Multi-Index Advanced RAG Apps"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/87bf0cb600025605b68dcd2f0d597560","name":"Paul Iusztin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/0bb2983de08cbe4fe43fad876af41aee","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/03\/cropped-1664517339716-96x96.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2024\/03\/cropped-1664517339716-96x96.jpg","caption":"Paul Iusztin"},"sameAs":["https:\/\/decodingml.substack.com\/"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/paul-iusztin\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/12512","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/128"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=12512"}],"version-history":[{"count":3,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/12512\/revisions"}],"predecessor-version":[{"id":15796,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/12512\/revisions\/15796"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/10100"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=12512"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=12512"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=12512"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=12512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}