{"id":8208,"date":"2023-11-30T06:17:22","date_gmt":"2023-11-30T14:17:22","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8208"},"modified":"2025-04-24T17:04:10","modified_gmt":"2025-04-24T17:04:10","slug":"retrieval-part-2-text-embeddings","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/","title":{"rendered":"Retrieval Part 2: Text Embeddings"},"content":{"rendered":"\n<section class=\"section section--body\">\n<div class=\"section-divider\"><span style=\"color: var(--wpex-heading-color); font-size: var(--wpex-text-2xl); font-weight: var(--wpex-heading-font-weight); font-family: var(--wpex-body-font-family, var(--wpex-font-sans));\">Explore How LangChain\u2019s Semantic Search Allows You To Transform Data Retrieval and Information Discovery<\/span><\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<figure class=\"graf graf--figure\">\n<\/figure><\/div><\/div><\/section>\n\n\n\n<figure class=\"wp-block-image aligncenter graf-image\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*phgSIxMelU53JkCB\" alt=\"text embeddings, semantic search, LangChain, Comet ML, CometLLM\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@trisolarian?utm_source=medium&amp;utm_medium=referral\">Axel R.<\/a> on\u00a0<a href=\"http:\/\/Unsplash.com\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">In this blog post, I\u2019ll show you how to work with text embedding models using LangChain.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Text embedding models represent documents as high-dimensional vectors. They\u2019re the key to unlocking semantic search capabilities that go beyond simple keyword matching. Imagine being able to sift through massive volumes of text and instantly find documents that match the intent and meaning of your search query, not just the exact words. That\u2019s the transformative potential of text embeddings in tasks such as document similarity search and recommendation systems. We\u2019ll explore how LangChain\u2019s OpenAIEmbeddings leverage this technology to revolutionize the way we interact with information, ensuring that the most relevant documents are at your fingertips, irrespective of the language used.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Let\u2019s dive into it!<\/p>\n\n\n\n<h3 class=\"wp-block-heading graf graf--h3\">Text Embeddings<\/h3>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Text embedding models represent text documents in a high-dimensional vector space, where the similarity between vectors corresponds to the semantic similarity between the corresponding documents.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">These models capture the semantic meaning of text and allow for efficient retrieval of similar documents based on their embeddings.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Text embedding models are handy when performing tasks like document similarity search, information retrieval, or recommendation systems.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">They enable you to find documents that are semantically similar to a given query document, even if the wording or phrasing is different.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">You would use text embedding models when you need to find similar documents based on their semantic meaning rather than just keyword matching.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">For example, in a search engine, you might want to retrieve documents that are relevant to a user\u2019s query, even if the query terms are not an exact match to the document\u2019s content.<\/p>\n\n\n\n<p class=\"graf graf--p wp-block-paragraph\">Text embedding models can help you achieve this by capturing the semantic relationships between words and documents.<\/p>\n\n\n\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<blockquote class=\"graf graf--pullquote\"><p>Want to learn how to build modern software with LLMs using the newest tools and techniques in the field? <a class=\"markup--anchor markup--pullquote-anchor\" href=\"https:\/\/www.comet.com\/production\/site\/llm-course\/?utm_source=Heartbeat&amp;utm_medium=referral&amp;utm_content=Medium&amp;utm_campaign=Heartbeat_LangChain_Series_HS\" target=\"_blank\" rel=\"noopener ugc nofollow\" data-href=\"https:\/\/www.comet.com\/production\/site\/llm-course\/?utm_source=Heartbeat&amp;utm_medium=referral&amp;utm_content=Medium&amp;utm_campaign=Heartbeat_LangChain_Series_HS\">Check out this free LLMOps course<\/a> from industry expert Elvis Saravia of&nbsp;DAIR.AI!<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/section>\n\n\n\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<h3 class=\"graf graf--h3\">There are several reasons why you would use text embedding models for retrieval in LangChain:<\/h3>\n<p class=\"graf graf--p\">\u2022 <strong class=\"markup--strong markup--p-strong\">Improved search accuracy:<\/strong> Text embedding models can capture the semantic meaning of text, allowing for more accurate retrieval of relevant documents compared to traditional keyword-based approaches.<\/p>\n<p class=\"graf graf--p\">\u2022 <strong class=\"markup--strong markup--p-strong\">Flexibility in query formulation:<\/strong> With text embedding models, you can search for similar documents based on the semantic meaning of a query rather than relying solely on exact keyword matches. This provides more flexibility in query formulation and improves the user experience.<\/p>\n<p class=\"graf graf--p\">\u2022 <strong class=\"markup--strong markup--p-strong\">Handling of out-of-vocabulary words:<\/strong> Text embedding models can handle them by mapping them to similar words in the embedding space. This allows for better retrieval performance even when encountering unseen or rare words.<\/p>\n<p class=\"graf graf--p\">Text embedding models for retrieval in LangChain provide a powerful tool for capturing the semantic meaning of text and enabling efficient retrieval of similar documents.<\/p>\n<p class=\"graf graf--p\">They are handy when finding documents based on semantic similarity rather than exact keyword matches.<\/p>\n<pre class=\"graf graf--pre graf--preV2\" spellcheck=\"false\" data-code-block-mode=\"1\" data-code-block-lang=\"makefile\"><span class=\"pre--content\">from langchain.embeddings import OpenAIEmbeddings\n\nembeddings_model = OpenAIEmbeddings()\n\nembeddings = embeddings_model.embed_documents(\n    [\n        <span class=\"hljs-string\">\"Hi there!\"<\/span>,\n        <span class=\"hljs-string\">\"Oh, hello!\"<\/span>,\n        <span class=\"hljs-string\">\"What's your name?\"<\/span>,\n        <span class=\"hljs-string\">\"My friends call me World\"<\/span>,\n        <span class=\"hljs-string\">\"Hello World!\"<\/span>\n    ]\n)\nlen(embeddings), len(embeddings[0])\n\n<span class=\"hljs-comment\">#(5, 1536)<\/span>\n\nembedded_query = embeddings_model.embed_query(<span class=\"hljs-string\">\"What was the name mentioned in the conversation?\"<\/span>)\n<span class=\"hljs-section\">embedded_query[:5]<\/span>\n\n\n<span class=\"hljs-comment\">#[0.005387211957276042,<\/span>\n<span class=\"hljs-comment\">#-0.0005941777859814659,<\/span>\n<span class=\"hljs-comment\"># 0.03892524773846194,<\/span>\n<span class=\"hljs-comment\"># -0.00297914132073842,<\/span>\n<span class=\"hljs-comment\"># -0.008912666382268376]<\/span><\/span><\/pre>\n<h3 class=\"graf graf--h3\">Caching<\/h3>\n<p class=\"graf graf--p\">The great thing about embeddings is that you can store them, or temporarility cache, so you don\u2019t have to recompute them.<\/p>\n<p class=\"graf graf--p\"><code class=\"markup--code markup--p-code\">CacheBackedEmbeddings<\/code> in LangChain are a text embedding model that combines the benefits of precomputed embeddings with the flexibility of on-the-fly computation.<\/p>\n<p class=\"graf graf--p\">They are used to improve the efficiency and speed of text embedding retrieval by caching precomputed embeddings and retrieving them when needed.<\/p>\n<p class=\"graf graf--p\"><code class=\"markup--code markup--p-code\">CacheBackedEmbeddings<\/code> are particularly useful when you have a large corpus of text documents and want to efficiently retrieve embeddings for various tasks such as document similarity search, information retrieval, or recommendation systems.<\/p>\n<p class=\"graf graf--p\">They allow you to store precomputed embeddings in a cache, reducing the need for repeated computation and improving the overall retrieval performance.<\/p>\n<p class=\"graf graf--p\">You would use <code class=\"markup--code markup--p-code\">CacheBackedEmbeddings<\/code> when you need to speed up the retrieval of text embeddings and reduce the computational overhead.<\/p>\n<p class=\"graf graf--p\">By caching precomputed embeddings, you can avoid the time-consuming (and expensive) process of computing embeddings for each query or document, resulting in faster retrieval times.<\/p>\n<p class=\"graf graf--p\"><code class=\"markup--code markup--p-code\">CacheBackedEmbeddings<\/code> are especially beneficial in scenarios where the text corpus is static or changes infrequently.<\/p>\n<p class=\"graf graf--p\">The main supported way to initialize a <code class=\"markup--code markup--p-code\">CacheBackedEmbeddings<\/code> is <code class=\"markup--code markup--p-code\">from_bytes_store<\/code>.<\/p>\n<p class=\"graf graf--p\">This takes in the following parameters:<\/p>\n<p class=\"graf graf--p\">\u2022 <code class=\"markup--code markup--p-code\">underlying_embedder<\/code>: The embedder to use for embedding.<\/p>\n<p class=\"graf graf--p\">\u2022 <code class=\"markup--code markup--p-code\">document_embedding_cache<\/code>: The cache to use for storing document embeddings.<\/p>\n<p class=\"graf graf--p\">\u2022 <code class=\"markup--code markup--p-code\">namespace<\/code>: (optional, defaults to &#8220;&#8221;)Thw namespace for document cache. This namespace is used to avoid collisions with other caches. For example, set it to the name of the embedding model used.<\/p>\n<p class=\"graf graf--p\">There\u2019s a bunch of caches you can use in LangChain. Again, the basic pattern is the same. That\u2019s the beauty of LangChain, a unified interface for them all.<\/p>\n<p class=\"graf graf--p\">Go <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/colab.research.google.com\/corgiredirector?site=https%3A%2F%2Fpython.langchain.com%2Fdocs%2Fmodules%2Fdata_connection%2Ftext_embedding%2Fcaching_embeddings\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/colab.research.google.com\/corgiredirector?site=https%3A%2F%2Fpython.langchain.com%2Fdocs%2Fmodules%2Fdata_connection%2Ftext_embedding%2Fcaching_embeddings\">here<\/a> to learn more about them.<\/p>\n<pre class=\"graf graf--pre graf--preV2\" spellcheck=\"false\" data-code-block-mode=\"1\" data-code-block-lang=\"python\"><span class=\"pre--content\">!pip install faiss-cpu\n\n<span class=\"hljs-keyword\">from<\/span> langchain.storage <span class=\"hljs-keyword\">import<\/span> InMemoryStore, LocalFileStore, RedisStore\n\n<span class=\"hljs-keyword\">from<\/span> langchain.embeddings <span class=\"hljs-keyword\">import<\/span> OpenAIEmbeddings, CacheBackedEmbeddings\n\n<span class=\"hljs-keyword\">from<\/span> langchain.document_loaders <span class=\"hljs-keyword\">import<\/span> TextLoader\n<span class=\"hljs-keyword\">from<\/span> langchain.embeddings.openai <span class=\"hljs-keyword\">import<\/span> OpenAIEmbeddings\n<span class=\"hljs-keyword\">from<\/span> langchain.text_splitter <span class=\"hljs-keyword\">import<\/span> CharacterTextSplitter\n<span class=\"hljs-keyword\">from<\/span> langchain.vectorstores <span class=\"hljs-keyword\">import<\/span> FAISS\n\nunderlying_embeddings = OpenAIEmbeddings()\n\nfs = LocalFileStore(<span class=\"hljs-string\">\".\/cache\/\"<\/span>)\n\ncached_embedder = CacheBackedEmbeddings.from_bytes_store(\n    underlying_embeddings, fs, namespace=underlying_embeddings.model\n)\n\n<span class=\"hljs-built_in\">list<\/span>(fs.yield_keys())\n\n<span class=\"hljs-comment\"># []<\/span><\/span><\/pre>\n<pre class=\"graf graf--pre graf--preV2\" spellcheck=\"false\" data-code-block-mode=\"2\" data-code-block-lang=\"python\"><span class=\"pre--content\">raw_documents = TextLoader(<span class=\"hljs-string\">\"\/content\/golden-sayings-of-epictetus.txt\"<\/span>).load()\ntext_splitter = CharacterTextSplitter(chunk_size=<span class=\"hljs-number\">1000<\/span>, chunk_overlap=<span class=\"hljs-number\">0<\/span>)\ndocuments = text_splitter.split_documents(raw_documents)<\/span><\/pre>\n<h3 class=\"graf graf--h3\">Create a Vectorstore<\/h3>\n<p class=\"graf graf--p\">When using LangChain, you have two options for caching embeddings: vector stores and CacheBackedEmbeddings.<\/p>\n<p class=\"graf graf--p\">Vector stores, such as FAISS, are useful when you want to store and retrieve embeddings efficiently.<\/p>\n<p class=\"graf graf--p\">They are typically used when you have many embeddings and need fast retrieval.<\/p>\n<p class=\"graf graf--p\">You can create a vector store by using the <code class=\"markup--code markup--p-code\">FAISS.from_documents<\/code> (it could be <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/colab.research.google.com\/corgiredirector?site=https%3A%2F%2Fpython.langchain.com%2Fdocs%2Fintegrations%2Fvectorstores%2F\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/colab.research.google.com\/corgiredirector?site=https%3A%2F%2Fpython.langchain.com%2Fdocs%2Fintegrations%2Fvectorstores%2F\">any one of the supported vector stores<\/a>) method and passing in the documents and the embedder.<\/p>\n<p class=\"graf graf--p\">You should use vector stores when you need fast retrieval of embeddings and have many embeddings.<\/p>\n<p class=\"graf graf--p\">On the other hand, you should use <code class=\"markup--code markup--p-code\">CacheBackedEmbeddings<\/code> when you want to temporarily cache embeddings to avoid recomputing them, such as in unit tests or prototyping.<\/p>\n<pre class=\"graf graf--pre graf--preV2\" spellcheck=\"false\" data-code-block-mode=\"2\" data-code-block-lang=\"python\"><span class=\"pre--content\">db = FAISS.from_documents(documents, cached_embedder)<\/span><\/pre>\n<p class=\"graf graf--p\">You can time the creation of the first and second databases to see how much faster it was to create it. That\u2019s the power of cached embeddings!<\/p>\n<pre class=\"graf graf--pre graf--preV2\" spellcheck=\"false\" data-code-block-mode=\"2\" data-code-block-lang=\"python\"><span class=\"pre--content\">db2 = FAISS.from_documents(documents, cached_embedder)\n\n<span class=\"hljs-built_in\">list<\/span>(fs.yield_keys())[:<span class=\"hljs-number\">5<\/span>]<\/span><\/pre>\n<pre class=\"graf graf--pre graf--preV2\" spellcheck=\"false\" data-code-block-mode=\"2\" data-code-block-lang=\"python\"><span class=\"pre--content\">[<span class=\"hljs-string\">'text-embedding-ada-00258ed3f2f-e965-57c4-9f1d-d737f70d99d4'<\/span>,\n <span class=\"hljs-string\">'text-embedding-ada-002d85fa430-6eee-546c-bd31-fe8c2b7f5d28'<\/span>,\n <span class=\"hljs-string\">'text-embedding-ada-00275fa6ffa-5e70-52ab-a788-811652175577'<\/span>,\n <span class=\"hljs-string\">'text-embedding-ada-00208b3877e-ac85-56a0-9156-048fba0dda88'<\/span>,\n <span class=\"hljs-string\">'text-embedding-ada-0029b5b82a9-2927-51c7-981a-855474cd6be1'<\/span>]<\/span><\/pre>\n<h3 class=\"graf graf--h3\">Conclusion<\/h3>\n<p class=\"graf graf--p\">In conclusion, text embedding models, particularly those implemented in LangChain, represent a quantum leap in how we handle and retrieve textual information.<\/p>\n<p class=\"graf graf--p\">They empower us to understand and process language on a semantic level, transcending the limitations of traditional keyword searches. Through practical examples and powerful tools like CacheBackedEmbeddings and vector stores like FAISS, we\u2019ve seen how LangChain simplifies and speeds up the retrieval process, ensuring efficiency and accuracy. Whether you\u2019re building a search engine, a recommendation system, or any application that relies on deep text understanding, using text embeddings is not just an option; it\u2019s an imperative.<\/p>\n<p class=\"graf graf--p\">With LangChain plus embeddings, you\u2019re not just searching; you\u2019re discovering, ensuring every query returns the most semantically relevant information possible.<\/p>\n<\/div>\n<\/div>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>Explore How LangChain\u2019s Semantic Search Allows You To Transform Data Retrieval and Information Discovery In this blog post, I\u2019ll show you how to work with text embedding models using LangChain. Text embedding models represent documents as high-dimensional vectors. They\u2019re the key to unlocking semantic search capabilities that go beyond simple keyword matching. Imagine being able [&hellip;]<\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[65,7],"tags":[70,71,52,31,34],"coauthors":[166],"class_list":["post-8208","post","type-post","status-publish","format-standard","hentry","category-llmops","category-tutorials","tag-langchain","tag-language-models","tag-llm","tag-llmops","tag-prompt-engineering"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Retrieval Part 2: Text Embeddings - Comet<\/title>\n<meta name=\"description\" content=\"Text embedding models capture the semantic meaning of text and allow for efficient retrieval of similar documents based on their embeddings.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Retrieval Part 2: Text Embeddings\" \/>\n<meta property=\"og:description\" content=\"Text embedding models capture the semantic meaning of text and allow for efficient retrieval of similar documents based on their embeddings.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-30T14:17:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:04:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*phgSIxMelU53JkCB\" \/>\n<meta name=\"author\" content=\"Harpreet Sahota\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Harpreet Sahota\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Retrieval Part 2: Text Embeddings - Comet","description":"Text embedding models capture the semantic meaning of text and allow for efficient retrieval of similar documents based on their embeddings.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/","og_locale":"en_US","og_type":"article","og_title":"Retrieval Part 2: Text Embeddings","og_description":"Text embedding models capture the semantic meaning of text and allow for efficient retrieval of similar documents based on their embeddings.","og_url":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-11-30T14:17:22+00:00","article_modified_time":"2025-04-24T17:04:10+00:00","og_image":[{"url":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*phgSIxMelU53JkCB","type":"","width":"","height":""}],"author":"Harpreet Sahota","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Harpreet Sahota","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/"},"author":{"name":"Harpreet Sahota","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/46036ab474aa916e2873daece26a28d6"},"headline":"Retrieval Part 2: Text Embeddings","datePublished":"2023-11-30T14:17:22+00:00","dateModified":"2025-04-24T17:04:10+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/"},"wordCount":1040,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*phgSIxMelU53JkCB","keywords":["LangChain","Language Models","LLM","LLMOps","Prompt Engineering"],"articleSection":["LLMOps","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/","url":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/","name":"Retrieval Part 2: Text Embeddings - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/#primaryimage"},"thumbnailUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*phgSIxMelU53JkCB","datePublished":"2023-11-30T14:17:22+00:00","dateModified":"2025-04-24T17:04:10+00:00","description":"Text embedding models capture the semantic meaning of text and allow for efficient retrieval of similar documents based on their embeddings.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/#primaryimage","url":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*phgSIxMelU53JkCB","contentUrl":"https:\/\/cdn-images-1.medium.com\/max\/1600\/0*phgSIxMelU53JkCB"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-part-2-text-embeddings\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Retrieval Part 2: Text Embeddings"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/46036ab474aa916e2873daece26a28d6","name":"Harpreet Sahota","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/2d21512be19ba7e19a71a803309e2a88","url":"https:\/\/secure.gravatar.com\/avatar\/a6ca5a533fc9f143a0a7428037ff652aa0633d66bf27e76ae89b955ae72a0f2d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a6ca5a533fc9f143a0a7428037ff652aa0633d66bf27e76ae89b955ae72a0f2d?s=96&d=mm&r=g","caption":"Harpreet Sahota"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/theartistsofdatasciencegmail-com\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8208","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8208"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8208\/revisions"}],"predecessor-version":[{"id":15436,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8208\/revisions\/15436"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8208"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}