{"id":8202,"date":"2023-11-24T07:02:44","date_gmt":"2023-11-24T15:02:44","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8202"},"modified":"2025-04-24T17:04:20","modified_gmt":"2025-04-24T17:04:20","slug":"retrieval-document-loaders-document-transformers","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/","title":{"rendered":"Retrieval in LangChain: Part 1"},"content":{"rendered":"\n<div class=\"ew tb tc td te\">\n<div class=\"ab cm\">\n<div class=\"hy bg hz ia ib ic\">\n<figure class=\"xn xo xp xq xr xs lp lq paragraph-image\">\n<div class=\"xt xu dl xv bg xw\" tabindex=\"0\" role=\"button\">\n<div class=\"lp lq xm\">\n<h2>Document Loaders, Document Transformers<\/h2>\n<\/div><\/div><\/figure><\/div><\/div><\/div>\n\n\n\n<figure class=\"wp-block-image aligncenter bg wx xx c\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*rtwprRszj_gzDhW_\" alt=\"Retrieval part 1: Document Loaders, Document Transformers, Comet ML\"\/><figcaption class=\"wp-element-caption\">Photo by <a href=\"https:\/\/unsplash.com\/@beadisruptur?utm_source=medium&amp;utm_medium=referral\">Derek Laliberte<\/a>\u00a0on\u00a0<a href=\"http:\/\/Unsplash.com\">Unsplash<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj wp-block-paragraph\" id=\"8ea1\">Retrieval in LangChain refers to fetching and retrieving relevant data or documents from external sources.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj wp-block-paragraph\" id=\"cfdd\">It is a crucial step in many language model applications, especially in Retrieval Augmented Generation (RAG) tasks.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj wp-block-paragraph\" id=\"c7cd\">Retrieval is useful because it allows you to incorporate external data into your language model, providing additional context and information that may not be present in the model\u2019s training data.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj wp-block-paragraph\" id=\"590a\">By retrieving relevant documents, you can enhance the generation process and improve the quality and relevance of the generated responses.<\/p>\n\n\n\n<h2 class=\"wp-block-heading yw yx tg be yy yz za zb mk zc zd ze mp zf zg zh zi zj zk zl zm zn zo zp zq zr bj\" id=\"2ae0\">You may need retrieval in LangChain when you want to:<\/h2>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye zs yg yh yi zt yk yl mq zu yn yo mv zv yq yr na zw yt yu yv ew bj wp-block-paragraph\" id=\"6d81\"><strong class=\"be fx\">Incorporate user-specific data:<\/strong>&nbsp;Retrieval allows you to fetch data that is specific to individual users or applications, enabling personalized and context-aware responses.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj wp-block-paragraph\" id=\"33bd\"><strong class=\"be fx\">Provide additional information:<\/strong>&nbsp;By retrieving relevant documents, you can supplement the model\u2019s knowledge with up-to-date information, facts, or explanations.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj wp-block-paragraph\" id=\"3ee1\"><strong class=\"be fx\">Answer questions over documents:<\/strong>&nbsp;Retrieval is particularly useful for tasks like question answering, where you need to find relevant information from a large corpus of documents.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj wp-block-paragraph\" id=\"a6e5\">You can determine if you need retrieval by considering if your application requires accessing external data or retrieving relevant documents based on user queries.<\/p>\n\n\n\n<p class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj wp-block-paragraph\" id=\"df68\">If you need to enhance your language model\u2019s responses with additional information or provide accurate answers to user queries, retrieval can be beneficial.<\/p>\n\n\n\n<div class=\"ab cm zx zy pk hb\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ew tb tc td te\">\n<div class=\"ab cm\">\n<div class=\"hy bg hz ia ib ic\">\n<blockquote class=\"abc\"><p id=\"9b27\" class=\"abd abe tg be abf abg abh abi abj abk abl yv dq\" data-selectable-paragraph=\"\">Want to learn how to build modern software with LLMs using the newest tools and techniques in the field?&nbsp;<a class=\"af hd\" href=\"https:\/\/www.comet.com\/production\/site\/llm-course\/?utm_source=Heartbeat&amp;utm_medium=referral&amp;utm_content=Medium&amp;utm_campaign=Heartbeat_LangChain_Series_HS\" target=\"_blank\" rel=\"noopener ugc nofollow\">Check out this free LLMOps course<\/a>&nbsp;from industry expert Elvis Saravia of DAIR.AI.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"ab cm zx zy pk hb\" role=\"separator\"><\/div>\n\n\n\n<div class=\"ew tb tc td te\">\n<div class=\"ab cm\">\n<div class=\"hy bg hz ia ib ic\">\n<h2 id=\"d741\" class=\"yw yx tg be yy yz abm zb mk zc abn ze mp zf abo zh zi zj abp zl zm zn abq zp zq zr bj\">To use retrieval in LangChain, you can follow these steps:<\/h2>\n<p id=\"a515\" class=\"pw-post-body-paragraph yc yd tg be b ye zs yg yh yi zt yk yl mq zu yn yo mv zv yq yr na zw yt yu yv ew bj\" data-selectable-paragraph=\"\"><strong class=\"be fx\">Load documents:<\/strong>&nbsp;Use document loaders to load documents from various sources, such as files, websites, or databases.<\/p>\n<p id=\"cc75\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\"><strong class=\"be fx\">Transform documents:<\/strong>&nbsp;Apply document transformers to preprocess and transform the loaded documents, such as splitting large documents into smaller chunks or applying specific logic optimized for different document types.<\/p>\n<p id=\"7386\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\"><strong class=\"be fx\">Create embeddings:<\/strong>&nbsp;Generate embeddings for the documents using text embedding models. Embeddings capture the semantic meaning of text and enable efficient searching and similarity calculations.<\/p>\n<p id=\"f3b1\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\"><strong class=\"be fx\">Store documents and embeddings:<\/strong>&nbsp;Use vector stores to store the documents and their corresponding embeddings. Vector stores provide efficient storage and retrieval capabilities for large collections of embeddings.<\/p>\n<p id=\"b7db\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\"><strong class=\"be fx\">Retrieve relevant documents:<\/strong>&nbsp;Use retrievers to query the vector store and retrieve relevant documents based on user queries or search criteria. Retriever algorithms, such as similarity search or Maximum Marginal Relevance (MMR) search, can be used to find the most relevant documents.<\/p>\n<p id=\"efd1\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\">Following these steps, you can effectively incorporate retrieval capabilities into your LangChain application and enhance the language model\u2019s performance and contextual understanding.<\/p>\n<h2 id=\"c9fc\" class=\"yw yx tg be yy yz za zb mk zc zd ze mp zf zg zh zi zj zk zl zm zn zo zp zq zr bj\">Document Loaders<\/h2>\n<p id=\"063e\" class=\"pw-post-body-paragraph yc yd tg be b ye zs yg yh yi zt yk yl mq zu yn yo mv zv yq yr na zw yt yu yv ew bj\" data-selectable-paragraph=\"\">Document loaders in LangChain are used to load data from various sources as Document objects.<\/p>\n<p id=\"25cf\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\">A Document is a piece of text with associated metadata. Document loaders provide a convenient way to fetch data from different sources, such as text files, web pages, or even transcripts of videos. The main purpose of document loaders is to retrieve data and prepare it for further processing in LangChain.<\/p>\n<p id=\"922b\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\">They expose a&nbsp;<code class=\"eg abr abs abt abu b\">load<\/code>&nbsp;method that fetches data from the configured source and returns it as a Document object. Some document loaders also support lazy loading, which allows data to be loaded into memory only when needed.<\/p>\n<h2 id=\"8b5f\" class=\"yw yx tg be yy yz za zb mk zc zd ze mp zf zg zh zi zj zk zl zm zn zo zp zq zr bj\">Text loader<\/h2>\n<p id=\"2efc\" class=\"pw-post-body-paragraph yc yd tg be b ye zs yg yh yi zt yk yl mq zu yn yo mv zv yq yr na zw yt yu yv ew bj\" data-selectable-paragraph=\"\">This is the simplest loader. It reads in a file as text and places it all into one Document.<\/p>\n<pre class=\"abv abw abx aby abz aca abu acb bo acc ba bj\"><span id=\"a965\" class=\"acd yx tg abu b bf ace acf l acg ach\" data-selectable-paragraph=\"\">%%capture\n!pip install langchain openai tiktoken\n!wget -O <span class=\"hljs-string\">\"golden-sayings-of-epictetus.txt\"<\/span> https:\/\/www.gutenberg.org\/cache\/epub\/<span class=\"hljs-number\">871<\/span>\/pg871.txt<\/span><\/pre>\n<pre class=\"aci aca abu acb bo acc ba bj\"><span id=\"dbe4\" class=\"acd yx tg abu b bf ace acf l acg ach\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">import<\/span> os\n<span class=\"hljs-keyword\">import<\/span> getpass\nos.environ[<span class=\"hljs-string\">\"OPENAI_API_KEY\"<\/span>] = getpass.getpass(<span class=\"hljs-string\">\"Enter Your OpenAI API Key:\"<\/span>)<\/span><\/pre>\n<pre class=\"aci aca abu acb bo acc ba bj\"><span id=\"ac7e\" class=\"acd yx tg abu b bf ace acf l acg ach\" data-selectable-paragraph=\"\">from langchain.document_loaders <span class=\"hljs-keyword\">import<\/span> <span class=\"hljs-type\">TextLoader<\/span>\n<span class=\"hljs-variable\">loader<\/span> <span class=\"hljs-operator\">=<\/span> TextLoader(<span class=\"hljs-string\">\"golden-sayings-of-epictetus.txt\"<\/span>)\ngolden_sayings = loader.load()<\/span><\/pre>\n<pre class=\"aci aca abu acb bo acc ba bj\"><span id=\"9ef9\" class=\"acd yx tg abu b bf ace acf l acg ach\" data-selectable-paragraph=\"\"><span class=\"hljs-built_in\">type<\/span>(golden_sayings)\n<span class=\"hljs-comment\"># list<\/span>\n\n<span class=\"hljs-built_in\">type<\/span>(golden_sayings[0])\n<span class=\"hljs-comment\"># langchain.schema.document.Document<\/span><\/span><\/pre>\n<h2 id=\"1215\" class=\"yw yx tg be yy yz za zb mk zc zd ze mp zf zg zh zi zj zk zl zm zn zo zp zq zr bj\">CSV Loaders<\/h2>\n<p id=\"3064\" class=\"pw-post-body-paragraph yc yd tg be b ye zs yg yh yi zt yk yl mq zu yn yo mv zv yq yr na zw yt yu yv ew bj\" data-selectable-paragraph=\"\">CSV loaders in LangChain are used to load CSV files into the system for further processing and analysis.<\/p>\n<p id=\"12ed\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\">They allow you to easily import data from CSV files and convert them into LangChain\u2019s Document format. CSV loaders are useful when you have structured data in CSV format that you want to work with in LangChain.<\/p>\n<h2 id=\"4f64\" class=\"yw yx tg be yy yz za zb mk zc zd ze mp zf zg zh zi zj zk zl zm zn zo zp zq zr bj\">To use a CSV loader in LangChain, you can follow these steps:<\/h2>\n<p id=\"1d3f\" class=\"pw-post-body-paragraph yc yd tg be b ye zs yg yh yi zt yk yl mq zu yn yo mv zv yq yr na zw yt yu yv ew bj\" data-selectable-paragraph=\"\">1) Import the CSVLoader class from the&nbsp;<code class=\"eg abr abs abt abu b\">langchain.document_loaders module<\/code>.<\/p>\n<p id=\"4463\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\">2) Create an instance of the CSVLoader class, providing the path to the CSV file as the argument.<\/p>\n<p id=\"5c9d\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\">3) Use the&nbsp;<code class=\"eg abr abs abt abu b\">load()<\/code>&nbsp;method of the CSVLoader instance to load the CSV file and convert it into LangChain&#8217;s Document format.<\/p>\n<p id=\"e9ec\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\">CSV loaders are particularly useful when you have tabular data in CSV format that you want to analyze or process using LangChain\u2019s text analysis capabilities.<\/p>\n<p id=\"8a6c\" class=\"pw-post-body-paragraph yc yd tg be b ye yf yg yh yi yj yk yl mq ym yn yo mv yp yq yr na ys yt yu yv ew bj\" data-selectable-paragraph=\"\">They allow you to easily import and work with structured data from CSV files within the LangChain ecosystem.<\/p>\n<h3 id=\"6521\" class=\"acj yx tg be yy mg ack mh mk ml acl mm mp mq acm mr mu mv acn mw mz na aco nb ne acp bj\">Here\u2019s an example code snippet that demonstrates how to use a CSV loader in LangChain:<\/h3>\n<pre class=\"abv abw abx aby abz aca abu acb bo acc ba bj\"><span id=\"5201\" class=\"acd yx tg abu b bf ace acf l acg ach\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> langchain.document_loaders.csv_loader <span class=\"hljs-keyword\">import<\/span> CSVLoader\n\n\nloader = CSVLoader(file_path=<span class=\"hljs-string\">'\/content\/sample_data\/california_housing_test.csv'<\/span>)\ndata = loader.load()\n\n<span class=\"hljs-built_in\">print<\/span>(data)<\/span><\/pre>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Document Loaders, Document Transformers Retrieval in LangChain refers to fetching and retrieving relevant data or documents from external sources. It is a crucial step in many language model applications, especially in Retrieval Augmented Generation (RAG) tasks. Retrieval is useful because it allows you to incorporate external data into your language model, providing additional context and [&hellip;]<\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[65,7],"tags":[70,71,52,31,34],"coauthors":[166],"class_list":["post-8202","post","type-post","status-publish","format-standard","hentry","category-llmops","category-tutorials","tag-langchain","tag-language-models","tag-llm","tag-llmops","tag-prompt-engineering"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Retrieval in LangChain: Part 1 - Comet<\/title>\n<meta name=\"description\" content=\"LangChain Retrieval refers to fetching + retrieving relevant data from external sources + is a crucial step in many LLM applications like RAG\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Retrieval in LangChain: Part 1\" \/>\n<meta property=\"og:description\" content=\"LangChain Retrieval refers to fetching + retrieving relevant data from external sources + is a crucial step in many LLM applications like RAG\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-24T15:02:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:04:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*rtwprRszj_gzDhW_\" \/>\n<meta name=\"author\" content=\"Harpreet Sahota\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Harpreet Sahota\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Retrieval in LangChain: Part 1 - Comet","description":"LangChain Retrieval refers to fetching + retrieving relevant data from external sources + is a crucial step in many LLM applications like RAG","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/","og_locale":"en_US","og_type":"article","og_title":"Retrieval in LangChain: Part 1","og_description":"LangChain Retrieval refers to fetching + retrieving relevant data from external sources + is a crucial step in many LLM applications like RAG","og_url":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-11-24T15:02:44+00:00","article_modified_time":"2025-04-24T17:04:20+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*rtwprRszj_gzDhW_","type":"","width":"","height":""}],"author":"Harpreet Sahota","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Harpreet Sahota","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/"},"author":{"name":"Harpreet Sahota","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/46036ab474aa916e2873daece26a28d6"},"headline":"Retrieval in LangChain: Part 1","datePublished":"2023-11-24T15:02:44+00:00","dateModified":"2025-04-24T17:04:20+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/"},"wordCount":743,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*rtwprRszj_gzDhW_","keywords":["LangChain","Language Models","LLM","LLMOps","Prompt Engineering"],"articleSection":["LLMOps","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/","url":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/","name":"Retrieval in LangChain: Part 1 - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*rtwprRszj_gzDhW_","datePublished":"2023-11-24T15:02:44+00:00","dateModified":"2025-04-24T17:04:20+00:00","description":"LangChain Retrieval refers to fetching + retrieving relevant data from external sources + is a crucial step in many LLM applications like RAG","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*rtwprRszj_gzDhW_","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*rtwprRszj_gzDhW_"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/retrieval-document-loaders-document-transformers\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Retrieval in LangChain: Part 1"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/46036ab474aa916e2873daece26a28d6","name":"Harpreet Sahota","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/2d21512be19ba7e19a71a803309e2a88","url":"https:\/\/secure.gravatar.com\/avatar\/a6ca5a533fc9f143a0a7428037ff652aa0633d66bf27e76ae89b955ae72a0f2d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a6ca5a533fc9f143a0a7428037ff652aa0633d66bf27e76ae89b955ae72a0f2d?s=96&d=mm&r=g","caption":"Harpreet Sahota"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/theartistsofdatasciencegmail-com\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8202","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8202"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8202\/revisions"}],"predecessor-version":[{"id":15444,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8202\/revisions\/15444"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8202"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8202"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8202"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8202"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}