{"id":8094,"date":"2023-11-02T10:24:13","date_gmt":"2023-11-02T18:24:13","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/?p=8094"},"modified":"2025-04-24T17:04:42","modified_gmt":"2025-04-24T17:04:42","slug":"natural-language-processing-with-spacy-a-python-library","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library\/","title":{"rendered":"Natural Language Processing With SpaCy (A Python Library)"},"content":{"rendered":"\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library\">\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<figure class=\"ly lz ma mb mc md lv lw paragraph-image\">\n<div class=\"me mf ee mg bg mh\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mi mj c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Dom9_CkFV57KMZ-k_lasRQ.jpeg\" alt=\"\" width=\"700\" height=\"394\"><\/figure><div class=\"lv lw lx\"><picture><\/picture><\/div>\n<\/div><figcaption class=\"mk ml mm lv lw mn mo be b bf z dw\" data-selectable-paragraph=\"\">Photo by <a class=\"af mp\" href=\"https:\/\/unsplash.com\/@baleibee?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Brooks Leibee<\/a> on <a class=\"af mp\" href=\"https:\/\/unsplash.com\/s\/photos\/text-sentance?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\" target=\"_blank\" rel=\"noopener ugc nofollow\">Unsplash<\/a><\/figcaption><\/figure>\n<h1 id=\"00c2\" class=\"mq mr fr be ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn bj\" data-selectable-paragraph=\"\">Introduction<\/h1>\n<p id=\"2a42\" class=\"pw-post-body-paragraph no np fr be b nq nr ns nt nu nv nw nx ny nz oa ob oc od oe of og oh oi oj ok fk bj\" data-selectable-paragraph=\"\">Natural language processing (NLP) is the field that gives computers the ability to recognize human languages, and it connects humans with computers. One can build NLP projects in different ways, and one of those is by using the Python library S<a class=\"af mp\" href=\"https:\/\/spacy.io\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">paCy<\/a>.<\/p>\n<p id=\"bec6\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">This post will go over how the most cutting-edge NLP software, <a class=\"af mp\" href=\"https:\/\/spacy.io\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">SpaCy<\/a>, operates. You will also discover SpaCy\u2019s outstanding attributes and how they differ from NLTK, which offers an intriguing look at NLP.<\/p>\n<h1 id=\"54cf\" class=\"mq mr fr be ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn bj\" data-selectable-paragraph=\"\">What is spaCy?<\/h1>\n<p id=\"0e7e\" class=\"pw-post-body-paragraph no np fr be b nq nr ns nt nu nv nw nx ny nz oa ob oc od oe of og oh oi oj ok fk bj\" data-selectable-paragraph=\"\">SpaCy is a free, open-source library written in Python for advanced Natural Language Processing. If you are working as a developer and your work involves a lot of text, then you definitely need to know more about it.<\/p>\n<p id=\"5a57\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">SpaCy is designed specifically to build applications that process and understand large volumes of text data. One of SpaCy\u2019s important strengths is its adaptability to construct and use specific models for NLP tasks, such as <a class=\"af mp\" href=\"https:\/\/heartbeat.comet.ml\/named-entity-recognition-with-python-5a116490915\" target=\"_blank\" rel=\"noopener ugc nofollow\">named entity identification<\/a> or part-of-speech tagging. Developers can fine-tune their apps using relevant data to meet the needs of their particular use cases.<\/p>\n<p id=\"dfdc\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">SpaCy has sophisticated entity recognition, <a class=\"af mp\" href=\"https:\/\/medium.com\/cometheartbeat\/tokenization-techniques-in-nlp-561e277b6090\" rel=\"noopener\">tokenization<\/a>, and parsing functions. It also supports a wide variety of widely used languages. SpaCy is a suitable option for developing production-level NLP applications because it is quick and effective at runtime.<\/p>\n<figure class=\"or os ot ou ov md lv lw paragraph-image\">\n<div class=\"me mf ee mg bg mh\" tabindex=\"0\" role=\"button\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mi mj c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/0*x3RrC5u7wxz50CiA.png\" alt=\"\" width=\"700\" height=\"121\"><\/figure><div class=\"lv lw oq\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/0*x3RrC5u7wxz50CiA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/0*x3RrC5u7wxz50CiA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/0*x3RrC5u7wxz50CiA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/0*x3RrC5u7wxz50CiA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/0*x3RrC5u7wxz50CiA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/0*x3RrC5u7wxz50CiA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/format:webp\/0*x3RrC5u7wxz50CiA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/0*x3RrC5u7wxz50CiA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/0*x3RrC5u7wxz50CiA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/0*x3RrC5u7wxz50CiA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/0*x3RrC5u7wxz50CiA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/0*x3RrC5u7wxz50CiA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/0*x3RrC5u7wxz50CiA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1400\/0*x3RrC5u7wxz50CiA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/div>\n<figcaption class=\"mk ml mm lv lw mn mo be b bf z dw\" data-selectable-paragraph=\"\">Image from: <a class=\"af mp\" href=\"https:\/\/blog.neurotech.africa\/content\/images\/2022\/03\/spaCy-nlp.png\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/blog.neurotech.africa\/<\/a><\/figcaption>\n<\/figure>\n<p id=\"d49a\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">With SpaCy, you may use a pipeline to transform the raw text into the final Doc, and enabling you to add additional pipeline components to your NLP library and respond to user input.<\/p>\n<p id=\"688d\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">It offers a ton of fantastic pre-trained models in a variety of languages, but it also enables you to train your own models using your own data to optimize for a particular use case.<\/p>\n<h1 id=\"9ecd\" class=\"mq mr fr be ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn bj\" data-selectable-paragraph=\"\">Installation of SpaCy<\/h1>\n<p id=\"371c\" class=\"pw-post-body-paragraph no np fr be b nq nr ns nt nu nv nw nx ny nz oa ob oc od oe of og oh oi oj ok fk bj\" data-selectable-paragraph=\"\">The installation of Python on the system is a prerequisite for configuring SpaCy. They are available for download and installation on the <a class=\"af mp\" href=\"https:\/\/www.python.org\/downloads\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Python website<\/a>.<\/p>\n<p id=\"3d3d\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">Installing SpaCy and its model is the next step after installing <em class=\"ow\">Python<\/em> and <em class=\"ow\">pip<\/em>. The following command can be used to accomplish this.<\/p>\n<pre class=\"or os ot ou ov ox oy oz bo pa ba bj\"><span id=\"b620\" class=\"pb mr fr oy b bf pc pd l pe pf\" data-selectable-paragraph=\"\">pip install spacy\n\npython -m spacy download en_core_web_sm<\/span><\/pre>\n<p id=\"73f9\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">By executing this, the small English model\u2014which covers fundamental NLP features like tokenization, POS tagging, and dependency parsing \u2014 will be downloaded. Developers can also download larger models with extra capabilities like named entity recognition and word vectors if they require more sophisticated capabilities.<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<blockquote class=\"po\"><p id=\"ec5d\" class=\"pp pq fr be pr ps pt pu pv pw px ok dw\" data-selectable-paragraph=\"\">Comet is now integrated with SpaCy! <a class=\"af mp\" href=\"https:\/\/www.comet.com\/docs\/v2\/integrations\/third-party-tools\/spaCy\/\" target=\"_blank\" rel=\"noopener ugc nofollow\">Learn more<\/a> and get started for free today.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"fk fl fm fn fo\">\n<div class=\"ab ca\">\n<div class=\"ch bg ew ex ey ez\">\n<h1 id=\"bfa1\" class=\"mq mr fr be ms mt py mv mw mx pz mz na nb qa nd ne nf qb nh ni nj qc nl nm nn bj\" data-selectable-paragraph=\"\">Objects and Features of SpaCy<\/h1>\n<p id=\"84aa\" class=\"pw-post-body-paragraph no np fr be b nq nr ns nt nu nv nw nx ny nz oa ob oc od oe of og oh oi oj ok fk bj\" data-selectable-paragraph=\"\">Using container objects, you may get at the linguistic characteristics that SpaCy offers the text. A container object can logically represent text units such as a document, a token (a text is made up of tokens), or a section of a document.<\/p>\n<ol class=\"\">\n<li id=\"6312\" class=\"no np fr be b nq ol ns nt nu om nw nx ny qd oa ob oc qe oe of og qf oi oj ok qg qh qi bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">Doc:<\/strong> It is the most frequently used container object. To create our first Doc container, It is best practice to always refer to this object as \u201cdoc\u201d (in lowercase). Next, call \u201cnlp\u201d object and pass the text as a single argument to construct a doc container.<\/li>\n<\/ol>\n<pre class=\"or os ot ou ov ox oy oz bo pa ba bj\"><span id=\"50f6\" class=\"pb mr fr oy b bf pc pd l pe pf\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">with<\/span> <span class=\"hljs-built_in\">open<\/span> (<span class=\"hljs-string\">\"dataFolder\/wiki_data.txt\"<\/span>, <span class=\"hljs-string\">\"r\"<\/span>) <span class=\"hljs-keyword\">as<\/span> f:\n    text = f.read()\n\n<span class=\"hljs-comment\">#Create a Doc container with the text file wiki_data.txt<\/span>\ndoc = nlp(text)\n\n<span class=\"hljs-comment\">#To print all the text held in The doc container<\/span>\n<span class=\"hljs-built_in\">print<\/span> (doc)<\/span><\/pre>\n<p id=\"6328\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">2. Tokenization: <\/strong>The fundamental text units used in every NLP activity are word tokens. It is the process of segmenting text into \u201ctokens\u201d made up of words, commas, spaces, symbols, punctuation, and other elements. Splitting text into tokens is the initial stage of text processing.<\/p>\n<pre class=\"or os ot ou ov ox oy oz bo pa ba bj\"><span id=\"581b\" class=\"pb mr fr oy b bf pc pd l pe pf\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">from<\/span> spacy.lang.en <span class=\"hljs-keyword\">import<\/span> English\n \u200b\n nlp = English()\n \u200b\n <span class=\"hljs-comment\"># Process the given text line<\/span>\n doc = nlp(<span class=\"hljs-string\">\"Singapore is nice place to visit\"<\/span>)\n \u200b\n <span class=\"hljs-comment\"># Select the first token from the line<\/span>\n token1 = doc[<span class=\"hljs-number\">0<\/span>]\n \u200b\n <span class=\"hljs-comment\"># Print the first token from the text<\/span>\n <span class=\"hljs-built_in\">print<\/span>(token1.text)\n\n Output: Singapore\n<\/span><\/pre>\n<p id=\"4703\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">The below image shows the word and their token position in the input text<\/p>\n<figure class=\"or os ot ou ov md lv lw paragraph-image\">\n<figure><img decoding=\"async\" class=\"mi bg mj c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/1*64AFAXwaTSHTlbZjmbUuFA.png\" alt=\"\" width=\"700\"><\/figure><div class=\"ab cm ca qk\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/format:webp\/1*64AFAXwaTSHTlbZjmbUuFA.png 640w, https:\/\/miro.medium.com\/v2\/format:webp\/1*64AFAXwaTSHTlbZjmbUuFA.png 720w, https:\/\/miro.medium.com\/v2\/format:webp\/1*64AFAXwaTSHTlbZjmbUuFA.png 750w, https:\/\/miro.medium.com\/v2\/format:webp\/1*64AFAXwaTSHTlbZjmbUuFA.png 786w, https:\/\/miro.medium.com\/v2\/format:webp\/1*64AFAXwaTSHTlbZjmbUuFA.png 828w, https:\/\/miro.medium.com\/v2\/format:webp\/1*64AFAXwaTSHTlbZjmbUuFA.png 1100w, https:\/\/miro.medium.com\/v2\/format:webp\/1*64AFAXwaTSHTlbZjmbUuFA.png 1400w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/1*64AFAXwaTSHTlbZjmbUuFA.png 640w, https:\/\/miro.medium.com\/v2\/1*64AFAXwaTSHTlbZjmbUuFA.png 720w, https:\/\/miro.medium.com\/v2\/1*64AFAXwaTSHTlbZjmbUuFA.png 750w, https:\/\/miro.medium.com\/v2\/1*64AFAXwaTSHTlbZjmbUuFA.png 786w, https:\/\/miro.medium.com\/v2\/1*64AFAXwaTSHTlbZjmbUuFA.png 828w, https:\/\/miro.medium.com\/v2\/1*64AFAXwaTSHTlbZjmbUuFA.png 1100w, https:\/\/miro.medium.com\/v2\/1*64AFAXwaTSHTlbZjmbUuFA.png 1400w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 700px\" data-testid=\"og\"><\/picture><\/div>\n<\/figure>\n<p id=\"89d1\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">3. Parts of speech (POS) tagging: <\/strong>SpaCy has methods for organizing sentences into lists of words and classifying each word according to its part of speech in the context. Splitting words into grammatical characteristics like nouns, verbs, adjectives, and adverbs is part of the POS process.<\/p>\n<p id=\"4346\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">The below code will explain the POS tagging:<\/p>\n<pre class=\"or os ot ou ov ox oy oz bo pa ba bj\"><span id=\"fca6\" class=\"pb mr fr oy b bf pc pd l pe pf\" data-selectable-paragraph=\"\"> <span class=\"hljs-keyword\">import<\/span>  spacy\n nlp  = spacy.load(<span class=\"hljs-string\">\"en_core_web_sm\"<\/span>)\n text1 = <span class=\"hljs-string\">\"Susan is my neighbor; She is charming and having 1 brother\"<\/span>\n <span class=\"hljs-comment\"># Now, Process the input text<\/span>\n doc = nlp(text1)\n <span class=\"hljs-keyword\">for<\/span> token <span class=\"hljs-keyword\">in<\/span> doc:\n     <span class=\"hljs-comment\"># Get the token text, part-of-speech (POS) tag <\/span>\n     token_text1 = token.text\n     token_pos1 = token.pos_\n\n     <span class=\"hljs-comment\"># This is for formatting only<\/span>\n     <span class=\"hljs-built_in\">print<\/span>(<span class=\"hljs-string\">f\"<span class=\"hljs-subst\">{token_text1:&lt;<span class=\"hljs-number\">12<\/span>}<\/span><span class=\"hljs-subst\">{token_pos1:&lt;<span class=\"hljs-number\">8<\/span>}<\/span>\"<\/span>)  <\/span><\/pre>\n<p id=\"07e0\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">Output:<\/strong><\/p>\n<pre class=\"or os ot ou ov ox oy oz bo pa ba bj\"><span id=\"05c8\" class=\"pb mr fr oy b bf pc pd l pe pf\" data-selectable-paragraph=\"\">\n Susan       NOUN\n <span class=\"hljs-keyword\">is<\/span>          AUX\n my          ADJ\n neighbor    NOUN\n she         PROPN\n ................\n ................\n <span class=\"hljs-number\">1<\/span>           NUM <\/span><\/pre>\n<p id=\"9768\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">4. Entity recognition: <\/strong>Finding entities in the text is one of the most frequent labeling issues. It is a sophisticated form of language processing that recognizes significant text input items, including places, people, organizations, and languages. Since you can easily select relevant subjects or pinpoint crucial chunks of text, this is incredibly beneficial for swiftly extracting information from text.<\/p>\n<p id=\"74b7\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">The code in the section will demonstrate entity recognition in simple terms. To do this, first, import SpaCy and load the model to process the text. Then, iterate through each entity and output its label.<\/p>\n<pre class=\"or os ot ou ov ox oy oz bo pa ba bj\"><span id=\"d992\" class=\"pb mr fr oy b bf pc pd l pe pf\" data-selectable-paragraph=\"\"><span class=\"hljs-keyword\">import<\/span> spacy\n nlp = spacy.load(<span class=\"hljs-string\">\"en_core_web_sm\"<\/span>)\n text1 = <span class=\"hljs-string\">\"Upcoming iPhone XII release date leaked as apple discloses it's pre-orders\"<\/span>\n <span class=\"hljs-comment\"># Process the input text<\/span>\n doc = nlp(text1)\n\n <span class=\"hljs-comment\"># Iterate over the entities<\/span>\n <span class=\"hljs-keyword\">for<\/span> ent <span class=\"hljs-keyword\">in<\/span> doc.ents:\n     <span class=\"hljs-comment\"># Print the entity from the input text and label<\/span>\n     <span class=\"hljs-built_in\">print<\/span>(ent.text, ent.label_)<\/span><\/pre>\n<p id=\"9e2c\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">Output: <\/strong>Apple ORG<\/p>\n<h1 id=\"be4d\" class=\"mq mr fr be ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn bj\" data-selectable-paragraph=\"\">Difference between SpaCy and NLTK<\/h1>\n<p id=\"c321\" class=\"pw-post-body-paragraph no np fr be b nq nr ns nt nu nv nw nx ny nz oa ob oc od oe of og oh oi oj ok fk bj\" data-selectable-paragraph=\"\"><a class=\"af mp\" href=\"https:\/\/medium.com\/cometheartbeat\/text-summarization-using-python-and-nltk-d1022ac347eb\" rel=\"noopener\">NLTK<\/a> and SpaCy differ significantly in several ways.<\/p>\n<p id=\"9d1a\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">1.<\/strong> NLTK can support various languages, but SpaCy has models for only seven languages: English, French, German, Spanish, Portuguese, Italian, and Dutch.<\/p>\n<p id=\"dbc1\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">2.<\/strong> SpaCy uses an object-oriented approach where, when you parse a text, it will return a document object whose words and sentences are objects themselves, whereas NLTK takes strings as input and returns strings or lists of strings as output because it is a string processing library.<\/p>\n<p id=\"5578\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\"><strong class=\"be qj\">3.<\/strong> The performance of SpaCy is typically better than that of NLTK since it uses the most up-to-date and effective algorithms. SpaCy performs better in word tokenization and POS tagging, whereas NLTK surpasses SpaCy in sentence tokenization.<\/p>\n<p id=\"3586\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">4. NLTK does not have support for word vectors, but SpaCy has that feature.<\/p>\n<p id=\"ee81\" class=\"pw-post-body-paragraph no np fr be b nq ol ns nt nu om nw nx ny on oa ob oc oo oe of og op oi oj ok fk bj\" data-selectable-paragraph=\"\">The below image shows the quick difference between NLTK and SpaCy.<\/p>\n<figure class=\"or os ot ou ov md lv lw paragraph-image\">\n<figure><img loading=\"lazy\" decoding=\"async\" class=\"bg mi mj c\" role=\"presentation\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:550\/1*WUN6x0ZFWLpFV6cHmSFtKA.png\" alt=\"\" width=\"550\" height=\"546\"><\/figure><div class=\"lv lw ql\"><picture><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/format:webp\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/format:webp\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/format:webp\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/format:webp\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/format:webp\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/format:webp\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 1100w\" type=\"image\/webp\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 550px\"><source srcset=\"https:\/\/miro.medium.com\/v2\/resize:fit:640\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 640w, https:\/\/miro.medium.com\/v2\/resize:fit:720\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 720w, https:\/\/miro.medium.com\/v2\/resize:fit:750\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 750w, https:\/\/miro.medium.com\/v2\/resize:fit:786\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 786w, https:\/\/miro.medium.com\/v2\/resize:fit:828\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 828w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 1100w, https:\/\/miro.medium.com\/v2\/resize:fit:1100\/1*WUN6x0ZFWLpFV6cHmSFtKA.png 1100w\" sizes=\"(min-resolution: 4dppx) and (max-width: 700px) 50vw, (-webkit-min-device-pixel-ratio: 4) and (max-width: 700px) 50vw, (min-resolution: 3dppx) and (max-width: 700px) 67vw, (-webkit-min-device-pixel-ratio: 3) and (max-width: 700px) 65vw, (min-resolution: 2.5dppx) and (max-width: 700px) 80vw, (-webkit-min-device-pixel-ratio: 2.5) and (max-width: 700px) 80vw, (min-resolution: 2dppx) and (max-width: 700px) 100vw, (-webkit-min-device-pixel-ratio: 2) and (max-width: 700px) 100vw, 550px\" data-testid=\"og\"><\/picture><\/div>\n<figcaption class=\"mk ml mm lv lw mn mo be b bf z dw\" data-selectable-paragraph=\"\">Image from: <a class=\"af mp\" href=\"https:\/\/heartbeat.comet.ml\/v2\/resize:fit:640\/format:webp\/1*_hyXT-_ggsgzv2nDRMz39w.png\" target=\"_blank\" rel=\"noopener ugc nofollow\">https:\/\/miro.medium.com<\/a><\/figcaption>\n<\/figure>\n<h1 id=\"a8df\" class=\"mq mr fr be ms mt mu mv mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn bj\" data-selectable-paragraph=\"\">Conclusion<\/h1>\n<p id=\"6653\" class=\"pw-post-body-paragraph no np fr be b nq nr ns nt nu nv nw nx ny nz oa ob oc od oe of og oh oi oj ok fk bj\" data-selectable-paragraph=\"\">When developing an NLP system, NLTK and SpaCy are both excellent choices. But as we\u2019ve seen, SpaCy is the best tool to employ in a real-world setting due to its high user-friendliness and performance, which are driven by its core principle, providing a service rather than just a tool. Hopefully, this article has inspired you to experiment with SpaCy.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Photo by Brooks Leibee on Unsplash Introduction Natural language processing (NLP) is the field that gives computers the ability to recognize human languages, and it connects humans with computers. One can build NLP projects in different ways, and one of those is by using the Python library SpaCy. This post will go over how the [&hellip;]<\/p>\n","protected":false},"author":84,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6],"tags":[],"coauthors":[181],"class_list":["post-8094","post","type-post","status-publish","format-standard","hentry","category-machine-learning"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Natural Language Processing With SpaCy (A Python Library) - Comet<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Natural Language Processing With SpaCy (A Python Library)\" \/>\n<meta property=\"og:description\" content=\"Photo by Brooks Leibee on Unsplash Introduction Natural language processing (NLP) is the field that gives computers the ability to recognize human languages, and it connects humans with computers. One can build NLP projects in different ways, and one of those is by using the Python library SpaCy. This post will go over how the [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-02T18:24:13+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-04-24T17:04:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Dom9_CkFV57KMZ-k_lasRQ.jpeg\" \/>\n<meta name=\"author\" content=\"Khushboo Kumari\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Khushboo Kumari\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Natural Language Processing With SpaCy (A Python Library) - Comet","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library","og_locale":"en_US","og_type":"article","og_title":"Natural Language Processing With SpaCy (A Python Library)","og_description":"Photo by Brooks Leibee on Unsplash Introduction Natural language processing (NLP) is the field that gives computers the ability to recognize human languages, and it connects humans with computers. One can build NLP projects in different ways, and one of those is by using the Python library SpaCy. This post will go over how the [&hellip;]","og_url":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2023-11-02T18:24:13+00:00","article_modified_time":"2025-04-24T17:04:42+00:00","og_image":[{"url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Dom9_CkFV57KMZ-k_lasRQ.jpeg","type":"","width":"","height":""}],"author":"Khushboo Kumari","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Khushboo Kumari","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library\/"},"author":{"name":"Khushboo Kumari","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9e9bc90fd931c322a00805c37b5dc8e8"},"headline":"Natural Language Processing With SpaCy (A Python Library)","datePublished":"2023-11-02T18:24:13+00:00","dateModified":"2025-04-24T17:04:42+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library\/"},"wordCount":916,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Dom9_CkFV57KMZ-k_lasRQ.jpeg","articleSection":["Machine Learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library\/","url":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library","name":"Natural Language Processing With SpaCy (A Python Library) - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library#primaryimage"},"thumbnailUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Dom9_CkFV57KMZ-k_lasRQ.jpeg","datePublished":"2023-11-02T18:24:13+00:00","dateModified":"2025-04-24T17:04:42+00:00","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library#primaryimage","url":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Dom9_CkFV57KMZ-k_lasRQ.jpeg","contentUrl":"https:\/\/miro.medium.com\/v2\/resize:fit:700\/1*Dom9_CkFV57KMZ-k_lasRQ.jpeg"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/natural-language-processing-with-spacy-a-python-library#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"Natural Language Processing With SpaCy (A Python Library)"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/9e9bc90fd931c322a00805c37b5dc8e8","name":"Khushboo Kumari","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/d5766b081477ed4dc292729a8cfdf38b","url":"https:\/\/secure.gravatar.com\/avatar\/0a4a12b6e00a526ba8df6fba3b372ca0c498565db302b52ccceb6df4329d16a5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/0a4a12b6e00a526ba8df6fba3b372ca0c498565db302b52ccceb6df4329d16a5?s=96&d=mm&r=g","caption":"Khushboo Kumari"},"url":"https:\/\/www.comet.com\/site\/blog\/author\/khushboo-writer2244gmail-com\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8094","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/84"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=8094"}],"version-history":[{"count":1,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8094\/revisions"}],"predecessor-version":[{"id":15462,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/8094\/revisions\/15462"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=8094"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=8094"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=8094"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=8094"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}