Learn Python Stemming and Lemmatization – NLTK   Recently updated !


1. Objective

In this Python Stemming tutorial, we will discuss Stemming and Lemmatization in Python Programming Language– two basics when working with data science in Python. Moreover, we will discuss Python NLTK and Python Stemming examples. Along with this, we will learn Python Stemming vs Lemmatization.

So, let’s begin Python Stemming and Lemmatization.

Python Stemming and Lemmatization - NLTK

Python Stemming and Lemmatization – NLTK

2. Prerequisites for Python Stemming and Lemmatization

For our purpose, we will use the following library-

a. NLTK

Python NLTK is an acronym for Natural Language Toolkit. It is a set of libraries that let us perform Natural Language Processing (NLP) on English with Python. It lets us do so in a symbolic and statistical way. It also provides sample data and supports graphical representation.

Do you How Python Rename File – Single & Multiple Files With Example

You can install it using pip-

C:\Users\lifei>pip install nltk

Collecting nltk

Downloading https://files.pythonhosted.org/packages/50/09/3b1755d528ad9156ee7243d52aa5cd2b809ef053a0f31b53d92853dd653a/nltk-3.3.0.zip (1.4MB)

100% |████████████████████████████████| 1.4MB 669kB/s

Requirement already satisfied: six in c:\users\lifei\appdata\local\programs\python\python36\lib\site-packages (from nltk) (1.11.0)

Installing collected packages: nltk

Running setup.py install for nltk … done

Successfully installed nltk-3.3

3. What is Python Stemming?

Python Stemming is the act of taking a word and reducing it into a stem. A stem is like a root for a word- that for writing is writing. But this doesn’t always have to be a word; words like study, studies, and studying all stem into the word studi, which isn’t actually a word.

Python Stemming and Lemmatization - NLTK

What is Python Stemming

It is almost like these words are synonyms; this lets us normalize sentences and makes searching for words easier and faster. The stemming algorithms we have are often based on rules applying to suffix-stripping. The most common is the Porter-Stemmer, which has been around since 1979.

Read about Python Read And Write File – File Handling In Python

a. Python – Stemming Individual Words

>>>import nltk
>>> from nltk.stem import PorterStemmer
>>> words=['write','writer','writing','writers']
>>> ps=PorterStemmer()
>>> for word in words:
print(f"{word}: {ps.stem(word)}")

Output- 

write: write

writer: writer

writing: write

writers: writer

Now let’s try some more words.

>>> ps.stem('written')

‘written’

>>> ps.stem('wrote')

‘wrote’

>>> ps.stem('writable')

‘writabl’

>>> ps.stem('writes')

‘write’

b. Another Example of Python Stemming

Let’s try more words.

>>> ps.stem('game')

‘game’

>>> ps.stem('gaming')

‘game’

>>> ps.stem('gamed')

‘game’

>>> ps.stem('games')

‘game’

Let’s Explore Difference Between Method and Function in Python

c. Stemming an Entire Sentence

>>> from nltk.tokenize import word_tokenize
>>> nltk.download('punkt')
>>> sentence='I am enjoying writing this tutorial; I love to write and I have written 266 words so far. I wrote more than you did; I am a writer.'
>>> words=word_tokenize(sentence)
>>> for word in words:
print(f"{word}: {ps.stem(word)}")

I: I

am: am

enjoying: enjoy

writing: write

this: thi

tutorial: tutori

;: ;

I: I

love: love

to: to

write: write

and: and

I: I

have: have

written: written

266: 266

words: word

so: so

far: far

.: .

I: I

wrote: wrote

more: more

than: than

you: you

did: did

;: ;

I: I

am: am

a: a

writer: writer

.: .

4. What is Python Lemmatization?

Python Lemmatization lets us group together inflected forms of a word. It links words with similar meanings to one word and maps various words onto one root.

Let’s Discuss Python Regular Expressions | Python Regex Tutorial

Learn Python Stemming and Lemmatization - NLTK

What is Python Lemmatization

a. Python Stemming vs Lemmatization

But how is this different than Python stemming? While stemming can create words that do not actually exist, Python lemmatization will only ever result in words that do. lemmas are actual words.

>>> ps.stem('indetify')

‘indetifi’

>>> lemmatizer.lemmatize('identify')

‘identify’

b. Python Lemmatization Examples

>>> from nltk.stem import WordNetLemmatizer
>>> lemmatizer=WordNetLemmatizer()
>>> nltk.download('wordnet')
>>> lemmatizer.lemmatize('dogs')

‘dog’

>>> lemmatizer.lemmatize('geese')

‘goose’

>>> lemmatizer.lemmatize('cacti')

‘cactus’

>>> lemmatizer.lemmatize('erasers')

‘eraser’

>>> lemmatizer.lemmatize('children')

‘child’

>>> lemmatizer.lemmatize('feet')

‘foot’

Let’s Learn Python Debugger with Examples

c. Using Pos

>>> lemmatizer.lemmatize('better',pos='a')

‘good’

Here, pos is a speech parameter, which is noun by default. This means Python will try to find the closest noun.

>>> lemmatizer.lemmatize('redder','a')

‘red’

Since, Python lemmatization considers whether a word is a noun, a verb, an adjective, an adverb, and so, Python needs to find out about a word’s context.

So, this was all about Python Stemming and Lemmatization – NLTK. Hope you like our explanation.

5. Conclusion

Hence, in this Python tutorial, we studied Python Stemming and Lemmatization. In addition, we studied NLTK, an example of Stemming and Lemmatization in Python, and the difference between Python Stemming and Lemmatization. Tell us what you think about this Python Stemming and Lemmatization tutorial, in the comments Box.

Related Topic- CGI Programming in Python with Functions and Modules

For reference

Leave a comment

Your email address will not be published. Required fields are marked *