lou
Posted on August 17, 2022
NLTK is short for Natural Language Toolkit, which is an open-source Python library for NLP.
We want to count the frequency of words for the following text using NLTK.
text= "Morocco, officially the Kingdom of Morocco, is the westernmost country in the Maghreb region of North Africa. It overlooks the Mediterranean Sea to the north and the Atlantic Ocean to the west, and has land borders with Algeria to the east, and the disputed territory of Western Sahara to the south. "
To install NLTK
pip install nltk
If you don't have Jupyter installed type the following commands in your terminal.
pip install jupyterlab
pip install notebook
pip install voila
run Jupyter with
jupyter notebook
Import the following libraries.
Assign the text to a variable.
The following function divides a sentence into words and punctuations.
Which you can see in the output.
The following code loops over the text tokens and counts the number of times a given token occurred.
Using lower() we're going to convert the words into lowercase, like this we can avoid considering the same word in uppercase as different.
Top 10 most frequent words:
Posted on August 17, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.