Understand Various Steps in Natural Language Processing (NLP) under 5 minutes
Natural Language Processing or NLP for short, is broadly defined as the automatic manipulation of Languages, like speech and text by software.
The study of Natural Language Processing has been around for more than 4 decades.
In this blog, you will discover the steps in NLP
Tokenization:
The Process of breaking the string into smaller tokens is called Tokenization.
Example:
My Name is Aman
after breaking this into token we get
'My' , 'Name' , 'is' , 'Aman'
Stemming:
Normalizing the words into their base form or root form is called Stemming.
Example: All the below words are considered as one:
Affections, Affects, Affected, Affecting
All the above token will be converted to their root form that is
Affect
It simply tries to remove all possible and basic prefix and postfix to a work
Lemmatization:
Takes care of Morphological analysis of word
- Groups together different inflected forms of the word called lemma
- Somehow similar to stemming, as it maps several words into one common root
- Output of lemmatization is a proper word
Example:
Lemmatiser should map gone, going, went into go
POS Tag: Part of Speech
Here the words are mapped with their Parts of Speech
Example:
The | Dog | Killed | the | Bat |
DT | NN | VBD | DT | NN |
List of Universal POS Tag
Name Entity Recognition:
It is used to Identify or Recognize the name of Movie/ Organisation, Location, person, and so on
Example:
Google's CEO Sundar Pichai introduced the new Pixel Phone at New York
and after Name Entity Recognition it shall be
Google's | CEO | Sundar Pichai | introduced | the | new | Pixel | Phone | at | New York |
Organisation | Person | Object | Location |
Chunking
Picking up Individual pieces of Information and grouping them into bigger pieces.