NLP initialization & pre-processing.
Raidah Fairuz Nashra
Posted on January 27, 2024
NLP stands for Natural Language Processing. It is a part of AI. Machine can understand and process human language by NLP. In 1950 Alan Mathison published an article(Computing Machinery & Intelligence) about Al and in further it talks interpretation and generation of natural language.
Heuristics-Based NLP: Initial approach of NLP. It comes from domain knowledge.
Statistical Machine Learning-based NLP: It is based on statistical rules and ML algorithms. In this algorithms are applied to the data and various tasks.
Neural Network-based NLP: Based on neural network architecture. It is a data hungry and time consuming approach. It requires high computational power to train the model. Ex: Transformers, Recurrent neural network etc.
Advantages
- Analyze data from both structure and unstructured sources.
- Fast and time efficient. 3.End to end exact answer never consume unnecessary information. 4.Takes milliseconds to response.
Disadvatages
- A lots of data and computations are needed to train the model.
2.Limited function and can not adapt new domain.
Components
- NLP understanding
- NLP generation
Applications
Voice assistants: Alexa,siri,Google etc.
Text classification :MS word,google doc,Grammarly.
Information extraction: Google
Google Translator.
Approaches of NLP
Pre-processing
Removes handles and URLs
Tokenization:Break down the sentence into smaller unites.
3.Normalization: Case conversion. Covert the text into a standard form.
4.Stemming: Reduce the words by removing suffix ex: dance,dancing,danced stemmed "dan".
5.Lemmatization: Removes the part of speech(be verbs) and stop words(and,a,are etc)
6.Puncuation Removal: Focus on important words, removes puncuations(, ; () !)
7.Stopwords:Romoval of very common words.Ex:"the","a","and"
Posted on January 27, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024