NLP initialization & pre-processing.

raidah_fairuz

Raidah Fairuz Nashra

Posted on January 27, 2024

NLP initialization & pre-processing.

NLP stands for Natural Language Processing. It is a part of AI. Machine can understand and process human language by NLP. In 1950 Alan Mathison published an article(Computing Machinery & Intelligence) about Al and in further it talks interpretation and generation of natural language.

  • Heuristics-Based NLP: Initial approach of NLP. It comes from domain knowledge.

  • Statistical Machine Learning-based NLP: It is based on statistical rules and ML algorithms. In this algorithms are applied to the data and various tasks.

  • Neural Network-based NLP: Based on neural network architecture. It is a data hungry and time consuming approach. It requires high computational power to train the model. Ex: Transformers, Recurrent neural network etc.

Advantages

  1. Analyze data from both structure and unstructured sources.
  2. Fast and time efficient. 3.End to end exact answer never consume unnecessary information. 4.Takes milliseconds to response.

Disadvatages

  1. A lots of data and computations are needed to train the model.

2.Limited function and can not adapt new domain.

Components

  1. NLP understanding
  2. NLP generation

Applications

  • Voice assistants: Alexa,siri,Google etc.

  • Text classification :MS word,google doc,Grammarly.

  • Information extraction: Google

  • Google Translator.
    Approaches of NLP
    Pre-processing

  1. Removes handles and URLs

  2. Tokenization:Break down the sentence into smaller unites.

3.Normalization: Case conversion. Covert the text into a standard form.

4.Stemming: Reduce the words by removing suffix ex: dance,dancing,danced stemmed "dan".

5.Lemmatization: Removes the part of speech(be verbs) and stop words(and,a,are etc)

6.Puncuation Removal: Focus on important words, removes puncuations(, ; () !)

7.Stopwords:Romoval of very common words.Ex:"the","a","and"

💖 💪 🙅 🚩
raidah_fairuz
Raidah Fairuz Nashra

Posted on January 27, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related