Machine Learning with Simple Text Messages

petercour

petercour

Posted on July 15, 2019

Machine Learning with Simple Text Messages

You can classify text data automatically. Say you collect messages about cooking and messages about programming.

A machine learning algorithm can then decide which group (cooking, programming) a new message belongs to.

If you have a list of messages like this:

#!/usr/bin/python3
data = [ "Help me impress the girl of my dreams!", 
         "How do you measure ingredients like butter in cups?",
         "Tips on making fried rice", 
         "immutability in javascript. It has a declarative approach of programming, which means that you focus on describing what your program must accomplish", 
         "Facing a Programming Problem. Everybody has encountered it, the programming problem that makes NO sense. This problem has no fix, it just cannot be done",
         " 5 Uses for the Spread Operator. The spread operator is a favorite of JavaScript developers. It's a powerful piece of syntax that has numerous applications."]
Enter fullscreen mode Exit fullscreen mode

Where each message belongs to a class

target = [ 0,0,0,1,1,1 ]
Enter fullscreen mode Exit fullscreen mode

You can predict the class for a new message:

#!/usr/bin/python3
sentence = input("Enter some text: ")
sentence_x = transfer.transform([sentence])
y_predict = estimator.predict(sentence_x)
print("y_predict:\n", y_predict)
Enter fullscreen mode Exit fullscreen mode

Give it a spin:

Enter some text: im a cook
y_predict: [0]
Enter fullscreen mode Exit fullscreen mode

Another run:

Enter some text: programming javascript is great
y_predict: [1]
Enter fullscreen mode Exit fullscreen mode

The program

The data set we have defined is extremely small (6 samples). The more samples you have, the better it becomes.

#!/usr/bin/python3
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split

def text_classify():
    data = [ "Help me impress the girl of my dreams!", 
             "How do you measure ingredients like butter in cups?",
             "Tips on making fried rice", 
             "immutability in javascript. It has a declarative approach of programming, which means that you focus on describing what your program must accomplish", 
             "Facing a Programming Problem. Everybody has encountered it, the programming problem that makes NO sense. This problem has no fix, it just cannot be done",
             " 5 Uses for the Spread Operator. The spread operator is a favorite of JavaScript developers. It's a powerful piece of syntax that has numerous applications."]
    target = [ 0,0,0,1,1,1 ]
    x_train, x_test, y_train, y_test = train_test_split(data,target)

    transfer = TfidfVectorizer()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    estimator = MultinomialNB()
    estimator.fit(x_train,y_train)

    score = estimator.score(x_test, y_test)
    print("score:\n", score)

    sentence = input("Enter some text: ")
    sentence_x = transfer.transform([sentence])
    y_predict = estimator.predict(sentence_x)
    print("y_predict: ", y_predict)

    return None

text_classify()            
Enter fullscreen mode Exit fullscreen mode

Related links:

💖 💪 🙅 🚩
petercour
petercour

Posted on July 15, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related