This is a follow-up tutorial on Hugging Face's library transformers i wrote earlier. In this post I'll cover zero shot classification pipeline;I'll cover what this is, and how a web developer or iOS developer can leverage this technology.

Spoiler

A little peak of what this library can do -

>>> from transformers import pipeline
>>> classifier = pipeline('zero-shot-classification')
>>> classifier('your delivery boy was really rude, the service sucks. #review #badreview', ['negative', 'positive'])
{'sequence': 'your delivery boy was really rude, the service sucks. #review #badreview', 'labels': ['negative', 'positive'], 'scores': [0.9980460405349731, 0.0019539250060915947]}

That mean 99% negative and 0.19% positive, in 3 lines you know if review is positive or negative.

Another one

>>> classifier('Get a free iphone today, just let us your banking details', ['spam', 'not spam'])
{'sequence': 'Get a free iphone today, just let us your banking details', 'labels': ['not spam', 'spam'], 'scores': [0.7550246119499207, 0.24497543275356293]}

That is 100% scam obviously; but 75% confidence score in 3 lines of code is pretty decent.

What is Zero Shot classification?

In machine learning, you feed in a lot of data into a model with some labels, it's called training; Then you pass in some data and model predicts those model. If you have different labels retrain the model.

This is very good at replacing humans, but overall it's dumb. Humans don't have to learn every time we have a new question(or set of labels), we leverage our general understanding.

Zero-shot learning to rescue, just pass in the data(text) and labels; the model tell which label is most suitable.

Sounds like magic but it isn't.

My tests with the model

I tried to test model on some real world data to see how good this performs from a naive developer's perspective.

1) Predicting labels in Reddit

Reddit have flairs/labels in a post, so you can filter posts by a specific flair.

I tried to predict flairs from a post's title, in the following subreddits,

Science Subreddit (/r/science/) -
Very good performance for non-overlapping labels like Medicine, Technology, etc. but for related labels like Health, Medicine, Animal etc. it was very frequently mislabeled.
Jobbit (/r/jobbit/) -
In this subreddit people offer job/project [Hiring] and also showcase their resume[For hire].
Using these labels as prediction target we had very confusing results, so I changes labels to hiring and resume and it worked like a charm, with a confidence score of 90+ mostly.

2) Predicting type of SMS from my phone

I picked out some SMS from my inbox, and used labels 'OTP', 'bank statement', 'Offers'. Some stats -

	Number of messages	Correctly Classified	Average score of correct prediction
OTP	5	5	0.92
bank statement	4	4	0.65
Offers	4	3	0.72

3) Predicting type of programing language

I haven't checked the dataset this model was trained on so I wasn't sure if model can handle it, but it did work pretty decent. I didn't do any benchmarks but overall it seems to work, with a lot of fluctuation in confidence score. It gets confused between c and c++ but it did distinguish python and c++ like languages very well.

It does confuse label assembly language with c language, may because it wasn't trained in code repositories XD; or because a lot of c code had inline assembly.

Choice of proper label effects the performance, for example go vs golang.

Note: This was just a random thought, don't use this in production.

What it means for all the developers?

You can now include some intelligent features in your applications without any deep learning expertise. This is very good for prototyping and hackathons, where you create a proof-of-concept and if it takes off, hire an expert for more accurate solution.

Few examples I can think of are -

Support Ticket - forwarding a support request by customer to the correct department is very crucial for quickly resolving it, use support message as text and departments as classes.
Ban negative/NSFW content - filter out any text in your application for hateful or NSFW content in message board, comments etc.
Let us know in the comments - Suggest some cool applications you can think of in the comments down below.

Cons

Model is huge

if you are thinking about running it on an android or iOS device don't. Memory usage of my Ubuntu desktop went from 6Gb to 13Gb when using this model, most people won't even be able to run this on their 4Gb laptop.

I haven't tried running it on a cloud environment but obviously a 512mb free tier VPS won't do, you need 7Gb+ with a decent CPU, dedicated CPU will be best.

NOTE: These stats are in python when i tested the library, I haven't looked at optimized model serving performance.

Not reliable

You can use this for prototyping, or when people are verifying it, or when mislabeling doesn't cost your business a lot.

In general take it with a grain of salt, it's very new technology and will take some time to mature.

Some Tips

Larger text is classified with better accuracy
Choose labels wisely, as found above 'For hire' and 'Hiring' were really bad set of labels.

Note: This is my naive observation, take it with a grain of salt.

Thank you for reading

Let me know what you think about this article series, down in the comments.

Give this article a like if this was helpful.

Follow me to get notified on similar articles, more is on the way ;)

Disclaimer: I'm not affiliated with Hugging Face in any manner XP, this project just have a lot of untapped potential with very few tutorials out of deep learning community.

Blog

Automatic text classification in 3 lines of code 🤗 [Tutorial]

AviKKi