Notes on “Introduction to machine learning.”

I have recently started dipping my toes in the deep waters of Machine Learning. Through a few posts I would like to share my learnings and notes on Machine Learning.

This post covers the topics:

What is Machine Learning?
Types of Machine Learning. - Supervised learning. - Unsupervised learning. [Reinforcement learning. is not covered in this blog]
Extras
- Dividing Data
- Overfitting
- Underfitting.

[Note that I am a beginner at blogging, so if I have gone wrong anywhere or can improve, please let me know.]

The references I am using for this blog and my learning:
FreeCodeCamp, Kaggle, Reddit, Twitter threads, and blogs.

The type of learning I prefer is to watch videos about a concept and then read about it. Not everyone learns in a similar manner but if you are someone who prefers this type of learning then feel free to use the sources referenced above.

So, let’s get started!

What is Machine learning?

‘Machine learning’ is a fancy way of saying-
Feed data to an algorithm. The algorithm analyzes the data. The algorithm makes a prediction (/gives an output).

A simple explanation I liked from Reddit was:

“Machine Learning is a form of Artificial Intelligence in which the program is designed to learn on its own.

A simple example might be the following:

You want to create a program to differentiate between apples and oranges. You have data that says that oranges weigh between 150-200g, and apples between 100-130g. Also, oranges are rough, and apples smooth (which you might represent as a 0 or 1). If you have a fruit that weighs 115g, and is smooth, your program can determine that it is probably an apple. Vice-versa, if the fruit is 175g and rough, it is most likely an orange. Anything outside of these boundaries won't be either. What, now, if your fruit is smooth, but only 99g? It probably is an apple, but not to your program. Therefore, the more data you have, the more accurate your data becomes. It might even use past guesses to further its own data. It is learning on its own what an orange or an apple is. This is Machine Learning.”

Types of Machine Learning?

There are three types of machine learning:

Supervised learning
Unsupervised Learning
Reinforcement Learning.[Not covered in this blog]

Supervised Learning

In supervised learning, a human is supervising the computer forming a model.

Here's an analogy- When a child is learning the different types of colors, an elder is telling the child whether they have detected the color correctly or not. When shown the colour 'red' to the child there are two possibilities, the child recognizes the color or they don't. If they do they have learnt well else the elder corrects them and teaches them again.

Over here the dataset is the colors, the ‘supervisor’ is an elder and the child learning colors is a model. A better model is created with repeated re-training.

There are two types of supervised learnings:

Classification
Regression

In classification, the values are discrete. In regression, the values are continuous.

'Discrete values' mean that they are specific. A bottle of water can either be ‘empty’ or ‘not empty.’ A number can be either ‘even’ or ‘odd.’ The output is specific.

Continuous values are values that fall within a range. A person’s age. ‘Age’ will fall within a range. i.e 0 to 100.

Unsupervised learning

In unsupervised learning a pattern or a structure is to be extracted from a given dataset. This means that we can draw references from observations in the input data.

Here’s an example from a blog I came across was:

“Imagine you are in a foreign country and you are visiting a food market, for example. You see a stall selling a fruit that you cannot identify. You don’t know the name of this fruit. However, you have your observations to rely on, and you can use these as a reference. In this case, you can easily the fruit apart from nearby vegetables or other food by identifying its various features like its shape, color, or size.

This is roughly how unsupervised learning happens. We use the data points as references to find meaningful structure and patterns in the observations. Unsupervised learning is commonly used for finding meaningful patterns and groupings inherent in data, extracting generative features, and exploratory purposes.”

It’s okay if a lot of things don't make sense for now and may seem intimidating. It will eventually.

Extras

In this section I am putting down information that I thought is useful but did not understand where to place it.

Dividing the dataset.

The existing data set is divided into two parts. Training and testing.
The training data set is the data set used to train the model/algorithm.
The test part of the data set is used to evaluate the model/algorithm you are using.

Overfitting

“Overfitting would be like training your dog raise his paw when you hold out your hand, and he learns the trick perfectly, but he only does it when it's you holding out your hand, and only your right hand because that's all he's been trained on. He won't do it for anyone else, and he won't do it when you raise your left hand. So his "model" of the trick works perfectly, but because there hasn't been enough variation in the training activity, or because of the way that the training was done (or perhaps just because of the way the dog's brain works), his trick doesn't generalize correctly to other stimuli that was intended to yield the same result.” - Reddit

Underfitting

In underfitting the model has not learnt enough and is not able to map the input to the output properly.

That's it for this blog. Thank you for reading. I hope this was helpful. If there are ways in which I can improve please do let me know.

Blog