Beginning with Machine Learning - Part 1
Apoorva Dave
Posted on February 13, 2019
This question pops into almost everyone’s head who so ever wants to play with this new technology. I myself wondered as to from where should I begin, what should I cover and how can I learn quickly!
I am not here to give you a list of articles from where you can read or explore. But I will help you through it. To have a basic understanding of almost every important concept so that you can dig into that as well. Let’s get started!
- What is Machine Learning?
- Important types of Machine Learning
- Classification Algorithms
- Regression Algorithms
- Clustering
- Cost Functions
- Collinearity
- PCA
- Gradient Descent
- Some projects on ML to help you get started
The above list of topics will be covered in almost 5 articles to help you start with ML.
What is Machine Learning?
In ML you learn from data, as simple as that. We don’t have to write any custom code specific to the problem. Instead, we feed data to algorithms and it builds its own logic based on the data.
Consider you want to identify which fruit is an apple and which is not. You cannot go on writing specific dimensions, color or size of your apple. As each apple might look similar but they don’t have exact same dimensions. This is one of the most basic use cases of ML. Here we will provide the algorithm with all types of apples that is a set of features of different types of apples. Our algorithm learns these features and classifies a fruit as an apple or not an apple!
Types of Machine Learning
Supervised: In this approach, we have a labeled dataset. Our model can learn from this labeled data and help in classification, prediction etc. In our above example of apples, when we provide our model with a set of features, each row of the dataset is labeled as to whether those features constitute an apple or not. Classification and Regression problems are supervised.
Unsupervised: Here we have an unlabeled dataset. That is we do not know what all features will constitute an apple. An example is clustering where we cluster or create groups of similar types of objects.
Reinforcement Learning: In this, the agent learns from the environment by interacting with it and receiving rewards for performing actions. It tries to move to a state by performing an action. He learns by receiving a reward for this positive or negative for each action.
Before jumping on to classification and regression algorithms, I will list out a set of terms which will help us have a better understanding.
Model: People often gets confused by the term model. It is simply an artifact that is created by the training process. You provide training data to machine learning algorithms, the data is learned and we get a trained model.
Training and Testing Data: The data provided to the algorithm for learning is called the training set. The predictions are made on a separate dataset called testing data. It is on this data that we check the accuracy of our trained model.
Overfitting and Underfitting: A model is said to be overfitted, if it learns the training data very well, but is unable to generalize. That is even though it gives good results on the trained data it does not provide good predictions on the test data. A model is said to be underfitted if it is unable to learn the training data itself 😄. The underfit model won’t perform well on the seen data forget about the unseen one 😝.
Bias and Variance: Many people (including me) wondered what these errors actually mean. So in simple terms, Bias is the error which occurs because of making wrong assumptions. It results in an underfit model. We might make an assumption that data is linear but in fact, it is quadratic. This causes underfitting. On the other hand, Variance causes overfitting. It is due to the model’s excessive sensitivity to small variations in training data. There is always a trade-off between Bias and Variance. As reducing one error increases the other and vice versa.
There are numerous articles on ML which are better. But here it is an effort to consolidate all the important stuff as I also learn and develop my ML skills. Personally, I prefer using Python and Scikit-Learn. There are other languages and libraries like R, Keras, Tensorflow which we might explore as we go further.
Stay tuned for the next article in the series where we will learn about Regression Algorithms. Till then happy learning! 😃
Posted on February 13, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.