Machine Learning: Not just a buzzword anymore..
Nidhi Agrawal
Posted on May 1, 2020
If you have been following the latest trends in technology, you have probably noticed that Machine Learning (ML) is not just a buzzword anymore but is responsible for the most important breakthroughs in Artificial Intelligence (AI). ML will come to change the way you work, earn a livelihood, purchase and consume goods and services. Knowing this today opens up great opportunities for those who move quickly and decisively to anticipate and benefit from this AI-led revolution tomorrow. There are lots of examples to validate the claims (from image classification to text generation to language translation), but this article is about a quick overview of ML for people that either start from zero or those that are after a concise summary.
ML -- What is it?
ML enables computers to find patterns in data and then use those patterns to make decisions rather than being explicitly programmed to carry out a certain task. In simple words You're trying to make a computer smart enough to learn from the data it's fed so that after a point of time the computer is able to predict further data.
The workflow is pretty simple:
•You have data which contains patterns.
•You supply it to a ML algorithm which finds the patterns and generates a model.
•The model recognizes these patterns when presented with new data.
Every day examples include:
Medical diagnosis, Customer’s ability to pay back a loan, Market analysis / Stock trading, Customer segmentation, Spam emails.
Need of ML?
Using Machine Learning, its possible to handle previously unseen scenarios. Once a Machine Learning model with good generalization capabilities is learned, it can handle them and take decisions accordingly. Note that in a traditional program, you need to tell what decisions need to be taken if a particular scenario occurs. Now imagine a billion scenarios are present, you clearly cannot write a code which can handle all these new scenarios. Hence the need for machine learning.
Who is a Data Scientist?
Data Scientist is ‘the amazing job of the 21st century’. Fast forward to 2019, a Data Scientist is someone with multidisciplinary skills ranging from mathematics, statistics, machine learning, computer science, programming and a business domain expertise.
||ML Pipeline||
Data scientists define a pipeline for data as it flows through their ML solution. Each step of the pipeline is fed data processed from its preceding step. The term ‘pipeline’ is slightly misleading as it implies a one-way flow of data; instead the ML pipelines are cyclical and iterative as every step is repeated to finally achieve a successful algorithm.
The key stages are described below:
1.Problem Definition: Define the business problem you require an answer for.
2.Data Ingestion: Identify and gather the data you want to work with.
3.Data Preparation: Since the data is raw and unstructured, it is rarely in the correct form to be processed. It usually involves filling missing values or removing duplicate records or normalizing and correcting other flaws in data, like different representations of the same values in a column for instance. This is where the feature extraction, construction and selection takes place too.
4.Data Segregation: Split subsets of data to train the model, test it and further validate how it performs against new data.
5.Model Training: Use the training subset of data to let the ML algorithm recognize the patterns in it.
6.Candidate Model Evaluation: Assess the performance of the model using test and validation subsets of data to understand how accurate the prediction is. This is an iterative process and various algorithms might be tested until you have a Model that sufficiently answers your question.
7.Model Deployment: Once the chosen model is produced, it is typically exposed via some kind of API and embedded in decision-making frameworks as a part of an analytics solution.
8.Performance Monitoring: The model is continuously monitored to observe how it behaved in the real world and calibrated accordingly. New data is collected to incrementally improve it.
||Training Algorithms||
ML algorithms are divided into two broader categories
Supervised Learning (SL): The value you want to predict is in the training data, so the algorithm can predict future outputs in a reasonable manner. Here Data is labelled
Unsupervised Learning (UL):The value you want to predict is not in the training data, so the algorithm finds hidden patterns (according to similarities or differences) or intrinsic values. Here Data is unlabeled
The main subcategories are:
Classification (Supervised Learning — Classification)
A subcategory of Supervised Learning, Classification is the process of predicting categorical/discrete responses i.e. the input data is classified into categories. Another application is anomaly detection i.e. the identification of outliers/unusual objects that do not appear in a normal distribution.
Regression(Supervised Learning — Regression)
Another subcategory of SL, Regression is the process of predicting continuous responses (i.e. numeric values) which normally answer questions like ‘How many’/ ‘How much’.
Clustering(Unsupervised Learning — Clustering)
A subcategory of UL, Clustering is the process used for exploratory data analysis to find hidden patterns or groupings/partitions of data.
Machine Learning is an exciting subject, it is art and it is science! In this article I have just explored the basics — my aim was to make Machine Learning ‘as simple as possible, but not one bit simpler’ — as Einstein used to say!
Thanks for reading!
Nidhi Ghanshyam Agrawal.
Posted on May 1, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.