Machine Learning For Dummies: An Absolute Beginner’s Guide

Are you looking to get started with machine learning and want to know about the machine learning process in detail?

Or, do you want to learn how machine learning works, and aim to make your first machine learning project all by yourself?

Well… Whatever your reason may be, this article is an absolute treat for all the dummies to get started with machine learning.

All the major algorithms/techniques used in machine learning are explained herein with the most basic examples, which will help you understand the foundations of machine learning in the simplest of ways.

Article Contents

Toggle

Introduction to Machine Learning

Machine Learning is made of two words: ‘Machine’ and ‘Learning’. In simple terms, machine learning deals with the learning process of machines.

Wait, what?

Yes… That’s what it is all about!

But, I have a simple question from you:

Why would anyone want the machines to learn in the first place?

After all, if you teach machines everything you know and then let them do what you do for a living, they will most probably replace you, right?

No… That’s not how it works.

Machines may be able to perform a complex repetitive task in seconds, which you may have taken a thousand years to complete. Just take, for example, the task of writing a trillion multiples of ‘1.232’.

The main constraint for the machines is the level of intelligence they can achieve. As per some reports, the machines are still not being able to mimic the intelligence of a one-year-old.

So, machine learning is a quest to inculcate human-like intelligence into machines.

A simple example of an intelligent machine is the working of a search engine i.e., Google, BING.

If you were asked to find the articles related to machine learning from a billion articles submitted to google servers, that could have taken you years, if not decades.

However, lucky you… Google has saved you all those years of searching and headaches. They have made an intelligent recommendation system that allows you to search a thousand articles related to your interest in a matter of seconds.

How exciting, right?

There is a myriad of such applications in which you can ask the machines to do the boring tasks for you i.e., pick and pick place machine as per the color, medical diagnosis of diseases from the lab reports of million patients, etc.

But the real question still prevails, “How to make the Machines Learn?”

Basics of Machine Learning

Before you try to understand how machines learn, you need to (somewhat) imagine the learning process of a Baby.

If I tell you a simple tool that a baby uses to learn about the environment, then it will be ‘Data’.

Whether it’s mobility, culture, language, or simply crying, babies try to examine what is going around them, and how their response can alter the environment?

For example, if a baby boy is hungry, he may try different responses to convey that information to the environment. Eventually, he learns that crying is the best response to get what he wants.

In simple terms, babies are taking data from the environment, processing it, and then feeding it to their minds all the time. This helps them optimize their responses to different desires.

And, it is not just the babies… Teenagers are learning from adults, and adults are learning from senior citizens.

A simple example is that if you want to get rich, you need to learn the investing habits of a millionaire and then try to follow them.

On the contrary, if you want to avoid getting poor, you will need to learn the bad spending habits of the poor people and then try to eliminate them from your personality.

So, it is just the data that drives most of your daily decisions in your life.

In simple terms, if you want to make machines as intelligent as humans, the best strategy will consist of:

Feeding the data to machines
Helping the machines learn something meaningful from the data
Evaluating the output response from the machines

This simple strategy is termed Machine Learning. A basic example is to enable a robot to differentiate between two colors i.e., ‘Red’ and ‘Yellow’.

For such a task, you will need to provide some images of both the colors to the robot and inform the robot which image refers to which color. A specialized algorithm will help the robot learn the differences between the two colors. After the learning phase, the robot will be able to differentiate between the ‘Red’ and ‘Yellow’ colors without any aid.

*Machine Learning Robot differentaiting between Red and Yellow Color*

In a real-world setting, the whole process of machine learning is not as simple as it may seem. Since robots are machines, they only tend to understand numbers. So, there is a lot of mathematics involved.

Now let’s explore how you can inculcate intelligence within a real-world machine…

How does Machine Learning work?

Machine learning is all about imitating and adapting human-like behavior within dumb machines. In this framework, machines need to learn from data and experience to make intelligent decisions.

Machine learning algorithms consist of complex computations techniques which help machines learn ‘directly’ from the data rather than being explicitly programmed to carry out a specific task.

On a whole, machines are empowered to find natural patterns within the data, and then predict the unknown.

In simple terms, machine learning is all about getting computers to program themselves using the data provided.

In the normal programming practices, ‘Program’ and ‘Inputs’ are provided to the computer to produce the ‘Output’.

However, a machine learning algorithm produces a ‘Program’ using the ‘Inputs’ and ‘Outputs’ of the dataset. This ‘Program’ is then used to predict ‘Outputs’ by feeding in the ‘Unknown Inputs’.

A simple example to understand the whole machine learning paradigm is to consider the mathematical model of the straight line with a specified slope:

Y = 2(X) + 3

The above equation represents the traditional programming model. The value of Output ‘Y’ is not known, but you have somehow developed a function that will take ‘X’ as an Input and produce ‘Y’ such that a straight line is formed.

In the world of machine learning, Input ‘X’ and Output ‘Y’ coordinates of a straight line will be available and your job will be to train the machine to come up with a mathematical model of the straight line.

*Difference between Traditional Programming and Machine Learning*

On a side note, machine learning algorithms adaptively improve their performance as the number of Input/Output samples for learning increases over time.

The approach of machine learning also has an added advantage of abstraction over the traditional mode of programming.

Just imagine how hard it would be if you are assigned a task to write a program to recognize every object in this world.

In an ideal scenario, it will take you 15-20 years to do so only to find out later that those things are no longer in use.

However, you don’t need to worry much. Machine learning has got your back!

You just need to take some images of each object, assign them output labels, and then train a machine-learning algorithm to recognize as many objects as you desire.

Easy peasy, right?

At this point, you may be wondering about the Machine Learning Algorithms and how these algorithms work?

So, let’s get to that…

Machine Learning Types

There are two broad categories of machine learning algorithms based on the numerical nature of the Input Data and the Desired Outcome.

1. Classification

Classification-based algorithms work towards the prediction of discrete responses for the output i.e., Win/Lose, Spam/NotSpam, Wrong/Genuine, Black/White/Blue/Yellow, etc.

There is no restriction on the number of classes you need to deal with, but they should be in discrete form.

For instance, there could be 5 or 10 or 50 or 100 classes, but there should exist an upper limit.

2. Regression

Regression-based algorithms predict continuous responses for the output i.e., Trend in stock market prices, Weather Forecast, Winning Predictor, etc.

In this technique, you need a machine to learn about a function that can output any numerical value based on the input data.

A simple example of such a case is the Straight-Line mathematical model we have already discussed.

For an equation, Y = 2(X) + 3, you can get any response from positive to negative and from decimal to an integer. You will feed the ‘Inputs’ and ‘Outputs’ to a regression algorithm, and let it figure out the equation on its own.

Machine Learning Algorithms

A machine learning algorithm is fed with a dataset, and its job is to learn a function that can successfully predict a particular behavior or feature even by looking at the unseen data.

Depending on the nature of the learning dataset provided, machine learning algorithms fall into two categories:

Supervised Learning
Unsupervised Learning

Now let’s explore what each of these learning classes depicts, and also the basics of some algorithms contained within each class…

1. Supervised Learning

In supervised learning, the input data is provided alongside ‘Labeled’ outputs. In simple terms, if an algorithm is fed with an image of a ‘tree’, it is explicitly told that the image belongs to a ‘tree’.

On similar terms, if the images of ‘Apple’, ‘Mango’, and ‘Guava’ are provided for the training of a 3-Class fruit recognition model, then the numerical label of each class is also provided with the input images i.e., ‘1’ for Apple, ‘2’ for Mango, ‘3’ for Guava, and ‘0’ for Non-Identifiable input.

Rather than wasting any useful resources on grouping together similar items, the algorithm tries to extract and differentiate between the features of each class since it is already aware of the corresponding class.

The most common algorithms for supervised learning are explained herein briefly:

Linear Regression
Logistic Regression
Decision Trees
Support Vector Machines
Naïve Bayes
K-Nearest Neighbors

Linear Regression

It is all about finding a Linear relation between a Target Variable and one or more Predictors.

For example, you can find out the relation between height and weight. Or, you can predict the salary of a person given his age, education, and daily working hours.

Another interesting example is to find a relation between the number of hours studied and the marks obtained for a specific subject.

Since this technique is based on regression, the target variable has a predicted continuous value.

The simplest regression problem (Single Input ‘X’ and Single Output ‘Y’) can be represented by a straight-line equation:

Y = B1*X + B0

Let’s consider a scenario of forming a relation between ‘Weight’ and ‘Height’ using the linear regression model:

Weight = B1*Height + B0

In the data-set, you are provided with the value of ‘Height’ and the corresponding ‘Weight’. The job of linear regression is to optimize the values of ‘B1′ and ‘B0’ such that the Best-Fit line is obtained for all the points in the dataset.

Once the linear regression model is trained well enough, any value of ‘Height’ will result in the output ‘Weight’ on the pattern of the training data.

*Linear Regression for Height vs Weight Dataset*

Logistic Regression

Just as linear regression is used to generate a straight-line function to map outputs against inputs, logistic regression maps a dichotomous (two states) dependent variable against a set of independent variables.

For example, how does the probability of passing the exam (yes or no) change for every additional hour of studying the course books, or submitting an additional assignment?

How the body weight, calorie intake, and age influence the probability of having a heart attack?

The output of logistic regression is always bounded between ‘0’ and ‘1’ unlike that of linear regression.

For extreme values of inputs, the output is either ‘0’ or ‘1’. However, for the intermittent input values, the output varies between ‘0’ and ‘1’.

The following image depicts how likely a customer is to convert based on the age:

*Logistic Regression Model for Customer Conversion*

Decision Trees

This is a classification technique in which each independent variable or feature is divided into two sets to form a top-down tree-like shape.

This tree is then used to predict the class of the dependent variable, based on the discrete or continuous values of the independent variables.

In some cases, a single feature is enough to decide whether an output event will occur or not. On the contrary, multiple features can also be combined to predict the outcome.

Exciting thing is that all of the predictions can be made using an easy-to-visualize tree, hence the name decision trees.

The following image depicts whether a person will go out or not depending upon different weather conditions:

Support Vector Machines (SVM)

It is a classification algorithm whose objective is to find a hyperplane in n-dimensions (n-features) so that it distinctly classifies the data points.

The dimensions of the hyperplane depend upon the number of features defining a data point in a specific class.

For instance, if you have a dataset depicting whether a person will get a heart attack or not depending on the values of Cholesterol, BMI, Weight, and Height, then the hyperplane will have 4 dimensions.

A hyperplane for 2-features will be a LINE, whereas a hyperplane for 3-features will be a PLANE.

In SVM, the main objective is to find a hyperplane that maximizes the margin between the data points of distinct classes.

‘Support Vectors’ are the data points of different classes which are closer to the hyperplane and define the orientation/direction of the hyperplane.

*Support Vector Machines illustrating an Optimal Hyperplane*

Naïve Bayes

It helps to classify the data based on the probability of the features and assumes that every feature plays an independent role in determining the final output class.

For example, if an image consists of 2 Faces, 4 Arms, and 4 Legs, there is a high probability that two persons are present in the image. However, none of the features depend on each other in depicting the final output.

To estimate the probability of an output class given a specific feature, you can use Naïve Bayes Theorem:

P(class/feature) = [P(feature/class) * P(class)] / P(feature)

*Naive Bayes Model of Weather Forcasting (Image Credits: Analytics Vidhya)*

K-Nearest Neighbors

As the name implies, this technique looks for the K-Nearest Neighbors (already labeled with an output class) for any new data points.

The class of the labeled neighbors having the minimum distance from the new point gets assigned to that data point.

The most crucial step in this algorithm is to select a suitable value of K i.e., how many closest neighbors should be explored for any data point before predicting its class.

You can use the following steps to predict any data point using K-Nearest Neighbors:

Calculate the distance between the data point and each row (sample) of the training data i.e., Euclidean distance
Sort all the rows (training samples) in the order of increasing distances
Calculate the most frequent output class assigned to top K rows (samples)
The most frequent class gets assigned to the new data point

2. Unsupervised Learning

In this sort of learning, input data is not assigned any explicit Output Label or Class. Instead, it is the job of the algorithm to make explicit classes based on some specific/matching features. The model keeps on improving with each additional input data.

Unsupervised learning is used when you are not sure what to look for, and mostly for exploratory analysis of the raw data to recognize the hidden patterns.

Algorithms in unsupervised learning are mainly responsible for two applications, namely Clustering and Association.

In Clustering, data items are grouped based on similarity in some characteristic values. Data items within the same group will be identical to each other, but quite different from the items clustered together in another group.

In Association, you discover some explicit rules which describe larger portions of a dataset i.e., people who visit X place also visit Y place, people who buy item X are most likely to buy the item Y, etc.

Now let’s take a look into some algorithms available for unsupervised learning:

K-Means

This algorithm tends to cluster similar items together and discover underlying patterns in the dataset. However, you firstly need to define the fixed number of clusters (K) in the dataset.

In simple words, K-Means identifies the number of clusters and then assigns each data point to these fixed number of clusters. The central point of each cluster is called Centroid and plays a vital role in the progression of the algorithm.

You can follow a simple repetitive approach to implement K-Means as described herein:

Specify a fixed number of centroids (K) which are just random values having the same dimensions as the features of the dataset
Associate each input in the dataset with the closest centroid i.e, K-Clusters
For each newly formed cluster, find out the new centroids by averaging out all the input values contained within that specific cluster
Now repeat Steps 2 and 3 for each newly updated centroid i.e., associate the data points to new clusters and update the centroids. Repeat this process until the convergence occurs and the centroid value do not change anymore
Each new input is then predicted based on the distance to the nearest cluster

For a visual example of the complete process, you can refer to this great resource.

K-Medoids

This approach is similar to K-Means, however, the cluster centers (medoids) are selected from explicit samples in the dataset rather than just some random numbers.

A medoid can be defined as any point in the cluster, whose dissimilarities from all the other data points in the cluster are minimum.

The dissimilarity is just the sum of absolute differences between a medoid and all the other points in the cluster.

Here is the process flow which you can use to implement K-Medoids on an unlabeled data set:

Initialize: Select ‘K’ random points from the data set as medoids
Assignment: Associate each data point to the close medoid using a distance metric i.e, Euclidean Distance
Update: For each medoid ‘m’ and data point ‘o’ associated with the medoid:
Swap ‘m’ and ‘o’; associate each data point to the closest medoid and recompute the overall cost
Compute the total cost of configuration (average dissimilarity of ‘o’ to all data points associated with ‘m’).
Select the medoid ‘o’ with the lowest configuration cost

You can refer to this resource for a visual example of the K-Medoid.

Machine Learning Datasets

For any machine learning project, your first step is to collect the relevant dataset from online repositories, web servers, the company’s CRM, etc.

Once you have done so, you need to divide your dataset into three categories:

Training Dataset
Validating Dataset
Testing Dataset

In normal circumstances, 70% of the input data is reserved for Training Dataset, 20% for Validating Dataset, and 10% for Testing Dataset.

Training Dataset refers to the data points which the model uses to learn the prediction model. The learned model needs to perform well on the unseen data, else the model is not classified as a well-trained model

Validating Dataset is used by the algorithm to verify the trained model on the unseen data. In this way, the algorithm can fine-tune the learning parameters of the model by evaluating it on the previously not seen data.

Testing Dataset refers to the input samples used for the testing of the fully-trained and verified model before it is deployed into production. The accuracy of the algorithm is usually depicted by its performance on the training dataset.

Machine Learning Applications

With the influx of data in all the fields/businesses, there is no area in which you won’t be able to find the applications of machine learning.

Numerous machine learning applications have seamlessly integrated into our daily life that we don’t even realize that most of our decisions are guided by different algorithms of machine learning.

Some of the commonly used applications of machine learning include:

Personal Assistants: Siri, Alexa
Weather Prediction for Next Day, Month, or Year
Win Predictor in a sports tournament
Medical Diagnosis
Shopping Recommendations on E-Commerce Websites
Customized Ads on different niche websites
Web Search to rank different pages as per your search intent
Face Recognition
Automatic Steering in Self-Driving Cars
Insurance Risk Assessment
…

If you can think of any other Machine Learning Applications, don’t forget to mention them in the comments section.

Also, share the article with your colleagues to help them get started with machine learning.

Awais Naeem

He is the owner and founder of Embedded Robotics and a health based start-up called Nema Loss. He is very enthusiastic and passionate about Business Development, Fitness, and Technology. Read more about his struggles, and how he went from being called a Weak Electrical Engineer to founder of Embedded Robotics.

Follow him on Facebook, Twitter, LinkedIn

Machine Learning for Dummies: An Absolute Beginner’s Guide

Introduction to Machine Learning

Basics of Machine Learning

How does Machine Learning work?