Neural Networks for Beginners
Aravind S
Posted on February 5, 2022
Neural Network also known as Artificial Neural Network (ANN) is a subset of Machine learning algorithms. Its name and structure are inspired by the human brain, mimicking the way biological neurons pass signals from one neuron to another, allowing computer programs to recognize patterns and solve common problems in the fields of AI, machine learning, and deep learning.
Types of Neural Networks
There are 3 types of Neural Networks:
- Artificial Neural Networks (ANN)
- Convolutional Neural Networks(CNN)
- Recurrent Neural Networks (RNN)
In this post, we'll look at the basics for building an Artificial Neural Network.
The motivation of Neural Network
The basic structure of the neural network is inspired by the architecture of the neurons in the human brain. In the biological neuron, the dendrites accept the input signal from other neurons, and the Axon transmits the impulse to the next neuron. Synapsis is the structure where the signals are transferred from one neuron to another.
Basic structure of ANN
Artificial Neural Networks are comprised of an input layer, one or more hidden layers, and an output layer. Each layer is made of nodes/neurons that are connected to nodes/neurons on different layers. Each connection are having weights associated with it. We will know more about these weights.
Now, let's build a simple neural network for predicting the price of a house based on the size of the house.
This is a neural network with a single hidden layer. In this neural network, the input values from the input layer are mapped to the output layer using the function in the hidden layer called the Activation function ( σ(x) ).
Suppose the neural network gives the predictions as in the given graph:
So using the Neural Network we can predict an output whenever input is fed to it. From this we understand the basic working of Artificial Neural Network.
Neuron Model
Let us see what are the calculations performed in a neuron.
The figure shows a neuron at layer-k. The input values of the neuron are x1, x2,x3........xn which can be represented as a vector X. The weights assigned to each input value are w1, w2, w3 ....., wn respectively, and can be represented by the vector W, and bias b. The value Z in layer-k is computed by the formula:
After the value of Z is computed, then an activation function is applied to obtain the output of the layer-k
Activation functions
Activation function decides, whether a neuron should be activated or not by calculating weighted sum and further adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.
A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.
Some of the activation functions are:
Perceptron model
The perceptron learning algorithm is a binary classification algorithm. It is the simplest type of neural network model.
It consists of a single neuron that takes the input and predicts class labels (0 or 1).
The perceptron algorithm predicts the output that is close to the ground truth by defining the appropriate metric or loss function. We will discuss more about loss function when we discuss logistic regression. This can be done by changing the weights (w1, w2, w3, ...., wn) and the bias (b). These are referred to as parameters.
Perceptron learning rules
Logistic regression
Logistic regression is a learning algorithm used in supervised learning problems when the output y are all either zero or one. The goal of logistic regression is to minimize the error between its predictions and training data.
The parameters used in logistic regression are:-
- The input feature vector x
- The training label y
- The weights w
- The bias b
- The sigmoid function s = σ(z) = 1 / (1 + e^(-z))
- The output y' = σ(w.T . x + b) {w.T is transpose of w}
(w.T . x + b) is a linear function but since we are looking for a probability constraint between [0,1], the sigmoid function is used.
- If z is a large +ve number, σ(z) = 1
- If z is a large -ve number, σ(z) = 0
- If z =0, σ(z) = 0.5
Logistic regression: Cost function
To train the parameters w and b, we need to define a cost function. We update the parameters until we get a minimum cost function.
Before knowing the cost function, let's understand the Loss function
Loss function:-
The loss function minimizes the discrepancy between the prediction and the actual value in the dataset. In other words, the loss function computes the error for a single training example.
The maximum likelihood function for the Logistic regression can be given by:
Taking log on both sides:
This is called the maximum log likelihood function. Maximizing the log likelihood function is equivalent to minimizing the negative log likelihood. This is called the loss function for logistic regression.
Cost function:-
The cost function is the average of the error function of the entire training set.
This is cost function for the classification problem known as Cross Entropy.
Similarly there are different types of cost function like Mean Error (ME), Mean Squared Error (MSE), Mean Absolute Error (MAE).
Gradient Descent
Gradient descent is a commonly used optimization algorithm in neural networks, to update the parameters. Using the gradient descent the weights keep updating until the cost function is close to zero or minimum.
Let L(w) be the cost function for a neural network
First, we will find an arbitrary point to evaluate the performance. From that point of the curve, we find the slope (derivative). And we use that value of slope to update the parameters. We keep finding the slop and updating the parameters until the lowest point of the curve is reached to give the minimum cost. This point is known as the Point of Convergence.
Learning rate:-
Learning rate is the size of the steps that are taken to update the parameter to reach the minimum cost. It is a small value (say 0.01) evaluated and updated based on the behavior of the cost function.
So in gradient descent we compute the derivative of the cost function L with respect to the parameter w.
The parameters are updated using the formula:
here η is the Learning rate
Intuition about derivatives
The derivatives of a function f(x), with respect to x is the rate of change of f(x) with respect to the change in x
Derivative of a function = rate of change = slope
Computation graph
The computation graph is a directed graph which is used for expressing and evaluating mathematical expressions
For example, let us consider an equation
L = (a + 6) * (b+1)
This equation can be represented by the given computation graph
The computation graph computes the mathematical operations performed on a variable by traversing the graph in the forward direction. This method of traversing the computation graph in a forward direction is called Forward propagation.
In the neural networks, the forward propagation calculates and stores the intermediate variables that are, the parameters, from the input layer to the output layer.
Backpropagation:-
The backpropagation algorithm is used to find the derivative of a mathematical equation with respect to a variable using the computation graph. This technique is used in gradient descent to update the parameters and thereby reduce the cost function. Backpropagation is performed by traversing the computation graph in the reverse order to find the derivatives of the adjacent nodes and apply the chain rule.
Let's perform backpropagation to the computation graph we discussed in the previous section to find the derivative of L w.r.t a and b.
So, by backpropagation we can find the derivative of L from the computation graph.
Build a Neural Network with One hidden Layer
Let's build a neural network with a hidden layer and compute the output.
Here {x1,x2,x3....xn} are the input features and w1 and b1 are the weight and the bias vectors for the hidden layer. And the output from the hidden layer after applying the activation function is represented by the vector a1.
For the output layer, w2 and b2 are the weight and bias vectors respectively, and a2 is output from the output layer after applying the activation function.
The computation graph of the neural network is computed. Then forward propagation is performed to find the Cost (L). After that backward propagation is performed to find the gradients of the parameters.
A single forward and backward propagation step is known as epoch.
After computing the gradients, the gradient descent algorithm is performed to update the parameters.
We keep updating the parameters until the cost function is minimized. Finally we get an optimized Neural Network model.
Steps performed to build the model
- Assign parameters with random values
- Perform step-3 to step 6 until cost is minimized
- Perform forward propagation to find the output
- Compute the cost
- Perform backpropagation to find the gradients
- Update the parameters
At last we build a Neural Network model.
These are the basic steps for building a neural network model. Much more complex neural networks can build by adding more layers, changing the activation function, increasing the epochs, etc. Play with all the hyperparameters and build the neural network suitable for your problem statement.
If you like this article, please leave a like or a comment.
Posted on February 5, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.