How neural network works? Let's figure it out

liashchynskyi

Petro Liashchynskyi

Posted on January 19, 2019

How neural network works? Let's figure it out

Hey, what's up 😁 In my previous article i have described how to build neural network from scratch with only JavaScript. Today, at the request of several people, i'll try to explain mathematical principle of neural networks. Bro, you finally will understand what under the hood of that monster is!

And first, i'm gonna tell you another secret: there's no magic, just only math 😵

This article is based on my previous one. If you don't read it yet, it's time to do that! I will use the same formulas and try to explain them. Let's go!

Preparation

I'm gonna solve XOR again 😅 It's not a joke, bro! There are many data science books start with solving it 😎 One more time i remind you XOR input table.

Inputs Outputs
0 0 0
0 1 1
1 0 1
1 1 0

To demonstrate it let's use the following structure of neural network.

nn structure

Here we have 2 neurons in input layer, 4 in hidden and 1 in output layer.

Weights initialization

The main goal of neural network training is adjusting the weights to minimize the output error. In most cases, the weights is initializing randomly and during neural net training these ones is adjusting by backpropagation algorithm.

So, let's initialize the weights randomly from [0, 1] range.

weights

Graphically, it looks like this.

weights-init

Forward propagation

Ok, let's compute neuron inputs. I will use only one input case to save time: 0 and 1 so the output will be 1.

The formula:
net

So, for the first neuron in the hidden layer:

net1_h = 0 * 0.2 + 1 * 0.6 = 0.6

/**

1..n, n = 2 (2 neurons in the input layer)

0 value of the first input element
1 value of the second input element

0.2 the weight from first input neuron to first hidden
0.6 the weight from second input neuron to first hidden

Understand, bro? 😏 

*/

Enter fullscreen mode Exit fullscreen mode

For second one and others:

net2_h = 0 * 0.5 + 1 * 0.7 = 0.7
net3_h = 0 * 0.4 + 1 * 0.9 = 0.9
net4_h = 0 * 0.8 + 1 * 0.3 = 0.3

Enter fullscreen mode Exit fullscreen mode

Now, we need one more thing - we need to choose activation function. I'll use sigmoid.
sigmoid

The formula and derivative:
sigm
deriv

f(x) = 1 / (1 + exp(-x))
deriv(x) = f(x) * (1 - f(x)) 

Enter fullscreen mode Exit fullscreen mode

So, now we apply our activation to each of computed net:

output1_h = f(net1_h) = f(0.6) = 0.64
output2_h = f(net2_h) = f(0.7) = 0.66
output3_h = f(net3_h) = f(0.9) = 0.71
output4_h = f(net4_h) = f(0.3) = 0.57

Enter fullscreen mode Exit fullscreen mode

We've got the output values for each neuron in the hidden layer. Graphically, it looks like this:

w-hidden

And now, when we've got output values for hidden layer neurons we can calculate the output value for the output layer.

net_o = 0.64 * 0.6 + 0.66 * 0.7 + 0.71 * 0.3 + 0.57 * 0.4 = 1.28
output_o = f(net_o) = f(1.28) = 0.78

Enter fullscreen mode Exit fullscreen mode

And here we go.

out

Back propagation

Bro, look at the output value. What do you see? 0.78 right? If you remember the XOR table you know that we should have got 1 for this case 0 1, but we've got 0.78. That's called an error. Let's calculate that.

Output error and delta

The formula:

error

target = 1
error = target - output_o = 1 - 0.78 = 0.22

Enter fullscreen mode Exit fullscreen mode

Now, we need to calculate the delta error. In general, that's the value by which you adjust the weights.

The formula:

delta

You can use this site for sigmoid derivative calculation.

delta_error = deriv(output_o) * error = deriv(0.78) * 0.22 = 0.21 * 0.22 = 0.04

Enter fullscreen mode Exit fullscreen mode

Hidden error and delta

Let's do the same for each neuron in the hidden layer. The formula is different a little bit.

error-hidden

We need to calculate the error for each neuron. Remember it, bro. Let's get started!

error1_h = delta_error * 0.6 = 0.04 * 0.6 = 0.024
error2_h = delta_error * 0.6 = 0.04 * 0.7 = 0.028
error3_h = delta_error * 0.6 = 0.04 * 0.3 = 0.012
error4_h = delta_error * 0.6 = 0.04 * 0.4 = 0.016

Enter fullscreen mode Exit fullscreen mode

And again the delta!

delta

delta_error1_h = deriv(output1_h) * error1_h = deriv(0.64) * 0.024 = 0.22 * 0.024 = 0.005
delta_error2_h = deriv(output2_h) * error2_h = deriv(0.66) * 0.028 = 0.224 * 0.028 = 0.006
delta_error3_h = deriv(output3_h) * error3_h = deriv(0.71) * 0.012 = 0.220 * 0.012 = 0.002
delta_error4_h = deriv(output4_h) * error4_h = deriv(0.57) * 0.016 = 0.23 * 0.016 = 0.003

Enter fullscreen mode Exit fullscreen mode

The time has come! 😎

Now, we have all variables to update the weights. The formulas look like this.

wetights

Let's start from the hidden to the output.

learning_rate = 0.001

hidden_to_output_1 = old_weight + output1_h * delta_error * learning_rate = 0.6 + 0.64 * 0.04 * 0.001 = 0.6000256
hidden_to_output_2 = old_weight + output2_h * delta_error * learning_rate = 0.7 + 0.66 * 0.04 * 0.001 = 0.7000264
hidden_to_output_3 = old_weight + output3_h * delta_error * learning_rate = 0.3 + 0.71 * 0.04 * 0.001 = 0.3000284
hidden_to_output_4 = old_weight + output4_h * delta_error * learning_rate = 0.4 + 0.57 * 0.04 * 0.001 = 0.4000228

Enter fullscreen mode Exit fullscreen mode

We've got the values too close to the old weights. It's because we chose the learning rate too small. It's a very important hyper parameter. When you choose it too small - your network will training for years 😄 Otherwise, when it's a large number - your network will train faster, but it's accuracy may be low for new data. So you have to choose it correctly. The optimal value is in range between 1e-3 and 2e-5.

Ok, let's do the same for the input to the hidden synapses.

//for the first hidden neuron
input_to_hidden_1 = old_weight + input_0 * delta_error1_h * learning_rate = 0.2 + 0 * 0.005 * 0.001 = 0.2
input_to_hidden_2 = old_weight + input_1 * delta_error1_h * learning_rate = 0.6 + 1 * 0.005 * 0.001 = 0.600005

//for the second one
input_to_hidden_3 = old_weight + input_0 * delta_error2_h * learning_rate = 0.5 + 0 * 0.006 * 0.001 = 0.5
input_to_hidden_4 = old_weight + input_1 * delta_error2_h * learning_rate = 0.7 + 1 * 0.006 * 0.001 = 0.700006

//for the third one
input_to_hidden_5 = old_weight + input_0 * delta_error3_h * learning_rate = 0.4 + 0 * 0.002 * 0.001 = 0.4
input_to_hidden_6 = old_weight + input_1 * delta_error3_h * learning_rate = 0.9 + 1 * 0.002 * 0.001 = 0.900002

//for the fourth one
input_to_hidden_7 = old_weight + input_0 * delta_error4_h * learning_rate = 0.8 + 0 * 0.003 * 0.001 = 0.8
input_to_hidden_8 = old_weight + input_1 * delta_error4_h * learning_rate = 0.3 + 1 * 0.003 * 0.001 = 0.300003

Enter fullscreen mode Exit fullscreen mode

That's it! Finally 😉

Conclusions

Oh, finally we did all the math stuff! But we only did that for one training set - 0 and 1. For our problem we solve (XOR) we have 4 training sets (see the table above). That means you have to do the same calculations we just did above for each training set! Brrr, that's terrible 😑 Too much math 😆

So, in machine learning when you do one forward propagation step (from the input layer to the output) and one backward (from the output layer to the input) for one training set it's called an iteration. Another important term is epoch. Epoch counter is iterating when you pass through your neural network all the training sets. In our case, we have 4 training sets. One iteration means one training set passed through neural network. When all training sets passed through a network - here we have one epoch. Then: 4 iterations equals 1 epoch. Understand, bro? 🤗 In general, more epochs - a higher accuracy, less epochs - a lower accuracy.

That's it. No magic, only math. Hope, you've understood it, bro 😊 See ya! Happy coding 😇

💖 💪 🙅 🚩
liashchynskyi
Petro Liashchynskyi

Posted on January 19, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related