Build your first Neural Network with the Keras API
Muiz Alvi
Posted on December 29, 2020
Objective
This tutorial will help programmers build, train and test their first Neural Network using the powerful Keras python library.
Table of Contents
Introduction
About Neural Networks and Keras
Github code
Problem Statement
Generating Dataset
Building a Sequential Model
Training the Model
Testing the Model using Predictions
Plotting Predictions using Confusion Matrix
Final Code
Conclusion
Introduction
In pursuit of learning about the field of artificial intelligence, many come across the term 'Neural Networks'. They realize the importance of these algorithms and their application in the field of deep learning, however face difficulty building their own.
This tutorial will not only show you how to build a neural network from scratch, but will also walk you over the code for training and testing your model. All while solving an actual deep learning problem in the process!
Note: The Keras API along with 10,000 other such python libraries can be accessed from the Anaconda Navigator. You can learn all about acquiring Anaconda and installing the Keras API in the Setting up Python environments using Anaconda tutorial.
About Neural Networks and Keras
Artificial Neural Networks, ANNs or Neural Networks are a series of algorithms that are modeled after the biological activity of the human brain. Neural Networks compose of layers. Each layer consists of nodes (also called neurons).
You can learn more about Neural Networks from the following video created by DeepLizard:
Keras is a python library that uses Tensorflow as its back-end. This library allows us to build, train and test models effectively. It also allows us to make use of its own pre-existing models. You can learn more about the deep learning API from the Keras website.
Github Code
The following problem statement along with the code for this blog are all available on my Github profile and compiled in a Jupyter Notebook titled First Neural Network with Keras API. You can view and make use of the code to your liking. I will encourage you to read through this blog as well for better explanation of the written code.
Problem Statement
An experimental drug was tested on 2100 individual in a clinical trail. The ages of participants ranged from thirteen to a hundred. Half of the participants were under the age of 65 years, the other half were 65 years or older.
Ninety five percent patients that were 65 years or older experienced side effects. Ninety five percent patients under 65 years of age experienced no side effects.
You have to build a program that takes the age of a participant as input and predicts whether this patient has suffered from a side effect or not.
Steps:
Generate a random dataset that adheres to these statements
Divide the dataset into Training (90%) and Validation (10%) set
Build a Simple Sequential Model
Train and Validate the Model using the dataset
Randomly choose 20% data from dataset as Test set
Plot predictions made by the Model on the Test set
Generating Dataset
First we import some of the libraries needed for generating the dataset.
We are importing numpy as all our variables are n dimensional arrays. Rest of the libraries will be used for randomizing, shuffling and scaling data respectively.
Next we initialize the empty lists for training samples along with their labels.
train_labels=[]train_samples=[]
The train_samples list includes participant age and train_label includes whether they have suffered from side effects (denoted by '1') or not (denoted by 'zero').
We now move towards randomly generating values for both lists.
foriinrange(50):# The 5% of younger individuals who did experience side effects
random_younger=randint(13,64)train_samples.append(random_younger)train_labels.append(1)# The 5% of older individuals who did not experience side effects
random_older=randint(65,100)train_samples.append(random_older)train_labels.append(0)foriinrange(1000):# The 95% of younger individuals who did not experience side effects
random_younger=randint(13,64)train_samples.append(random_younger)train_labels.append(0)# The 95% of older individuals who did experience side effects
random_older=randint(65,100)train_samples.append(random_older)train_labels.append(1)
The first 50 iterations generate random ages for participants younger than 65 that did suffer from side effects (labeled
'1') and participants 65 and older that did not suffer from side effects (labeled '0'). The random age is generated, placed inside the random_younger or random_older variable and then appended into the train_samples list. A value for the label is also appended into the train_label list. For the next 1000 iterations, a similar approach is taken but for the participants younger than 65 that did not suffer from side effects (labeled
'0') and participants 65 and older that did suffer from side effects (labeled '1').
Once the iterations are complete, the lists are converted to arrays. The data of these arrays are also shuffled.
The conversion is required as calculations are done on n dimensional arrays and not lists. The shuffling is done to remove any order imposed on the data set during the creation process.
The next step requires us to transform and scale data in order to pass it through our model.
In the first line we specify the range for the imported MinMaxScalar function and in the next line, we are scaling the values of the train_samples according to the specified range whilst also reshaping the n dimensional array to a shape appropriate for our model.
Building a Sequential Model
Now we move towards building our Neural Network. The first step is to import the Tensorflow and Keras library, along with certain parameters required for our model. These parameters are all being imported from within Keras that is using a Tensorflow backend.
Our model is a simple sequential model, meaning that the layers are linearly stacked, hence we are importing Sequential. From layers we are importing the fully-connected or Dense layer along with the type of activation function needed for scaling data sent to nodes. Optimizers are also required to help minimize loss, which is a crucial step in a neural network's calculation. Finally categorical_crossentropy is the type of loss function that we will be using for our model.
If you are facing difficulty in understanding any of these terms then it is a good idea to check them out by clicking on them as this will take you directly to the documentation. However there is no need to worry as these things do take time and improve with practice.
Let us now build our model. I am creating a model with one input (with 16 units), one hidden (with 32 units) and one output (with 2 units) layer. Deciding the number of layers and units to set differ from problem to problem and can be improved overtime through practice.
This code is pretty simple, we are creating a sequential model with three dense layers. The input layer takes in a tuple of integers that matches the shape of the input data, hence (1,). 'relu' and 'softmax' are types of activation functions.
Here is what the summary of the model should look like:
Training the Model
To train our model we simple use the model.compile() function followed by the model.fit() function.
The compile function pieces our model together while the fit function begins the training process. We specified the 'accuracy' metric as we want to see the accuracy of our model during training. By using the validation_split parameter we are automatically splitting the dataset into training and validation sets. '0.1' here means that 10% goes to the validation set and the remaining 90% goes to the training set. The dataset is also being split into batches of 10 (batch_size) and will be passed through the model 30 times. Each cycle is referred to as an 'epoch'. 'Verbose' here refers to how detailed the training outputs would be and so we set that to '2' which is maximum detail.
Preprocessing Test Data
This is similar to the data scaling and transforming done for the training and validation data sets.
First we initialize the test lists.
test_labels=[]test_samples=[]
One for ages and one for labels (if suffered from side effects (denoted as '1') or if not suffered from side effects (denoted as '0')).
Now we randomly generate values for both lists.
foriinrange(10):# The 5% of younger individuals who did experience side effects
random_younger=randint(13,64)test_samples.append(random_younger)test_labels.append(1)# The 5% of older individuals who did not experience side effects
random_older=randint(65,100)test_samples.append(random_older)test_labels.append(0)foriinrange(200):# The 95% of younger individuals who did not experience side effects
random_younger=randint(13,64)test_samples.append(random_younger)test_labels.append(0)# The 95% of older individuals who did experience side effects
random_older=randint(65,100)test_samples.append(random_older)test_labels.append(1)
The first 10 iterations generate random ages for participants younger than 65 that did suffer from side effects (labeled
'1') and participants 65 and older that did not suffer from side effects (labeled '0'). The random age is generated, placed inside the random_younger or random_older variable and then appended into the test_samples list. A value for the label is also appended into the test_label list. For the next 200 iterations, a similar approach is taken but for the participants younger than 65 that did not suffer from side effects (labeled
'0') and participants 65 and older that did suffer from side effects (labeled '1').
We will now convert the lists to numpy arrays and shuffle, similar to what we did with the training/validation set.
We will also scale, transform and reshape our data to make it appropriate for our model. Again this is similar to the process done for the training/validation set.
In order to test our model we will make use of the predict() function, this will take each individual test age and take out the probability of it being part of either label. A rounding off function is then used to round off and keep the probability of the higher label only and discard the other label
In order to plot the results, I have used a confusion matrix. The code can be found on the scikit-learn website here. Simply copy the code from the website and run it.
Now make use of appropriate labels and plot the matrix.
And that's it! you've successfully created your first Neural Network that actually solves a real problem!
Final Code
Now that we're done with all the steps, your code should look something like this.
importnumpyasnpfromrandomimportrandintfromsklearn.utilsimportshufflefromsklearn.preprocessingimportMinMaxScalertrain_labels=[]# one means side effect experienced, zero means no side effect experienced
train_samples=[]foriinrange(50):# The 5% of younger individuals who did experience side effects
random_younger=randint(13,64)train_samples.append(random_younger)train_labels.append(1)# The 5% of older individuals who did not experience side effects
random_older=randint(65,100)train_samples.append(random_older)train_labels.append(0)foriinrange(1000):# The 95% of younger individuals who did not experience side effects
random_younger=randint(13,64)train_samples.append(random_younger)train_labels.append(0)# The 95% of older individuals who did experience side effects
random_older=randint(65,100)train_samples.append(random_older)train_labels.append(1)train_labels=np.array(train_labels)train_samples=np.array(train_samples)train_labels,train_samples=shuffle(train_labels,train_samples)# randomly shuffles each individual array, removing any order imposed on the data set during the creation process
scaler=MinMaxScaler(feature_range=(0,1))# specifying scale (range: 0 to 1)
scaled_train_samples=scaler.fit_transform(train_samples.reshape(-1,1))# transforms our data scale (range: 13 to 100) into the one specified above (range: 0 to 1), we use the reshape fucntion as fit_transform doesnot accept 1-D data by default hence we need to reshape accordingly here
importtensorflowastffromtensorflowimportkerasfromtensorflow.keras.modelsimportSequentialfromtensorflow.keras.layersimportActivation,Densefromtensorflow.keras.optimizersimportAdamfromtensorflow.keras.metricsimportcategorical_crossentropymodel=Sequential([Dense(units=16,input_shape=(1,),activation='relu'),Dense(units=32,activation='relu'),Dense(units=2,activation='softmax')])model.compile(optimizer=Adam(learning_rate=0.0001),loss='sparse_categorical_crossentropy',metrics=['accuracy'])model.fit(x=scaled_train_samples,y=train_labels,validation_split=0.1,batch_size=10,epochs=30,shuffle=True,verbose=2)test_labels=[]test_samples=[]foriinrange(10):# The 5% of younger individuals who did experience side effects
random_younger=randint(13,64)test_samples.append(random_younger)test_labels.append(1)# The 5% of older individuals who did not experience side effects
random_older=randint(65,100)test_samples.append(random_older)test_labels.append(0)foriinrange(200):# The 95% of younger individuals who did not experience side effects
random_younger=randint(13,64)test_samples.append(random_younger)test_labels.append(0)# The 95% of older individuals who did experience side effects
random_older=randint(65,100)test_samples.append(random_older)test_labels.append(1)test_labels=np.array(test_labels)test_samples=np.array(test_samples)test_labels,test_samples=shuffle(test_labels,test_samples)scaled_test_samples=scaler.fit_transform(test_samples.reshape(-1,1))scaled_test_samples=scaler.fit_transform(test_samples.reshape(-1,1))rounded_predictions=np.argmax(predictions,axis=-1)fromsklearn.metricsimportconfusion_matriximportitertoolsimportmatplotlib.pyplotaspltcm=confusion_matrix(y_true=test_labels,y_pred=rounded_predictions)# This function has been taken from the website of scikit Learn. link: https://scikit-learn.org/0.18/auto_examples/model_selection/plot_confusion_matrix.html
defplot_confusion_matrix(cm,classes,normalize=False,title='Confusion matrix',cmap=plt.cm.Blues):"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""plt.imshow(cm,interpolation='nearest',cmap=cmap)plt.title(title)plt.colorbar()tick_marks=np.arange(len(classes))plt.xticks(tick_marks,classes,rotation=45)plt.yticks(tick_marks,classes)ifnormalize:cm=cm.astype('float')/cm.sum(axis=1)[:,np.newaxis]print("Normalized confusion matrix")else:print('Confusion matrix, without normalization')print(cm)thresh=cm.max()/2.fori,jinitertools.product(range(cm.shape[0]),range(cm.shape[1])):plt.text(j,i,cm[i,j],horizontalalignment="center",color="white"ifcm[i,j]>threshelse"black")plt.tight_layout()plt.ylabel('True label')plt.xlabel('Predicted label')cm_plot_labels=['no_side_effects','had_side_effects']plot_confusion_matrix(cm=cm,classes=cm_plot_labels,title='Confusion Matrix')
This code is also available on my Github profile and compiled in a Jupyter Notebook titled First Neural Network with Keras API. From there you can make use of this code to your liking and suggest improvements there as well.
Conclusion
You should now have a good idea about how Neural Networks are built, trained, validated and tested. You can also check out other cool deep learning models in the following GitHub repository:
Repository containing models based on ideas of Machine learning and Deep learning
Machine_Learning_and_Deep_Learning_models
Repository containing models based on ideas of Machine learning and Deep learning. List of files:
Simple Sequential Model
Uses randomly generated trainin set (10% of which is used in validation set) and test data
Shows final predictions in a confusion matrix
Cat and Dog Classifier - Convolution Neural Network
Uses a data set of 1300 images (1000 for training set, 200 for validation set, 100 for test set) randomly picked out of a larger data set of 25000 images
I hope the tutorial was clear and covered everything, please use the discussion/comment section to let me know if you faced any difficulty or if any step is unclear. Thank you for taking out the time to read this article!