Using Transfer Learning and TensorFlow to Identify Dog Breeds from Images
Nicolas Vallée
Posted on March 30, 2022
What are we building here?
In this project, we're using Machine Learning to identify different breeds of dogs from images.
To do this, we'll use data from the Kaggle dog breed identification competition. The dataset consists of 10,000+ labeled images of 120 different dog breeds.
This type of problem is called multi-class image classification. Multi-class because we're trying to classify multiple dog breeds. If, on the other hand, we wanted to classify dogs versus cats, it would be called binary classification.
I completed this project in March 2022 as part of the Complete Machine Learning & Data Science Bootcamp, taught by Andrei Neagoie and Daniel Bourke. If you're looking for a beginner friendly course teaching Data Science and Machine Learning from scratch, I highly recommend you check out this one.
Follow this link: Dog Vision Project to open my completed notebook, which you can also open in Colab.
Why is this an interesting topic?
Multi-class image classification is both a common and an important problem in Machine Learning. This is the same kind of technology that Tesla uses for their self-driving cars, or Airbnb to automatically add information to their listings.
How are we going about this?
The first step in a deep learning problem is to get the data ready by turning it into numbers.
We will go through the following workflow:
- Get the data ready (download from Kaggle, store, import).
- Prepare the data (preprocessing, the 3 sets, X & y).
- Choose and fit a model (TensorFlow Hub,
tf.keras.applications
, TensorBoard, EarlyStopping). - Evaluate the model (making predictions and comparing them with the ground truth labels).
- Improve the model through experimentation (starting with 1000 images, making sure it works, then increasing the number of images).
- Save, share, and reload the model (once we're happy with the results).
For the preprocessing of our data, we're going to use TensorFlow 2.x. We will turn our data into Tensors (arrays of numbers which can be run on GPUs) and then allow a machine learning model to find patterns between them.
Our machine learning model will be a pretrained deep learning model from TensorFlow Hub.
The process of using a pretrained model and adapting it to a specific problem is called transfer learning. Rather than training our own model from scratch, which could be time consuming and expensive, we will leverage the patterns of another model which has already been trained to classify images.
Getting our workspace ready
Before we get started, we need to:
- Import TensorFlow 2.x
- Import TensorFlow Hub
- Make sure we're using a GPU
import tensorflow as tf
import tensorflow_hub as hub
print("TF version:", tf.__version__)
print("Hub version:", hub.__version__)
# Check for GPU
print("GPU", "is available, we're good to go!"
if tf.config.list_physical_devices("GPU")
else "is not available. Change runtime type to GPU
before proceeding.")
TF version: 2.8.0
Hub version: 0.12.0
GPU is available, we're good to go!
What is a GPU and why do we need one?
A GPU (graphics processing unit) is a computer chip that is faster at doing numerical computations.
By default, Colab runs on a computer located on Google's servers which doesn't have a GPU attached to it.
We can fix this by changing the runtime type:
- Go to Runtime.
- Click "Change runtime type".
- Where it says "Hardware accelerator", choose "GPU".
- Click save.
- The runtime will be restarted to activate the new hardware, so we'll have to rerun the above cells.
- If the steps have worked, we should see a print out saying "GPU is available".
To wee how much a GPU speeds up computing, Google Colab has a demonstration notebook available.
Getting our data ready
Getting our data ready to be used with a machine learning model is an important step.
There are a few ways to do this. Many of them are detailed in the Google Colab notebook on I/O (input and output).
Since the data we're using is hosted on Kaggle, we could use the Kaggle API.
Another method is to upload the data to our Google Drive, mount the drive in this notebook, and import the files.
Mounting Google Drive
# Running this cell will provide a token to link our drive to
# this notebook
from google.colab import drive
drive.mount('/content/drive')
We now see a "drive" folder available under the Files tab.
This means we'll be able to access files in our Google Drive in this notebook.
For this project, I've downloaded the data from Kaggle and uploaded it to my Google Drive as a .zip file under the folder "ML/Dog Vision".
To access it, we need to unzip it.
Note: Running the cell below for the first time could take a while (a couple of minutes is normal). After we've run it once and got the data in our Google Drive, we don't need to run it again.
# Use the '-d' parameter as the destination for where the
# files should go
!unzip "drive/MyDrive/ML/Dog Vision/dog-breed-identification.zip" -d "drive/MyDrive/ML/Dog Vision/"
Accessing the data
Now that the data files are available on our Google Drive, we can start to check them out.
Let's start with labels.csv
which contains all of the image ID's and their associated dog breed (our data and labels).
# Checkout the labels of our data
import pandas as pd
labels_csv = pd.read_csv("drive/MyDrive/ML/Dog Vision/labels.csv")
print(labels_csv.describe())
print(labels_csv.head())
id breed
count 10222 10222
unique 10222 120
top 000bec180eb18c7604dcecc8fe0dba07 scottish_deerhound
freq 1 126
id breed
0 000bec180eb18c7604dcecc8fe0dba07 boston_bull
1 001513dfcb2ffafc82cccf4d8bbaba97 dingo
2 001cdf01b096e06d78e9e5112d419397 pekinese
3 00214f311d5d2247d5dfe4fe24b2303d bluetick
4 0021f9ceb3235effd7fcde7f7538ed62 golden_retriever
Looking at this, we can see there are 10,222 different ID's (meaning 10,222 different images) and 120 different breeds.
Let's figure out how many images there are for each breed.
# How many images are there for each breed?
labels_csv["breed"].value_counts().plot.bar(figsize=(20, 10));
If we draw a line across the middle of the graph, we see there's about 60+ images for each dog breed.
This is a good amount. For some of their vision products, Google recommends a minimum of 10 images per class to get started. And the more images per class available, the more chance a model has to figure out patterns between them.
Let's check out one of the images.
Note: Loading an image file for the first time may take a while as it gets loaded into the runtime memory.
from IPython.display import Image
Image("drive/MyDrive/ML/Dog Vision/train/001513dfcb2ffafc82cccf4d8bbaba97.jpg")
Getting images and their labels
Since we've got the image ID's and their labels in a DataFrame (labels_csv
), we'll use it to create:
- A list a filepaths to training images
- An array of all labels
- An array of all unique labels
We'll only create a list of filepaths to images rather than importing them all to begin with. This is because working with filepaths (strings) is more efficient than working with images.
# Create pathnames from image ID's
filenames = ["drive/MyDrive/ML/Dog Vision/train/" + fname + ".jpg" for fname in labels_csv["id"]]
# Check the first 10 filenames
filenames[:10]
['drive/MyDrive/ML/Dog Vision/train/000bec180eb18c7604dcecc8fe0dba07.jpg',
'drive/MyDrive/ML/Dog Vision/train/001513dfcb2ffafc82cccf4d8bbaba97.jpg',
'drive/MyDrive/ML/Dog Vision/train/001cdf01b096e06d78e9e5112d419397.jpg',
'drive/MyDrive/ML/Dog Vision/train/00214f311d5d2247d5dfe4fe24b2303d.jpg',
'drive/MyDrive/ML/Dog Vision/train/0021f9ceb3235effd7fcde7f7538ed62.jpg',
'drive/MyDrive/ML/Dog Vision/train/002211c81b498ef88e1b40b9abf84e1d.jpg',
'drive/MyDrive/ML/Dog Vision/train/00290d3e1fdd27226ba27a8ce248ce85.jpg',
'drive/MyDrive/ML/Dog Vision/train/002a283a315af96eaea0e28e7163b21b.jpg',
'drive/MyDrive/ML/Dog Vision/train/003df8b8a8b05244b1d920bb6cf451f9.jpg',
'drive/MyDrive/ML/Dog Vision/train/0042188c895a2f14ef64a918ed9c7b64.jpg']
Now we've got a list of all the filenames from the ID column of labels_csv
, we can compare it to the number of files in our training data directory to see if they line up.
If they do, great. If not, there may have been an issue when unzipping the data. To fix this, we might have to unzip the data again.
# Check if number of filenames matches number of actual image files
import os
if len(os.listdir("drive/MyDrive/ML/Dog Vision/train/")) == len (filenames):
print("Filenames match actual number of files.")
else:
print("Filenames do not match actual number of files, check the target directory.")
Filenames match actual number of files.
Let's visualize an image directly from a filepath.
# Check an image directly from a filepath
Image(filenames[9000])
Now that we've got our image filepaths together, let's get the labels.
We'll take them from labels_csv
and turn them into a NumPy array.
import numpy as np
labels = labels_csv["breed"].to_numpy() # convert labels column to NumPy array
labels
array(['boston_bull', 'dingo', 'pekinese', ..., 'airedale',
'miniature_pinscher', 'chesapeake_bay_retriever'], dtype=object)
Now, let's compare the amount of labels to the number of filenames.
# Check for missing data
if len(labels) == len(filenames):
print("Number of labels matches number of filenames!")
else:
print("Number of labels does not match number of filenames, check data directories")
Number of labels matches number of filenames!
We should have the same amount of images and labels.
Finally, since a machine learning model can't take strings as input, we'll have to convert our labels to numbers.
To begin with, we'll find all of the unique dog breed names.
Then, we'll go through the list of labels
, compare them to unique breeds, and create a list of booleans indicating which one is the real label (True
) and which ones aren't (False
).
# Find the unique label values
unique_breeds = np.unique(labels)
len(unique_breeds)
120
The length of unique_breeds
should be 120, meaning we're working with images of 120 different breeds of dogs.
Now we'll use unique_breeds
to turn our labels
array into an array of booleans.
# Turn every label into a boolean array
boolean_labels = [label == unique_breeds for label in labels]
boolean_labels[:2]
[array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, True, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False]),
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, True, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False])]
Why do it like this?
An important concept in machine learning is converting our data to numbers before passing it to a machine learning model.
In this case, we've transformed a single dog breed name such as boston_bull
into a one-hot array.
Let's see an example.
# Example: Turning a boolean array into integers
print(labels[0]) # original label
print(np.where(unique_breeds == labels[0])[0][0]) # index where label occurs
print(boolean_labels[0].argmax()) # index where label occurs in boolean array
print(boolean_labels[0].astype(int)) # there will be a 1 where the sample label occurs
boston_bull
19
19
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0]
Now that we've got our labels in a numeric format and our image filepaths easily accessible (although they aren't numeric yet), let's split our data.
Creating our own validation set
Since the dataset from Kaggle doesn't come with a validation set (a split of the data we can test our model on before making final predicitons on the test set), let's make one.
We could use Scikit-Learn's train_test_split
function or we could simply make manual splits of the data.
For accessibility later, let's save our filenames variable to X
(data) and our labels to y
.
# Setup X and y variables
X = filenames
y = boolean_labels
Since we're working with 10,000+ images, it's a good idea to start with a portion of them to make sure things are working before training on them all.
This is because computing with 10,000+ images could take a fairly long time. And our goal when working through machine learning projects is to reduce the time between experiments.
Let's start experimenting with 1,000 images and increase it as we need.
# Set number of images to use for experimenting
NUM_IMAGES = 1000
Now, let's split our data into training and validation sets. We'll use and 80/20 split (80% training data, 20% validation data).
# Import train_test_split from Scikit-Learn
from sklearn.model_selection import train_test_split
# Split them into training and validation sets of total size NUM_IMAGES
X_train, X_val, y_train, y_val = train_test_split(X[:NUM_IMAGES],
y[:NUM_IMAGES],
test_size=0.2,
random_state=42)
len(X_train), len(y_train), len(X_val), len(y_val)
(800, 800, 200, 200)
# Let's look at the training data
X_train[:2], y_train[:2]
(['drive/MyDrive/ML/Dog Vision/train/00bee065dcec471f26394855c5c2f3de.jpg',
'drive/MyDrive/ML/Dog Vision/train/0d2f9e12a2611d911d91a339074c8154.jpg'],
[array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, True,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False]),
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, True, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False])])
Preprocessing images (turning images into Tensors)
Our labels are in numeric format but our images are still just filepaths.
Since we're using TensorFlow, our data has to be in the form of Tensors.
A Tensor is a way to represent information in numbers. A Tensor can be thought of as a combination of NumPy arrays, except with the special ability to be used on a GPU.
Because of how TensorFlow stores information (in Tensors), it allows machine learning and deep learning models to be run on GPUs (generally faster at numerical computing).
To preprocess our images into Tensors, we're going to write a function which does a few things:
- Take an image filepath as input.
- Use TensorFlow to read the file and save it to a variable,
image
. - Turn our
image
(a jepg file) into Tensors. - Normalize our image (convert color channel values from 0-255 to 0-1).
- Resize the
image
to be a shape (224, 224). - Return the modified
image
.
A good place to read about this type of function is the TensorFlow documentation on loading images.
Why is the shape (224, 224), which is (heigh, width)? This is to match the size of input our model takes. As we'll see later, our model will take as input an image which is (224, 224, 3).
And there, 3 is the number of color channels per pixel: red, green, and blue.
Let's make this a little more concrete.
# Convert image to NumPy array
from matplotlib.pyplot import imread
image = imread(filenames[42]) # read in an image
image.shape
(257, 350, 3)
The shape of the image is (257, 350, 3). This is height, width, color channel value.
And we can easily convert it to a Tensor using tf.constant()
.
# Turn image into a Tensor
tf.constant(image)[:2]
<tf.Tensor: shape=(2, 350, 3), dtype=uint8, numpy=
array([[[ 89, 137, 87],
[ 76, 124, 74],
[ 63, 111, 59],
...,
[ 76, 134, 86],
[ 76, 134, 86],
[ 76, 134, 86]],
[[ 72, 119, 73],
[ 67, 114, 68],
[ 63, 111, 63],
...,
[ 75, 131, 84],
[ 74, 132, 84],
[ 74, 131, 86]]], dtype=uint8)>
Let's build that function to preprocess an image.
# Define image size
IMG_SIZE = 224
def process_image(image_path, img_size=IMG_SIZE):
"""
Takes an image file path and an image size, and turns the image into a Tensor.
"""
# Read in an image file
image = tf.io.read_file(image_path)
# Turn the jpeg image into numerical Tensor with 3 color channels (RGB)
image = tf.image.decode_jpeg(image, channels=3)
# Convert the color channel values from 0-255 to 0-1 values
image = tf.image.convert_image_dtype(image, tf.float32)
# Resize the image to our desired value (224, 224)
image = tf.image.resize(image, size=[img_size, img_size])
return image
Creating data batches
We'll now build a function to turn our data into batches (more specifically, a TensorFlow BatchDataset
).
What's a batch?
A batch (also called mini-batch) is a small portion of our data conatining, for instance, 32 images and their labels. 32 is generally the default batch size. In deep learning, instead of finding patterns in an entire dataset at the same time, we often find them in one batch at a time.
Let's say we're dealing with 10,000+ images (which we are). Together, these files may take up more memory than our GPU has. Trying to compute on them all would result in an error.
Instead, it's more efficient to create smaller batches of our data and compute on one batch at a time.
TensorFlow is very efficient when our data is in batches of (image, label) Tensors. So, we'll build a function to create these bacthes. We'll take advantage of the process_image
function at the same time.
# Create a simple function to return a tuple (image, label)
def get_image_label(image_path, label):
"""
Takes an image file path name and the associated label,
processes the image, and returns a tuple of (image, label).
"""
image = process_image(image_path)
return image, label
Now that we've got a simple function to turn our image filepath names and their associated labels into tuples, we'll create a function to make data batches.
Because we'll be dealing with 3 different sets of data (training, validation, and test), we'll make sure the function can accomodate for each set.
We'll set a default batch size of 32 because according to Yann Lecun, friends don't let friends train with batch sizes over 32.
# Define the batch size
BATCH_SIZE = 32
# Create a function to turn data into batches
def create_data_batches(X, y=None, batch_size=BATCH_SIZE, valid_data=False, test_data=False):
"""
Creates batches of data out of image (X) and label (y) pairs.
Shuffles the data if it's training data but doesn't shuffle if it's validation data.
Also accepts test data as input (no labels).
"""
# If the data is a test dataset, we probably don't have labels
if test_data:
print("Creating test data batches...")
data = tf.data.Dataset.from_tensor_slices((tf.constant(X))) # only filepaths (no labels)
data_batch = data.map(process_image).batch(BATCH_SIZE)
return data_batch
# If the data is a validation dataset, we don't need to shuffle it
elif valid_data:
print("Creating validation data batches...")
data = tf.data.Dataset.from_tensor_slices((tf.constant(X), # filepaths
tf.constant(y))) # labels
data_batch = data.map(get_image_label).batch(BATCH_SIZE)
return data_batch
else:
# If the data is a training dataset, we shuffle it
print("Creating training data batches...")
# Turn filepaths and labels into Tensors
data = tf.data.Dataset.from_tensor_slices((tf.constant(X),
tf.constant(y)))
# Shuffling pathnames and labels before mapping image processor function is faster than shuffling images
data = data.shuffle(buffer_size=len(X))
# Create (image, label) tuples (this also turns the image path into a preprocessed image)
data = data.map(get_image_label)
# Turn the data into batches
data_batch = data.batch(BATCH_SIZE)
return data_batch
# Create training and validation data batches
train_data = create_data_batches(X_train, y_train)
val_data = create_data_batches(X_val, y_val, valid_data=True)
# Check the different attributes of our data batches
train_data.element_spec, val_data.element_spec
((TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None),
TensorSpec(shape=(None, 120), dtype=tf.bool, name=None)),
(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None),
TensorSpec(shape=(None, 120), dtype=tf.bool, name=None)))
We've now got our data in batches, more specifically, they're in Tensor pairs of (images, labels) ready for use on a GPU.
But having our data in batches can be a bit of a hard concept to understand. Let's build a function which helps us visualize what's going on under the hood.
Visualizing data batches
import matplotlib.pyplot as plt
# Create a function for viewing images in a data batch
def show_25_images(images, labels):
"""
Displays a plot of 25 images and their labels from a data batch.
"""
# Setup the figure
plt.figure(figsize=(10, 10))
# Loop through 25
for i in range(25):
# Create subplots (5 rows, 5 columns)
ax = plt.subplot(5, 5, i+1)
# Display an image
plt.imshow(images[i])
# Add the image label as the title
plt.title(unique_breeds[labels[i].argmax()])
# Turn grid lines off
plt.axis("off")
To make computation efficient, a batch is a tighly wound collection of Tensors.
So, to view data in a batch, we've got to unwind it.
We can do so by calling the as_numpy_iterator()
method on a data batch.
This will turn our a data batch into something which can be iterated over.
Passing an iterable to next()
will return the next item in the iterator.
In our case, next()
will return a batch of 32 images and label pairs.
Note: Running the cell below and loading images may take a little while.
# Visualize training images from the training data batch
train_images, train_labels = next(train_data.as_numpy_iterator())
show_25_images(train_images, train_labels)
# Now let's visualize our validation set
val_images, val_labels = next(val_data.as_numpy_iterator())
show_25_images(val_images, val_labels)
Creating and training a Model
We'll use an existing model from TensorFlow Hub.
TensorFlow Hub is a resource where we can find pretrained machine learning models for the problem we're working on.
Using a pretrained machine learning model is often referred to as transfer learning.
Why use a pretrained model?
Building a machine learning model and training it from scratch can be expensive and time consuming.
Transfer learning helps solve these issues by taking what another model has already learned and using that information with our own problem.
How do we choose a model?
Since we know our problem is image classification (classifying different dog breeds), we can navigate the TensorFlow Hub page by our problem domain (image).
We start by choosing the image problem domain, and then can filter it down by subdomains, in our case, image classification.
Doing this gives a list of different pretrained models we can apply to our task.
For example, the mobilenet_v2_130_224 model takes an input of images in the shape 224, 224. It also says the model has been trained in the domain of image classification.
Let's try it out.
Building a model
Before we build a model, there are a few things we need to define:
- The input shape (images, in the form of Tensors) to our model.
- The output shape (image labels, in the form of Tensors) of our model.
- The URL of the model we want to use.
These things will be standard practice with whatever machine learning model we use. And because we're using TensorFlow, everything will be in the form of Tensors.
# Setup input shape to the model
INPUT_SHAPE = [None, IMG_SIZE, IMG_SIZE, 3] # batch, height, width, color channels
# Setup output shape of the model
OUTPUT_SHAPE = len(unique_breeds)
# Setup model URL from TensorFlow Hub
MODEL_URL = "https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/5"
Now we've got the inputs, outputs, and model we're using ready to go. We can start to put them together.
There are many ways of building a model in TensorFlow but one of the best ways to get started is to use the Keras API.
Defining a deep learning model in Keras can be as straightforward as saying, "here are the layers of the model, the input shape, and the output shape, let's go!"
Knowing, this. let's create a function which:
- Takes the input shape, output shape, and the model we've chosen as parameters.
- Defines the layers in a Keras model in sequential fashion.
- Compiles the model (says how it should be evaluated and improved.)
- Builds the model (tells it what kind of input shape it'll be getting.)
- Returns the model.
All of these steps can be found here: https://www.tensorflow.org/guide/keras/sequential_model
# Create a function which builds a keras model
def create_model(input_shape=INPUT_SHAPE, output_shape=OUTPUT_SHAPE, model_url=MODEL_URL):
print("Building model with:", MODEL_URL)
# Setup the model layers
model = tf.keras.Sequential([
hub.KerasLayer(model_url), # Layer 1 (input layer)
tf.keras.layers.Dense(units=output_shape,
activation="softmax") # Layer 2 (output layer)
])
# Compile the model
model.compile(
loss=tf.keras.losses.CategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"]
)
# Build the model
model.build(input_shape)
return model
What's happening here?
Setting up the model layers
There are two ways to do this in Keras, the functional and sequential API. We've used the sequential.
Which one should we choose?
The Keras documentation states that the functional API is the way to go for defining complex models, but the sequential API (a linear stack of layers) is perfectly fine for getting started, which is what we're doing.
The first layer we use is the model from TensorFlow Hub (hub.KerasLayer(MODEL_URL)
). So our first layer is actually an entire model (many more layers). This input layer takes in our images and finds patterns in them based on the patterns mobilenet_v2_130_224
has found.
The next layer (tf.keras.layers.Dense()
) is the output layer of our model. It brings all of the information discovered in the input layer together and outputs it in the shape we're after, 120 (the number of unique labels we have).
The activation="softmax"
parameter tells the output layer that we'd like to assign a probability value to each of the 120 labels somewhere between 0 and 1. The higher the value, the more confident the model is that the input image should have that label. If we were working on a binary classification problem, we'd use activation="sigmoid"
.
For more on which activation function to use, see the article Which Loss and Activation Functions Should I Use?
Compiling the model
This one is best explained with a story.
Let's say you're at the international hill descending championships. Where you start standing on top of a hill and your goal is to get to the bottom of the hill. The catch is you're blindfolded.
Luckily, your friend Adam is standing at the bottom of the hill shouting instructions on how to get down.
At the bottom of the hill, there's a judge evaluating how you're doing. They know where you need to end up so they compare how you're doing to where you're supposed to be. Their comparison is how you get scored.
Transferring this to model.compile()
terminology:
-
loss
- The height of the hill is the loss function, the model's goal is to minimize this, getting to 0 (the bottom of the hill) means the model is learning perfectly. -
optimizer
- Your friend Adam is the optimizer, he's the one telling you how to navigate the hill (lower the loss function) based on what you've done so far. His name is Adam because the Adam optimizer performs well on most models. Other optimizers include RMSprop and Stochastic Gradient Descent. -
metrics
- This is the onlooker at the bottom of the hill rating how well your performance is. Or in our case, giving the accuracy of how well our model is predicting the correct image label.
Building the model
We use model.build()
whenever we're using a layer from TensorFlow Hub to tell our model what input shape it can expect.
In this case, the input shape is [None, IMG_SIZE, IMG_SIZE, 3]
or [None, 224, 224, 3]
or [batch_size, img_height, img_width, color_channels]
.
Batch size is left as None
as this is inferred from the data we pass the model. In our case, it'll be 32 since that's what we've set up.
Now that we've gone through each section of the function, let's use it to create a model.
We can call summary()
on our model to get an idea of what our model looks like.
model = create_model()
model.summary()
Building model with: https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/5
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
keras_layer (KerasLayer) (None, 1001) 5432713
dense (Dense) (None, 120) 120240
=================================================================
Total params: 5,552,953
Trainable params: 120,240
Non-trainable params: 5,432,713
The non-trainable parameters are the patterns learned by mobilenet_v2_130_224
and the trainable parameters are the ones in the dense layer we added.
This means the main bulk of the information in our model has already been learned and we're going to take that and adapt it to our own problem.
Creating callbacks
We've got a model ready to go, but before we train it, we'll make some callbacks.
Callbacks are helper functions a model can use during training to do things such as save a model's progress or stop training early if a model stops improving.
The two callbacks we're going to add are a TensorBoard callback and an Early Stopping callback.
TensorBoard callback
TensorBoard helps provide a visual way to monitor the progress of our model during and after training.
It can be used directly in a notebook to track the performance measures of a model such as loss and accuracy.
To setup a TensorBoard callback and view TensorBoard in a notebook, we need to do 3 things:
- Load the TensorBoard notebook extension.
- Create a TensorBoard callback which is able to save logs to a directory and pass it to our model's
fit()
function. - Visualize our model's training logs with the
%tensorboard
magic function (we'll do this after model training.)
# Load TensorBoard notebook extension
%load_ext tensorboard
import datetime
# Create a function to build a TensorBoard callback
def create_tensorboard_callback():
# Create a log directory for storing TensorBoard logs
logdir = os.path.join("drive/MyDrive/ML/Dog Vision/logs",
# Make it so the logs get tracked whenever we run an experiment
datetime.datetime.now().strftime('%Y%m%d-%H%M%S'))
return tf.keras.callbacks.TensorBoard(logdir)
Early stopping callback
Early stopping helps prevent overfitting by stopping a model when a certain evaluation metric stops improving. If a model trains for too long, it can do so well at finding patterns in a certain dataset that it's not able to use those patterns on another dataset it hasn't seen before (the model doesn't generalize).
It's basically like saying to our model, "keep finding patterns until the quality of those patterns starts to go down."
# Create early stopping callback
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_accuracy",
patience=3) # stops after 3 rounds of no improvements
Training a model (on subset of data)
Our first model is only going to be trained on 1,000 images. Or rather, trained on 800 images and then validated on 200 images, meaning 1,000 images in total or about 10% of the total data.
We do this to make sure everything is working. And if it is, we can step it up later and train on the entire training dataset.
The final parameter we'll define before training is NUM_EPOCHS
(also known as number of epochs).
NUM_EPOCHS
defines how many passes of the data we'd like our model to do. A pass is equivalent to our model trying to find patterns in each dog image and see which patterns relate to each label.
If NUM_EPOCHS=1
, the model will only look at the data once and will probably score badly because it hasn't had a chance to correct itself. It would be like you competing in the international hill descent championships and your friend Adam only being able to give you 1 single instruction to get down the hill.
What's a good value for NUM_EPOCHS
?
This one is hard to say. 10 could be a good start but so could 100. This is one of the reasons we created an early stopping callback. Having early stopping setup means if we set NUM_EPOCHS
to 100 but our model stops improving after 22 epochs, it'll stop training.
NUM_EPOCHS = 100
Let's create a function that trains a model. The function will:
- Create a model using
create_model()
. - Setup a TensorBoard callback using
create_tensorboard_callback()
. - Call the
fit()
function on our model passing it the training data, validation data, number of epochs to train for (NUM_EPOCHS
) and the callbacks we'd like to use. - Return the fitted model.
# Build a function to train and return the trained model
def train_model():
"""
Trains a given model and returns the trained version.
"""
# create a model
model = create_model()
# Create new TensorBoard session every time we train a model
tensorboard = create_tensorboard_callback()
# Fit the model to the data passing it the callbacks we created
model.fit(x=train_data,
epochs=NUM_EPOCHS,
validation_data=val_data,
validation_freq=1, # check validation metrics every epoch
callbacks=[tensorboard, early_stopping])
return model
Note: When training a model for the first time, the first epoch will take a while to load compared to the rest. This is because the model is getting ready and the data is being initialised. Using more data will generally take longer, which is why we've started with ~1,000 images. After the first epoch, subsequent epochs should take only a few seconds.
# Fit the model to the data
model = train_model()
Building model with: https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/5
Epoch 1/100
25/25 [==============================] - 212s 8s/step - loss: 4.4983 - accuracy: 0.0913 - val_loss: 3.3317 - val_accuracy: 0.2500
Epoch 2/100
25/25 [==============================] - 5s 183ms/step - loss: 1.5892 - accuracy: 0.7038 - val_loss: 2.0969 - val_accuracy: 0.4950
Epoch 3/100
25/25 [==============================] - 5s 182ms/step - loss: 0.5362 - accuracy: 0.9525 - val_loss: 1.6433 - val_accuracy: 0.6000
Epoch 4/100
25/25 [==============================] - 5s 181ms/step - loss: 0.2430 - accuracy: 0.9875 - val_loss: 1.4789 - val_accuracy: 0.6250
Epoch 5/100
25/25 [==============================] - 5s 184ms/step - loss: 0.1421 - accuracy: 0.9975 - val_loss: 1.4015 - val_accuracy: 0.6450
Epoch 6/100
25/25 [==============================] - 4s 174ms/step - loss: 0.0988 - accuracy: 1.0000 - val_loss: 1.3711 - val_accuracy: 0.6450
Epoch 7/100
25/25 [==============================] - 5s 186ms/step - loss: 0.0745 - accuracy: 1.0000 - val_loss: 1.3289 - val_accuracy: 0.6500
Epoch 8/100
25/25 [==============================] - 4s 175ms/step - loss: 0.0589 - accuracy: 1.0000 - val_loss: 1.3003 - val_accuracy: 0.6600
Epoch 9/100
25/25 [==============================] - 7s 268ms/step - loss: 0.0485 - accuracy: 1.0000 - val_loss: 1.2854 - val_accuracy: 0.6700
Epoch 10/100
25/25 [==============================] - 6s 226ms/step - loss: 0.0409 - accuracy: 1.0000 - val_loss: 1.2699 - val_accuracy: 0.6700
Epoch 11/100
25/25 [==============================] - 4s 178ms/step - loss: 0.0353 - accuracy: 1.0000 - val_loss: 1.2561 - val_accuracy: 0.6650
Epoch 12/100
25/25 [==============================] - 5s 192ms/step - loss: 0.0307 - accuracy: 1.0000 - val_loss: 1.2476 - val_accuracy: 0.6700
It looks like our model is overfitting (getting far better results on the training set than the validation set). And we should look into some ways to prevent model overfitting.
Note: Overfitting to begin with is a good thing. It means our model is learning something.
Checking the TensorBoard logs
The TensorBoard magic function (%tensorboard
) will access the logs directory we created earlier and visualize its content.
%tensorboard --logdir drive/MyDrive/ML/Dog\ Vision/logs
Thanks to our early_stopping
callback, the model stopped training after 12 epochs (in my case, yours might be slightly different). This is because the validation accuracy failed to improve for 3 epochs.
But the good news is, we can definitely see that our model is learning something. The validation accuracy got to 67% in only a few minutes.
This means, if we were to scale up the number of images, hopefully we'd see the accuracy increase.
Making and evaluating predictions using a trained model
Before we scale up and train on more data, let's see some other ways we can evaluate our model. Because although accuracy is a pretty good indicator of how our model is doing, it would be even better if we could could see it in action.
Making predictions with a trained model is as calling predict()
on it and passing it data in the same format the model was trained on.
# Make predictions on the validation data (not used to train on)
predictions = model.predict(val_data, verbose=1) # verbose shows us how long there is to go
predictions
7/7 [==============================] - 2s 149ms/step
array([[6.22897118e-04, 1.67990889e-04, 8.79216474e-04, ...,
7.09853193e-04, 5.02764196e-05, 1.65839761e-03],
[2.67735624e-04, 2.39910398e-04, 1.09814322e-02, ...,
9.64251813e-05, 7.48365244e-04, 7.03895203e-05],
[1.96330846e-04, 1.39784577e-04, 4.39772011e-05, ...,
2.66594667e-04, 2.05397533e-04, 2.26700475e-04],
...,
[2.78241873e-06, 8.37749758e-05, 4.92490479e-04, ...,
3.37603960e-05, 6.75940537e-04, 1.90498744e-04],
[2.50874972e-03, 1.12199872e-04, 5.24179341e-05, ...,
7.15124188e-05, 2.62187295e-05, 1.73446781e-03],
[1.17617834e-04, 7.77364112e-05, 1.37082185e-04, ...,
1.58314139e-03, 8.59645079e-04, 2.46757936e-05]], dtype=float32)
# Check the shape of predictions
predictions.shape
(200, 120)
Making predictions with our model returns an array with a different value for each label.
In this case, making predictions on the validation data (200 images) returns an array (predictions
) of arrays, each containing 120 different values (one for each unique dog breed).
These different values are the probabilities, or the likelihood, the model has predicted a certain image being a certain breed of dog. The higher the value, the more likely the model thinks a given image is a specific breed of dog.
Let's see how we'd convert an array of probabilities into an actual label.
# First prediction
index = 0
print(predictions[index])
print(f"Max value (probability of prediction): {np.max(predictions[index])}")
print(f"Sum: {np.sum(predictions[index])}")
print(f"Max index: {np.argmax(predictions[index])}" )
print(f"Predicted label: {unique_breeds[np.argmax(predictions[index])]}")
[6.2289712e-04 1.6799089e-04 8.7921647e-04 1.8631003e-04 5.3724495e-04
3.2465730e-04 2.2423903e-02 4.9657328e-04 3.0789597e-05 4.2850631e-03
1.8981443e-04 1.0261026e-04 6.6824240e-04 2.1106268e-04 3.2271945e-04
2.0372072e-04 2.2583949e-05 1.3980259e-01 3.2590338e-05 4.0915584e-05
1.1816905e-03 2.7455544e-04 4.0372779e-05 6.4781279e-04 8.5824677e-06
5.7637418e-04 4.2180789e-01 3.6451096e-05 2.1398282e-03 1.8662729e-04
5.3795295e-05 1.1816223e-03 1.5667198e-03 1.0925717e-05 2.2503540e-05
1.3354327e-02 3.7860959e-06 3.5932841e-04 1.2963214e-04 2.9137873e-04
2.3260089e-03 3.3095494e-06 1.5097676e-05 4.4024091e-05 4.1719111e-05
1.4688818e-04 5.0045412e-05 1.1059161e-04 7.7821646e-04 3.0642765e-04
3.9216696e-04 1.4359584e-05 5.3910341e-04 6.1444713e-05 1.7364083e-04
3.5276284e-05 1.3722615e-04 9.0744550e-04 3.9121576e-04 1.9728519e-02
2.5874743e-04 9.5249598e-05 3.4022541e-04 1.5865565e-04 2.8779902e-04
6.3628376e-02 7.0957394e-05 8.5773220e-04 1.4202471e-02 1.0215643e-04
5.1950999e-02 1.8512135e-04 7.5176729e-05 2.5200758e-02 3.5105829e-04
1.6164074e-04 8.3191908e-04 1.5876073e-02 3.7631966e-04 1.8149629e-02
1.6471182e-04 1.5049442e-03 3.0149630e-04 5.1044985e-03 3.1844739e-04
8.8904443e-04 3.7882995e-04 2.3105989e-04 1.8062179e-04 1.4020882e-03
1.0508097e-03 2.2645481e-04 6.8475410e-06 2.5964212e-03 8.1505415e-05
1.9787325e-04 9.7168679e-04 9.0247270e-04 4.6203140e-04 1.0710631e-04
2.3471296e-02 9.6860625e-05 1.8919840e-02 5.8173094e-02 3.1562344e-05
7.0314546e-04 2.0326689e-02 8.7760345e-05 2.7958115e-04 1.8794164e-02
3.9155426e-04 6.9520087e-05 5.1070543e-05 2.0120994e-04 4.8968988e-04
3.7422600e-05 3.5885598e-03 7.0985319e-04 5.0276420e-05 1.6583976e-03]
Max value (probability of prediction): 0.4218078851699829
Sum: 1.0
Max index: 26
Predicted label: cairn
Having this information is great but it would be even better if we could compare a prediction to its true label and original image.
To help us, let's first build a little function to convert prediction probabilities into predicted labels.
Note: Prediction probabilities are also known as confidence levels.
# Turn prediction probabilities into their respective label
def get_pred_label(prediction_probabilities):
"""
Turns an array of prediction probabilities into a label.
"""
return unique_breeds[np.argmax(prediction_probabilities)]
# Get a predicted label based on an array of prediction probabilities
pred_label = get_pred_label(predictions[0])
pred_label
'cairn'
Now we've got a list of all different predictions our model has made, we'll do the same for the validation images and validation labels.
The model hasn't trained on the validation data, during the fit()
function, it only used the validation data to evaluate itself. So we can use the validation images to visually compare our models predictions with the validation labels.
Since our validation data (val_data
) is in batch form, to get a list of validation images and labels, we'll have to unbatch it (using unbatch()
) and then turn it into an iterator using as_numpy_iterator()
.
Let's make a small function to do so.
# Create a function to unbatch a batched dataset
def unbatchify(data):
"""
Takes a batched dataset of (image, label) Tensors and returns separate arrays
of images and labels.
"""
images = []
labels = []
# Loop through unbatched data
for image, label in data.unbatch().as_numpy_iterator():
images.append(image)
labels.append(unique_breeds[np.argmax(label)])
return images, labels
# Unbatchify the validation data
val_images, val_labels = unbatchify(val_data)
val_images[0], val_labels[0]
(array([[[0.29599646, 0.43284872, 0.3056691 ],
[0.26635826, 0.32996926, 0.22846507],
[0.31428418, 0.2770141 , 0.22934894],
...,
[0.77614343, 0.82320225, 0.8101595 ],
[0.81291157, 0.8285351 , 0.8406944 ],
[0.8209297 , 0.8263737 , 0.8423668 ]],
[[0.2344871 , 0.31603682, 0.19543913],
[0.3414841 , 0.36560842, 0.27241898],
[0.45016077, 0.40117094, 0.33964607],
...,
[0.7663987 , 0.8134138 , 0.81350833],
[0.7304248 , 0.75012016, 0.76590735],
[0.74518913, 0.76002574, 0.7830809 ]],
[[0.30157745, 0.3082587 , 0.21018331],
[0.2905954 , 0.27066195, 0.18401104],
[0.4138316 , 0.36170745, 0.2964005 ],
...,
[0.79871625, 0.8418535 , 0.8606443 ],
[0.7957738 , 0.82859945, 0.8605655 ],
[0.75181633, 0.77904975, 0.8155256 ]],
...,
[[0.9746779 , 0.9878955 , 0.9342279 ],
[0.99153054, 0.99772066, 0.9427856 ],
[0.98925114, 0.9792082 , 0.9137934 ],
...,
[0.0987601 , 0.0987601 , 0.0987601 ],
[0.05703771, 0.05703771, 0.05703771],
[0.03600177, 0.03600177, 0.03600177]],
[[0.98197854, 0.9820659 , 0.9379411 ],
[0.9811992 , 0.97015417, 0.9125648 ],
[0.9722316 , 0.93666023, 0.8697186 ],
...,
[0.09682598, 0.09682598, 0.09682598],
[0.07196062, 0.07196062, 0.07196062],
[0.0361607 , 0.0361607 , 0.0361607 ]],
[[0.97279435, 0.9545954 , 0.92389745],
[0.963602 , 0.93199134, 0.88407487],
[0.9627158 , 0.9125331 , 0.8460338 ],
...,
[0.08394483, 0.08394483, 0.08394483],
[0.0886985 , 0.0886985 , 0.0886985 ],
[0.04514172, 0.04514172, 0.04514172]]], dtype=float32), 'cairn')
Now we've got ways to get:
- Prediction labels
- Validation labels (truth labels)
- Validation images
Let's make some functions to make these all a bit more visual.
More specifically, we want to be able to view an image, its predicted label and its actual label (true label).
The first function we'll create will:
- Take an array of prediction probabilities, an array of truth labels, an array of images and an integer.
- Convert the prediction probabilities to a predicted label.
- Plot the predicted label, its predicted probability, the truth label and target image on a single plot.
def plot_pred(prediction_probabilities, labels, images, n=1):
"""
View the prediction, ground truth, and image for sample n.
"""
pred_prob, true_label, image = prediction_probabilities[n], labels[n], images[n]
# get the pred label
pred_label = get_pred_label(pred_prob)
# Plot image and remove ticks
plt.imshow(image)
plt.xticks([])
plt.yticks([])
# Change the color of the title depending on if the preidction is right or wrong
if pred_label == true_label:
color = "green"
else:
color = "red"
# Change plot title
plt.title("Predicted breed: {}\n Probability: {:2.0f}%\n Actual breed: {}".format(pred_label,
np.max(pred_prob)*100,
true_label),
color=color)
# View an example prediction, original image and truth label
plot_pred(prediction_probabilities=predictions,
labels=val_labels,
images=val_images,
n=1)
Making functions to help visualize our model's results is really helpful in understanding how our model is doing.
Since we're working with a multi-class problem, it would also be good to see what other guesses our model is making. More specifically, if our model predicts a certain label with 24% probability, what else did it predict?
Let's build a function to demonstrate this. The function will:
- Take an input of a prediction probabilities array, a ground truth labels array and an integer.
- Find the predicted label using
get_pred_label()
. - Find the top 10:
- Prediction probabilities indexes
- Prediction probabilities values
- Prediction labels
- Plot the top 10 prediction probability values and labels, coloring the true label green.
def plot_pred_conf(prediction_probabilities, labels, n=1):
"""
Plots the top 10 highest prediction confidences along with the truth label for sample n.
"""
pred_prob, true_label = prediction_probabilities[n], labels[n]
# Get the predicted label
pred_label = get_pred_label(pred_prob)
# Find the top 10 prediction confidence indexes
top_10_pred_indexes = pred_prob.argsort()[-10:][::-1]
# Find the top 10 pred confidence values
top_10_pred_values = pred_prob[top_10_pred_indexes]
# Find the top 10 prediction labels
top_10_pred_labels = unique_breeds[top_10_pred_indexes]
# Setup plot
top_plot = plt.bar(np.arange(len(top_10_pred_labels)),
top_10_pred_values,
color="grey")
plt.xticks(np.arange(len(top_10_pred_labels)),
labels=top_10_pred_labels,
rotation="vertical")
# Change the color of true label
if np.isin(true_label, top_10_pred_labels):
top_plot[np.argmax(top_10_pred_labels == true_label)].set_color("green")
else:
pass
plot_pred_conf(prediction_probabilities=predictions,
labels=val_labels,
n=1)
# Let's check a few predictions and their different values
i_multiplier = 0
num_rows = 3
num_cols = 2
num_images = num_rows*num_cols
plt.figure(figsize=(5*2*num_cols, 5*num_rows))
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_pred(prediction_probabilities=predictions,
labels=val_labels,
images=val_images,
n=i+i_multiplier)
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_pred_conf(prediction_probabilities=predictions,
labels=val_labels,
n=i+i_multiplier)
plt.tight_layout(h_pad=1.0)
plt.show()
Saving and reloading a model
After training a model, it's a good idea to save it. Saving it means we can share it with colleagues, put it in an application and more importantly, won't have to go through the potentially expensive step of retraining it.
The format of an entire saved Keras model is h5. So we'll make a function which can take a model as input and utilize the save()
method to save it as a h5 file to a specified directory.
def save_model(model, suffix=None):
"""
Saves a given model in a models directory and appends a suffix (str)
for clarity and reuse.
"""
# Create model directory with current time
modeldir = os.path.join("drive/MyDrive/ML/Dog Vision/models",
datetime.datetime.now().strftime("%Y%m%d-%H%M%s"))
model_path = modeldir + "-" + suffix + ".h5" # save format of model
print(f"Saving model to: {model_path}...")
model.save(model_path)
return model_path
If we've got a saved model, we'd like to load it, let's create a function which can take a model path and use the tf.keras.models.load_model()
function to load it into the notebook.
Because we're using a component from TensorFlow Hub (hub.KerasLayer
) we'll have to pass this as a parameter to the custom_objects
parameter.
def load_model(model_path):
"""
Loads a saved model from a specified path.
"""
print(f"Loading saved model from: {model_path}")
model = tf.keras.models.load_model(model_path,
custom_objects={"KerasLayer":hub.KerasLayer})
return model
# Save our model trained on 1000 images
save_model(model, suffix="1000-images-mobilenetv2-Adam")
# Load our model trained on 1000 images
model_1000_images = load_model('drive/MyDrive/ML/Dog Vision/models/20220325-04411648183299-1000-images-mobilenetv2-Adam.h5')
Training a model (on the full dataset)
Now we know our model works on a subset of the data, we can start to move forward with training one on the full data.
Above, we saved all of the training filepaths to X
and all of the training labels to y
.
We've got over 10,000 images and labels in our training set.
Before we can train a model on these, we'll have to turn them into a data batch.
We can use our create_data_batches()
function from above which also preprocesses our images for us.
# Turn full training data in a data batch
full_data = create_data_batches(X, y)
Our data is in a data batch, all we need now is a model.
Let's use create_model()
to instantiate another model.
# Instantiate a new model for training on the full dataset
full_model = create_model()
Since we've made a new model instance, full_model
, we'll need some callbacks too.
# Create full model callbacks
# TensorBoard callback
full_model_tensorboard = create_tensorboard_callback()
# Early stopping callback
# Note: No validation set when training on all the data, therefore can't monitor validation accuracy
full_model_early_stopping = tf.keras.callbacks.EarlyStopping(monitor="accuracy",
patience=3)
To monitor the model whilst it trains, we'll load TensorBoard (it should update every 30-seconds or so whilst the model trains).
%tensorboard --logdir drive/My\ Drive/Data/logs
Note: Since running the cell below will cause the model to train on all of the data (10,000+) images, it may take a fairly long time to get started and finish. However, thanks to our full_model_early_stopping
callback, it'll stop before it starts going too long.
The first epoch is always the longest as data gets loaded into memory. After it's there, it'll speed up.
# Fit the full model to the full training data
full_model.fit(x=full_data,
epochs=NUM_EPOCHS,
callbacks=[full_model_tensorboard,
full_model_early_stopping])
Epoch 1/100
320/320 [==============================] - 57s 163ms/step - loss: 1.3450 - accuracy: 0.6682
Epoch 2/100
320/320 [==============================] - 52s 162ms/step - loss: 0.3995 - accuracy: 0.8813
Epoch 3/100
320/320 [==============================] - 52s 163ms/step - loss: 0.2371 - accuracy: 0.9335
Epoch 4/100
320/320 [==============================] - 49s 152ms/step - loss: 0.1529 - accuracy: 0.9647
Epoch 5/100
320/320 [==============================] - 51s 159ms/step - loss: 0.1060 - accuracy: 0.9785
Epoch 6/100
320/320 [==============================] - 52s 163ms/step - loss: 0.0775 - accuracy: 0.9873
Epoch 7/100
320/320 [==============================] - 56s 175ms/step - loss: 0.0602 - accuracy: 0.9913
Epoch 8/100
320/320 [==============================] - 58s 181ms/step - loss: 0.0476 - accuracy: 0.9943
Epoch 9/100
320/320 [==============================] - 57s 178ms/step - loss: 0.0369 - accuracy: 0.9966
Epoch 10/100
320/320 [==============================] - 58s 180ms/step - loss: 0.0311 - accuracy: 0.9971
Epoch 11/100
320/320 [==============================] - 58s 181ms/step - loss: 0.0264 - accuracy: 0.9977
Epoch 12/100
320/320 [==============================] - 58s 182ms/step - loss: 0.0222 - accuracy: 0.9977
Epoch 13/100
320/320 [==============================] - 55s 171ms/step - loss: 0.0199 - accuracy: 0.9984
Epoch 14/100
320/320 [==============================] - 58s 181ms/step - loss: 0.0172 - accuracy: 0.9987
Epoch 15/100
320/320 [==============================] - 58s 181ms/step - loss: 0.0165 - accuracy: 0.9983
Epoch 16/100
320/320 [==============================] - 57s 179ms/step - loss: 0.0136 - accuracy: 0.9990
Epoch 17/100
320/320 [==============================] - 58s 181ms/step - loss: 0.0153 - accuracy: 0.9983
Epoch 18/100
320/320 [==============================] - 57s 178ms/step - loss: 0.0148 - accuracy: 0.9979
Epoch 19/100
320/320 [==============================] - 58s 181ms/step - loss: 0.0123 - accuracy: 0.9985
<keras.callbacks.History at 0x7fcb67ee7950>
Saving and reloading the full model
Even on a GPU, our full model took a while to train. So it's a good idea to save it.
We can do so using our save_model()
function.
Note: It may be a good idea to incorporate the save_model()
function into a train_model()
function. Or look into setting up a checkpoint callback.
# Save model to file
save_model(full_model, suffix="all-images-Adam")
# Load in the full model
loaded_full_model = load_model('drive/MyDrive/ML/Dog Vision/models/20220325-05281648186092-all-images-Adam.h5')
Making predictions on the test dataset
Since our model has been trained on images in the form of Tensor batches, to make predictions on the test data, we'll have to get it into the same format.
We created create_data_batches()
earlier which can take a list of filenames as input and convert them into Tensor batches.
To make predictions on the test data, we'll:
- Get the test image filenames.
- Convert the filenames into test data batches using
create_data_batches()
and setting thetest_data
parameter toTrue
(since there are no labels with the test images). - Make a predictions array by passing the test data batches to the
predict()
function.
# Load test image filenames (since we're using os.listdir(), these already have .jpg)
test_path = "drive/MyDrive/ML/Dog Vision/test/"
test_filenames = [test_path + fname for fname in os.listdir(test_path)]
test_filenames[:10]
['drive/MyDrive/ML/Dog Vision/test/e5f2204119380ce1a17fd09435c5012a.jpg',
'drive/MyDrive/ML/Dog Vision/test/e7ce78e874945f182a4f5149aa505b09.jpg',
'drive/MyDrive/ML/Dog Vision/test/de6cc38e54a460dd34c53b74f022a8da.jpg',
'drive/MyDrive/ML/Dog Vision/test/e7b608110b0e29120d8740f37e85f3d0.jpg',
'drive/MyDrive/ML/Dog Vision/test/e66a91249a4979a86db48e5c64b81a88.jpg',
'drive/MyDrive/ML/Dog Vision/test/e17defebd1b8fc39e9c3c10df3c2e3de.jpg',
'drive/MyDrive/ML/Dog Vision/test/e3baf6b2914677edd2729db0f32e2620.jpg',
'drive/MyDrive/ML/Dog Vision/test/e08d42b2e6f2dbcf24c6bfee8b7d03bd.jpg',
'drive/MyDrive/ML/Dog Vision/test/e2b24cea9d0796ffad73cb24eab1a3f6.jpg',
'drive/MyDrive/ML/Dog Vision/test/e137b0cd96051765c349377725c4696d.jpg']
# How many test images are there?
len(test_filenames)
10357
# Create test data batch
test_data = create_data_batches(test_filenames, test_data=True)
Note: Since there are 10,000+ test images, making predictions could take a while, even on a GPU. So beware running the cell below may take up to an hour.
# Make predictions on test data batch using the loaded full model
test_predictions = loaded_full_model.predict(test_data,
verbose=1)
324/324 [==============================] - 1132s 3s/step
# Save predictions (NumPy array) to csv file
np.savetxt("drive/MyDrive/ML/Dog Vision/preds_array.csv", test_predictions, delimiter=",")
# Load predictions (NumPy array) from csv file
test_predictions = np.loadtxt("drive/MyDrive/ML/Dog Vision/preds_array.csv", delimiter=",")
# Check out the test predictions
test_predictions[:10]
array([[2.77832507e-10, 3.47354963e-08, 1.59594504e-10, ...,
3.36205460e-07, 1.01586806e-09, 1.35463404e-10],
[1.06715709e-06, 9.80697884e-11, 1.83814009e-05, ...,
5.81553738e-09, 1.35275069e-09, 9.25533868e-07],
[8.79199422e-11, 7.13828712e-08, 3.49000935e-08, ...,
5.00790861e-07, 3.85421224e-08, 9.47958489e-10],
...,
[2.41431576e-13, 9.99975681e-01, 7.91099131e-11, ...,
1.21032284e-09, 1.01821096e-09, 8.52753868e-09],
[1.11224371e-14, 5.69826790e-11, 4.86594055e-12, ...,
9.99895334e-01, 3.12582422e-08, 3.12060724e-11],
[3.74029030e-09, 6.98669282e-08, 6.67546445e-08, ...,
1.31928124e-09, 3.12934681e-05, 1.51918755e-07]])
Making predictions on custom images
It's great being able to make predictions on a test dataset already provided for us.
But how could we use our model on our own images?
The premise remains, if we want to make predictions on our own custom images, we have to pass them to the model in the same format the model was trained on.
To do so, we'll:
- Get the filepaths of our own images.
- Turn the filepaths into data batches using
create_data_batches()
. And since our custom images won't have labels, we set thetest_data
parameter toTrue
. - Pass the custom image data batch to our model's
predict()
method. - Convert the prediction output probabilities to prediction labels.
- Compare the predicted labels to the custom images.
Note: To make predictions on custom images, I've uploaded pictures to a directory located at drive/MyDrive/ML/Dog Vision/my-dogs/
(as seen in the cell below).
# Get custom image filepaths
custom_path = "drive/MyDrive/ML/Dog Vision/my-dogs/"
custom_image_paths = [custom_path + fname for fname in os.listdir(custom_path)]
# Turn custom image into batch (set to test data because there are no labels)
custom_data = create_data_batches(custom_image_paths, test_data=True)
# Make predictions on the custom data
custom_preds = loaded_full_model.predict(custom_data)
Now we've got some predictions arrays, let's convert them to labels and compare them with each image.
# Get custom image prediction labels
custom_pred_labels = [get_pred_label(custom_preds[i]) for i in range(len(custom_preds))]
custom_pred_labels
['boxer',
'bull_mastiff',
'american_staffordshire_terrier',
'staffordshire_bullterrier',
'maltese_dog',
'labrador_retriever']
# Get custom images (our unbatchify() function won't work since there aren't labels)
custom_images = []
# Loop through unbatched data
for image in custom_data.unbatch().as_numpy_iterator():
custom_images.append(image)
# Check custom image predictions
plt.figure(figsize=(10, 10))
for i, image in enumerate(custom_images):
plt.subplot(3, 2, i+1)
plt.xticks([])
plt.yticks([])
plt.title(custom_pred_labels[i])
plt.imshow(image)
What's next?
We've just gone end-to-end on a multi-class image classification problem!
This is the same style of problem self-driving cars have, except with different data.
We've got plenty of options on where to go next.
We could try to improve the full model we trained in this notebook in a few ways. Since our early experiment (using only 1,000 images) hinted at our model overfitting, one goal going forward would be to try and prevent it.
-
Trying another model from TensorFlow Hub - Perhaps a different model would perform better on our dataset. One option would be to experiment with a different pretrained model from TensorFlow Hub or look into the
tf.keras.applications
module. - Data augmentation - Take the training images and manipulate (crop, resize) or distort them (flip, rotate) to create even more training data for the model to learn from. Check out the TensorFlow images documentation for a whole bunch of functions we can use on images. A great idea would be to try and replicate the techniques in this example cat vs. dog image classification notebook for our dog breeds problem.
- Fine-tuning - The model we used in this notebook was directly from TensorFlow Hub, we took what it had already learned from another dataset (ImageNet) and applied it to our own. Another option is to use what the model already knows and fine-tune this knowledge to our own dataset (pictures of dogs). This would mean all of the patterns within the model would be updated to be more specific to pictures of dogs rather than general images.
One of the best ways to find out something is to search for something like:
- "How to improve a TensorFlow 2.x image classification model?"
- "TensorFlow 2.x image classification best practices"
- "Transfer learning for image classification with TensorFlow 2.x"
Posted on March 30, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
March 30, 2022
August 26, 2021