Building and Training a Neural Network with PyTorch: A Step-by-Step Guide
Paul Chibueze
Posted on July 26, 2024
Imagine a world where machines can not only see but also understand and classify images as effortlessly as humans. This capability has been at the heart of many breakthroughs in artificial intelligence, revolutionizing fields from healthcare to retail.
In recent years, advancements in deep learning have enabled computers to recognize objects, identify faces, and even understand emotions depicted in images. One of the pivotal tasks in this domain is image classification — teaching computers to categorize images into predefined classes based on their visual features.
In this guide, we’ll embark on a journey to build and train a neural network using PyTorch. We’ll start by preparing our data — transforming raw images into a format suitable for training our model. Then, we’ll delve into defining our neural network architecture, which will learn to recognize various clothing items based on their pixel patterns. For this project we will use FashionMNIST dataset.
FashionMNIST, is a dataset that captures grayscale images of clothing items, serves as an excellent playground for learning and mastering image classification techniques. Similar to its predecessor, MNIST (which consists of handwritten digits), FashionMNIST challenges us to distinguish between different types of apparel with the aid of deep learning models. PyTorch provides tools to download and load datasets conveniently.
As we progress, we’ll explore how to train our model using backpropagation and gradient descent, evaluate its performance on unseen data, and ensure it generalizes well to new examples.
Finally, we’ll learn how to save our trained model’s parameters, enabling us to deploy it in real-world applications or continue refining its capabilities.
I guess you are already excited, I am too.
What is a Neural Network?
A neural network is a series of interconnected nodes, inspired by the structure of the human brain. It learns by processing data and adjusting its internal connections based on the results. In this case, the neural network will learn to recognize patterns in images of clothing and predict the corresponding category (t-shirt, dress, etc.).
Throughout this tutorial, we will cover essential steps in deep learning especially for building classification neural network models. Some of the steps we will employ includes:
Data Preparation: We will download and prepare our dataset, transforming it into a format suitable for training with PyTorch.
Model Definition: We will also define a neural network architecture using PyTorch’s
nn.Module
that will learn to classify images into different clothing categories.Training and Evaluation: We will then implement the training loop to optimize our model’s parameters using gradient descent, evaluate its performance on test data, and monitor its progress.
Model Persistence: you will also see how to save and load trained models, allowing you to reuse them for predictions or further training.
By the end of this journey, you will not only have a grasp of the fundamental concepts of deep learning with PyTorch but also a practical understanding of how to apply them to real-world datasets.
Let’s embark on this learning adventure together!
Dataset Preparation
The first step is to prepare our dataset. Like I initially said, we will use the FashionMNIST dataset, which is readily available in PyTorch’s torchvision library. This dataset contains 70,000 grayscale images of 10 different classes of clothing items.
We start by importing the necessary libraries:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
torch
: The core PyTorch library for building and training neural networks.nn
: A submodule oftorch
containing building blocks for neural networks like layers and activation functions.DataLoader
: A class fromtorch.utils.data
that helps us load and iterate over datasets in batches.datasets
: A submodule oftorchvision
providing access to pre-downloaded datasets like Fashion MNIST.ToTensor
: A data transform that converts images to PyTorch tensors.
After we are done importing libraries, its time we download the training dataset and test data set too from the FashionMNIST plaform and also load them into our environment.
# download training data from the FashionMNISTdataset.
training_data = datasets.FashionMNIST(
train=True,
transform=ToTensor(),
download=True,
root="data"
)
# download test data from the FashionMNIST dataset.
test_data = datasets.FashionMNIST(
train=False,
transform=ToTensor(),
download=True,
root="data"
)
The above code downloads the FashionMNIST dataset. We also specify that we want the training data by setting (train=True
) and test data (train=False
). We also apply the ToTensor
transform, which converts the raw image data (pixel intensities between 0 and 255) into PyTorch tensors.
Data Loaders
Next step is to define or dataset loaders, Data loaders will help us load the dataset in batches, making it easier to manage memory and speed up training of our model. To define our data loaders for our model, we first declare the loading batch size.
batch_size = 64
# create data loaders
training_loader = DataLoader(training_data, batch_size=batch_size)
test_loader = DataLoader(test_data, batch_size=64)
for X, y in test_loader:
print(f"Shape of X [N C H W]: {X.shape}")
print(f"Shape of y: {y.shape} {y.dtype}")
break
We just define the batch size, which will help control how many images are processed at once during training. We then create data loaders for both the training and test data. Our configuration is that the data loaders will feed the data into the neural network in batches during training and evaluation.
Also we use the for
loop to iterate through the batches of data and prints the shapes of the input images (X
) and their corresponding labels (y
). We see that X
has a shape of [batch_size, channel, height, width]
, where batch_size
is 64 in this case, channel
is 1 (grayscale images), and height
and width
are both 28 (representing the 28x28 pixel images). The labels y
are a one-dimensional tensor of integers representing the clothing categories.
Since we have defined and configured our data loaders for both the training and and testing datasets, lets then define how we mount our model unto our devices, in our case we will mount it into our CPU device.
# get cpu, gpu or mps device for training.
device = (
"cuda"
if torch.cuda.is_available()
else "mps"
if torch.backends.mps.is_available()
else "cpu"
)
print(f"Using {device} device")
# OUTPUT
Using cpu device
Our code checks if a GPU or MPS device is available and uses that for training if possible, otherwise it defaults to CPU. Using a GPU or MPS can significantly speed up the training process considering that training large neural models requires compute power and CPU allocation.
Consequently, all things being equal, we will continue with the next step which is defining our network.
Defining the Neural Network Model
We define a simple fully connected neural network. Our model will have three layers with ReLU activations in between.
To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the init function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU or MPS if available.
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.Flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10),
)
def forward(self, x):
x = self.Flatten(x)
logits = self.linear_relu_stack(x)
return logits
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = NeuralNetwork().to(device)
print(model)
# OUTPUT
NeuralNetwork(
(Flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
)
)
Few things you should know about our neural network:
nn.Module: Base class for all neural network modules in PyTorch.
Flatten: Flattens the input tensor.
nn.Sequential: A sequential container to define the layers of the model.
nn.Linear: Fully connected layer.
nn.ReLU: ReLU activation function.
Now, looks like we are all set, lets move over to defining our loss function and optimizer.
Defining the Loss Function and Optimizer
The loss function measures how well the model’s predictions match the actual labels. while the optimizer updates the model parameters to minimize the loss.
To handle this, we just define the following variables
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)
Explaining each of the concepts, we will have that;
nn.CrossEntropyLoss: is a loss function used primarily for classification tasks where the model predicts probabilities for each class. It combines nn.LogSoftmax()
and nn.NLLLoss()
in one single class. The CrossEntropyLoss expects raw logits (the output of the model before applying soft max) as input. It computes the soft max internally to normalize logits and then computes the negative log likelihood loss between the predicted class probabilities and the actual target labels.
torch.optim.SGD: also is the optimizer that implements Stochastic Gradient Descent (SGD), a fundamental optimization algorithm used for training neural networks. SGD updates the model parameters in the direction of the negative gradient of the loss function with respect to the parameters. The model.parameters()
argument specifies which parameters of the model should be optimized.
lr (Learning rate): which is a scalar factor that controls the step size taken during optimization. It determines how much to change the model parameters with respect to the gradient of the loss function. A higher learning rate can speed up convergence, but if it’s too high, it may cause the model to overshoot optimal values. Conversely, a lower learning rate can improve stability and precision but may require more iterations to converge.
momentum: Momentum simply is a parameter that accelerates SGD in the relevant direction and dampens oscillations. It improves the convergence rate and helps SGD to escape shallow local minima more effectively. A common value for momentum is 0.9, but it can be tuned depending on the specific problem and dataset characteristics.
In summary, these components together form the backbone of the optimization process during training. nn.CrossEntropyLoss
computes the loss based on model predictions and target labels, torch.optim.SGD
updates the model parameters based on the computed gradients, and lr
and momentum
are crucial hyperparameters that affect how quickly and effectively the model learns from the data. Adjusting these parameters can significantly impact the training process and model performance.
Defining our Training Function
The training function iterates over the data loader, computes predictions, calculates the loss, and updates the model parameters.
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
print(f"size: {size}")
for batch, (X, y) in enumerate(dataloader):
X = X.to(device) # Move input data to the device (GPU or CPU)
y = y.to(device) # Move target labels to the device (GPU or CPU)
# compute predicted y by passing X to the model
prediction = model(X)
# compute the loss
loss = loss_fn(prediction, y)
# apply zero gradients, perform a backward pass, and update the weights
optimizer.zero_grad()
loss.backward()
optimizer.step()
# print training progress
if batch % 100 == 0:
loss_value = loss.item()
current = batch * len(X)
print(f"loss: {loss_value:>7f} [{current:>5d}/{size:>5d}]")
Now, to check the model’s performance against the test dataset to ensure it is learning, lets define a test learning function
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0,0
with torch.no_grad():
for X, y in dataloader:
X = X.to(device)
y = y.to(device)
prediction = model(X)
test_loss += loss_fn(prediction, y).item()
correct += (prediction.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
Its time we train our model, lets do that in the next step
Defining the training loop
The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.
epoch = 5
for t in range(epoch):
print(f"Epoch {t+1}\n-------------------------------")
train(training_loader, model, loss_fn, optimizer)
test(test_loader, model, loss_fn)
print("Done!")
# OUTPUT
Epoch 1
-------------------------------
size: 60000
loss: 2.301722 [ 0/60000]
loss: 2.196219 [ 6400/60000]
loss: 1.919408 [12800/60000]
loss: 1.602865 [19200/60000]
loss: 1.206242 [25600/60000]
loss: 1.089895 [32000/60000]
loss: 1.010409 [38400/60000]
loss: 0.888665 [44800/60000]
loss: 0.871484 [51200/60000]
loss: 0.801176 [57600/60000]
Test Error:
Accuracy: 70.4%, Avg loss: 0.797208
Epoch 2
-------------------------------
size: 60000
loss: 0.793278 [ 0/60000]
loss: 0.839569 [ 6400/60000]
loss: 0.590993 [12800/60000]
loss: 0.796638 [19200/60000]
loss: 0.679180 [25600/60000]
loss: 0.645485 [32000/60000]
loss: 0.705061 [38400/60000]
loss: 0.694501 [44800/60000]
loss: 0.680406 [51200/60000]
loss: 0.634787 [57600/60000]
Test Error:
Accuracy: 78.1%, Avg loss: 0.632338
Epoch 3
-------------------------------
size: 60000
loss: 0.558544 [ 0/60000]
loss: 0.660779 [ 6400/60000]
loss: 0.436486 [12800/60000]
loss: 0.679563 [19200/60000]
loss: 0.600478 [25600/60000]
loss: 0.567539 [32000/60000]
loss: 0.587003 [38400/60000]
loss: 0.657008 [44800/60000]
loss: 0.643853 [51200/60000]
loss: 0.547364 [57600/60000]
Test Error:
Accuracy: 80.3%, Avg loss: 0.560929
Epoch 4
-------------------------------
size: 60000
loss: 0.462072 [ 0/60000]
loss: 0.580780 [ 6400/60000]
loss: 0.374757 [12800/60000]
loss: 0.618166 [19200/60000]
loss: 0.552829 [25600/60000]
loss: 0.526478 [32000/60000]
loss: 0.529090 [38400/60000]
loss: 0.666382 [44800/60000]
loss: 0.634566 [51200/60000]
loss: 0.482042 [57600/60000]
Test Error:
Accuracy: 81.2%, Avg loss: 0.523512
Epoch 5
-------------------------------
size: 60000
loss: 0.403316 [ 0/60000]
loss: 0.539046 [ 6400/60000]
loss: 0.340361 [12800/60000]
loss: 0.577453 [19200/60000]
loss: 0.509404 [25600/60000]
loss: 0.496750 [32000/60000]
loss: 0.495348 [38400/60000]
loss: 0.670772 [44800/60000]
loss: 0.620382 [51200/60000]
loss: 0.439184 [57600/60000]
Test Error:
Accuracy: 82.2%, Avg loss: 0.500474
Done!
epochs: Number of times to iterate over the entire training dataset in our case 5 times.
train(): Calls the training function.
test(): Calls the evaluation(test) function.
At this point, we already have a trained model that can perfectly predict and classify images and provide output value or expected value as the case may be.
Moving forward, next thing to consider is ways to save our trained model, so that when we want to use or deploy them for application usage, we can easily call them and provide the required classes and values.
To save our defined model, we follow the following ways;
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")
>> # OUTPUT Saved PyTorch Model State to model.pth
This approach will save the model and and serialize the internal state dictionary (containing the model parameters).
After saving the model, if next time we want to use our model for predictions, we will first load them into our compute space. And to do that we;
model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))
# OUTPUT <All keys matched successfully>
The above process involves loading our model which also includes re-creating the model structure and loading the state dictionary into it.
Finally, to make use of our loaded model for maybe prediction or classification.
Model Usage for prediction
classes = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
# set model to evaluation mode
model.eval()
sample_index = 1 # sample Index (Change this index to select a different sample)
x, y = test_data[sample_index][0], test_data[sample_index][1]
# make prediction without gradient calculation
with torch.no_grad():
x = x.to(device)
prediction = model(x.unsqueeze(0))
# get predicted and actual classes
predicted, actual = classes[prediction.argmax(dim=1).item()], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')
# OUTPUT: Predicted: "Pullover", Actual: "Pullover"
Preparation and Data
classes = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
classes
: This is a list of class labels that correspond to the categories the model is trained to recognize. Each index in this list represents a specific class.
Set Model to Evaluation Mode
model.eval()
model.eval()
: Sets the model to evaluation mode. This is important because some layers (e.g., dropout, batch normalization) behave differently during training and evaluation. In evaluation mode, these layers operate in inference mode, ensuring consistent results during testing.
Select a Single Test Sample
x, y = test_data[0][0], test_data[0][1]
x, y = test_data[0][0], test_data[0][1]
: Selects the first sample from the test_data
dataset. x
is the input data (e.g., an image), and y
is the corresponding label (e.g., the class index).
Make Prediction Without Gradient Calculation
with torch.no_grad():
x = x.to(device)
pred = model(x)
with torch.no_grad():
: Disables gradient calculation, which is not needed for evaluation and reduces memory usage and computation time.
x = x.to(device)
: Moves the input data to the specified device (CPU or GPU) where the model is located.
pred = model(x)
: Passes the input data through the model to obtain the predictions. pred
is typically a tensor containing the output logits or probabilities for each class.
To Determine Predicted and Actual Class Labels
predicted, actual = classes[pred[0].argmax(0)], classes[y]
pred[0].argmax(0)
: Finds the index of the class with the highest score in the model's output for the first (and only) sample in the batch. This index corresponds to the predicted class.
classes[pred[0].argmax(0)]
: Uses the index to look up the predicted class label from the classes
list.
classes[y]
: Uses the true label index y
to look up the actual class label from the classes
list.
Print the Predicted and Actual Class Labels
print(f'Predicted: "{predicted}", Actual: "{actual}"')
Prints the predicted and actual class labels in a formatted string.
Conclusion
we walked through the entire process of building, training and evaluating a neural network using PyTorch with the FashionMNIST dataset. We covered essential concepts such as dataset preparation, defining a neural network model, setting up training and evaluation loops, saving and loading models, and making predictions.
Lastly, constant practice leads to mastery, so experiment with different models, hyperparameters, and datasets to deepen your understanding and improve your skills in deep learning and image classifications.
Till next time, but for now all I can say is, Happy coding! 🚀
Reference
Fashion MNIST
An MNIST-like dataset of 70,000 28x28 labeled fashion images*www.kaggle.com
Code
Posted on July 26, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
July 26, 2024