All in 1 Image Classification using CNN's
Miguel Perez
Posted on November 30, 2023
In the realm of image classification, Convolutional Neural Networks (CNNs) have established themselves as a pinnacle of success. Unlike traditional neural networks such as Multi-Layer Perceptrons (MLPs), CNNs are uniquely tailored for image data, using a specialized architecture that finds local patterns within images. In this blog we are gonig to understand and use CNNs, as well as seeing how they stack up against MLPs that have been fed extracted features from preprocessed data. We are only going in depth on CNNs, for specifics on the MLPs used for this comparison go to Classical ML vs Neural Networks [image 1]
Differentiation from Classical Methods and MLPs
CNNs surpass classical image classification methods and MLPs primarily due to their inherent capacity to learn hierarchical spatial representations directly from raw pixel data. While classical methods heavily rely on manually engineered feature extraction, CNNs autonomously learn relevant features through their convolutional layers, eliminating the need for image preprocessing.
The local connectivity and weight sharing in CNNs allow them to preserve capture intricate patterns across the entire image, in contrast MLP's treat input features as independent and lack the ability to capture these patterns.
Architecture and Functionality
CNNs operate by employing specialized layer designed to extract intricate features from input images: convolutional, pooling, and fully connected layers.
Convolutional Layers: These layers consist of learnable filters or kernels applied across the input image. Each filter identifies specific patterns, generating feature maps that highlight relevant features such as edges, textures, and shapes. The network learns these filters iteratively during training, enhancing its ability to detect hierarchical features.
Pooling Layers: Following convolutional layers, pooling layers reduce spatial dimensions while retaining essential information. Max pooling, for instance, selects the maximum value from each pool window, downsampling the feature maps and enhancing computational efficiency. This process helps in capturing the most relevant information while reducing computational load.
Fully Connected Layers: These layers, typically at the end of the network, interpret the high-level features extracted by the previous layers for classification. Each neuron in the fully connected layers is connected to all neurons in the preceding layer, amalgamating the learned features to make predictions.
Metrics
Given that we are dealing with classification and taking into account that all 3 of our datasets are balanced, meaning that every class is evenly represented in terms of observations, **accuracy **will be our main metric for comparing model performance.
Accuracy measures the overall correctness of the model's predictions across all classes. It is calculated as (True Positives + True Negatives) / Total Observations.
Datasets
First off we have the Fashion-MNIST Dataset, it's comprised of 70,000 grayscale images and 10 classes, meticulously scaled and normalized.
The second dataset has approximately 2,000 high-definition images of different landscapes across Mexico obtained from satellite captures and categorized into six classes. Given that these are HD colored images, I performed feature extraction for our MLP and for the CNN I only resized the images to a lower resolution.
The third and final dataset is comprised of blood images to classify white blood cells. Given that these images have HD resolution and color, I also performed feature extraction for the MLP in the same way as the satellite dataset and for the CNN I only resized the images to a lower resolution.
More detailed information on the datasets and feature extraction here:
Classical ML vs Neural Networks
Methodology and Architecture
The architectures used in all 3 datasets followed the same basic structure, involving a sequence of convolutional and pooling layers followed by fully connected layers. After exploring different configurations within the hardware limitations of my personal laptop (sorry about that), this was the structure of largest network (used with the satellite dataset):
The network started with a convolutional layer with 32 filters of size (3, 3), employing the Leaky ReLU activation function and a max pooling layer to reduce spatial dimensions. Then there are 2 convolutional layers with 64 filters each, also followed by pooling layers.
After the convolutional and pooling layers, the network utilized a flattening layer to transform the multidimensional feature maps into a single vector.
Following this there were 4 dense layers (fully connected) containing 256 neurons each, with the Leaky ReLU activation function and incorporating dropout layers after each dense layer (with a dropout rate of 0.5) to prevent overfitting. The final output layer comprised 6 neurons (representing the 6 classes) activated by the softmax function for classification.
Results
Across the Blood, Satellite, and Fashion datasetsthe CNN achieved accuracies of 96.5%, 90%, and 86%, respectively vs very similar scores by the MLP which were 96%, 89.3% and 86.1%, respectively. These results show that CNN's regularly outperform or acheive similar performance to traditional NN models which require extensive and manual feature extraction to perform. This proves CNN's superiority over traditional methods given that they learn these features on their own, reducing manual workload.
Conlcusion
In conclusion, the evolution of Convolutional Neural Networks has significantly transformed the landscape of image classification. Their architecture, tailored for image data, empowers them to discern complex patterns autonomously, surpassing traditional methods reliant on manual feature engineering and establishing CNNs as the leading model in image analysis.
References
Images:
https://saturncloud.io/blog/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way/
Posted on November 30, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.