Introduction to Image Classification via Neural Networks

7 min readApr 8, 2021

A Beginner’s Guide

Introduction

In today’s world, there is much talk about artificial intelligence. It can seem very complicated or maybe even scary, but this is not the case. Implementing certain models can actually be fairly easy and quick thanks to the creation of libraries and programming languages. This article will work through an image classification example using a neural network from start to finish. This tutorial is aimed at those with some familiarity with Python programming and who want to start learning about neural networks, specifically with the task of image classification.

If you do not have experience with Python you can refer to this article and go through the different tutorials and refer back to this article.

Background: What is a neural network?

Neural networks are now increasing in popularity across many industries due to their ability to solve many problems such as image recognition, forecasting, and natural language processing. Neural networks come from the idea of the neurons in our brain. The network is comprised of nodes that receive input from previous nodes or initially from an external input. Each input is associated with a weight that is updated to emphasize the importance of that node relative to other inputs. In the end, the weighted inputs are summed then put through an activation function resulting in an output. There are many types of activation functions. If you want to learn more you can refer to the following article.

https://www.datacamp.com/community/tutorials/deep-learning-python

A variation of a neural network, that we will develop in this tutorial, is a convolutional neural network (CNN). CNNs are typically used for vision learning problems. CNNs are particularly good for vision (images and videos) problems because they can capture spatial dependencies and do not require the extensive pre-processing of data.

Convolutional Neural Networks consist of three types of layers:

Convolutional Layer — are the layers where filters are applied to the original image, or to other feature maps in a deep CNN.
Pooling Layer — perform a specific function such as max pooling, which takes the maximum value in a certain filter region, or average pooling, which takes the average value in a filter region. These are typically used to reduce the dimensionality of the network.
Fully Connected Layer — placed before the classification output of a CNN and are used to flatten the results before classification.

An example of a typical CNN architecture can be seen below.

For the purposes of this article, I will not dive deep into the specifics behind the convolutional neural networks. An in-depth explanation can be found in the following article.

Pre-Requirements

In order to follow along with this tutorial, you must have some experience working with python and python notebooks. For convenience, you can access a notebook via Google Colab. Also, you can download the data from this link. Once you download the data and open a new Colab notebook you can follow the tutorial in this article.

There are various libraries that will be utilized in this tutorial.

NumPy —a fundamental package for scientific computing in Python.
Matplotlib — a library for creating static, animated, and interactive visualizations in Python.
Keras — a library that provides a Python interface for artificial neural networks.
Sklearn — a library that provides simple and efficient tools for predictive data analysis.

Below is how these libraries can be imported for use.

Dataset

The dataset consists of images that are about 320x240 pixels and have a corresponding (RGB) value for each pixel. RBG stands for red, green, and blue. All photos are not in the same proportion. The dataset contains 4242 images of flowers. The flowers consist of the classes daisy, dandelion, rose, sunflower, and tulip. For each class, there are about 800 photos.

Import Dataset

First, we have to import the dataset into the Colab notebook. We must connect Colab to our drive. The files need to exist in your Drive they can be imported from your local device to Drive.

Connect Google Drive to Colab to access files.

When reading in the files we can use Keras image processing to convert the image files to numerical representations that can be fed into our model.

Convert image files to numerical representations. Implementation courtesy of https://www.kaggle.com/mohitkarelia/flower-classification/notebook

Next, we must convert these numerical representations to Numpy arrays in order to be compatible with our model input type.

Convert lists of numerical images and labels to NumPy arrays.

We can use the matplotlib imshow() function to view the compressed image.

Visualization of a compressed image in the dataset.

Prepare Data for Test and Training

Next, now that the dataset is loaded we want to split it into a training and test set. For this, we will use a 75–25 training-test split.

Next, we want to reshape the image datasets and normalize the RGB values. Also, we want to convert the labels to be categorical. We divide the RGB values by 255 to normalize them, see lines 4 and 5.

Create Neural Network

For the model, we need to declare our input shape. The input shape will be the dimensions of our inputted images by three (64x64x3). The three values represent each RGB value. We also have defined our output space which will be a vector of the length of the number of classes (1x5). For the implementation, we will be using the Keras library.

To initialize the model we need to create an object. To do so, you need to set a variable equal to a Sequential() constructor, as seen below. This model consists of 5 convolutional layers, 4 max-pooling layers, and 1 fully connected layer. These are created using Conv2D, MaxPooling2D, and Dense layers respectively. Documentation on the inputs for each of these layers can be found at the links embedded. The finally fully connected layer is then run through a softmax activation function in which an output of size (1x#classes) will contain probabilities of what class the image should be classified as. These probabilities will sum to 1. To add these layers you must call the .add() function on the model you create.

CNN model definition.

By calling model.summary() we can view the number of parameters in the model and the output shape after each layer.

Next, we need to identify the loss, optimizer, and metrics we want the model to use while training. We will use categorical cross-entropy loss which is used when there is a multi-class labeling problem and when using a softmax activation function. Categorical cross-entropy loss is used in this case due to the fact it will calculate a score that sums the average difference between the actual and predicted probability distributions for all classes. The formula can be seen below. The optimizer is an algorithm that determines how the model’s weights are updated. We will use the Adam optimizer. For metrics, we will be using accuracy.

https://gombru.github.io/2018/05/23/cross_entropy_loss/

To set these parameters we will use the model.compile() function.

Set model parameters.

Train Model

To train the model we will call the fit() method and feed the training data to the model. The fit() method takes in two other arguments:

batch_size — number of samples to work through before updating model parameters
epochs — number of times to fully pass through a dataset in training

Fit model. Set batch size and the number of epochs.

Once the model begins training we can monitor the loss for the model. Ideally, we want to minimize this loss. This infers that the model is learning.

Output while the model is training. Can see the loss is decreasing.

Results

After training is complete we can test our model’s performance using the test set. To do so we can use the model.predict() method. This will be called on the test set of unseen images. Then we can use a max over the columns to select the classification with the highest probability. This can then be compared with the images true labels using classifcation_report() from Sklearn. This will give us the precision, recall, and f1-score for each class. You can learn more about these metrics in this article.

Predict labels for the test set. Calculate evaluation metrics.

Conclusion

In this article, we learned how to implement a simple CNN architecture to classify flower images. This method can be applied to many different use cases. To learn more in-depth about neural network architectures and applications please refer to the sources below.

About Me

I am an undergraduate student in Data Science at Northeastern University. I have academic and professional experience in programming, machine learning, data analytics, and data visualization. I have completed many projects involving machine learning algorithms and have taken courses in advanced machine learning methods.