Image classifier - CIFAR10

 Goal:

To improve the average accuracy of ten classes in CIFAR10 classifier.

Following link has detailed description of initial code:
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

Background:

We will use the CIFAR10 dataset. It has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.



cifar10


As part of initial code, following steps are done in order:

  1. Loading and normalizing the CIFAR10 training and test datasets using torchvision
  2. Defining a Convolutional Neural Network
  3. Defining a loss function
  4. Training the network on the training data
  5. Testing the network on the test data

Basic Definitions:

CNN are made up of neurons that have weight and biases. These are mainly applied to image networks. They consist of convolutional layers followed by activation filters and pooling layers. These layers are used to learn some features from images. After that we have a fully connected layer for classification task.

 
                                                             CNN : Convolutional Neural Network


In each convolutional layer, size of output image is

(W-F+2P)/S+1

where W => Input, F=>Filter, P=> Pooling, S=>Stride

In our case, input images are of 32*32 pixels in size, therefore W is 32



                                                                             Convolutional Layer



Max pooling is used to downsize an image by applying a maximum filter to separations . It reduces the computational cost by reducing the number of parameters that our model has to learn and helps in avoiding overfitting by providing the abstracted form of input.



                                                                              Max Pooling

Experimental Observations: 

Code changes were made for each case and run 4 times to analyze the variations in each run and consider the approximate value for analysis.

1. Taking 3 convolution layers with a max pool at the end 


Output of average accuracy in different runs : 42% 42 % 42% 42%  41%


2. Using softmax


Code change: return F.log_softmax(x) instead of return x


Output of average accuracy in different runs: 54% 53% 54% 54%


3. Using variation in input output of convolutional layer

I) input -> conv2d -> relu -> maxpool2d





II) input -> conv2d -> relu -> conv2d -> relu ->  conv2d -> relu -> maxpool2d

with small variation in output





Output of average accuracy in different runs: 30% 32% 28% 19%


III) input -> conv2d -> relu -> conv2d -> relu ->  conv2d -> relu -> maxpool2d

with large variation in output





Output of average accuracy in different runs: 44% 45% 44% 44%


4. Without pooling





Output of average accuracy in different runs : 56% 52% 54% 54%



5. Using single convolution layer in given code with higher output value





Output of average accuracy in different runs: 58% 57% 57% 56%


I) Using output channel value as 9


Output of average accuracy: 58%


II) Using output channel value as 12


Output of average accuracy: 59%


III) Using output channel value as 24


Output of average accuracy: 60%


IV) Using output channel value as 30


Output of average accuracy: 61%


V) Using output channel value as 40


Output of average accuracy: 63%


6. Using single convolution layer in given code with higher output value and increasing the number of epochs


I) Using number of epochs as 2


Output of average accuracy: 60%


II) Using number of epochs as 4


Output of average accuracy: 66%


III) Using number of epochs as 6


Output of average accuracy: 66%


IV) Using number of epochs as 8


Output of average accuracy: 66%


IV) Using number of epochs as 10


Output of average accuracy: 66%



Challenges faced:

In most of the cases, the value of average percentage varies in different runs. This makes it difficult to precisely differentiate one case with other. However, as the variation was not very large, the approximated values could be used for comparisons.

Analysis:

Based on the observations made,

1. The change in average accuracy with output channel value is seen to be linear as part of point 5 in experiments performed. Increasing the output channel value increased accuracy as follows. 







2. The change in average accuracy with number of epochs is seen to be increasing as part of point 6 in experiments performed as follows. It increases initially and then does not bring much change in accuracy.

Code:

Following is the link for final code in Jupyter notebook:

https://colab.research.google.com/drive/1_ssdELGkXLCHXvKoevxMR3ETmBh2q1yX?usp=sharing

Following is the link for final code in GitHub:

https://github.com/swatidamele/CNN/blob/main/cifar10_tutorial_final.ipynb

Outcome:

On applying the changes to output channel in the code and increasing the epochs, average accuracy of ten classes in CIFAR10 classifier improved from around 56% to 66%

References:

  • https://pub.towardsai.net/building-neural-networks-with-python-code-and-math-in-detail-ii-bbe8accbf3d1
  • https://www.youtube.com/watch?v=FTr3n7uBIuE&t=2100s
  • https://www.youtube.com/watch?v=pDdP0TFzsoQ

Contribution:

  • Increasing the output channel after reducing the number of layers increases the performance of the model.
  • Looking at the graph drawn as part of analysis we see that increasing the number of epochs can be used to improve the performance of the model. However, after some time it does not bring much change and gives same accuracy on increment. It takes somewhat more time to run the model with this change.

Comments

Popular posts from this blog

Garbage Classification

Sentiment Analysis using NBC

Study - Concept of overfitting using higher order linear regression