MLP implementation in Python with PyTorch for the MNIST-fashion dataset (90+ on test).
- Roei Gida
- Tomer Shay
Implementation of a neural network on the MNIST dataset, which takes as an input a 28*28 grayscale image (784 floating point values of pixels between 0-255).
The program contains about seven models of different networks, implemented through pytorch. The last layer size of all the networks is 10 neurons with the Softmax activation function.
During learning, the network verifies its accuracy on an independent set of data on which learning is not performed. This group is called a validation set. After all the epochs, the network saves its best state, the weights that resulted the maximum accuracy on the validation set, to prevent overfitting.
Finally, the network exports a graph of the accuracy on the training, validation, and the testing sets, by the number of epochs, and prints the final accuracy on the testing set.
The seven models architecture:
Model A:- number of hidden layers: 2
- sizes of the layers: [786, 100, 50, 10]
- activation function: [ReLU, ReLU, Softmax]
- optimizer: SGD
- learning rate: 0.12
- No batch normalization, no dropout
Model B:- number of hidden layers: 2
- sizes of the layers: [786, 100, 50, 10]
- activation function: [ReLU, ReLU, Softmax]
- optimizer: ADAM
- learning rate: 0.0001
- No batch normalization, no dropout
Model C:- number of hidden layers: 2
- sizes of the layers: [786, 100, 50, 10]
- activation function: [ReLU, ReLU, Softmax]
- optimizer: ADAM
- learning rate: 0.0001
- dropout: 20% on the 3rd layer (size of 50)
- No batch normalization
Model D:- number of hidden layers: 2
- sizes of the layers: [786, 100, 50, 10]
- activation function: [ReLU, ReLU, Softmax]
- optimizer: ADAM
- learning rate: 0.01
- batch normalization: before the activation function (ReLU) on each of the hidden layers.
- No dropout
Model E:- number of hidden layers: 4
- sizes of the layers: [786, 128, 64, 10, 10, 10]
- activation function: [ReLU, ReLU, ReLU, ReLU, Softmax]
- optimizer: SGD
- learning rate: 0.1
- No batch normalization, no dropout
Model F:- number of hidden layers: 4
- sizes of the layers: [786, 128, 64, 10, 10, 10]
- activation function: [Sigmoid, Sigmoid, Sigmoid, Sigmoid, Softmax]
- optimizer: ADAM
- learning rate: 0.001
- No batch normalization, no dropout
Best Model:- number of hidden layers: 4
- sizes of the layers: [786, 512, 256, 128, 64, 10]
- activation function: [Leaky ReLU, Leaky ReLU, Leaky ReLU, Leaky ReLU, Softmax]
- optimizer: starts with ADAM, then to SGD
- learning rate: 0.001
- batch normalization: before the activation function (Leaky ReLU) on each of the hidden layers.
- dropout: 10% on the input layer (size of 784), 3rd layer (size of 256), and 5th layer (size of 64).
To get the best percentages on the testing set (90+), our experiments showed that the Best Model should be run for about 30 epochs, with batch size = 64 and validation percentage = 10%.
The program gets several arguments, and this can be seen with the -h or with -help flags when running. A total of about ten arguments can be sent:
- flag
-train_x STRING: AStringfor the training images file path (file that contains 784 values in each row). NOTE: this flag will be used only if-local Truewas enterd. - flag
-train_y STRING: AStringfor the training labels file path (file that contains one value between0-9in each row and has the same rows number as the train_x file). NOTE: this flag will be used only if-local Truewas enterd. - flag
-test_x STRING: AStringfor the testing images file path (file that contains 784 values in each row). NOTE: this flag will be used only if-local Truewas enterd. - flag
-test_y STRING: AStringfor the testing labels file path (file that contains one value between0-9in each row and has the same rows number as the train_x file). NOTE: this flag will be used only if-local Truewas enterd. - flag
-e INT: AnIntegerfor the number of epochs (default value = 10). - flag
-batch_size INT: AnIntegerfor the batch size (default value = 64). - flag
-validate INT: AnIntegerfor the percentage of the training set that should be allocated to the validation set (default value = 10). - flag
-model STRING: AStringthat says with which model to work in the program run. You can send'A'-'F'or'BestModel'(default value = BestModel). - flag
-local BOOLEAN:Trueto load the dataset locally (according to the paths entered), orFalseto load the original MNIST-fashion dataset (default value = False). - flag
-plot BOOLEAN:Trueto export a graph of the percentage of accuracy and loss value in each epoch (default value = True).
running example:
$ python3 mian.py -train_x train_x -train_y train_y -test_x test_x -test_y test_y -local True
Note that for using the dataset given in this repo, you need to unzip the dataset.zip folder (using 7-zip for example)
- Open the terminal.
- Clone the project by:
$ git clone https://github.com/tomershay100/MLP-MNIST-fashion-with-PyTorch.git - Enter the project folder:
$ cd MLP-MNIST-fashion-with-PyTorch - You can unzip the
dataset.zipfor local running:$ unzip dataset.zip - Run the
main.pyfile with your favorite parameters:$ python3 main.py -e 30 -validate 10 -model BestModel -batch_size 64
As you can see, there are several additional files. The files contain different graphs of different experiments of the program with some changes of the hyperparameters (e.g. learning rate, batch size, etc.).