Assignment 5: Recognition Using Deep Networks
Due 25 April 2019 (2 weeks +)
This project is about learning how to build, train, analyze, and modify a deep network for a recognition task. We will be using the MNIST digit recognition data set, primarily because it is simple enough to build and train a network without a GPU, but also because it is challenging enough to provide a good example of what deep networks can do.
As this is the last defined project, you may want to think about this project as a lead-in to your final project.
The first task is to build and train a network to do digit recognition using the MNIST data base, then save the network to a file so it can be re-used for the later tasks. My strong recommendation is to use the python Keras package, which is built on top of the Tensorflow package.
The python Keras package is generally available through standard package managers (Anaconda, MacPorts, apt, dpkg). You will also want the python bindings for openCV, numpy, and the matplotlib package.
There are a number of different tutorials on building and training a convolutional network to solve the MNIST digit recognition task. Some example Keras code is at the Keras examples github folder.
- The MNIST digit data consists of a training set of 60k 28x28 labeled digits and a test set of 10k 28x28 labeled digits. The data set can be imported directly from the Keras package, though you will need to resize it to apply it to the convolutional network. Use matplotlib or OpenCV to look at the first two example digits. Include these in your report.
- In order to make your code repeatable, be sure to use np.random.seed(42) at the top of your code file. Remove this line if you want to create different networks.
- Similar to the example above, create a network with two convolution layers with 32 3x3 filters, a max pooling layer with a 2x2 window, a dropout layer with a 0.25 dropout rate, a flatten layer, a dense layer with 128 nodes and relu activation, a second dropout layer with a 0.5 dropout rate, and a final dense layer for the output with 10 nodes and the softmax activation function. When compiling the model, use categorical cross-entropy as the loss function and adam as the optimizer. The metric should be accuracy.
- Train the model for 10-12 epochs, one epoch at a time, evaluating the model on both the training and test sets after each epoch. Collect the accuracy scores and plot the training and testing accuracy in a graph. Include this in your report.
- When the network is trained, save it to a file. Then run the model on the first 10 examples in the test set and have your program print out the 10 output values (use only 2 decimal places), the index of the max output value, and the correct label of the digit. The network should correctly classify all 10 of the examples.
- Write out the ten digits [0-9] in your own handwriting on a piece of paper (not too close together). Take a picture of the digits and crop out each digit to its own square image. As the last step in your code, read in the images, convert them to greyscale, resize them to 28x28 images and run them through the network. How did the network do on this new input?
The second task is to examine your network and analyze how it processes the data. Make this a separate code file from task one. Read in the trained network as the first step.
- Get the weights of the first layer. Note that different versions of Keras use a different index for the first layer. Keras on the dwarves uses index 1 as the first layer. It may be index 0 on more recent versions.
Construct the 32 3x3 filters from the first layer, print them to the terminal, and visualize them in matplotlib. You can use the pyplot functions figure, subplot, and imshow to make an 8x4 grid of figures such as the image below.
Use OpenCV's filter2D function to apply the 32 filters to the first training example. Generate a plot of the 32 filtered images, as below. In your report, note whether the results make sense given the filters in the prior step.
Build a new model using just the first layer. For example, use:
first_layer_model = Model( inputs=model.input, outputs=model.get_layer(index=first_index).output )
Apply the first layer model to the first training example, then generate a plot of the 32 26x26 output images. Compare these to the results of running the filters on the original images in OpenCV.
- Repeat the prior task for a model with the first two convolutional layers, and repeat it again after adding in the 2x2 pooling layer. Again, make two 32 element plots. The first should be 24x24 images, and the second should be 12x12 images.
- Show the output of the pooling layer for 2-3 other digits and visually compare them. Can you detect any critical features?
The third task is to use the trained network as an embedding space for images of written symbols. In this case, you'll use it to differentiate images of the greek letters alpha, beta, and gamma. Make new code files from the prior tasks. Read in your trained network as the first step.
- Download this collection of 27 280x280 images of greek symbols. There are nine images each of alpha, beta, and gamma. Write a program to read in the images, scale them down to 28x28, convert them to greyscale, and write out the data set as two CSV files: one file should contain a header row and 27 data rows with 784 intensity values in each row, one file should contain a header row and 27 data rows with just the category (alpha = 0, beta = 1, gamma = 2). If you're smart about your design, this program should be able to take in any number of input images with any number of different categories identified by the filename, just in case you want to try adding more greek symbols.
- Read in the trained model and build a new model from the old network that terminates at the Dense layer with 128 outputs. Show that if you apply the network to the first training input that it gives 128 numbers as its output.
- Apply the truncated network to the greek symbols (read from the CSV file) to get a set of 27 128 element vectors.
- Select one example from each set of nine (i.e. one alpha, one beta, and one gamma). Compute the sum-squared distance in the 128 dimensional embedding space between each example and all 27 examples. Note that the SSD with itself should be 0. For each example show all 27 SSD values. What pattern do you find? How well would a KNN classifier in the embedding space do for this task?
- Take a picture of a couple of examples of your own alpha, beta, and gamma symbols. Crop and rescale them as appropriate and see if they match their corresponding symbol examples in the embedding space.
The final task is to undertake some experimentation with the deep network for the MNIST task. Your goal is to evaluate the effect of changing different aspects of the network. Pick at least three dimensions along which you can change the network architecture and see if you can optimize the network performance and/or training time along those three dimensions. Do your best to automate the process.
Potential dimensions to evaluate include:
- The number of convolution layers
- The size of the convolution filters
- The number of convolution filters in a layer
- The number of hidden nodes in the Dense layer
- The dropout rates of the Dropout layers
- The size of the pooling layer filters
- The number or location of pooling layers
- The activation function for each layer
- The number of epochs of training
- The batch size while training
- Come up with a plan for what you want to explore and the metrics you will use. Determine the range of options in each dimension to explore (e.g. L options in dimension 1, M options in dimension 2, and N options in dimension 3). You don't have to evaluate all L * M * N options unless you want to. Instead, think about using a linear search strategy where you hold two parameters constant and optimize the third, then switch things up, optimizing one parameter at a time in a round-robin or randomized fashion. Overall, plan to evaluate 50-100 network variations (again, automate this process).
- Before starting your evaluation, come up with a hypothesis for how you expect the network to behave along each dimension. Include these hypotheses in your report and then discuss whether the evaluation supported the hypothesis.
- Run the evaluation and report on the results.
- Evaluate more dimensions on task 3.
- Try more greek letters and build an actual KNN classifier that can take in any square image and classify it.
- Explore a different computer vision task with available data.
- There are many pre-trained networks available in the Keras package. Try loading one and evaluate its first couple of convolutional layers as in task 2.
- Replace the first layer of the MNIST network with a filter bank of your choosing (e.g. Gabor filters) and retrain the rest of the network, holding the first layer constant. How does it do?
- Build a live video digit recognition application using the trained network.
Describe the results of each task. For the final task, present your hypotheses about how you think the network behavior will change with varying parameters and compare it to what you found in your evaluation.
Clearly describe any extensions you undertook.
Give your wiki page the label: cs365s19project05
Put your code in your Private Courses handin directory.