Assignment 4: Real-time 2-D Object Recognition
Part 1: due 6 November
Part 2: due 15 November
This project will introduce you to both 2-D object recognition and real-time image processing. The first week will focus on building the recognition system. The second week will focus on the real-time processing aspects. Because of the real-time goal, it is important to think about efficiency and memory management in week 1.
Work on this project with a partner.
Download the data set. The data set is a collection of pictures of the ten objects we will be trying to differentiate. The data consists of ten 320x240 pictures of each of the ten objects plus a set of images that contain multiple objects. For the procedure below, use the first six images of each object for training and the last four images of each object for testing.
The following procedure walks you through a gallery method of pattern recognition. You will use the training set to build a mean feature vector for each object and then compare new inputs to the set of mean feature vectors to find the closest match. The new input then gets the label of the closest match or is labeled as unclassified if the match is not good enough.
Once you have a working baseline system, design a second method that uses a different method of classification. The simplest thing to do is change to a k-nearest neighbor classifier and represent each object with multiple gallery examples. A more interesting second method would be to build something like a decision tree. You should feel free to capture more training images if you feel your method needs them.Baseline procedure:
- Download the video skeleton code .
- Type make in the src directory to build the program disp. Please change the makefile rule to give it a different name.
- Inside the code, at the top of the event loop in the main function, there are two locations where you need to process images. In the first case, the image data is in xim1->data. In the second case, the data is in xim2->data. The number of rows is Ysize, and the number of columns is Xsize. The data is in a 4-bytes per pixel format. You will probably need to do some experiments to figure out how the data is stored in those four bytes.
- Your function should do something to indicate which digit or digits are visible. The simple method is to just use a printf statement to specify the values of all visible digits and their centroids. A more interesting method is to color the digit regions, put bounding boxes around the digits, or even draw text in the output image (man XDrawText).
- Instead of K-nearest neighbor for a second method, try implementing something like a decision tree.
- Take more training data and see if the results improve using more data. The difference will likely be more apparent using a method like K-NN or a decision than it will using a single mean per object.
- Time your program and see how fast it is really running.
- Design your program so you can easily add a new object to the database (we have letters as well as numbers).
- Design your program so you can add an object to it using the real-time interface.
Follow the writeup instructions to create a web page for your assignment. Send the instructor an email with the code in a zip or tar file along with instructions on how to compile and run it as well as a pointer to the URL for the writeup.