Title image Spring 2019

Assignment 3: 2D Object Recognition

Due 14 March 2019 (2 weeks)


This project is about 2D object recognition. The goal is to have the computer identify a specified set of objects placed on a white surface in a translation, scale, and rotation invariant manner from a camera looking straight down. The computer should be able to recognize single objects placed in the image and identify the object an output image. If provided a video sequence, it should be able to do this in real time.


For development, you can use this training set. Once you have a working system, test it out on one of the stages in the robot lab. You are also free to make a stage of your own if you want to use your own computer/camera pair. The stage should provide a clean, white or grey surface on which you can place the objects, ideally without strong shadows.

Your system needs to be able to differentiate at least 10 objects. Eight objects are provided in the training set; you get to choose the other two. You can use this version of vidDisplay.cpp in order to capture data from video to a file.

Your system should be able to take in two types of inputs: a live feed from a camera or a set of images. You can choose whether the set of image should be provided as a directory or as a list of images files.


  1. Using the video framework from the first project (if you wish), start building your OR system by implementing a thresholding algorithm of some type that separates an object from the background. Give your system the ability to display the thresholded video (remember, you can create multiple output windows). Test it on the complete set of objects to be recognized to make sure this step is working well. The objects to be recognized are either dark or fairly saturated in color.
  2. The next step is to run connected components analysis on the thresholded image to get regions. You may need to do some morphological processing on the thresholded image to get rid of spurious regions and fill in holes in your objects. Give your system the ability to display the regions it finds. A good extension is to enable recognition of multiple objects simultaneously.

    OpenCV 3.1 has a connected components function. Alternatively, you can write your own or use this function with this include file.

  3. Write a function that computes a set of features for a specified region given a region map and a region ID. You probably want to choose features that are translation, scale, and rotation invariant. Give your system the ability to display at least one feature in real time on the video output. Then you can easily test whether the feature is translation, scale, and rotation invariant by moving around the object and watching the feature value. Start with just 2-3 features and add more later.
  4. Enable your system to collect feature vectors from objects, attach labels, and store them in an object DB (e.g. a file). In other words, your system needs to have a training mode that enables you to collect the feature vectors of known objects and store them for later use in classifying unknown objects. You may want to implement this as a response to a key press: when the user types an N, for example, have the system prompt the user for a name/label and then store the feature vector for the current object along with its label into a file. This could also be done from labeled still images of the object, such as those from the training set.
  5. Enable your system to classify a new feature vector using the known objects database and a scaled Euclidean distance metric [ (x_1 - x_2) / stdev_x ]. Feel free to experiment with other distance metrics. Label the unknown object according to the closest matching feature vector in the object DB. Have your system indicate the label of the object on the output video stream. An extension is to detect when an unknown object (something not in the object DB) is in the video stream or provided as a single image.
  6. Implement a different classifier system of your choice. For example, implement K-Nearest Neighbor matching with K > 1. Note, KNN matching requires multiple training examples for each object.
  7. Evaluate the performance of your system on this test set. Build a confusion matrix of the results showing true labels versus classified labels and include that in your report.
  8. Take a video of your system running in real time and classifying objects as you put them on and off the stage.



For this project, make a wiki page that begins by explaining your overall pipeline for OR. Your audience is your fellow CS majors not in the course.

Explain the features you used for the OR process, how you computed them, and the classification methods you tested. Your audience for this section is other students in the course.

Your project report should include the images of 3 objects of your choice from your training set and then images of the same 3 objects being recognized by the system.

The final piece of your report should provide a quantitative metric that describes the performance of the system on the OR task.

If you did any extensions, describe the algorithms and show at least one example for each extension.

Give your wiki page the label: cs365s19project03


Put a zip file or tar file of your code in your Private Courses handin directory. Make sure to differentiate it from the first two projects.