CS 251: Assignment #6

Principal Components Analysis

Due Monday 6 April 2015

The goal of this week's lab is to add the capability to execute PCA on a data set and then create plots based on the analysis.


Tasks

  1. In your GUI, enable the user to pick a data file, execute a PCA analysis, store the result, and then create an entry in a listbox that links to the result. The basic capability should allow the user to pick and choose which columns of the original data to use in the PCA analysis. Each new analysis should show up as a new entry in the analysis list box. Allow the user to delete an existing analysis.
  2. In your GUI, enable the user to select an analysis from the listbox and view the data projected onto the first three eigenvectors. An extension is to allow the user to pick the columns to plot.
  3. In your GUI, somehow enable the user to see the eigenvectors and eigenvalues of a selected PCA analysis. For example, show them in a dialog window as a table.

    Eigenvector Table

    Note the second and third columns, which show the eigenvalues and the cumulative percentage of the eigenvalues from largest to smallest. In this case, the first five eigenvectors explain 92% of the variation in the data set.

  4. Using the Australia Coast data set, compute the PCA analysis on the columns: premin, premax, salmin, salmax, minairtemp, maxairtemp, minsst, maxsst, minsoilmoist, maxsoilmoist, and runoffnew. Then show a spatial plot of the data projected onto the first three eigenvectors. The plot should look something like the following.

    PCA plot

  5. Come up with an acronym or name for your program. Be creative. The success of your program may, in the end, be completely determined by how cool your acronym is. Then again, it's success may have something to do with the quality of your work. But it never hurts to have a cool name.

Extensions


Writeup

Write a brief description of how you implemented the PCA algorithm and modified your Data and Display/Application classes. Incorporate screen shots showing a visualization of the Australia Coast data set. Be sure to document and describe any extensions.

Handin

Once you have written up your assignment, give the page the label:

cs251s15project6

Put your code your private handin directory on Courses. Please make sure you are organizing your code by project. If you have any problems uploading the code, send the prof a zip file.