Project 5: Principal Component Analysis
Project due Friday night Mar 20, 2014
The purpose of this assignment is to integrate your PCA analysis into your display application.
- In your GUI, enable the user to pick a data file, execute a PCA analysis, store the result, and then create an entry in a listbox that links to the result. The basic capability should use all of the numeric columns of the selected data set. An extension is to allow the user to pick and choose columns.
- In your GUI, enable the user to select an analysis from the listbox and view the data projected onto the first three eigenvectors. An extension is to allow the user to pick the columns to plot.
- In your GUI, somehow enable the user to see the eigenvectors and eigenvalues of a selected PCA analysis. You can use plots, project the eigenvectors onto the original data, or just throw up a window with the numeric values in a table.
- Come up with an acronym or name for your program. Be creative. The success of your program may, in the end, be completely determined by how cool your acronym is. Then again, it's success may have something to do with the quality of your work. But it never hurts to have a cool name.
- Enable reading and writing PCA data as a CSV file. You will need to somehow store the eigenvectors, eigenvalues, and column averages along with the projected data. Note that there should be as many eigenvectors, eigenvalues, and means as there are columns of data.
- Add other features, like the ability to name an analysis.
- Enable the user to select which columns to use for the PCA analysis.
- Enable the user to select up to five columns from the PCA analysis to plot (x, y, z, color, size).
- Enable the user to select up to five columns, intermixed from the original data and the PCA analysis to plot.
- Demonstrate your system on your data from project 2.
- A super extension would be to plot the two or three eigenvectors on top of the data (like this image on the Wikipedia page for PCA). This will demonstrate visually how much of each of the displayed dimensions is "involved" in the eigenvectors. You will need to center the vectors at the mean of the data and then compute the endpoints using information about the direction of the vector its length, and the extents of the data in columns a, b, and c.
Writeup and Hand-in
Write a brief description of how you implemented the PCA algorithm and modified your Data and Application classes. Incorporate screen shots showing a visualization of the provided data set and another data set of your choice.
Once you have written up your assignment, give the page the label:
Put your code your private directory on Courses/CS251. Please make sure you are organizing your code by project. If you have any problems uploading the code, send the prof a zip file.