Project 6: Clustering Analysis
Project due Monday night Apr 7, 2014
The purpose of this assignment is to make it possible to visualize your clustering results. We will do this by augmenting data files with their clustering results. This way, you can visualize the result without making any changes to your display program. If you would like to explicitly add clustering to your display, that will be an extension.
- Add the necessary code to your Data class to make it possible to write a data file. Your goal will be to write out the data used to do the clustering plus the cluster indices that are returned from the k-means function. You may want to write a specialized write routine to do this, or you may want to use two steps (1. add the cluster indices as a new column, 2. write the Data object out to a file). (The latter is preferred.) (Note also that you may want to include all columns of the data and the cluster indices, rather than just the columns used for the clustering and the cluster indices. That way, you can cluster the Australia data set without latitude and longitude, but display the clusters with latitude and longitude.)
- Test your clustering on the file AustraliaCoast.csv. Use 10 clusters and visualize the result using three variables, such as premax, maxairtemp, and maxsoilmoist.
- Execute a clustering on a data file of your choice, using the analysis to demonstrate something about the data set. If necessary, you may construct a data set that works well with clustering. If that is what you do, then you should describe how you created the data set.
- Make your k-means functions more flexible. For example, you could make it possible to use different distance metrics (and the appropriate methods for computing the "means"). Another option is to develop new methods of initialization (such as randomly assigning points to an initial cluster).
- Implement other clustering methods.
- Integrate clustering capabilities into your display application.
- Enable the user to view the cluster means. Give the cluster means names.
- Run PCA on the AustraliaCoast data set and then cluster using the first three PCA dimensions. Visualize the result in PCA space.
For this week's writeup, create a wiki page that shows your clustering visualizations, analyzes the results of your clustering from task 3, and explains any extensions.
Once you have written up your assignment, give the page the label:
Put your code in a folder named Proj6 on the Courses/CS251 server in your private subdirectory.