Title image Spring 2018

Final Project

The final project is to apply concepts you have learned in the course to a new data set, or to implement new or more complex versions of algorithms or concepts from the course. The first option is to develop novel visualizations and/or undertake an analysis of a selected data set, possibly customizing your GUI to the data set of interest. The second option is to implement analysis algorithms or machine learning techniques you have not implemented as part of the projects.

You may work with a partner for this project. Use the lab time to identify a partner, if any, and to explore potential projects. By the end of the lab period, you should have a plan for your final project.

If you are pursuing a project based on data, you should have selected a data set, defined a set of questions about the data set, and defined a set of analyses and/or visualizations designed to help you answer the question.

If you are pursuing a project based on implementation of a new algorithm, then you should have an initial design of the algorithm and supporting classes/code/libraries and have selected a data set on which to evaluate the algorithm.

Data sets you can consider for this project include the following.

If you are using any of the specific data sets listed above, you can get them from the Course_Materials folder on the Courses server.

Once you have selected a data set, pick 1-3 questions that you want to answer using the data set. These questions can be observational, such as identifying relationships between variables, or they can be predictive, such as predicting a dependent variable from a set of independent variables. Consider what kinds of useful visualizations will be helpful in explaining the answers to the questions.

Selecting your questions should be done in consultation with the professor or the developer of the data set. For some data sets, the questions will have specific answers you can write programs to generate. In other cases, the questions will be answered by a program that lets the user interact with the data.

After selecting your questions, develop a plan of analysis that outlines the process and methods you will use to generate an answer. In some cases, this will involve simply using your analysis and visualization code. In other cases, it may involve designing code to create custom visualizations or analyses.

If you choose, instead, to implement a new algorithm, such as neural networks or an alternative machine learning algorithm, you still need to select a target data set on which to evaluate your implementation. It's probably a good idea to pick both a very simple data set for testing and a more interesting data to demonstrate the algorithm generalizes. If you pick a project like this, keep it simple.

Write up your project design as a wiki page with the label cs251s18project9. Do this during the lab time. When you are done with your design, then start executing your plan to complete the final project.