Data Analysis and Visualization
Due May 16, 2011
Pick a data set that can be used to do one of the following.
- answer a scientific question,
- answer a question with practical significance, or
- generate a software tool with a practical application.
As example of a scientific question, consider a data set on purple finch banding and recovery (when a banded bird is re-captured one or more times). There are observational questions about the mean, median, and range of distances the birds travel. There are also relationships of interest, such as whether the distances the birds travel is related to the season. This data set is one option to consider.
As an example of a practical question, consider loads on the Colby network. There are observational questions about the levels of traffic and relational questions about how traffic changes over time at periods of a day, a week, and a semester.
As an example of a tool with practical application, consider the handwriting digits data set from the last assignment. You could, using that data set as a training set, train a software tool for reading hand-written digits. There are many other examples of this, including data from a flow-cytometry machine in Biology that needs to be classified into categories.
- After choosing your data set, and discussing it with the professor, pick 1-2 questions you want to answer. Outline a set of visualizations and analyses that will let you answer the question. Order them by priority and tailor your analysis and visualization GUI to the tasks. You should feel free to simplify the system you have.
- Bird Arrivals Data: one of the goals of this data is to observe trends or changes over a single species, group of species, or all birds. Clustering may yield some interesting relationships or sub-groups within which to look for trends. It may also be interesting to look at the relationship between weather data in locations other than Maine and see if there are relationships with bird arrival times.
- Purple Finch [PUFI] Banding Data: this data provides information about banded purple finches and when they have been recaptured. Each recovery event shows the distance, direction, and time between banding and recover. A pdf in the Academics folder contains some scientific questions the data may help to answer.
- Flow Cytometry Data: this data is the result of measurements in a flow cytometry device. The primary use of the data is classification in to sub-classes and then analysis of the statistics of the sub-classes. The data is in FCS format, so you'll need to find a tool to convert it to a csv file.
- Eye-Tracking Data: the basic analysis for this data set is determining if there is a relationship between the different types of stimuli and the subject's eye behavior. What aspects of the subject's measured behavior are related to the type of stimuli? There are many possible relationships, and visualization and analysis tools need to enable this kind of exploration.
- Another data set of interest that you can access quickly and that does not have any privacy issues.
Your writeup should be in the form of a paper, no more than 6 pages in length, that follows the format of a traditional conference paper. You can do your writeup on the wiki or using latex. If the latter, ask and I'll give you a latex style file you can use.
- Abstract: 350 words or less.
- Introduction: a high-level description of the question or application and your approach to answering the question or building the application.
- Theory/Methods: a description of theoretical concepts and methods you used.
- Experiments/Design: a description of the process you used to answer the questions or the design process you used to build your tool.
- Results/Demonstration: a description of your results, graphs, visualizations, or capabilities.
- Discussion: a discussion of your results or the utility of your tool
- Conclusions/Summary: any final comments on your results, identifying the most important ones.
Once you have written up your assignment, give the page the label:
Put your code in the COMP/CS251/yourname/private/ folder on fileserver1/Academics in a project9 folder. Make sure the program runs properly and has all of the necessary files, data and otherwise.