Finding a Data Set
Due midnight, Thursday 11 February 2010
The purpose of this assignment is to get you thinking about data: its forms, formats, and meta-data.
Find a small (10) to medium (1000) size data set that has at least two
(three is better) variables per data point. For example, the data
might be high and low temperature and average wind speed for a weather
station over the course of a year.
Check out several different options before choosing one. Bug your friends doing projects, find interesting projects on the web, or grab data from the World Bank. One interesting site for climate data is the Biogeoinformatics of Hexacorals.
Alternatively, you are welcome to collect your own data set. It should have at least 10 data points.
- Once you have picked a data set, make sure you also get the meta-data. You will need it to answer the questions below.
- Using whatever tool you wish, make some plots of your data set. Include them in your writeup.
- Write a python script that can read in the data and calculate means or other similar statistics.
- Collecting your own data set, if well done, is a nice extension.
For this assignment, make a wiki page, link your data to the page and answer the questions below.
- Give a one sentence description of your data.
- What is the source of the data?
- What is the dimensionality of the data?
- What is the size of the data set?
- What is the format of the data?
- What is the precision of the data? Is this the native precision or the precision of the format? Can you figure this out?
- Does your data set contain missing values? If so, how are missing values represented?
- What is the range of each variable in the data set?
- What information could you find on error in your data set? Could you define error bars for each data value
Make your writeup for the project a wiki page in your personal space.
Once you have written up your assignment, give the page the label:
You can give any page a label when you're editing it using the label field at the bottom of the page. Once it is properly labeled, it should show up on the course wiki page