CS 251: Assignment #1

Finding a Data Set

Due midnight, Thursday 10 February 2011

The purpose of this assignment is to get you thinking about data: its forms, formats, and meta-data.


Tasks

  1. Find a small (10) to medium (1000) size data set that has at least two (three is better) variables per data point. For example, the data might be high and low temperature and average wind speed for a weather station over the course of a year.

    Check out several different options before choosing one. Bug your friends doing projects, find interesting projects on the web, or grab data from the World Bank. One interesting site for climate data is the Biogeoinformatics of Hexacorals.

    Alternatively, you are welcome to collect your own data set. It should have at least 10 data points.

  2. Once you have picked a data set, make sure you also get the meta-data. You will need it to answer the questions below.
  3. Write a short Python program to read the data into a list--usually a list of tuples--and then calculate and print the average for each variable.

Extensions


Writeup

For this assignment, make a wiki page, link your data to the page and answer the questions below.

  1. Give a one sentence description of your data.
  2. What is the source of the data?
  3. What is the dimensionality of the data?
  4. What is the size of the data set?
  5. What is the format of the data?
  6. What is the precision of the data? Is this the native precision or the precision of the format? Can you figure this out?
  7. Does your data set contain missing values? If so, how are missing values represented?
  8. What is the range of each variable in the data set?
  9. What information could you find on error in your data set? Could you define error bars for each data value

Handin

Make your writeup for the project a wiki page in your personal space.

Once you have written up your assignment, give the page the label:

cs251s11project1

You can give any page a label when you're editing it using the label field at the bottom of the page. Once it is properly labeled, it should show up on the course wiki page