CS 251: Lab #2

Lab 2: Managing Data

Project due Monday night Feb 24, 2014

The purpose of this week's lab is to develop a Data class that can read in a properly formatted csv data file.

Data Format

In order to make reading the data straightforward, we're going to use a general format for the data that simplifies the task. In general, the data should have the following properties.

Data Class

The Data class should have methods that tell it to read in a csv data file as described above and to access data (e.g. retrieve all the data for a particular column). The data will be stored in two forms.

  1. Raw form: In the raw form, each line is a list of strings (i.e. the result of reading in the line and calling split(',')). We keep the raw data in this format so that we have an accurate view of the file and so that that we can retrieve values of non-numericl columns.
  2. Matrix form: Most of the analysis and display will use only the numeric columns of the data. We store those in a numpy matrix.

In lab, we will write the methods associated with the raw data. They will all have the word raw in them. In the project, we will write the methods associated with the numeric data.


  1. Create a python file data.py and write the code for the Data class. The constructor for the Data class should have the option of taking in a filename and then reading the data from the file. The data file should be in the format described above. You may also want your constructor to be able to take in a list of lists that represents a data set, but this is optional.
  2. Create a method for reading the data from a file. The method should put the original data in string format into a list of lists, with one sublist for each data point. In addition, the method should store the headers and types read from the data file. (Note: You might want to write just part of the read method and then test it by writing the accessor methods that get at what you have so far.)

    You will likely want to use this set of fields to manage the raw data:

    • raw_headers (list of all headers)
    • raw_types (list of all types)
    • raw_data (list of lists of all data. Each row is a list of strings)
    • header2raw (dictionary mapping header string to index of column in raw data)

    Note: Once you get to the project, you will need to add fields and code to the read method in order to handle the numeric data.

    Note: You may test your Data class using testdata1.csv and testdata2.csv.

  3. Write at least these helpful accessor methods. Note that to extract specific columns from a Data object, you will use the column's header (as apposed to an index).
    • get_raw_headers: returns a list of all of the headers.
    • get_raw_types: returns a list of all of the types.
    • get_raw_num_columns: returns the number of columns in the raw data set
    • get_raw_num_rows: returns the number of rows in the data set
    • get_raw_row: returns a row of data (the type is list) (Note: since there will be the same number of rows in the raw and numerica data, Stephanie is writing just one method and isn't added the name raw to this one. You can do something different if you want.)
    • get_raw_value: takes a row index (an int) and column header (a string) and returns the raw data at that location. (The return type will be a string)
  4. You may test your methods with lab2_test1.py if you would like to.
  5. Create a method that nicely prints out the data to the command line.

When you are done with the lab exercises, you may start on the rest of the project.