### Project 2: Managing Data

Project due Monday night Feb 24, 2014

The purpose of this assignment is to complete your Data class, giving it support for numeric data. You will also begin your data analysis module.

### Tasks

- Update your Data class to support the numeric form of the data. This will involve adding new code to your read method, new fields to your class, and new accessors.
- The new code will need to convert the string data in the numeric column to numbers. Store them in a numpy matrix.
- Here are the suggested fields
- self.matrix_data = np.matrix([]) # matrix of numeric data
- self.header2matrix = {} # dictionary mapping header string to index of column in matrix data

- Here are the accessors:
- get_headers (list of headers of columns with numeric data)
- get_num_columns: returns the number of columns of numeric data
- get_row: take a row index and returns a row of numeric data
- get_value: takes a row index (int) and column header (string) and returns the data in the numeric matrix.
- get_data: At a minimum, this should take a list of columns headers and return a matrix with the data for all rows but just the specified columns. It is optional to also allow the caller to specify a specific set of rows.

- Test your new methods (you get to write this code.).

- Since our application will allow multiple open data files at a time, we need a way to uniquely identify a column of data. We can do so by specifying the Data object that contains the column and the column's header.
Within your data.py file, create a second class called DataColID. A DataColID object should contain a reference to a Data object and then a header. Create appropriate accessors/mutators for the class.

- Create an analysis.py file. All analysis functions will take lists of DataColID objects to specify what (numeric) data to analyze. Inside your analysis file, create the following three functions.
- data_range - Takes in a list of DataColID objects and returns a list of 2-element lists with the minimum and maximum values for each column. The function is required to work only on numeric data types.
- mean - Takes in a list of DataColID objects and returns a list of the mean values for each column. Use the built-in numpy functions to execute this calculation.
- stdev - Takes in a list of DataColID objects and returns a list of the standard deviation for each specified column. Use the built-in numpy functions to execute this calculation.
- normalize_columns_separately - Takes in a list of DataColId objects and returns a matrix with each column normalized so its minimum value is mapped to zero and its maximum value is mapped to 1.
- normalize_columns_together - Takes in a list of DataColId objects and returns a matrix with each entry normalized so that the minimum value (of all the data in this set of columns) is mapped to zero and its maximum value is mapped to 1.

Test your new methods. In addition to your own careful debugging code, you may use lab2_test2.py which produces this for testdata1.csv.

- Find your own data set. Put it into a .csv file and convince me that your Data class can read it in properly. One thing you could do is open the .csv file with Excel, compute the mean and standard deviation using Excel, and then verify that the mean and standard deviations that you calculate with your analysis functions are the same.

### Extensions

- Expand the notion of numeric data to include dates. Dates can be converted into numeric data by having each data represent the number of days it is past some date indicating the "beginning of time". To do that, you will need to convert the data from a string to Python's date or datetime type.
- Do the above extension, and make it work for multiple formats for dates.
- Expand the notion of numeric to include enumerated data. Enumerated types can be converted into numeric data. Using a dictionary, you can parse through the raw data, using the raw strings as dictionary keys. Give the first key the value 0, give the second unique key the value 1, and so on, incrementing the counter with each novel key. Create the numeric version of the enumerated type by going through the column and replacing the enumerated value key with its index. Keep the conversion dictionary in your Data class, because you may want to let the user choose from the set of keys.
- Add a method
`addColumn`to add a column of data to the Data object. It will require a header, a type, and the right number of points (one for each row). You will need to add it to the raw_data, and, possibly to the numeric data. And you will need to update the list of headers, etc.

### Writeup and Hand-in

- Make a wiki page for the project writeup. On it, describe your DataSet class API, with brief descriptions of all the functions, their inputs, outputs, and purpose.
- Describe in your writeup how you store the data internally in your Data class, noting how you deal with each different type of data.
- Include descriptions of any extensions you created (and please make it explicit that they are extensions).
- Once you have written up your assignment, give the page the label:
cs251s14project2

- Put your code in your private subdirectory in the CS251 folder on the Courses server.