## Objectives

For this project you will be writing a library of useful functions as well as a set of more general programs that will be useful for computing simple statistical functions of a data stream. The final task will involve computing the depth of the thermocline on Great Pond. The thermocline is the depth at which there is the largest difference in water density, with a layer of denser water below and a layer of less dense water above.

### Setting Up

If you haven't already set yourself up for working on the project, then do so now:

1. Mount your directory on the Personal server.
2. Open the Terminal and navigate to your Project3 directory on the Personal server.
3. Open TextWrangler. If you want to look at any of the files you have already created, then open those files.

### Temperature Conversion

If you did not do so in lab, complete your stats functions in stats.py so that you have functions for computing each of max, min, sum, mean, variance, and standard deviation for a list of single values. Make sure each function, except max() and min(), computes and returns a floating point value. You can assume the input is a list of integers or floats.

Add to your stats functions a function that converts Celsius to Fahrenheit. The function should take in a single parameter -- temperature in Celsius -- and return a single value -- temperature in Fahrenheit. Call the function celsius2fahrenheit().

### Temperature Statistics

Write a Python program, computeStats.py, that reads a single stream of numbers, stores them in a list, then uses the library of functions you wrote in the lab to calculate the min, max, mean, and standard deviation and prints the values to the terminal.

Use your program, plus appropriate grep and cut commands, to compute these values from the LEA buoy data for temperature at 1m, temperature at 5m, and one other variable of your choice for the first week of June (i.e. June 1-7, 2016). Repeat the exercise for the first week of July and then for the first week of August 2016. Report these values in your report. Be sure to put the whole command for running your program in comments at the top of your Python program.

When computing the max, min, and mean of temperature variables, use the Celsius to Fahrenheit function to display the temperatures in both systems. You can choose how to implement this.

### Thermocline Depth Calculation

The next task will be to write a program that computes the depth of the thermocline on Great Pond. The thermocline is the depth at which the water density changes most quickly, creating a layer of colder, denser water below a layer of warmer water that tend not to mix. Overall, you will write a function that computes the density of water given a list of temperatures, a function that computes the depth of the maximum change in density, and then a top-level function that reads in a stream of data and sets up the computations.

1. Create a new file, thermocline.py. Put your name, date, and class and project information at the top, along with a comment indicating what the program will do (compute the thermocline).
2. Write a function density() that takes in one parameter, temps, that is a list of temperatures. The function should first create a new empty list to hold density values, rhos. Then, it should loop over the temps list and for each temperature value t compute the density using the following equation.

``` rho = 1000 * (1 - (t + 288.9414) * (t - 3.9863)**2 / (508929.2*(t + 68.12963)))```

It should then append the computed density to the rhos list. Finally, it should return the list of densities (rhos).

Test your function using this test file. It should print out the following if your density function is working correctly.

```24.47 -> 997.21
23.95 -> 997.34
24.41 -> 997.22
23.81 -> 997.37
19.92 -> 998.25
16.88 -> 998.82
14.06 -> 999.26
11.56 -> 999.57
9.82 -> 999.74
9.13 -> 999.80
8.82 -> 999.82
```
3. The next step is to add a function to thermocline.py that computes the derivative of density with respect to depth, or how fast the density is changing as you get deeper. The function will take in two lists: one is the set of temperatures, the other is the set of corresponding depths. The function will return one value: the depth of the maximum change in density. The algorithm below gives the function.
```def thermocline_depth( temps, depths ):

# assign to rhos the result of calling the density function with temps as the argument
# assign to drho_dz the empty list

# calculate the first derivative of density
# loop for one less than the length of rhos
# append to drho_dz  the quantity rhos[i+1] minus rhos[i] divided by the quantity depths[i+1] minus depths[i]
# optional step: print out temps[i], rhos[i], and drho_dz[i]

# assign to max_drho_dz the value -1.0
# assign the maxindex the value -1
# loop for the length of drho_dz (loop variable i)
# if drho_dz[i] is greater than max_drho_dz
# assign to max_drho_dz the value drho_dz[i]
# assign to maxindex the value i

# assign to thermoDepth the average of depths[maxindex] and depths[maxindex+1]

return thermoDepth
```

Test your thermocline_depth function using this test file. It should return a depth of 6.0m (note that the maximum change of 0.44 at that depth -- you do not need to report this, but if you run into problems, knowing the maximum change is supposed to be 0.44 may help you debug).

4. The final step is to write the main function that reads in data from the buoy file through stdin, extracts all of the temperature fields in order, computes the thermocline depth and prints it out. The program should also keep track of the minimum and maximum depth of the thermocline over the range of measurements provided and print out those values at the end along with when those minimum and maximum events occurred.

The algorithm given below is not line by line. Each comment will correspond to one or more lines of Python.

```def main(stdin):

# these are the fields corresponding to the temperatures in order by depth
# note the 0-indexing
fields = [8, 11, 14, 17, 20, 23, 26]

# these are the depth values for each temperature measurement
depths = [ 1, 3, 5, 7, 9, 11, 13 ]

# create variables to hold the max depth, min depth, the datetime
# of the max depth and the datetime of the min depth.  Give them
# reasonable initial values (a small value for max depth, a large
# value for min depth, and empty strings for the datetime variables.

# assign buf the first line of stdin and then start the standard while loop until buf is empty
# split buf on commas and assign it to words

# assign to datetime the value in words[0]
# assign to temps the empty list

# loop over the number of items in depths (loop variable i)
# append to temps the result of casting words[ fields[i] ] to a float

# assign to depth the result of calling thermocline_depth with temps and depths as arguments

# test if depth is greater than maxdepth and update maxdepth and maxtime if it is

# test if depth is less than mindepth and update mindepth and mintime if it is

# print out the datetime value and the depth value, separated by commas

# update buf with the next readline from stdin

# print out the minimum and maximum thermocline depth and the corresponding date/time

if __name__ == "__main__":
main(sys.stdin)
```

Test your program using the following command.

`curl http://cs.colby.edu/courses/F18/cs152-labs/3100_iSIC.csv | grep '6/2/2016' | grep -e ' 1:00' -e ' 2:00' | python3 thermocline.py`

### Data Processing

For the final task, compute the thermocline depth on hourly intervals (i.e. you should grep for the data points that are "on the hour") for the month of June and save the results to a file. For this exercise, we want just the dates and thermocline depths, so be sure to comment out any additional print statements (such as those printing the minimum and maximum.) Use grep to select the lines you need and pipe it to the thermocline.py file to compute the thermocline values. Direct the output to a new file 2016-06-thermo.csv. This should be a file with two columns: datetime, and thermocline depth.

Use the cat command to direct the contents of 2016-06-thermo.csv to stdout, pipe that to cut to get the second field, and then pipe that to your computeStats.py file to get the min, max, mean, and standard deviation of the thermocline (which is in units of meters) for the month of June.

Include this final output in your writeup.

### Extensions

Each assignment will have a set of suggested extensions. The required tasks and writeup constitute about 83% of the assignment. If you do only the required tasks and writeup -- and to them well -- you will earn a B. To earn a higher grade, you need to undertake one or more extensions. The difficulty and quality of the extension or extensions will determine your final grade for the assignment. One complex extension, done well, or 2-3 simple extensions are typical.

These are only examples to help you start thinking of the unlimited possible ways you could extend the project. You are strongly encouraged to design your own extensions to suit your interests and show off your computational thinking skills.

Whichever extensions you choose, be sure to discuss your motivation, design process, implementation, and results in the writeup. A screenshot of your results is usually a great idea.

• Write a more general program for taking data at one sampling rate, for example 5min intervals, and converting it to an hourly sampling rate. Here is how to approach it:

Given a stream of data from stdin, the data is formatted such that the first field is a date/time field and the remaining data consists of comma separated numbers. The output of the program should be a stream of data to stdout with the date/time field first, followed by the same number of fields as the input as comma separated numbers. The numbers should be the hourly averages on the top of the hour.

To accomplish this task, you will need to figure out how many numbers are on each line and then build a list of lists, with one sublist to hold each column of numbers between hour intervals. As it loops, it stores the values from each column of data into their corresponding sublist. When the algorithm hits the top of the hour, it calculates the mean of each sublist and prints them out along with the date/time field.

• Generalize your convert-to-hour program to let you specify the interval you want to average. A simpler alternative is to write a different function such as convert2day that computes the average for each day in the input stream.

• Formulate a hypothesis about the average or median values of Avg AirTemp from month to month, from April through November, then analyze the data to see if they support your hypothesis. Do they?

One part of this data analysis would be to exclude data from faulty sensor readings in your calculations. For example, at 11/1/2016, 7:30, the Avg AirTemp reading is not something that is physically plausible. In your code, identify when faulty sensor data occur and make sure not to use them in your data analysis; in your writeup, however, be sure to state what, if any, data were excluded from your calculations.

• Write a general function that takes in date/time and another variable. Have your function compute the max and min for each day in the input stream.

• Using a tool of your choice, create graphs/plots of data that is an output of running your own code.

### Write Up & Hand In

Turn in your code by putting it into your private hand-in directory on the Courses server. All files should be organized in a folder titled "Project3" and you should include only those files necessary to run the program. We will grade all files turned in, so please do not turn in old, non-working, versions of files.

Make a new wiki page for your assignment. Put the label `cs152f18project3` in the label field on the bottom of the page, and give the page a meaningful title (e.g. Eric's Project 3).

In general, your intended audience for your write-up is your peers not in the class. Your goal should be to be able to use it to explain to friends what you accomplished in this project and to give them a sense of how you did it. Follow the outline below.

• Title includes your name and clearly describes the project.
• Section headings are used to delineate distinct sections of the report.
• Abstract identifies key lecture concepts (e.g. code structures, data types, and libraries) relevant to the project.
• Abstract explains why key lecture concepts are important to achieving project goals.
• Abstract identifies program output(s), giving context to the project tasks.
• Solutions to tasks are described, focusing on how you used key lecture concepts to solve each task.
• Required images/outputs are present and clearly labeled.
• Reflection at the end of the report addresses how the lecture concepts mentioned in the abstract made this project possible. If you can think of a more elegant way to achieve the same results, please share!
• Sources, imported libraries, and collaborators are cited, or a note is included indicating that none were referenced.
• Double-check the label. When you created the page, you should have added a the label `cs152f18project3`. Make sure it is there.

© 2018 Eric Aaron (with contributions from Colby CS colleagues).