Project 3: Calculating Thermoclines
For this project you will be building a library of useful functions as well as a more general program that will be useful for computing statistics of data in a CSV file.
The final task will involve computing the depth of the thermocline on Great Pond and plotting it for the month of July. The thermocline is the depth at which there is the largest difference in water density, with a layer of denser water below and a layer of less dense water above.
- Set up your workspace
If you have not already done so, make a new project 3 directory. Open a terminal and cd to the directory.
- Write your library of useful statistical functions
In the stats.py file you started in lab, write the following four functions. Each function should have a single parameter, which should be a list of numbers. The function should loop over the numbers in the list, compute the given statistic, and return it.
- mean(data) - computes the mean of the list of data.
- min(data) - computes the min of the list of data.
- max(data) - computes the max of the list of data.
- variance(data) - computes the variance of the list of data.
+ (more detail)
To compute the mean, sum the values in the list and then divide by the length of the list. The length of a list is returned by the len() function. So if you have a list data the number of elements is len(data)
To compute the variance use the formula:
First compute the mean (you can use your mean function), then compute the sum of the squared differences between each data point and the mean using a for loop. Finally, divide by N-1.
Use your test function in stats.py to check that each function is working correctly. In your report, indicate whether these four functions worked properly and, briefly, how you made that determination.
- Write a program to compute statistics of a column of data
Using your analyze.py file from the lab, update it so that it computes the sum, mean, variance, max, and min statistics for the selected column of data from the specified file and prints the statistics to the terminal.
Calling analyze.py with the hurricanes.csv file and column 1 should produce the following statistics.
sum : 103.00 mean: 7.36 var : 12.55 min : 2.00 max : 15.00
Demonstrate that your program works with a column from your extracted Goldie-MLRC July data file from project 2.
- Calculate the thermocline depth in Great Pond for July 2019
The next task will be to write a program that computes the depth of the thermocline on Great Pond for each day in July. The thermocline is the depth at which the water density changes most quickly, creating a layer of colder, denser water below a layer of warmer water that tend not to mix. Overall, you will write a function that computes the density of water given a list of temperatures, a function that computes the depth of the maximum change in density, and then a top-level function that reads in the data file and guides the computation.
Create a new file, thermocline.py. Put your name, date, and class at the top, along with a comment indicating what the program will do (compute the thermocline).
- Convert temperatures to densities
Write a function to convert a list of temperatures to a list of densities
Write a function density that takes in one parameter, temps that is a list of temperatures. The function should first create a new empty list to hold density values, rhos. Then, it should loop over the temps list and for each temperature value compute the density using the following equation.
rho = 1000 * (1 - (t + 288.9414) * (t - 3.9863)**2 / (508929.2*(t + 68.12963)))
It should then append the computed density to a list. Finally, it should return the list of densities.
Test your function using this test file. It should print out the following if your density function is working correctly.
24.47 -> 997.21 23.95 -> 997.34 24.41 -> 997.22 23.81 -> 997.37 19.92 -> 998.25 16.88 -> 998.82 14.06 -> 999.26 11.56 -> 999.57 9.82 -> 999.74 9.13 -> 999.80 8.82 -> 999.82
- Compute the derivative of the densities
The next step is to add a function to thermocline.py that computes the derivative of density with respect to depth, or how fast the density is changing as you get deeper. The function will take in two lists: one is the set of temperatures, the other is the set of corresponding depths. The function will return one value: the depth of the maximum change in density. The algorithm below gives the function.
def thermocline_depth( temps, depths ): # assign to rhos the result of calling the density function with temps as the argument # assign to drho_dz the empty list # loop for one less than the length of rhos # append to drho_dz the quantity rhos[i+1] minus rhos[i] divided by the quantity depths[i+1] minus depths[i] # optional step: print out temps[i], rhos[i], and drho_dz[i] # assign to max_drho_dz the value -1.0 # assign the maxindex the value -1 # loop for the length of drho_dz (loop variable i) # if drho_dz[i] is greater than max_drho_dz # assign to max_drho_dz the value drho_dz[i] # assign to maxindex the value i # assign to thermoDepth the average of depths[maxindex] and depths[maxindex+1] return thermoDepth
Test your thermocline_depth function using this test file. It should return a depth of 6.0m (note that the maximum change of 0.44 at that depth -- you do not need to report this, but if you run into problems, knowing the maximum change is supposed to be 0.44 may help you debug).
- Compute the thermocline for each day in July
The final step is to write the main function that reads in data from the buoy file, extracts all of the temperature fields in order, computes the thermocline_depth and either prints the day and thermocline_depth value or saves them to a CSV file.
You can use this Goldie data file for this task. The file includes all of the data fields for the month of July and a single header line. The fields indexes for the depths (m) [1, 3, 5, 7, 9, 11, 13, 15] are [10, 11, 16, 17, 15, 14, 13, 12]. You may want to double-check the field numbers before starting by look at the header line.
+ (more detail)
The algorithm given below is not strictly line by line. Each comment will correspond to one or more lines of Python.
def main(): # these are the fields corresponding to the temperatures in order by depth # note they use 0-indexing fields = [10, 11, 16, 17, 15, 14, 13, 12] # these are the depth values for each temperature measurement depths = [ 1, 3, 5, 7, 9, 11, 13, 15 ] # open the data file and read past the header line # assign to day the value 0 # read the first data line # start a while loop while there is still data # split the line on commas and assign it to words # if the time is about noon (12:03:00 PM) # add one to the day variable # assign to temps the empty list # loop over the number of items in depths (loop variable i) # append to temps the result of casting words[ fields[i] ] to a float # assign to thermo_depth the result of calling thermocline_depth with temps and depths as arguments # print (or save to a file) the day of the month and thermo_depth separated by a comma # update line with readline return if __name__ == "__main__": main()
Run your program and create a plot of the results with day on the x-axis and thermocline depth on the y-axis. Include this plot in your report
- What is a command-line argument, and why are they useful?
- What is the difference between a while loop and a for loop in Python?
- What is the difference between = and == in Python?
- Within a function, how would you control the number of times a for loop executes using a function parameter? Give an example.
Extensions are your opportunity to customize your project, learn something else of interest to you, and improve your grade. The following are some suggested extensions, but you are free to choose your own. Be sure to describe any extensions you complete in your report.
- Write functions in your stats.py file to compute more types of statistics.
- Use your code compute statistics on a data set of your own choosing.
- Compare different times or time periods in the Goldie data.
- Add more command-line control options, such as specifying what time of day to compute the thermocline.
- Explore how the thermocline changes and why. What are the min and max thermocline values for July? What if you graph wind direction and thermocline together, is there a relationship?
- Automate the process of making a graph from data.
Submit your code
Turn in your code (all files ending with .py) by putting it in a directory in the Courses server. On the Courses server, you should have access to a directory called CS152, and within that, a directory with your user name. Within this directory is a directory named Private. Files that you put into that private directory you can edit, read, and write, and the professor can edit, read, and write, but no one else. To hand in your code and other materials, create a new directory, such as project1, and then copy your code into the project directory for that week. Please submit only code that you want to be graded.
When submitting your code, double check the following.
- Is your name at the top of each code file?
- Does every function have a comment or docstring specifying what it does?
- Is your handin project directory inside your Private folder on Courses?
Write your project report
For CS 152 please use Google Docs to write your report. Create a new doc for each project. Start the doc with a title and your name. Attach the doc to your project on Google classroom. Make sure you click submit when you are done. The graders cannot provide feedback unless you click submit.
Your intended audience for your report is your peers not in the class. From week to week you can assume your audience has read your prior reports. Your goal should be to be able to use it to explain to friends what you accomplished in this project and to give them a sense of how you did it.
Your project report should contain the following elements.
A brief summary of the project, in your own words. This should be no more than a few sentences. Give the reader context and identify the key purpose of the assignment.
Writing an effective abstract is an important skill. Consider the following questions while writing it.
- Does it describe the CS concepts of the project (e.g. writing well-organized and efficient code)?
- Does it describe the specific project application?
- Does it describe your the solution or how it was developed (e.g. what code did you write)?
- Does it describe the results or outputs (e.g. did your code work as expected)?
- Is it concise?
- Are all of the terms well-defined?
- Does it read logically and in the proper order?
- A description of your solution to the tasks, including any text output or images you created (including the three required images mentioned above). This should be a description of the form and functionality of your final code. Note any unique computational solutions you developed or any insights you gained from your code's output. You may want to incorporate code snippets in your description to point out relevant features. Code snippets should be small segments of code--usually less than a whole function--that demonstrate a particular concept. If you find yourself including more than 5-10 lines of code, it's probably not a snippet.
- A description of any extensions you undertook, including text output or images demonstrating those extensions. If you added any modules, functions, or other design components, note their structure and the algorithms you used.
- The answers to any follow-up questions (there will be 3-4 for each project).
- A brief description (1-3 sentences) of what you learned. Think about the answer to this question in terms of the stated purpose of the project. What are some specific things you had to learn or discover in order to complete the project?
- A list of people you worked with, including TAs and professors. Include in that list anyone whose code you may have seen, such as those of friends who have taken the course in a previous semester.