Due: , 11:59 pm
The goal of this week's project is to start building more general functions that may be useful in many contexts. We're going to start using a concept called modular design where we build more complex programs on top of simpler functions.
The purpose of this lab time is to build a set of general functions in a library and then use that library in a separate Python file. This will be the first time you call a function from another Python file that you wrote by importing one file into another. You will also do a little experimentation with lists, which are a data structure for holding sequences of values.
For the project, you'll continue to work with the buoy data, doing some more sophisticated calculations and analyses, aided by our function library.
Start by creating a project3 directory to work in on the Personal server. Navigate to this directory in the Terminal.
Lists are an important data structure in Python. A list is an ordered sequence of values. In Python the values of a list can be of any type, including other lists, and the list does not have to all be of the same type (that is not necessarily the case for all programming languages). For this assignment, we'll mostly be dealing with lists of numbers.
Create a new file in TextWrangler and put the following assignment into it.
a = [5, 3, 6, 1, 2]
In Python, a list is delimited by square brackets, and the elements of the list are separated by commas. To access one element of a list, use syntax called bracket-notation. The first element of a list has index zero. To access the first element of a list named a:
Add two more lines to your file to print out the 5 and the 1 (the first element in the list and the fourth element in the list).
To add items to a list, you use the append method. That means that you
use the name of the list and then add
.append() with the item to
append inside the parentheses. For example, add the following two
lines to your Python code:
You should now see a 7 at the end of the list.
To change the value of an element of a list, you use an assignment. Use bracket notation to specify in which element of the list to store the new data. For example, the following changes the first element of the list to a 4 and then prints the updated list.
a = 4
Try editing two other locations in the list in your Python file, then print out the list and make sure it did what you think it should have.
So far, all of our programs have processed data while reading it from an incoming data stream. For this project, we will instead store all of the incoming data in a list, and then we will process the data in the list. The idea is straightforward: start with an empty list, and then as your program reads through the input stream, store each successive value in the list.
Start a new Python file called storedata.py. As always, put your name, date, and course and project information in comments at the top of the file. Then copy and paste the following algorithm, and use the comments to guide you as you fill in the code.
import sys def main(stdin): # assign to mylist the empty list  # assign to buf the result of calling readline on stdin # while buf.strip() is not equal to the empty string # append to mylist the result of casting buf to a float # assign to buf the result of calling readline on stdin # print mylist return if __name__ == "__main__": main(sys.stdin)
Download this file and test your program using the following command.
cat 2015-08-01-dosat.csv | python3 storedata.py
Take a look at the data file and make sure you're happy with the result.
Now we're going to create a file that holds a library of useful functions. In Python, a library (or "package", or "module") is a file that contains functions. When you import the module into another Python file, you can use those functions. We've already done this by importing the sys package into our programs, which enables us to use the stdin functions. Now we're going to create our own module and then import it into other Python files.
Create a new file called stats.py. Put your name, date, and course and project information at the top of the file.
Create a new function called sum() that takes one argument. You can assume that argument will be a list of numbers. This function should add together all of the values in the list and return the sum.
The algorithm is as follows. Create a variable to hold the sum and initialize it to 0.0 (explicitly make it a floating point number). Then loop over the list provided as the function parameter. Inside the loop, add each number to the variable holding the sum. Once the loop is complete, return the sum.
To test your function, make a second function called test() at the bottom of your stats.py file. The function does not require any arguments. As the first instruction in the test function, assign the list [1, 2, 3, 4] to a variable. For the second instruction assign to a second variable the result of calling sum with the list as the argument. For the third instruction, print out the variable holding the result. Run your program and make sure you get the value 10.0 as an output.
Put the following at the end of your stats.py file to call the test function only when you run the stats.py file directly. The test function will not run when stats.py is imported into another Python file.
if __name__ == "__main__": test()
Create another function called mean() that computes the mean value of the numbers in the list. The only difference between the mean() function and the sum() function is that you want to return the sum divided by the number of values in the input list. Use your sum() function to compute the sum. You can use the len() function to get the number of elements in a list. Add a test of your mean function to the test function at the bottom of the file. Run it and make sure you get the right answer (2.5).
Create two more functions, max() and min() that compute the minimum and maximum values in the list. This algorithm has the following steps:
The min() function is identical except you want to update minval only if the value in the list is less than the smallest value seen so far.
Test your max() and min() functions by adding the appropriate code to your test function. Make sure the answers you get make sense.
Create two more functions, variance() and stdev(), that compute the variance and standard deviation of the values in the list, respectively. The variance is defined as the following:
You can use the mean() function to calculate the mean x (pronounced "x bar"), then use a loop over the list of numbers (the xi) to compute the sum of the squares of the difference between each number and the mean. (The mean x remains constant throughout the loop.) Dividing this sum by the number N of items in the list yeilds the average distance of each list item from the whole list's mean value, which is one acceptable variance calculation. However, the variance equation above uses the divisor N-1 (referred to as Bessel's Correction), a common method of bias correction. The resulting quotient is the unbiased sample variance calculation you will be expected to implement in this class.
Note that a summation in mathematical sigma notation is simply a loop over the list of numbers, summing the calculated value each time through the loop.
The standard deviation is the square root of the variance. You
can call the variance function from the standard deviation
function and return the square root of the variance. Note how
you are building the standard deviation function on top of two
layers of other functions: the stdev function calls the variance
function, which calls the mean function, which calls the sum
function. This avoids code
duplication, which reduces the likelihood of coding errors, and
makes your coding task faster and easier. Also, note that you can
use math.sqrt() to compute the square root
import math first).
Test your variance and standard deviation functions by adding code to your test function. You should get 1.67 for the variance and 1.29 for the standard deviation (rounding to two decimals).
The last step in the lab is to import your stats library into another
file and use the functions. Reopen your storedata.py file. At the
top, after the
import sys, put the following line of code:
To call a function in your stats library, all you have to do is
stats. in front of the name of the function you want to
use. Calculate the mean of the values in the list you read from stdin.
To do this, after the line in the function that prints the list,
assign to a variable the result of calling mean() from your
stats library with the list of values as the argument. Then print out
If you run your storedata.py program using the same command as above in Section 2, you should get 21.071875 as the answer.
When you are done with the lab exercises, you may start on the rest of the project.
© 2019 Eric Aaron (with contributions from Colby CS colleagues).