Due: , 11:59 pm
For project 4, we're going to move to the world of ecological modeling. In particular, we're going to model the change in the population of Galapagos Penguins over time, given certain ecological pressures. Our goal is to model the risk of extinction over some number of years. The model we will use is fairly simple, but you can choose to add complexity to the model once the basic system is working.
One new aspect for this project is the use of random values in the simulation. Because the world doesn't exhibit strong regularity, we're going to model this uncertainty by using a random number generator. For example, El Niño, an important weather factor in Galapagos Penguin survival, does not occur on a perfectly regular schedule, but it does tend to occur at least once every 5-7 years. We can model this by assigning a 1.0/5.0 or 1.0/7.0 probability of an El Niño to any given year. Then, each year of the simulation, we effectively roll a 5-sided or a 7-sided die and declare an El Niño year if it comes up with a 1. If the year is an El Niño year, then the penguin population drops significantly. If it is not an El Niño year, the population sees a small amount of growth.
Because of the random variables in the model, each time we model 100 years the computer will generate a different result. In one simulation, the population may go extinct in 20 years, in another simulation it may not go extinct at all. In order to achieve a reasonable estimate of the true probability of extinction over 100 years, we have to run the simulation many, many times and aggregate the results. For example, the probability of extinction in 100 years would be calculated as the number of simulations where the population goes extinct, divided by the total number of simulations. Researchers using simulation to evaluate the risk of extinction in actual populations might run the simulation a million times or more in order to get a stable estimate of extinction risk. Stability, in this case, means that if you run the simulation 1,000,000 times, then the difference between that average and the average of a different run of 1,000,000 simulations will be small.
To design the simulation, we're going to use hierarchical, modular design to keep each function simple to write and debug. Think about the project as having layers. The top layer is the controlling function and contains three parts. The first part sets up the parameters for the simulation. The second part runs the complete simulation many times and stores the results. The third part analyzes the results and generates the desired outputs.
The next layer handles a single simulated year of what will eventually be a multi-year simulation. This layer takes in the current population and the ecological parameters, and it determines how many individuals are in the next year's population. It then returns the new population.
The final layer handles running a multi-year simulation, one designed to simulate numYears years of population dynamics. This full simulation function has two main parts. First, it sets up the initial population, then it simulates up to numYears years using a loop; it repeatedly uses the layer that simulates a single year, once for every year to be simulated. If the population goes extinct before numYears years have completed, it stops simulating and returns the year in which the population went extinct. Otherwise, it simulates all numYears years, returning numYears as its return value.
By designing the simulation in this manner, each function is well-defined and simple to write and debug. The tasks below provide more details on each level and the overall task.
If you haven't already set yourself up for working on the project, then do so now:
Create a new file penguin.py. For this week, all of the simulation functions will be in this file, though you can use the functions in your stats.py file to analyze the results, if you choose to pursue one of those suggested extensions.
Create a function initPopulation() that takes in two parameters: the initial population size (call it size), and the probability of an individual being female (give that parameter a good, descriptive variable name in your code, such as probFemale). The function should return a list of the specified size. Note: Each entry in the list will be either an f or an m.
The function should first initialize a variable (e.g., population) to the empty list. Then it should use a loop to append size items to this list. Each time through the loop it should generate a random number using random.random(), which generates values between 0 and 1. If the value is less than the probability of an individual being female, append an f to the population list. Otherwise, append an m to the population list. Finally, return the population list.
You can use the following test function to see if your initPopulation() function is doing the right thing. It should print out a list of 10 individuals with an appropriate mix of f and m elements. Run it a few times to make sure you're getting randomized lists.
# test function for initPopulation def test(): popsize = 10 probFemale = 0.5 pop = initPopulation(popsize, probFemale) print(pop) if __name__ == "__main__": test()
Create a function simulateYear() with six parameters:
The first step in the function is to determine if it is an El Niño year. Set a variable (e.g., elNinoYear) to False. Then compare the result of random.random() to the El Niño probability. If the random.random() result is less, then set elNinoYear to True.
The second part of the function is a loop over the existing population list. Before starting the loop, create a newpop list and set it to the empty list. Inside the loop, implement the following algorithm.
# for each penguin in the original population list # if the length of the new population list is greater than or equal to maxCapacity # break, since we don't want to make any more penguins # if it is an El Nino year # if random.random() is less than the El Nino growth/reduction factor # append the penguin to the new population list # else # append the penguin to the new population list # if random.random() is less than the standard growth factor - 1.0 # if random.random() is less than the probability of a female # append an 'f' to the new population list # else # append an 'm' to the new population list
Finally, return the new population list.
To test your simulateYear() function, add the following code to your test function.
newpop = simulateYear(pop, 0.5, 1.0, 1.188, 0.4, 2000) print("El Nino year") print(newpop) newpop = simulateYear(pop, 0.5, 0.0, 1.188, 0.4, 2000) print("Standard year") print(newpop)
You should see the population reduce to 3-6 individuals in the El Niño year and grow to 11-14 individuals in the standard year. Run it a few times to see the variation.
The next step is to write the runSimulation() function. This function takes in 8 parameters, described below in this section.
def runSimulation( numYears, initPopSize, probFemale, elNinoProb, stdRho, elNinoRho, maxCapacity, minViable ):
The function should first assign to a variable (e.g., population) the result of calling initPopulation with the appropriate arguments. Then it should set another variable (e.g., endDate) to numYears (the number of years to run the simulation). The variable endDate will be what the function returns.
The main part of the function should be a loop that runs numYears times. Each time through the loop it should call simulateYear() with the appropriate arguments, assigning the result to a new list variable (e.g., newPopulation). Then it should test if there are enough individuals in the new population list to be a viable population. It should also test if there are no males or no females in the new population list. If any of these tests fails, then set endDate to the loop variable value and break out of the loop. The final step in the loop is to assign to the population variable the newPopulation variable, replacing the old population with the new one.
If you want to graph or view the population numbers over time, print out the length of the population list at the end of each loop. Alternatively, you could save the numbers to a file.
Add some test code to your test function that calls runSimulation() a few times with some appropriate arguments. Make sure the initPopSize argument is larger than the minViable argument.
Default values for the simulation are as follows:
numYears | 201 | |
---|---|---|
Initial Population Size | 500 | |
Probability of Females | 0.5 | |
Probability of an El Niño | 1.0/7.0 | |
Growth Factor in a standard year | 1.188 | |
Growth Factor in an El Niño year | 0.41 | |
Maximum Carrying Capacity | 2000 | |
Minimum Viable Population | 10 |
Use these values unless any of them are superseded by user input. (For example, in the next section, you'll be instructed to write code that will take command-line input containing the number of years between El Niño events. In that case, use the input value to come up with the probability of an El Niño, and disregard the default.)
The top level function is your main function. Give it one argument, argv, which (as demonstrated in lab) will be the list of strings from the command line.
Start your main function by testing if there are three arguments on the command line. The first argument will be the name of the program, the second should be the number of simulations to run, and the third should be the typical number of years between an El Niño event. If there are less than three arguments (len(argv) < 3), print out a usage statement and exit.
After the test for arguments, cast the second argument (argv[1]) to an integer and assign it to a variable that specifies the number of simulations to run (e.g., numSim). Cast the third argument (argv[2]) to an integer and assign it to a variable that specifies the typical number of years between an El Niño event.
In the next section of your program, create variables for each of the simulation parameters in the table above not already assigned and give them the default values. You should also create a variable to hold the results of the simulations and initialize it to the empty list.
Loop over the number of simulations. Inside the loop, append to your result list the result of calling runSimulation() with the appropriate arguments.
Finally, count how many of the results in your result list are less than numYears (the number of years to simulate). Divide that count by the total number of simulations and print out the result. This is the overall probability that the penguins will go extinct within the next numYears years.
Run your program using 100 simulations and see what you get. Does it change a lot? Run it with 1000 simulations (which might take a little while) and see if the results are more stable.
The next task is to write a function that takes the list of results, which is a set of numbers indicating the last year in which the population was viable, and convert it to a cumulative extinction probability distribution (CEPD). The CEPD will be a list that is as long as the number of years in the simulation. Each entry Y in the CEPD is the number of simulations where the population has gone extinct by year Y divided by the total number of simulations.
The computeCEPD() function should take two arguments: the list of results from runSimulation() (a list of integers), and the number of years the simulation ran (numYears).
The computeCEPD() function has three parts.
Add a call to the computeCEPD() function to the end of your top level function. Then have your top level function print out every 10th entry in the CEPD list (remember how the range function works). Run your penguin simulation 1000 times using the standard parameters. Repeat the process three times and show the results in your writeup. How much variation is there in the results?
The final step is to use your simulation to compare the extinction distribution curves--the data from the previous section, showing number of simulations going to extinction for each year simulated--for a 3-year El Niño cycle, a 5-year El Niño cycle, and a 7-year cycle. What do your results indicate?
Each assignment will have a set of suggested extensions. The required tasks and writeup constitute about 86% of the assignment. To earn a higher grade, you need to undertake one or more extensions. The difficulty, quality, and creativity of the extension or extensions will determine your final grade for the assignment. One complex extension, done well, or 2-3 simple extensions are typical.
These are only examples to help you start thinking of the unlimited possible ways you could extend the project. You are strongly encouraged to design your own extensions to suit your interests and show off your computational thinking skills.
Whichever extensions you choose, be sure to discuss your motivation, design process, implementation, and results in the writeup. A screenshot of your results is usually a great idea.
Add other model parameters to the command line. The complex version of this extension is to have flags for your program so that the user can use a flag, like -E, to specify a given parameter. For example, running
python3 penguin.py -E 5
would modify the El Niño frequency but leave all other parameters at their default values.
Turn in your code by putting it into your private hand-in directory on the Courses server. All files should be organized in a folder titled Project4 and you should include only those files necessary to run the program. We will grade all files turned in, so please do not turn in old, non-working, versions of files.
Make a new wiki page for your
assignment. Put the label cs152s19project4
in the label field on the
bottom of the page, and give the page a meaningful title (e.g., Eric's
Project 4).
In general, your intended audience for your write-up is your peers not in the class. Your goal should be to be able to use it to explain to friends what you accomplished in this project and to give them a sense of how you did it. Follow the outline below.
Thanks to Cathy Collins for the project idea and documentation. The original project concept and idea came from "Problem-Solving in Conservation Biology and Wildlife Management, 2nd Edition", by James P. Gibbs, Malcolm L. Hunter, Jr., Eleanor J. Sterling.
© 2019 Eric Aaron (with contributions from Colby CS colleagues).