CS 336: Project #9

We Can't Get Enough of Fish Schooling on the GPU

Log onto one of the computers hosting a GPU. (gpu0, gpu1, gpu2, or gpu3). e.g.

ssh -X gpu0.cs.colby.edu

You can use XEmacs to edit files here. Notice that you need to use the -X flag for X-forwarding instead of -Y.

In this project, you will be finishing up your CUDA C version of the fish-schooling simulations. The goal of the project is to run lots of simulations, time them, and look examine the output (i.e. determine how r affects elongation and polarity for multiple values of r, and how that changes with fish-school size).

  1. In kernel.cu, write computeElongation and computePolarization so that they work on the device (I say this under the assumption that you aren't using disp_sim in this directory).
  2. In kernel.cu, write a version of fillRandomOcean (call it something like fillRandomOceanOnGPU) that uses CUDA's random number generator curand. To use the library, you will need to include \. To ensure that different threads and different blocks have different pseudorandom numbers, each block needs a seed, so fillRandomOceanGPU will need to take as input a unsigned long seed. It will be used by a kernel function and must run on the device, so it needs __device__ before its definition. We will talk about this in class,but here is the documentation for the CURAND library for your perusal. I set it up so that all fish in all oceans use the same seed (which I set by calling time(NULL) on the host). Each fish uses its id for the sequence number (and by id, I mean global id, i.e. blockDim.x*blockIdx.x + threadIdx.x).
  3. In kernel.cu, write kernel runRandomSimulationWithStatistics. It should take as input an unsigned long seed, along with r, numFrames, and pointers to elongation and polarization arrays. The code should declare the same shared memory arrays as the simulation kernel from last week, but instead of copying an initial ocean from device global memory, it will create it by calling fillRandomOceanOnGPU, putting the data directly into shared memory. It can then proceed like last week's version, but it must also compute the elongation and the polarization, and place those results in appropriate slots in the elongation and polarization arrays.
  4. Make a file named collect_stats.cu, and put runSimulations in it. This file plays a role similar to the role collect_stats.c played in the pthreads project - it encapsulates all the code that gets the simulations running. In this case, it means runSimulations calls the runRandomSimulationWithStatistics kernel for NUM_OCEANS simulations with NUM_FISH in each. Add a definition for a NUM_OCEANS macro to fish_sizes.h (both collect_stats and sim_stats_nr will want it).
  5. Write sim_stats_nr.cu. You can basically copy sim_stats_nr.c from the pthreads version. To use get_time_sec, take it out of utils.h and put it into a file named my_timing.cu and include a file named my_timing.cuh). Also, remember to call cudaThreadSynchronize() before timing any kernel call.
  6. Run the code for oceans of 100 fish, and compare the timing results to those of the p-threads version. Which is faster? How many oceans can you simulate without seeing a significant performance hit?
  7. Increase the number of fish in the ocean and describe the effects of r on polarization and elongation -- does r have different effects, depending on the ocean size?


Perform a particularly in depth analysis of the code or the results. I would really like to know if we can tell how many blocks are actually run at the same time. If you can figure that out, I will be particularly pleased.

Writeup and Handin

To hand in your project, you will gather all of the necessary files into a proj09 directory under your turnin direction on the gpu host.

  1. Create a file named README.txt for your project write-up. Include a description of the process you used to determine that your code produces correct results. Also include the analysis outlined earlier. The more thorough the analysis, the higher your grade will be.
  2. You should hand in all code necessary to run your solutions. Place all necessary .h, .c, and Makefile files in the proj09 directory. Stephanie will probably want to compile and run the code. It should be possible to do so without looking for any more files.

Email Stephanie to let her know your project is ready for grading.