CS 336: Project #7

Project 7: CUDA on the GPU

Log onto one of the computers hosting a GPU. (rocketcalc-n0, -n1, -n2, or -n3). e.g.

ssh -X rocketcalc.cs.colby.edu

You can use Emacs to edit files here. Notice that you need to use the -X flag for X-forwarding instead of -Y.

Read about using the GPU at Colby in the Guide to GPGPU Programming at Colby.

In this project, you will be writing a CUDA C program to add two vectors. The goal of the project is to become familiar with the basic CUDA C components and with memory-handling. You should also become aware of particularly inefficient strategies.

  1. Write a single CUDA C program that adds two float vectors and places the result into a third vector. You should define two macros for NUM_THREADS_PER_BLOCK and NUM_BLOCKS. N (the problem size) should be computed directly from these macros. The code should create the two vectors and fill them on the host, then copy them to the device, then call the kernel, then copy the result back to the host, then print part of the result (enough to convince you it worked), and finally free the memory allocated on both the host and device.
  2. Time sections of your host code. Which parts take the longest? Is that independent of problem size? Is any of this surprising?
  3. If you need N threads, is it better to use lots of blocks or as few blocks as possible?
  4. What happens when you use too many threads or blocks?

Extensions

What if we want to add vectors so large that we can't devote one thread to each entry? Then we need to assign multiple entries to each thread. Implement a version that adds vectors where numThreads = NUM_BLOCK*NUM_THREADS_PER_BLOCK is < N. Is it better to assign contiguous chunks of the array to a single thread? Or is it better to assign every numThreads'th element to a thread?

Writeup and Handin

To hand in your project, you will gather all of the necessary files into a proj07_ directory.

  1. Create a file named README.txt for your project write-up. Include the analysis outlined earlier. The more thorough the analysis, the higher your grade will be.
  2. You should hand in all code necessary to run your solutions. Place all necessary .h, .c, and Makefile files in the directory. Stephanie will probably want to compile and run the code. It should be possible to do so without looking for any more files.

Tar up the directory and email the tarball to Stephanie.