CS 336: Project #11

### Bucket sort with MPI

In this project, you will be writing code to sort an array of integers using multiple, distributed processors. Use the bucket sort algorithm and MPI. The goal of the project is to become aquainted with some of the challenges of sorting data on many processors and to increase your confidence writing deadlock free message-passing.

1. We will be sorting integers in this project. To do so, we will need a sequential sorting function. Fortunately, C has an implementation of quicksort, called qsort. qsort requires a comparison function to determine in which order the elements must go. Below is the code for a qsort that sorts ints in ascending order:
```// Comparison function used by qsort
int compare_ints(const void* arg1, const void* arg2)
{
int a1 = *(int *)arg1;
int a2 = *(int *)arg2;
if (a1 < a2 ) return -1;
else if (a1 == a2 ) return 0;
else return 1;
}

// Sort the array in place
void qsort_ints(int *array, int array_len)
{
qsort(array, (size_t)array_len, sizeof(int), compare_ints);
}
```
2. Because the bucket sort needs to know the range of the data values, and we want to keep this as simple as possible, let's create the original array in a way that makes it easy for us. Define a max value (e.g. a macro named MAX_VALUE). When you are creating the data, use
`rand() % MAX_VALUE`
for each entry. Then determine the min and max value for each bucket under the assumption that the array is in the range 0 to MAX_VALUE-1. Note that this means the data are chosen from a uniform distribution.
3. Write bucket_sort.c using MPI. Create the array in the root process (as described above), determine which data belongs on which processor, send it to the appropriate processor, sort the data on each processor, then gather the results in the root process. The root process should verify the correctness of the result. To compile your code, use mpicc:
```/usr/lib64/openmpi/1.4-gcc/bin/mpicc bucket_sort.c -o bucket_sort
```
4. To test your code, you should use 32 processors on nodes n3, n4, n5, and n6. E.g.
```/usr/lib64/openmpi/1.4-gcc/bin/mpirun -mca btl ^openib -np 32 -host n3,n4,n5,n6 bucket_sort
```
5. Time the different sections of the code using MPI_Wtime. Which parts take the longest? (note: exclude the code that checks the correctness). Be sure to vary your problem size so that you can detect any trends.
6. What happens if you use different data-creation strategies? For example, use the square of each number pulled from a uniform distribution. This should make the load imbalanced and slow down the sort. Does it?

### Extensions

• Do a thorough analysis of certain MPI functions. For example, compute the amount of time it takes to gather N integers, for many values of N and many values of P (number of processors). Plot T (the time it takes to complete the gather) vs. N/P (the number of integers per process). What trends do you observe?
• Write a version of the code that creates the initial data on all processors (i.e. each process creates its chunk), then have all processes redistribute the data to the appropriate process (i.e. everyone is sending data to everyone else). Then proceed as before (i.e. accumulate the result on the root process). What is the difference in performance between the two approaches? Given the option to choose one over the other, under what conditions would you choose each one?

### Writeup and Handin

To hand in your project, you will gather all of the necessary files into a proj11 directory under your turnin directory on nscc.

1. Create a file named README.txt for your project write-up. Include a description of the process you used to determine that your code produces correct results. Also include the analysis outlined earlier. The more thorough the analysis, the higher your grade will be.
2. You should hand in all code necessary to run your solutions. Place all necessary .h, .c, and Makefile files in the proj11 directory. Stephanie will probably want to compile and run the code. It should be possible to do so without looking for any more files.