CS 231: Project #4

Title image Spring 2020

Comparing Sort Functions

The goal of this project is to give you practice writing, testing, and timing efficient code. The focus of the project is to compare various sorting algorithms.

You are free to use resources like textbooks, relevant wikipedia pages, or any other resources that provide pseudo-code for sorting algorithms. Please do not copy and paste java code from any source. Understand what you are coding up and cite the resources you used.

Your methods section this week should include a description of the sorting algorithms you implemented. Put most of your focus on the algorithm you chose yourself, possibly including pseudo-code for it.

Tasks

  1. Implement merge sort using recursion

    public static void merge_sort( long[] data ) - sorts the data in ascending order using merge sort. After executing the function, the values in data should be in ascending order.

    Write a recursive implementation of Mergesort. For this version, allocate a new array for the left half and a new array for the right half before making the recursive call, passing in the new array as the argument. Merge the two arrays by assigning their values in order back to the original array.

    Add code to your main function in your Sorter class to test your merge_sort algorithm. Declare and allocate a long array of some size (e.g. 20), fill it with random values, and print the array. Pass the array to your merge_sort and then print it out. Debug until your algorithm works properly.

    Include the output of this test in your report.

  2. Implement merge sort using recursion with efficient memory use

    public static void merge_sort_me( long[] data ) - sorts the data in ascending order using merge sort. After executing the function, the values in data should be in ascending order.

    Write a recursive implementation of Mergesort, but allocate extra memory only once. Start by creating a helper function.

    public static void merge_sort_me( long[] data, int start, int end, long[] storage )

    data is the array to be sorted, start is the index of the first element in the array to be processed, end is the index immediately after the end of the part of the array to be processed, and storage is an array of the same size as data.

    The helper function will do the actual work. The initial call to this version of merge_sort_me should pass in an allocated array for storage of length equal to data, 0 for start, and data.length for end. When making recursive calls, there is no need to copy the data, just modify the values of start and end. The storage array is used during merging to collect the values in order. The in-order values should then be copied back to their correct position in the data array.

    Add test code for merge_sort_me to your main function. Include the output of this test in your report.

  3. Implement quicksort using recursion

    public static void quick_sort( long[] data ) - sorts the data in ascending order using quicksort. After executing the function, the values in data should be in ascending order.

    Quicksort should not explicitly allocate any additional memory.

    Add test code for quick_sort to your main function. Include the output of this test in your report.

  4. Implement one other O(NlogN) sorting method of your choice

    There are many algorithms to choose from, a number of which are combinations of other sorting algorithms. The Wikipedia page has many examples.

    Wikipedia page for sorting algorithms.

    Your sort method should take in an array of type long and the array should be in ascending order when the method is complete.

    Add test code for your chosen sort method to your main function. Include the output of this test in your report.

  5. Execute a timing analysis for all of your sort methods

    Use arrays filled with random values for this analysis. The analysis should include insert_sort.

    To get an accurate estimate of complexity, you want to time the sort methods for a variety of values of N. Given that most of the methods are NlogN, you will want to use logarithmically spaced values. For example, start with an initial N of 100, and then execute some number of iterations, multiplying N by 1.5 or 2 each iteration (1.5 gives a little more resolution but takes a little longer). Longer execution times give more accurate estimates, so have your loop continue until it takes a second or two to test one of the sorting algorithms.

    To get an accurate average, time each algorithm at least five times, then use your robust average function to compute the average minus the high and low values.

    You may want to create a method to do all of this, since you will need to repeat the process many times. You could, for example, write a method that takes arguments for N, which method to use, and how many repeats to perform and returns the time taken in ms.

    You may also want to have your code print the values in csv format with a proper header row. You can always use the redirect operator to send output from the terminal to a file.

    java -Xint Sorter > timingdata.csv

    Include two graphs in your report for this task. One graph should show insert_sort and merge_sort. The second graph should show all of the methods with O(NlogN) average complexity (not insert_sort).

  6. Execute a timing analysis for in-order arrays

    Repeat the process above, but use arrays where the values are already in ascending order.

    Put all of the timing results in one graph, which should be included in your report.

  7. Execute a timing analysis for reverse-order arrays

    Repeat the process above, but use arrays where the values are all in descending order.

    Put all of the timing results in one graph, which should be included in your report.


  8. Extensions

    1. Compare the timing results with and without the -Xint flag. Explain your results.
    2. Implement another sorting method and show how it compares. O(N^2) sort methods are not interesting enough to try.
    3. Try to make one of the above sorting methods as efficient as possible by modifying the code. Use timing results to determine if any changes you make to your code actually make it faster. For example, does it make a difference if you use the built-in function System.arraycopy() in mergesort?
    4. Try a different method of computing complexity by using a variable to count how many times an operation like a comparison is executed. For example, increment a the variable whenever a comparison occurs.
    5. Uber extension: try to incorporate parallel processing in your sort method.

    Report

    Your intended audience for your report are your peers not in the class, but who have taken a CS course. Your report should explain the core CS concept used, present the results of your program, and discuss the meaning of the results: what did you discover and does it make sense?

    Your project report should contain the following elements. Please include a header for each section.

    • Abstract

      A brief summary of the project, in your own words. This should be no more than a few sentences. Give the reader context and identify the key purpose of the assignment. Each assignment will have both a core CS purpose--usually a new data structure--and an application such as a simulation or analysis.

      Writing an effective abstract is an important skill. Consider the following questions while writing it.

      • Does it describe the CS concepts of the project (e.g. writing well-organized and efficient code)?
      • Does it describe the specific project application?
      • Does it describe the data structure you used?
      • Does it describe the results of your simulation or analysis?
      • Is it concise?
      • Are all of the terms well-defined?
      • Does it read logically and in the proper order?
    • Methods

      The methods section should be a high level explanation of your solution. Give an explanation of the main CS concept or data structure for the project. Then explain how you used it as part of a simulation or analysis and why the data structure was appropriate for that task. One paragraph for each should be sufficient.

    • Results

      Present your results. Include text output, images, tables, graphs, or qualitative descriptions, as appropriate. Include a discussion as to whether the results make sense and what they mean. This week, present your timing results, explain whether they show the correct complexity, and compare the different cases/algorithms.

      Remember to include the results of your lab work as well on squaring and insert sort.

    • Extensions

      Describe any extensions you undertook, including text output, graphs, tables, or images demonstrating those extensions. If you added any modules, functions, or other design components, note their structure and the algorithms you used.

    • Reflection

      A brief (1-3 sentences) description of what you learned. Think about the answer to this question in terms of the stated purpose of the project. What are some specific things you had to learn or discover in order to complete the project?

    • References/Acknowledgements

      A list of people you worked with, including TAs and instructors. Include in that list anyone whose code you may have seen, such as those of friends who have taken the course in a previous semester.

    Handin

    Make your report for the project a wiki page in your personal space. If you have questions about making a wiki page, stop by my office or ask in lab.

    Once you have written up your assignment, give the page the label:

    cs231s20project4

    You can give any wiki page a label using the label field at the bottom of the page. The label is different from the title.

    Do not put code on your writeup page or anywhere it can be publicly accessed. To hand in code, put it in your folder on the Courses fileserver. Create a directory for each project inside the private folder inside your username folder.