Objectives

The main purpose of this project is to give you the opportunity to examine the efficiency of your Hashtable and your BSTMap when counting words in the Reddit comment files.

Tasks

  1. Add code to make it possible to time the reading of the original Reddit comments files. You will want to use System.currentTimeMillis() to determine the time it takes for your map to read in each Reddit comments file. Here is an example of how to do so using the WordCounter as described earlier (using the BSTMap).
            WordCounter wc = new WordCounter( );
            long startTime = System.currentTimeMillis();
            wc.loadFromOriginalWordsFile(inTextFilename);        
            long finishTime = System.currentTimeMillis();
            System.out.println("It took: "+(finishTime-startTime)/1000.0+ " second to read in the file of words");
    
  2. Use code like that above to have your BSTMap read in the 2008 Reddit file 5 times. Record the 5 times, drop the low time, drop the high time, and compute the mean of the remaining three. Repeat this process for the next 7 Reddit files. This should give you 8 run-times associated with your BSTMap.
  3. Repeat the process with your Hashtable, i.e. replace the BSTMap in your word counter with a hashtable.
  4. Use a spreadsheet program to report the numbers with a line graph (one line per data structure, the x-axis should be the year and the y-axis should be the runtime). If there are any trends, describe them.
  5. Analyze the performance of your BSTMap and Hashtable to see if they perform "ideally". For your Hashtable, that will involve counting collisions. For your BSTMap, that will involve determining the height of the tree. Report this information in your writeup.

Extensions

Each assignment will have a set of suggested extensions. The required tasks constitute about 85% of the assignment, and if you do only the required tasks and do them well you will earn a B+. To earn a higher grade, you need to undertake at least one extension. The difficulty and quality of the extension or extensions will determine your final grade for the assignment. One significant extension, or 2-3 smaller ones, done well, is typical.

  1. Use additional files to test the (time and/or space) efficiency of your implementations.
  2. Examine the space efficiency of your implementations. One way to do this would be to refrain from using the -Xmx512m flag. Count words of files of increasing sizes. See what is the size of the smallest size that crashes your program. Does a smaller file crash the Hashtable code or the BSTMap code? Why?
  3. Improve the time-efficiency of one of your data structures. Explain what you did and report the improvement in speed.
  4. For any assignment, a good extension will be to implement a Java class yourself and demonstrate that it has the same functionality as the Java class. For example, you could implement your own ArrayList class for this assignment.
  5. For any assignment, a good extension will be to annotate your code to indicate all places where memory is "lost" (in other words, each place where the last remaining reference to an object is either destroyed or is given a new value). If you do this extension, please indicate so in your write-up.

Handin

Make your writeup for the project a wiki page in your personal space. If you have questions about making a wiki page, stop by of our offices or ask in lab.

Your writeup should have a simple format.

Once you have written up your assignment, give the page the label:

cs231f17project7

You can give any wiki page a label using the label field at the bottom of the page. The label is different from the title.

Do not put code on your writeup page or anywhere it can be publicly accessed. To hand in code, put it in your folder on the Courses fileserver. Create a directory for each project inside the private folder inside your username folder.


When you are done with the lab exercises, you may start on the rest of the project.


© 2017 Caitrin Eaton.