The main purpose of this project is to give you the opportunity to examine the efficiency of your Hashtable and your BSTMap when counting words in the Reddit comment files.
Add code to make it possible to time the reading of the original Reddit comments files. You will want to use System.currentTimeMillis() to determine the time it takes for your map to read in each Reddit comments file. Here is an example of how to do so using the WordCounter as described earlier (using the BSTMap).
WordCounter wc = new WordCounter( );
long startTime = System.currentTimeMillis();
long finishTime = System.currentTimeMillis();
System.out.println("It took: "+(finishTime-startTime)/1000.0+ " second to read in the file of words");
Use code like that above to have your BSTMap read in the 2008 Reddit file 5 times. Record the 5 times, drop the low time, drop the high time, and compute the mean of the remaining three. Repeat this process for the next 7 Reddit files. This should give you 8 run-times associated with your BSTMap.
Repeat the process with your Hashtable, i.e. replace the BSTMap in your word counter with a hashtable.
Use a spreadsheet program to report the numbers with a line graph (one line per data structure, the x-axis should be the year and the y-axis should be the runtime). If there are any trends, describe them.
Analyze the performance of your BSTMap and Hashtable to see if they perform "ideally". For your Hashtable, that will involve counting collisions. For your BSTMap, that will involve determining the height of the tree. Report this information in your writeup.
Each assignment will have a set of suggested extensions. The required
tasks constitute about 85% of the assignment, and if you do only the
required tasks and do them well you will earn a B+. To earn a higher
grade, you need to undertake at least one extension. The difficulty
and quality of the extension or extensions will determine your final
grade for the assignment. One significant extension, or 2-3 smaller
ones, done well, is typical.
Use additional files to test the (time and/or space) efficiency of your implementations.
Examine the space efficiency of your implementations. One way to do this would be to refrain from using the -Xmx512m flag. Count words of files of increasing sizes. See what is the size of the smallest size that crashes your program. Does a smaller file crash the Hashtable code or the BSTMap code? Why?
Improve the time-efficiency of one of your data structures. Explain what you did and report the improvement in speed.
For any assignment, a good extension will be to implement a Java class
yourself and demonstrate that it has the same functionality as the
Java class. For example, you could implement your own ArrayList class
for this assignment.
For any assignment, a good extension will be to annotate your code to indicate all places where memory is "lost" (in other words, each place where the last remaining reference to an object is either destroyed or is given a new value). If you do this extension, please indicate so in your write-up.
Make your writeup for the project a wiki page in your personal
space. If you have questions about making a wiki page, stop by of our
offices or ask in lab.
Your writeup should have a simple format.
A brief description of the overall project, in your own words.
An explanation of your solution, focusing on the interesting bits. In
this assignment, for example, collision-handling is likely the most interesting bit. You can include code snippets in your writeup to help describe your solution. A code snippet is usually
less than 10 lines of code.
Printouts, pictures, or results to show what you did. For this
assignment, you should include the graph of run times.
Other results to demonstrate extensions you undertook.
A brief conclusion and description of what you learned.
A list of people you worked with, including TAs, and instructors. Include in that list anyone whose code you may have seen, such as those of friends who have taken the course in a previous semester.
Once you have written up your assignment, give the page the label:
You can give any wiki page a label using the label field at the bottom
of the page. The label is different from the title.
Do not put code on your writeup page or anywhere it can be publicly
accessed. To hand in code, put it in your folder on the Courses fileserver. Create a directory for each project inside the private folder inside your username folder.
When you are done with the lab exercises, you may start on the rest of the project.