CS 251: Assignment #8

Decision Trees and 1R Classifiers

Due Thursday, 29 April 2010


Consult with the prof before doing this assignment, either in person or via email. After that, it may also be nice to consult with your client.

  1. Using a dependent category variable and at least 3 independent variables, generate a one-rule decision tree for each of the independent variables, trying to predict the dependent variable category.

    Write your own code to do this. The code should be part of your data class or encapsulated in a new data analysis class. Calculate the error rate and information gain for each rule. Which rule is best?

  2. Write a function to write out part or all of your data to an arff file (usable by WEKA). A description of the format is given in the book. Ideally, you should be able to select a subset of the dimensions and a subset of the data points. It would be nice, for example, to be able to write out a data set of 75% of the data, randomly selected, as a training set. This may be a more complex process for some data sets than for others.
  3. Once you have an arff file, use WEKA to build more complex decision trees on at least 3 independent variables to predict a dependent category variable. If it makes more sense for your data set, you can try to predict a real valued dependent variable using the appropriate WEKA tools.

    Your writeup should explain the resulting decision tree and provide results on the training set. If you have a test set, show the results on that set as well.



For this week's writeup, make one child page from the main data project page. Put a description of the decision trees you built and summarize the results.


Once you have written up your assignment, give the page the label:


Put your code in the COMP/CS251 folder on fileserver1/Academics. Please make sure you are organizing your code by project.