Decision Trees and 1R Classifiers
Due Thursday, 29 April 2010
Consult with the prof before doing this assignment, either in person or via email. After that, it may also be nice to consult with your client.
Using a dependent category variable and at least 3 independent
variables, generate a one-rule decision tree for each of the
independent variables, trying to predict the dependent variable
Write your own code to do this. The code should be part of your data class or encapsulated in a new data analysis class. Calculate the error rate and information gain for each rule. Which rule is best?
- Write a function to write out part or all of your data to an arff file (usable by WEKA). A description of the format is given in the book. Ideally, you should be able to select a subset of the dimensions and a subset of the data points. It would be nice, for example, to be able to write out a data set of 75% of the data, randomly selected, as a training set. This may be a more complex process for some data sets than for others.
Once you have an arff file, use WEKA to build more complex decision
trees on at least 3 independent variables to predict a dependent
category variable. If it makes more sense for your data set, you can
try to predict a real valued dependent variable using the appropriate
Your writeup should explain the resulting decision tree and provide results on the training set. If you have a test set, show the results on that set as well.
- Explore multiple decision tree building methods and compare and contrast the results.
- Write your own Bayesian classifier and compare the results to the 1R rules and decision tree.
- Do exercises one or three on more than one dependent variable. Try to automate the process, if you do this extension.
For this week's writeup, make one child page from the main data project page. Put a description of the decision trees you built and summarize the results.
Once you have written up your assignment, give the page the label:
Put your code in the COMP/CS251 folder on fileserver1/Academics. Please make sure you are organizing your code by project.