Contents

Classification Tree

Using Classification Tree in XLMiner™:

In XLMiner™, select Classification --> Classification tree. The following dialog box appears, where you need to specify the data range to be processed, the input variables and the output variable.

Variables: This box lists all the variables present in the dataset. If the "First row contains headers" box is checked, the header row above the data is used to identify variable names.

Variables in input data: Select one or more variables as independent variables from the Variables box by clicking on the corresponding selection button. These variables constitute the predictor variables.

Output Variable: Select one variable as the dependent variable from the Variables box by clicking on the corresponding selection button. This is the variable being classified.

Specify "Success" class : In classification tree the output variable has catagorical values. Eg. Let us enter a value "1" here. Then, if in a record the output variable attains a value of 1 in the training data, that is taken as success.

Specify initial cutoff probability value for success : Enter the desired value here, say 0.5. Then the class is taken to be a success if the probability is greater than this value.

 

Click Next and the following dialog box appears:  

Normalizing the data: This would make a difference only when you use linear combinations of the input set for splitting.

Minimum #records in a terminal node :  Continuing to build a tree to the point where it has maximum complexity reaches a point of diminishing returns in classification accuracy, then actually tends to increase the error rate of the tree (when applied to other data).  This is because you overfit the data -- fitting the tree in  a highly specific way to the training data's idiosyncrasies.  This option lets you halt the building of the tree when the number of "patterns" (= rows or cases) in every terminal node reaches a certain minimum level.

Prune tree:  Another way to correct for this over-fitting is to build the full tree, then prune it back. Pruning the tree trims back branches to yield a less complex tree. 

Click Next, and the following dialog box comes up, where you have the option to display a full tree, or pruned trees. 

Trees : XLMiner™ provides the option to provide maximum #levels in the tree. The "minimum error tree" is the tree that yields minimum classification error rate when tested on the validation data.  The "Best pruned tree" has the fewest number of nodes, subject to the constraint that error be kept below a specified level (that level is the minimum error rate plus the standard error of that error rate). Choose the appropriate options.

Score training data:  Select this option to show an assessment of the performance of the tree in classifying the training data. The report is displayed according to your specifications - Detailed, Summary and Lift charts.

Score validation data:  Select this option to show an assessment of the performance of the tree in classifying the validation data. The report is displayed according to your specifications - Detailed, Summary and Lift charts.

Score Test Data:  The options in this group let you apply the model for scoring to the test partition (if one had been created earlier). The option "Score Test Data" is available only if the dataset contains test partition. Select it to apply the model to test data. 

Score new Data:  The options in this group let you apply the model for scoring to an altogether new data. Specify where the new data is located. See the Example of Discriminant Analysis for detailed instructions on this. 

Score New data in database : See the Example of Discriminant Analysis for detailed instructions on this. 

See also