Contents

 

Neural Networks Classification

 

Example:

 

Data Size: Different versions of XLMiner™  have varying limits on size of data. The size of data depicted in the example below may not be supported by your version. Refer to Data Handling Specifications for details.

  1. Open the file Wine.xls in Microsoft Excel.  This file contains 13 quantitative variables measuring the chemical attributes of wine samples from 3 different wineries (the Type variable). The objective is to assign a wine classification to each record..

  2. In XLMiner™, click on Partition data --> Standard Partition.  In the ensuing dialog box, select all variables in the Variables box and move them to the "Variables in the partitioned data" box. Select Specify percentages and enter 80% for the training set and 20% for the validation set. Select OK. 

  3. In XLMiner™, select Classification, then Neural Network (Multilayer feedforward) option. In the Neural network dialog box, move the variable "Type" to the "Output variable" box, and move the remaining variables to the "Input variables" box.  

    A more detailed explanation of the above dialog box follows:

    Variables: This box lists all the variables present in the dataset. If the "First row contains headers" box is checked, the header row above the data is used to identify variable names.

    Variables in input data: Select one or more variables as independent variables from the Variables box by clicking on the corresponding selection button. These variables constitute the predictor variables.

    Weight variable:  Use this option if you have data where there are multiple cases (objects) sharing the same variable values, and the weight variable denotes the number of cases with those values.

    Output Variable: Select one variable as the dependent variable from the Variables box by clicking on the corresponding selection button. This is the variable being classified.

    Click Next, and the following dialog box appears. Here you specify the architecture for the neural network. 

  4. The second dialog box contains options to define the network architecture. For this sample, accept the default values.  Details on these choices are explained below the dialog box.

    Normalize input data:  Normalizing the data (subtracting the mean and dividing by the standard deviation) is important to ensure that the distance measure accords equal weight to each variable -- without normalization, the variable with the largest scale will dominate the measure. Check this box.

    Number of hidden layers:  Up to four hidden layers can be specified; see the overview section for more detail on layers in a neural network (input, hidden and output).  Let us specify the number to be 1.

    # Nodes:  Specify the number of nodes in each hidden layer. Selecting the number of hidden layers and the number of nodes is largely a matter of trial and error.

    # Epochs:  An epoch is one sweep through all the records in the training set.

    Step size for gradient descent:  This is the multiplying factor for the error correction during backpropagation; it is roughly equivalent to the learning rate for the neural network.  A low value produces slow but steady learning, a high value produces rapid but erratic learning.  Values for the step size typically range from 0.1 to  0.9.

    Weight change momentum:  In each new round of error correction, some memory of the prior correction is retained so that an outlier that crops up does not spoil accumulated learning. 

    Error tolerance: The error in a particular iteration is backpropagated only if it is greater than the error tolerance. Typically error tolerance is a small value in the range 0 to 1. The default value for error tolerance in XLMiner™ is 0.01.

    Weight decay: To prevent over-fitting of the network on the training data set a weight decay is used to penalize the weight in each iteration, thus updating it by multiplying the calculated weight by (1-decay).

    Cost Function : XLminer™ provides four options for cost functions -- squared mirror, cross entropy, Maximum likelihood and perceptron convergence. The user can select the appropriate one.

    Hidden layer sigmoid : The output of every hidden node passes through a sigmoid function. Standard sigmoid function is logistic, the range is between 0 and 1. Symmetric sigmoid function is tanh function, the range being -1 to 1. 

    Output layer sigmoid : Standard sigmoid function is logistic, the range is between 0 and 1. Symmetric sigmoid function is tanh function, the range being -1 to 1. 

    Check the options as shown above. Click Next.

  5. The next dialog box contains options for scoring data. For this example, check the options shown below. Click on Finish.

    Score training data:  Select this option to show an assessment of the performance of the tree in classifying the training data. The report is displayed according to your specifications - Detailed, Summary and Lift charts.

    Score validation data:  Select this option to show an assessment of the performance of the tree in classifying the validation data. The report is displayed according to your specifications - Detailed, Summary and Lift charts.

    Score Test Data:  The options in this group let you apply the model for scoring to the test partition (if one had been created earlier). The option "Score Test Data" is available only if the dataset contains test partition. Select it to apply the model to test data. 

    Score new Data:  The options in this group let you apply the model for scoring to an altogether new data. Specify where the new data is located. See the Example of Discriminant Analysis for detailed instructions on this. 

    Score New data in database : See the Example of Discriminant Analysis for detailed instructions on this. 

  6. Outputs of the neural network procedure are displayed in a separate sheet. You can use the Output Navigator to view various sections of the output. 

The "data," "variables," and "architecture" sections all reflect inputs chosen by the user.  

Epoch information:  Each time a record goes through the net, it is one trial, one sweep of all records is called an epoch.  So the total number of trials = # records * # epochs.These totals are noted in the "Epoch information" section; the breakdown of all trials among the different classes is also reflected here.

XLMiner™ also provides intermediate information produced during the last pass through the net.  

Interlayer connections' weights:  Recall that a key element in a neural network is the weights for the connections between nodes.  In this example, we chose to have one hidden layer, and we also chose to have 25 nodes in that layer.  XLMiner's output contains a section that has the final values for the weights between the input layer and the hidden layer, between hidden layers, and between the last hidden layer and the output layer.  This information is useful to see what the inside of a neural net looks like;  it is unlikely to be of utility to the data analyst end-user.  Displayed below are the final connection weights between the input layer and the hidden layer for our example.

The training epochs log (shown below) lists the percent misclassified at the end of each epoch. 

During an epoch, each training record is fed forward to the network and classified. Error is calculated and is back propagated for weights correction. Hence weights are continuously adjusted during the epoch. The classification error is computed as the records pass through the network. It does not report the classification error after the final weight adjustment is done. Scoring of the training data is done using the final weights so training classification error may not exactly match with the last epoch error in the Epoch log.

NNC_Stored_1 : XLMiner™ generates this sheet along with the other outputs. Please refer to the Stored Model Sheets for details.

 

 

See also