Contents

 

Neural Networks Prediction

 

Example:

Data Size: Different versions of XLMiner™ have varying limits on size of data. The size of data depicted in the example below may not be supported by your version. Refer to Data Handling Specifications for details.

  1. Open the file BostonHousing.xls in Microsoft Excel.  

  2. In XLMiner™, click on Partition data --> Standard Partition.  In the ensuing dialog box, use the following settings. 

  3. In XLMiner™, select Prediction, then Neural Network (Multilayer feedforward) option. 

    A more detailed explanation of the above dialog box follows:

    Variables in input data: This box lists all the variables present in the dataset. If the "First row contains headers" box is checked, the header row above the data is used to identify variable names.

    Input Variables : Select one or more variables as independent variables from the Variables box by clicking on the corresponding selection button. These variables constitute the predictor variables.

    Output Variable: Select one variable as the dependent variable from the Variables box by clicking on the corresponding selection button. This is the variable being predicted.

    Click on "Next" to get the next dialog box.

  4. The second dialog box contains options to define the network architecture. For this example, enter #epochs = 500 and accept all the other default values.  Details on these choices are shown below the dialog box.

    Normalize input data:  Normalizing the data (subtracting the mean and dividing by the standard deviation) ensures that the distance measure accords equal weight to each variable. Let us keep this option unchecked for this example.

    Number of hidden layers:  Up to four hidden layers can be specified; see the overview section for more detail on layers in a neural network (input, hidden and output).  

    # Nodes:  Specify the number of nodes in each hidden layer. Selecting the number of hidden layers and the number of nodes is largely a matter of trial and error.

    # Epochs:  An epoch is one sweep through all the records in the training set.

    Step size for gradient descent:  This is the multiplying factor for the error correction during backpropagation; it is roughly equivalent to the learning rate for the neural network.  A low value produces slow but steady learning, a high value produces rapid but erratic learning.  Values for the step size typically range from 0.1 to  0.9.

    Weight change momentum:  In each new round of error correction, some memory of the prior correction is retained so that an outlier that crops up does not spoil accumulated learning. 

    Error tolerance: The error in a particular iteration is backpropagated only if it is greater than the error tolerance. Typically error tolerance is a small value in the range 0 to 1. The default value for error tolerance in XLMiner™ is 0.01.

    Weight decay: To prevent over-fitting of the network on the training data set a weight decay is used to penalize the weight in each iteration, thus updating it by multiplying the calculated weight by (1-decay).

  5. The next dialog box contains options for scoring data.Select the following options and click on Finish.

    Score training data:  Select this option to show an assessment of the performance of the tree in classifying the training data. The report is displayed according to your specifications - Detailed, Summary and Lift charts.

    Score validation data:  Select this option to show an assessment of the performance of the tree in classifying the validation data. The report is displayed according to your specifications - Detailed, Summary and Lift charts.

    Score Test Data:  The options in this group let you apply the model for scoring to the test partition (if one had been created earlier). The option "Score Test Data" is available only if the dataset contains test partition. Select it to apply the model to test data. 

    Score new Data:  The options in this group let you apply the model for scoring to an altogether new data. Specify where the new data is located. See the Example of Discriminant Analysis for detailed instructions on this. 

    Score New data in database : See the Example of Discriminant Analysis for detailed instructions on this. 

  6. Outputs of the neural network procedure are displayed in a separate sheet. You can use the Output Navigator to view various sections of the output. 

The "data," "variables," and "parameters" sections all reflect inputs chosen by the user.  

Epoch information:  Each time a record goes through the net, it is one trial, one sweep of all records is called an epoch.  So the total number of trials = # records * # epochs.  These totals are noted in the "Epoch information" section; the breakdown of all trials among the different classes is also reflected here.

XLMiner™ also provides intermediate information produced during the last pass through the net.  

Interlayer connections' weights:  Recall that a key element in a neural network is the weights for the connections between nodes.  In this example, we chose to have one hidden layer, and we also chose to have 25 nodes in that layer.  XLMiner™'s output contains a section that has the final values for the weights between the input layer and the hidden layer, between hidden layers, and between the last hidden layer and the output layer.  This information is useful to see what the inside of a neural net looks like;  it is unlikely to be of utility to the data analyst end-user.  Displayed below are the final connection weights between the input layer and the hidden layer for our example.

The training epochs log (not shown) lists the percent wrong predictions at the end of each epoch. 

NNP_Stored_1 : XLMiner™ generates this sheet along with the other outputs. Please refer to the Stored Model Sheets for details.

 Lift charts : Lift charts are visual aids for measuring model performance. They consist of a lift curve and a baseline. The greater the area between the lift curve and the baseline, the better the model.

Method of drawing : After the model is built using the training data set, the model is used to score on the training data set and the validation data set (if exists). Then the data set(s) are sorted using the predicted output variable value (or predicted probability of success in the logistic regression case). After sorting, the actual outcome values of the output variable is cumulated and the lift curve is drawn as number of cases versus the cumulated value. The baseline is drawn as number of cases versus the average of actual output variable values multiplied by the number of cases. The decilewise lift curve is drawn as the decile number versus the cumulative actual output variable value divided by the decile's average output variable value.

The Lift charts for the training and validation data are shown below.

 

See also