|
Example:
Data
Size: Different versions of
XLMiner™ have varying limits on size of data. The size of data depicted
in the example below may not be supported by your version. Refer to Data Handling
Specifications
for details.
Figure
below shows data, which are test results on flying fitness tests for 40
pilots. There are five categorical variables (named var2, through var6)
indicative of the performance of the pilots on various physical and
psychological tests.
Open Flying_Fitness.xls
dataset from datasets folder. 
Open
XLMiner™
menu and select Partition data
--> Standard Partition. Use the following settings.
Open
XLMiner™
menu and select Classification and click on Naïve
Bayes to invoke
Naïve bayes dialog box. First dialog box of Naïve Bayes contains all
variables, which are available for selection. Select Var2 to Var6 as input
variables and TestRes/Var1 as output variable as shown in figure below. Click on
Next button to proceed.

In the second dialog box of Naïve Bayes, select Calculate according
to relative occurrences and click on Next button to proceed.

Check the score options to get the required
output, all options for training and validation in this case.

Score training data:
Select this option to show an assessment of the performance in classifying the training data. The report is displayed according to
your specifications - Detailed, Summary and Lift charts.
Score validation data:
Select this option to show an assessment of the performance
in classifying the validation data. The report is displayed according to
your specifications - Detailed, Summary and Lift charts.
Score Test Data: The options in this group let you apply the model for scoring to the test partition (if one had been created
earlier). The option "Score Test Data" is available only if the dataset contains test partition. Select it to apply the model to test data.
Score new Data: The options in this group let you apply the model for scoring to an altogether new data.
Specify where the new data is located. See
the Example of Discriminant Analysis for detailed instructions on this.
Score New data in database : See
the Example of Discriminant Analysis for detailed instructions on this.
The output of
Naïve Bayes is displayed on a separate sheet and you can view various sections
of output using Output Navigator.
See the output for Classification of Validation Data below. While
predicting the class of output variable, XLMiner™ calculates the conditional
probability that the variable may be classified to a particular
class. In this case the classes are 0 and 1. For every record in the
validation data the conditional probabilities for class - 0 and for class - 1
are calculated as shown below. The maximum value amongst these probabilities is
highlighted. XLMiner™ assigns that class to the output variable, for
which the conditional probability is maximum.

In addition to the classified data (above), you can also view the prior
class probabilities (in this case, the training data had 54.17%
"1's" and 45.83% "0's".
The
conditional probabilities are also shown. In this case, of the
cases classified as "1," 15.38% had a value of "0" for
variable 2. The remaining 84.62% had a value of "1" for variable
2. There were no cases classified as a "1" where variable 2 was
a "2."

NNB_Stored_1 : XLMiner™ generates this sheet along with the
other outputs. Please refer to the
Stored Model Sheets for details.
Lift
charts :
Lift charts are visual aids for
measuring model performance. They consist of
a lift curve and a baseline. The greater the area between the lift curve
and the baseline, the better the model.
Method
of drawing :
After the model is built using the training data set, the model is used
to score on the training data set and the validation data set (if
exists). Then the data set(s) are sorted using the predicted output
variable value (or predicted probability of success in the logistic
regression case). After sorting, the actual outcome values of the output
variable is cumulated and the lift curve is drawn as
number of cases versus the cumulated value. The baseline is drawn as
number of cases versus the average of actual output variable values
multiplied by the number of cases. The
decilewise lift curve is drawn as the decile number versus the
cumulative
actual output variable value divided by the decile's average output
variable
value.


See
also
|