| Contents |
Logistic Regression
| Using
Logistic Regression in
XLMiner™:
In XLMiner™, select Classification --> Logistic Regression. The following dialog box appears, where you need to specify the data range to be processed, the input variables, the weight variable and the output variable.
Variables in input data: Select one or more variables as independent variables from the Variables box by clicking on the corresponding selection button. These variables constitute the predictor variables. Weight variable: Use this option if you have data where there are multiple cases (objects) sharing the same variable values, and the weight variable denotes the number of cases with those values. Output Variable: Select one variable as the dependent variable from the Variables box by clicking on the corresponding selection button. This is the variable being classified. Specify "Success" class : In logistic regression the output variable has catagorical values. Eg. Let us enter a value "1" here. Then, if in a record the output variable attains a value of 1 in the training data, that is taken as success. Specify initial cutoff probability value for success : Enter the desired value here. The second logistic regression dialog box:
Force constant term to zero : Selecting this causes the constant term to be omitted from the regression. Set confidence level for odds: Use this to alter the level of confidence for the confidence intervals displayed in the results for the odds ratio. Clicking the Advanced button the following dialog comes up:
Advanced Computational Settings: Maximum number of iterations: Estimating the coefficients in logistic regression requires an iterative non-linear maximization procedure. You can specify a maximum number of iterations to prevent the program from getting lost in very lengthy iterative loops. The default is set at 50. Initial Marquardt overshoot factor: This overshoot factor is a part of the iterative non-linear maximization procedure. Reducing it speeds up the operation by reducing the number of iterations required, but increases the chances that the maximization procedure will fail due to overshoot. Collinearity Diagnostics: Sometimes, variables are highly correlated with one another, and this can result in large standard errors for the affected coefficients. This diagnostics display provides information useful in dealing with this problem. Click OK to go back to step 2 of 3. Select "Best Subset". Best Subset: Often, a subset of variables (instead of all variables) does the best job of classification. Selecting Best Subset in the above dialog box brings up the Best Subset dialog box:
Maximum size of Best subsets: Specify here the maximum size of the best subset. (The best subset produced by XLMiner™ could be smaller.) Number of best subsets : XLMiner™ can allow upto 20 best subsets. Select the appropriate number. Selection Procedures
FIN, FOUT: In adding and eliminating variables, an F-like statistic is calculated for the regression. For a variable to come into the regression, the F-like value must be greater than FIN (the default is 3.84). For a variable to leave the regression, the F-like value must be less than FOUT (the default is 2.71). The value you set for FIN must be greater than the value you set for FOUT. Clicking OK dismisses the Best subsets dialog box, then clicking Next brings up the third dialog box:
Covariance matrix of coefficients: This option causes the coefficient covariance matrix to be displayed with the output. Entries in the matrix are the covariances between the indicated coefficients. The "on-diagonal" values are the estimated variances of the corresponding coefficients. Residuals: Produces a two-column array of fitted values and residuals. Score training data: Select this option to show an assessment of performance in classifying the training data. The report is displayed according to your specifications - Detailed, Summary and Lift charts. Score validation data: Select this option to show an assessment of performance in classifying the validation data. The report is displayed according to your specifications - Detailed, Summary and Lift charts. Score Test Data: The options in this group let you apply the model for scoring to the test partition (if one had been created earlier). The option "Score Test Data" is available only if the dataset contains test partition. Select it to apply the model to test data. Score new Data: The options in this group let you apply the model for scoring to an altogether new data. Specify where the new data is located. See the Example of Discriminant Analysis for detailed instructions on this. Score New data in database : This procedure is similar to the one explained in the example of Discriminant analysis , only exception being that instead of class, probability of success class will be written in the database. See the Example of Discriminant Analysis for detailed instructions on this. See also |