Contents

 

Multiple Linear Regression

 

Using Multiple Linear Regression in XLMiner™

 

In XLMiner™, select Prediction -> Multiple Linear Regression. The following dialog box appears, where you enter the data range that needs to be processed.  You also select the variables you want to use in your analysis ("Input variables" are the independent variables in the regression, and "Output variable" is the dependent variable).  If you have a variable that assigns weights to the different rows (cases), select it in the "weight variable" box.

 

 

 

 

Click Next, and the following dialog box comes up, where you specify the various statistics to be included in the output. Explanation of each of the options is displayed in the adjoining comment.

 

Force constant term to zero : If this option is checked, there will be no constant term in the equation. 

ANOVA Table: When this checkbox is checked, the ANOVA table is displayed in the output.

Fitted values: When this checkbox is checked, the fitted values are displayed in the output.

Variance-Covariance matrix: When this checkbox is checked the variance-covariance matrix of the estimated regression coefficients is displayed in the output.

Score training data:  Select this option to show an assessment of the performance in predicting the training data. The report is displayed according to your specifications - Detailed, Summary and Lift charts.

Score validation data:  Select this option to show an assessment of the performance in predicting the validation data. The report is displayed according to your specifications - Detailed, Summary and Lift charts.

Score Test Data:  The options in this group let you apply the model for scoring to the test partition (if one had been created earlier). The option "Score Test Data" is available only if the dataset contains test partition. Select it to apply the model to test data. 

Score new Data:  The options in this group let you apply the model for scoring to an altogether new data. Specify where the new data is located. See the Example of Discriminant Analysis for detailed instructions on this. 

Score New data in database : See the Example of Discriminant Analysis for detailed instructions on this. 

Display of Residuals

 

Unstandardized: When this checkbox is checked the Unstandardized Residuals are displayed in the output. Unstandardized residuals are computed by the formula 

Unstandardized residual = Actual response - Predicted response

Standardized: When this checkbox is checked the Standardized Residuals are displayed in the output. Standardized residuals are obtained by dividing the unstandardized residuals by the respective standard deviations.

Advanced:

To bring up the dialog box for advanced selection, click on the Advanced button on the dialog box. 

Residuals :

Studentized: When this checkbox is checked the Studentized Residuals are displayed in the output. Studentized residuals are computed by dividing the unstandardized residuals by quantities related to the diagonal elements of the hat matrix , using a common scale estimate computed without the ith case in the model. These residuals have t - distributions with ( n-k-1) degrees of freedom, so any residual with absolute value exceeding 3 usually requires attention.

Deleted: When this checkbox is checked the Deleted Residuals are displayed in the output. The residual for the ith observation is obtained by fitting the model with the ith observation omitted, using the model to predict the i th observation and then computing the difference from the actual ith observation.

Influence Statistics :

Cook's Distance: When this checkbox is checked the Cook's Distance for each observation is displayed in the output. This is an overall measure of the impact of the ith datapoint on the estimated regression coefficient. In linear models Cook's Distance has, approximately, an F distribution with k and (n-k) degrees of freedom.

DF fits: When this checkbox is checked the DF fits (change in the regression fit) for each observation is displayed in the output. These reflect coefficient changes as well as forecasting effects when an observation is deleted.

Covariance Ratios: When this checkbox is checked the covariance ratios are displayed in the output. This measure reflects the change in the variance-covariance matrix of the estimated coefficients when the ith observation is deleted.

Hat matrix Diagonal: When this checkbox is checked the diagonal elements of the hat matrix are displayed in the output. This measure is also known as the leverage of the ith observation.

Perform Collinearity Diagnostics: When this checkbox is checked the collinearity diagnostics are displayed in the output.

Number of collinearity components: Enter the number of collinearity components. This number can be between 2 and the number of degrees of freedom for the model. When the model is fitted without an intercept, the model degrees of freedom is equal to the number of predictors in the model. When the model is fitted with an intercept, the model degrees of freedom is equal to the number of predictors in the model plus one.

Multicollinearity Criterion: Enter a value between 0 and 1.

Best Subset:

When you have a large number of predictors and would like to limit the model to those that matter the most, use this option to select the best subset of predictor variables.  Although the example data set has only four variables, it will serve to illustrate the best subset selection procedure. To bring up the dialog box for best subset selection, click on the Best Subset  button on the dialog box. 

                                

Maximum size of best subset: Specify here the maximum size of the best subset.  (The best subset produced by XLMiner™ could be smaller.)

Number of best subsets: Specify here the number of subsets to be shown (XLMiner™ will first show the best, then the next-best, etc). You can select upto 20 as the number of best subsets.

Selection Procedure

  • Backward elimination:  Variables are eliminated one at a time, starting with the least significant.
  • Forward selection:  Variables are added one at a time, starting with the most significant.
  • Exhaustive search:  Searches all combinations of variables for the best fit (can be quite time-consuming, depending on the number of variables). 

  • Sequential replacement:  For a given number of variables, variables are sequentially replaced and replacements that improve performance are retained.

  • Stepwise selection:  Like forward selection, but at each stage, variables can be dropped or added.  

FIN, FOUT:  In adding and eliminating variables, an F-like statistic is calculated for regression.  For a variable to come into the regression, the F-like value must be greater than FIN (the default is 3.84).  For a variable to leave the regression, the F-like value must be less than FOUT (the default is 2.71).  The value you set for FIN must be greater than the value you set for FOUT.

Note:  If the constant term is forced to zero then all models reported by the best subset procedure will omit the constant term.

Clicking the OK button above will send you back to the earlier dialog box ("Multiple Linear Regression: Step 2 of 2").

 

See also: