Contents

Missing Data Handling

Using the Missing Data Handling Utility

Most data mining projects use large volumes of data. Before building a model, you need to ensure that there are no missing values. This utility helps in fixing them the way you want.

Activate XLMiner --> Data Utilities -->Missing Data Handling. The following dialog appears on the screen.

Missing Values are represented by this value : You can specify a value here. Wherever it appears in the dataset, it will be treated as a missing value and handled as per your instruction on missing value treatment. Check the box and enter the value if required.

Note : XLMiner™ treats empty and invalid cells as missing values automatically.

Overwrite existing worksheet : If checked, XLMiner™ overwrites the data set with the one in which all the missing values are appropriately treated. 

Variable names in the first Row : When this box is checked, XLMiner™ picks up the headers from first row of the selected data range. When the box is unchecked, XLMiner™ follows the default naming convention, i.e., the variable in the first column of the selected range will be called "Var1", the second column "Var2," etc.

Variables : Lets you specify the way you would like XLMiner™ to handle the missing values for every variable. Select a variable and refer to the options below.

How do you want to handle missing values for the selected variable(s)? : When you select any variable under "variables" column, XLMiner™ activates this selection box. Click on the arrow for pull down menu. It displays the options you have for handling the missing values, as listed below.

Delete record : If this option is selected, XLMiner™ deletes the whole record on finding a missing value for that variable.

Mode : All missing values in the column for that variable are replaced by the mode (the value occurring most frequently) of the column on selecting this.

Mean : For a variable with all numeric values, XLMiner™  will replace the missing values with the mean of all other values of that variable on selecting this option.

Median : For a variable with all numeric values, XLMiner™  will replace the missing values with the median of all other values of that variable on selecting this option.

User specified value : You can select this and specify a value in the box that appears. XLMiner™  will replace all the missing values with the specified value.

No treatment : If this option is selected for a particular variable then XLMiner™ does not treat this column even if there are missing values in it.

Apply this option to selected variable(s) : After you select how you want to handle missing values for the selected variable(s), select this option to apply the same.

Reset : Selecting this will reset all the selections you made for handling the missing values. XLMiner™  will settle for the default, ie. "No Treatment" for all the variables. You can change the settings again as per the requirement.

Select OK and a new worksheet will be created with imputed values.

See also: