Most
data mining projects use large volumes of data. Before building a
model, you need to ensure that there are no missing values. This utility helps
in fixing them the way you want.
Activate XLMiner --> Data Utilities
-->Missing Data Handling. The following dialog appears on the screen.
Missing Values are represented by this
value : You can specify a value here. Wherever it appears in the dataset,
it will be treated as a missing value and handled as per your instruction on
missing value treatment. Check the box and enter the value if required.
Note : XLMiner™ treats empty and
invalid cells as missing values automatically.
Overwrite existing worksheet : If
checked, XLMiner™ overwrites the data set with the one in which all the
missing values are appropriately treated.
Variable names in the first Row : When this box is checked,
XLMiner™ picks up the headers from first row of the
selected data range. When the box is unchecked, XLMiner™ follows the default
naming convention, i.e., the variable in the first column of the selected range
will be called "Var1", the second column "Var2," etc.
Variables : Lets you
specify the way you would like XLMiner™ to handle the missing
values for every variable. Select a variable and refer to the options below.
How do you want to handle missing values
for the selected variable(s)? : When you select any variable under
"variables" column, XLMiner™ activates this selection box. Click
on the arrow for pull down menu. It
displays the options you have for handling the missing values, as listed
below.
Delete record : If this option is
selected, XLMiner™ deletes the whole
record on finding a missing value for that variable.
Mode : All missing values in the column
for that variable are replaced
by the mode (the value occurring most frequently) of the column on selecting this.
Mean : For a variable with all numeric
values, XLMiner™ will replace the
missing values with the mean of all other values of that variable on selecting
this option.
Median : For a variable with all
numeric values, XLMiner™ will
replace the missing values with the median of all other values of that
variable on selecting this option.
User specified value : You can select
this and specify a value in the box that appears. XLMiner™
will replace all the missing values with the specified value.
No treatment : If this option is
selected for a particular variable then XLMiner™ does not treat this column
even if there are missing values in it.
Apply this option to selected variable(s) :
After you select how you want to
handle missing values for the selected variable(s), select this option to
apply the same.
Reset : Selecting this will reset all
the selections you made for handling the missing values. XLMiner™
will settle for the default, ie. "No Treatment" for all the
variables. You can change the settings again as per the requirement.
Select OK and a new worksheet will be
created with imputed values.
See
also: