|
Examples:
Data Size: Different
versions of XLMiner™ have varying limits on size of data. The size of
data depicted in the example below may not be supported by your version. Refer
to
Data Handling
Specifications
for details.
Let us explore in
this section the various ways in which XLMiner™ can
treat the missing values . We will apply the utility on a few small datasets and
see the results.
Open the dataset Examples.xls.
It contains some worksheets which are small datasets. We will try the utility on
them to understand a lot of qualities this utility has.
-
Consider the
following dataset EX1 of Examples.xls. It contains a few empty cells, some invalid formulae
etc. XLMiner™ will treat all these as missing values, as you know.

Click XLMiner --> Data
Utilities --> Missing Data Handling. Let us see in this example how the defaults work.
While performing Missing Data Handling, XLMiner™
by default, does not treat any missing records.
To observe this, do not
make any changes in the dialog that appears.

Click OK. The results are as
shown below.

-
Note here that XLMiner™
has added a Row Id to every record (XLMiner™ highlights this
column). This comes handy when the data does not have a column for
record identification. Row Id makes it easier to find out which
records were deleted or changed as per the instructions in the
dialog.
-
A look at the column
RowID reveals that XLMiner™ has not treated any records yet as we have
selected the default option.
While using Missing Data
Handling, you have an option of overwriting the existing data set after
handling the missing values. For this you have to check the option,
Overwrite Existing Worksheet. XLMiner™ will then confirm with you if you
want to overwrite the current worksheet. On selecting Yes, it will do the
changes and you will see your data set after treatment. Like all other
EXCEL Worksheets, you have to save this if you want to have the
overwritten data set. Remember, once you save it, the original data set
will not exist. If
we had checked the option Overwrite Existing Worksheet in example 1 above,
XLMiner™ would overwrite the data set and inform us where to find the
summary of the output. It displays the summary on pressing OK.
-
Take a look at the dataset Ex2
of Examples.xls below.

Variable_1 and Variable_3
contain all numeric values and some missing values. Variable_2 has
non-numeric values and some missing values. In this example we will see
how the missing values can be replaced by the Mean and Median of the
column.
Click XLMiner -->
Data Utilities --> Missing Data Handling..
-
Select variable_1. Click on
the box for "How do you want to ---" and select Mean. Click on
"Apply this option to selected variable(s)".
Select Variable_3.
Click on the box for "How do you want to ---" and select Median.
Click on "Apply this option to selected variable(s)".
-
Let us maintain the default
option of No treatment for Variable_2.
Click "OK".
The results are shown below.

-
The missing values whether
empty or invalid cells, in the column for Variable_1 are replaced by the
mean of the remaining numbers in that column.
-
The missing values whether
empty or invalid cells, in the column for Variable_3 are replaced by the
median of the remaining numbers in that column.
-
The record containing
missing value in column for Variable_2 is not treated since we
maintained the default.
-
See the dataset Ex3 of Examples.xls.

Click XLMiner --> Data Utilities
--> Missing Data Handling.
We will replace the missing values by mode
for Variable_1 and Variable_2 and
Variable_3. Note that Variable_3 is a date variable.

-
Select Variable_2.
Click on the box for "How do you want to ---" and select Mode.
The options Mean and Median do not appear in the list since Variable_2
contains non-numeric values. Click on "Apply this option to selected variable(s)".
-
Select
Variable_3.
Click on the box for "How do you want to ---" and select Mode.
The options Mean and Median do not appear in the list since Variable_3
contains dates. Click on "Apply this option to selected variable(s)".
-
Click "OK".
The Results are shown below.

-
The missing values in the column
for Variable_2 are replaced by the mode, the most frequently occuring
value in that column. Though the data in this column is non-numeric,
XLMiner™
has done the replacement with the appropriate non-numeric
value.
-
Record no. 3 and 9 were the
missing values for Variable_3. Note that they are replaced by the mode,
ie the most frequently appearing date in the column. This indicates that
XLMiner™ maintained the type of variable here.
-
Here is dataset Ex4 of
Examples.xls..
Click XLMiner -->
Data Utilities --> Missing Data Handling.
We
will see how XLMiner™ replaces the missing values with User
Specified Values.
-
Select variable_1. Click on
the box for "How do you want to ---" and select User specified
value. Another box appears prompting you to enter the value you specify.
Let us enter 100 here. Click on
"Apply this option to selected variable(s)".
-
Select variable_2. Click on
the box for "How do you want to ---" and select User specified
value. Another box appears prompting you to enter the value you specify.
Let us enter 100 here. Click on
"Apply this option to selected variable(s)".
-
Let us maintain the default
option of No treatment for Variable_3.
-
Click "OK".
The Results are shown below.

-
The missing values in the
column for Variable_1 are replaced by 100 as specified by the user.
-
Same happens in the column
for Variable_2 though the data is non- numeric here.
-
The column for
Variable_3 remains untreated.
-
See the dataset Ex5 of Examples.xls. There are a few cells
containing a particular entry, -999 in this case. XLMiner provides a
facility to detect a given value and replace it the way you want. This
is especially useful when we have to handle the boundary cases.
XLMiner will identify this missing value and
replace it in the columns as specified.
Click XLMiner™ --> Data Utilities
--> Missing Data Handling.
-
Check "Missing values are
represented by this value". Enter -999 in the box provided.
-
Select variable_1 and instruct XLMiner
to
replace the missing values ie wrong or empty cells and -999 by mean of the column.
-
Similarly, enter "User specified
value" for variable_2 (enter "zzz" as the replacement )
and mode for variable_3.
The results are shown below.

- Note that in column 1, the specified
missing code is replaced by the mean of the column.
- In column 2, the missing values for
variable_2 are replaced by the
user specified value and for variable_3 by the mode of the column.
-
Let us take a look at one more dataset, Ex6 of
Examples.xls. Let us apply the missing value handling procedures on it.
Click XLMiner™ --> Data
Utilities --> Missing Data Handling.
Apply the following procedures to
the various columns.

Check "Missing values are
represented -- " and enter 33 there.
Select
"Delete record" for column for Variable_1.
Choose mode to replace the
missing values in column for Variable_2.
Specify a value, 9999, for
missing values in column for Variable_3.
See the output.

Record number 7 and 12 are
deleted because there were missing values for
variable_1 and we chose "Delete Record" for
variable_1.
In column 2 the missing
values are replaced by the mode. In this column record no. 7
was a missing value. Instead of getting replaced by mode the
record got entirely deleted as per the Delete record
instructions in column 1 (record 7 had a missing value in
column 1 as well). Note that "Delete
record" holds priority over any other instruction in
Missing Data Handling.
In column 3, we instructed
XLMiner™ to treat 33 as missing value. So 33, and the other missing values were replaced by the user specified
value. Again here, record no.12 had 33 which should have
been replaced by 9999. But since column 1 also contained a
missing value there, the instruction "Delete
record" got preference and the entire record was
deleted.
See
also:
|