Contents

Missing Data Handling

Examples:

Data Size: Different versions of XLMiner™  have varying limits on size of data. The size of data depicted in the example below may not be supported by your version. Refer to Data Handling Specifications for details.

Let us explore in this section the various ways in which XLMiner™ can treat the missing values . We will apply the utility on a few small datasets and see the  results.

Open the dataset Examples.xls. It contains some worksheets which are small datasets. We will try the utility on them to understand a lot of qualities this utility has.

  1. Consider the following dataset EX1 of Examples.xls. It contains a few empty cells, some invalid formulae etc. XLMiner™ will treat all these as missing values, as you know.

  2. Click XLMiner --> Data Utilities --> Missing Data Handling. Let us see in this example how the defaults work. While performing Missing Data Handling, XLMiner™ by default, does not treat any missing records. To observe this, do not make any changes in the dialog that appears.

    Click OK. The results are as shown below.

    • Note here that XLMiner™  has added a Row Id to every record (XLMiner™ highlights this column). This comes handy when the data does not have a column for record identification. Row Id makes it easier to find out which records were deleted or changed as per the instructions in the dialog. 

    • A look at the column RowID reveals that XLMiner™ has not treated any records yet as we have selected the default option.

  3. While using Missing Data Handling, you have an option of overwriting the existing data set after handling the missing values. For this you have to check the option, Overwrite Existing Worksheet. XLMiner™ will then confirm with you if you want to overwrite the current worksheet. On selecting Yes, it will do the changes and you will see your data set after treatment. Like all other EXCEL Worksheets, you have to save this if you want to have the overwritten data set. Remember, once you save it, the original data set will not exist.

    If we had checked the option Overwrite Existing Worksheet in example 1 above, XLMiner™ would overwrite the data set and inform us where to find the summary of the output. It displays the summary on pressing OK.

     

  4.  

  5. Take a look at the dataset Ex2 of Examples.xls below.

  6. Variable_1 and Variable_3 contain all numeric values and some missing values. Variable_2 has non-numeric values and some missing values. In this example we will see how the missing values can be replaced by the Mean and Median of the column.

    Click XLMiner --> Data Utilities -->  Missing Data Handling..

    • Select variable_1. Click on the box for "How do you want to ---" and select Mean. Click on "Apply this option to selected variable(s)".

    • Select Variable_3.  Click on the box for "How do you want to ---" and select Median. Click on "Apply this option to selected variable(s)".

    • Let us maintain the default option of No treatment for Variable_2.

    • Click "OK".

    The results are shown below.

    • The missing values whether empty or invalid cells, in the column for Variable_1 are replaced by the mean of the remaining numbers in that column.

    • The missing values whether empty or invalid cells, in the column for Variable_3 are replaced by the median of the remaining numbers in that column.

    • The record containing missing value in column for Variable_2 is not treated since we maintained the default.

  7. See the dataset Ex3 of Examples.xls.
  8. Click XLMiner --> Data Utilities -->  Missing Data Handling.

    We will replace the missing values by mode for Variable_1 and Variable_2 and Variable_3. Note that Variable_3 is a date variable.

    • Select Variable_2.  Click on the box for "How do you want to ---" and select Mode. The options Mean and Median do not appear in the list since Variable_2 contains non-numeric values. Click on "Apply this option to selected variable(s)".

    •  Select Variable_3.  Click on the box for "How do you want to ---" and select Mode. The options Mean and Median do not appear in the list since Variable_3 contains dates. Click on "Apply this option to selected variable(s)".

    • Click "OK".

    The Results are shown below.

    • The missing values in the column for Variable_2 are replaced by the mode, the most frequently occuring value in that column. Though the data in this column is non-numeric, XLMiner™ has done the replacement with the appropriate non-numeric value.

    • Record no. 3 and 9 were the missing values for Variable_3. Note that they are replaced by the mode, ie the most frequently appearing date in the column. This indicates that XLMiner™ maintained the type of variable here. 

  9. Here is dataset Ex4 of Examples.xls..

    Click XLMiner --> Data Utilities -->  Missing Data Handling.

    We will see how XLMiner™ replaces the missing values with User Specified Values.

    • Select variable_1. Click on the box for "How do you want to ---" and select User specified value. Another box appears prompting you to enter the value you specify. Let us enter 100 here. Click on "Apply this option to selected variable(s)".

    • Select variable_2. Click on the box for "How do you want to ---" and select User specified value. Another box appears prompting you to enter the value you specify. Let us enter 100 here. Click on "Apply this option to selected variable(s)".

    • Let us maintain the default option of No treatment for Variable_3.

    • Click "OK".

    • The Results are shown below.

    • The missing values in the column for Variable_1 are replaced by 100 as specified by the user.

    • Same happens in the column for Variable_2 though the data is non- numeric here.

    • The column for Variable_3 remains untreated.

  10. See the dataset Ex5 of Examples.xls. There are a few cells containing a particular entry, -999 in this case. XLMiner provides a facility to detect a given value and replace it the way you want. This is especially useful when we have to handle the boundary cases.

  11. XLMiner will identify this missing value and replace it in the columns as specified.

    Click XLMiner™ --> Data Utilities -->  Missing Data Handling.

    • Check "Missing values are represented by this value". Enter -999 in the box provided.

    • Select variable_1 and instruct XLMiner to replace the missing values ie wrong or empty cells and -999 by mean of the column.

    • Similarly, enter "User specified value" for variable_2 (enter "zzz" as the replacement ) and mode for variable_3.

    • The results are shown below.

    • Note that in column 1, the specified missing code is replaced by the mean of the column.
    • In column 2, the missing values for variable_2 are replaced by the user specified value and for variable_3 by the mode of the column.
  12. Let us take a look at one more dataset, Ex6 of Examples.xls. Let us apply the missing value handling procedures on it.

      Click XLMiner™ --> Data Utilities -->  Missing Data Handling.

      Apply the following procedures to the various columns.

      • Check "Missing values are represented -- " and enter 33 there. 

      • Select "Delete record" for column for Variable_1.

      • Choose mode to replace the missing values in column for Variable_2.

      • Specify a value, 9999, for missing values in column for Variable_3.

  13. See the output.

    • Record number 7 and 12 are deleted because there were missing values for variable_1 and we chose "Delete Record" for variable_1.

    • In column 2 the missing values are replaced by the mode. In this column record no. 7 was a missing value. Instead of getting replaced by mode the record got entirely deleted as per the Delete record instructions in column 1 (record 7 had a missing value in column  1 as well). Note that "Delete record" holds priority over any other instruction in Missing Data Handling.

    • In column 3, we instructed XLMiner™ to treat 33 as missing value. So 33, and the other missing values were replaced by the user specified value. Again here, record no.12 had 33 which should have been replaced by 9999. But since column 1 also contained a missing value there, the instruction "Delete record" got preference and the entire record was deleted.

    •  

      See also: