Contents

Transform Categorical Data

Examples:

Data Size: Different versions of XLMiner™  have varying limits on size of data. The size of data depicted in the example below may not be supported by your version. Refer to Data Handling Specifications for details.

Let us apply this utility on Irisfacto.xls, a small dataset to understand the features of Create Dummies and Create Category Scores. This dataset is derived from Iris.xls.

                            

Species_Name happens to be a string variable. 

  1. Select  XLMiner --> Data Utilities --> Transform Categorical Data --> Create Dummies.

  2. Select Species_name and click OK.

    See the output.

    Interpretation :

    As seen above, the variable, Species_name, is expressed as two dummy variables,  Species_name_Verginica and Species_name_Versicolor.  They act as switches. Species_name_Verginica takes a value of 1 only when the value of Species_name="Verginica" in the dataset. Otherwise, Species_name_Verginica = 0. Same is true for the other dummy variable ie. Species_Name_Versicolor. 

    The variable Species_Name assumes one more value in the dataset = "Setosa". You will wonder why the dummy variable Species_Name_Setosa is missing. See the values of the two dummy variables for Row Id = 3. Both of them are zero when the value of Species_Name in the the dataset is "Setosa" for the 3rd  record. This means when both the dummy variables show the value of 0, the value is known to be "Setosa" automatically. This is the reason for not including the column for dummy variable Species_Name_Setosa.

    In this way, XLMiner™ converts a string variable into categorical variables and the dataset is now numeric. 

  3. Select  XLMiner --> Data Utilities --> Transform Categorical Data --> Create Category Scores

 

Select Species_name and retain the default option of Assign numbers 1,2,3.... 

 

Interpretation: 

The output shows that the XLMiner™ sorts the values of this variable alphabetically and assigns numbers 1,2,3... to them.  (Starting from 1 because we selected assign numbers 1,2,3...) A variable, Species_name_ord is created to store these assigned numbers. If we had selected Assign numbers 0,1,2... then Species_name_ord would have values from 0,1,2.... Thus the variable Species_name is categorized.