Contents

Transform Categorical Data

Introduction

All datasets are not perfectly numeric. Most of the times they have a few non-numeric columns making it difficult to apply standard procedures on them. Transform categorical data can be applied on such datasets. This utility provides some options using which the string variables in such datasets can be categorized into numeric values.

Transform categorical data provides the following options :-

  1. Create Dummies In this method, a string variable is transformed into as many dummy variables as the number of distinct values it takes. XLMiner™ can handle string variables with upto 30 distinct values. These dummy variables act as switches. Suppose dummy variables are created for a variable Addr which assumes four distinct string values, say value1, value2, value3 and value4. Let us assume further that when alphabetically sorted, they are Value3, Value1, Value2, Value4. (Value3 is a string which stands first alphabetically or has the lowest value). Three dummy variables, Addr_value1, Addr_Value2, Addr_value4  will be created. Addr_value1 will be equal to 1 only when Addr = "value1" and will be equal to zero otherwise. All the other dummy variables will behave in the similar fashion. 

    This way the variable Addr is converted into numeric binary variables. All the three dummy variables will be equal to zero when Addr="Value3". XLMiner™ avoids creating four dummy variable here because Addr="Value_3" is obvious when all the other variables are equal to zero. Thus the column for lowest value of Addr is not created.  This procedure can be applied to all the non-numeric categorical variables in the dataset at a time (ie. if they have less than 30 distinct values.)

  2. Create Category Scores :  All the distinct values in a string variable are sorted in this procedure and the numbers 1,2, 3 .. are assigned to them starting from the string of lowest value. These assigned numeric values are stored in a newly created variable. Thus the string variable is converted into numeric, categorical variable. XLMiner™ can assign numbers 1, 2, 3... or 0, 1, 2.. as per the user's requirements.

See also: