|
Introduction
All
datasets are not perfectly numeric. Most of the times they have a few
non-numeric columns making it difficult to apply standard procedures on
them. Transform categorical data can be applied on such datasets. This
utility provides some options using which the string variables in such
datasets can be categorized into numeric values.
Transform
categorical data provides the following options :-
-
Create
Dummies : In this method, a string variable is
transformed into as
many dummy variables as the number of distinct values it takes.
XLMiner™
can handle string variables with upto 30 distinct values.
These dummy variables act as switches. Suppose dummy variables are
created for a variable Addr which assumes four distinct string values,
say value1, value2, value3 and value4. Let us assume further that when
alphabetically sorted, they are Value3, Value1, Value2, Value4.
(Value3 is a string which stands first alphabetically or has the
lowest value). Three dummy variables, Addr_value1, Addr_Value2,
Addr_value4 will
be created. Addr_value1 will be equal to 1 only when Addr =
"value1" and will be equal to zero otherwise. All the other
dummy variables will behave in the similar fashion.
This way the
variable Addr is converted into numeric binary variables. All the
three dummy variables will be equal to zero when Addr="Value3".
XLMiner™ avoids creating four dummy variable here because Addr="Value_3"
is obvious when all the other variables are equal to zero. Thus the
column for lowest value of Addr is not created. This
procedure can be applied to all the non-numeric categorical variables
in the dataset at a time (ie. if they have less than 30 distinct
values.)
-
Create
Category Scores : All the distinct values in a string
variable are sorted in this procedure and the numbers 1,2, 3 .. are
assigned to them starting from the string of lowest value. These
assigned numeric values are stored in a newly created variable. Thus
the string variable is converted into numeric, categorical variable.
XLMiner™
can assign numbers 1, 2, 3... or 0, 1, 2.. as per the
user's requirements.
See
also:
|