Contents

 

Hierarchical Clustering

 

Using Hierarchical Clustering in XLMiner™:

In  XLMiner™, select Data Reduction and Exploration -> Hierarchical Clustering.  Specify the data range that needs to be processed, and move the variables that are of interest into the selected variables list.

Data Type : Hierarchical clustering method can be used on Raw data as well as the data in Distance Matrix format. Choose the appropriate option.

Click Next and the following dialog comes up where you choose the clustering method (see the introductory section for a review of the various methods).

Normalize input data:  Normalizing the data is important to ensure that the distance measure accords equal weight to each variable -- without normalization, the variable with the largest scale will dominate the measure.

Similarity Measures : The Hierarchical clustering uses the Euclidean Distance as the similarity measure for working on raw numeric data. When the data is binary the other two options, Jaccard's coefficients and Matching coefficient are activated.

Suppose we have binary values for all the xij ’s and for individuals i and j we have the following 2 × 2 table:

The most useful similarity measures in this situation are :  

Jaccard’s coefficient  = d/(b+c+d). This coefficient ignores zero matches.

The matching coefficient, = (a + d)/p.

So if the data is binary, choose the similarity measure that is appropriate for the application.

Click Next, and the following dialog appears where you choose the output desired. 

Show cluster membership: Check this to display the cluster number (ID) to which each record is assigned by the routine. 

# Clusters:  Recall that the agglomerative method of hierarchical clustering keeps forming clusters until only one cluster is left.  This option lets you stop the process at a given number of clusters.

See also