Open the file Wine.xls in Excel. The figure below shows the
data set, where each row represents a sample of wine taken from one of three wineries
(A, B and C). In this problem, the Type variable representing
the winery is ignored and the clustering is performed simply on the basis of
the properties of the wine samples (the remaining variables).

In
XLMiner™, select Data Reduction and Exploration --> k-Means
clustering. The following dialog box will appear:

Data
Range: Specifies the range of input data used for partitioning.
XLMiner™
automatically picks active data range. You can also enter the range
address, or select it with the mouse.
Variables:
This box lists all the variables present in the dataset. If the "First row
contains headers" box is checked, the header row above the data is used to
identify variable names.
Select all the variables except Type.
Click on Next to advance to the
second dialog box.

Normalize input Data: Normalizing
the data is important to ensure that the distance measure accords equal
weight to each variable -- without normalization, the variable with the
largest scale will dominate the measure.
# Clusters: Select the number of final
clusters to be formed. This is actually the
parameter k in the k-means clustering. The number of clusters should be at
least 2 and at most the number of observations in the data range.
Set this value based on your best estimate of how many clusters there will be;
it is a good idea to repeat the procedure with several different values.
# Iterations: This determines how
many times the program will start with an initial partition and follow through
with the clustering algorithm. The configuration of clusters (and how good
a job they do of separating the data) may differ from one starting partition to
another. The program will go through the specified number of iterations,
and select the cluster configuration that minimizes the distance measure.
Options : With Fixed start,
XLMiner™ starts building the model with a single fixed starting point.
If we select Random starts the algorithm starts at any random point.
You have to specify the No. of starts and
XLMiner™ generates as many cluster sets. It
decides which is the best one and releases the output generated using
the best cluster set . We also have the
option of fixing the seed when we select Random starts.
For this example, we will do the following
settings.

Select these options to display the corresponding output.

Click on Finish button to get the results. You will see:


XLMiner™ calculates the sum of square
distances and decides the Best start. It then generates the further outputs
taking the Best start as the starting point.

In the output for "cluster centers"
above, the upper box shows the variable values at the cluster centers. The
lower box shows the distance between those cluster centers.

Data summary shows how many
records (observations) there are in each cluster, and the average distance from
cluster members to the center of the cluster.

The final part of the output, above, shows the
cluster to which each record belongs and its distance to each of the
clusters. Note that, for record 5, the distance to cluster 1 is the
minimum distance, so record 5 is assigned to cluster 1.