|
System
Requirements
Hardware
requirements:
Minimum:
Pentium 133 MHz
processor.
1
GB hard drive with a minimum of 50 MB free disk space.
32
MB RAM
Recommended:
Pentium 200 MHz processor
(or higher).
1
GB hard drive with minimum 60 MB free disk space.
64
MB RAM (the more the better).
Software
requirements:
Operating
system: Microsoft Windows NT 4.0 / 2000/ XP.
Microsoft
Excel 2000/XP (Please note XLMiner™
will not work on Microsoft Excel 97).
|
|
Installing
XLMiner™
To
install XLMiner™, insert the XLMiner™ CD and click on "setup.exe" file from Windows
Explorer. XLMiner™
installation dialog boxes will guide you through the installation procedure. Default path of installation is
<Drive>:\Program Files\XLMiner. It is assumed in this documentation that this is the path
to which XLMiner™
was installed.
You
can change this default path to specify a different location where you want
XLMiner™
to be
installed. After installing
XLMiner™
, the
XLMiner™ program group appears under
Start --> Programs--> XLMiner.
Now,
you can either invoke XLMiner™ directly or select the option to Register XLMiner™
as an Add-in.
Precautions:
-
While installing
XLMiner™
on Microsoft Windows NT,
2000 or XP, you may require administrative privileges. Once
XLMiner™
is
installed, you no longer need to have administrative privileges. Any user
can use
XLMiner™
.
The
folder in which XLMiner™
is installed and all its subfolders should have READ/
WRITE privileges for the software,
irrespective of which user is logged in.
Do not use
worksheet names with leading or trailing spaces. This leads to ambiguity as there might be more
than one sheet names with one or more leading or trailing spaces. Using such
names will lead to an error. However, sheet names with spaces embedded
within (not leading or trailing) are allowed.
Notes:
-
XLMiner™
appears as an item in the menu after the first time registration as an
Add-in. So you can use XLMiner™ thereafter by just invoking Excel. See
the figure below. (Remember, the edition indicator along with the name
XLMiner™ can be different depending upon the edition you have).
-
Alternatively,
in Explorer, you can locate and double-click the file XLMiner.xla in the
location where XLMiner™ has been installed, or,
-
XLMiner™
can also be invoked by the usual method - Start => Programs =>
Register XLMiner as an Add-in.
|
|
Running
XLMiner™ Procedures
Open Excel and locate the
XLMiner
menu in the top
toolbar and select it..
By clicking on the appropriate menu item, you can run any of XLMiner's procedures.
Entering the licence key
To enter the
Licence key after it is sent to you through sales@xlminer.com,
invoke "About XLMiner" from the XLMiner™ menu.
Select "Enter Licence
key". You see the following dialog.
Enter the user name and the key
supplied to you (the best way to enter the key is copy it and paste so that
there is no scope for an error). Select continue. XLMiner™ is all set for you for the set
duration.
|
|
Data types supported in
XLMiner™
Method
|
Output
variable
|
Input
variables
|
|
Continuous
|
Categorical
|
Continuous
|
Categorical
|
|
Ordinal
|
Nominal
|
Ordinal
|
Nominal
|
|
Multiple
Linear Regression
|
Y
|
N
|
N
|
Y
|
Y
|
N
|
|
k-Nearest
Neighbor Prediction
|
Y
|
N
|
N
|
Y
|
Y
|
N
|
|
Regression
Tree
|
Y
|
N
|
N
|
Y
|
Y
|
N
|
|
Discriminant
Analysis
|
N
|
Y
|
Y
|
Y
|
Y
|
N
|
|
Logistic
Regression
|
N
|
Y
(Dichotomous)
|
Y
|
Y
|
Y
|
N
|
|
Classification
Tree
|
N
|
Y
|
Y
|
Y
|
Y
|
N
|
|
Naïve
Bayes
|
N
|
Y
|
Y
|
N
|
Y
|
Y
|
|
Neural
Networks Classification
|
N
|
Y
|
Y
|
Y
|
Y
|
N
|
|
Neural
Networks Prediction
|
Y
|
N
|
N
|
Y
|
Y
|
N
|
|
k-Nearest
Neighbor Classification
|
N
|
Y
|
Y
|
Y
|
Y
|
N
|
|
Association
Rules
|
NA
|
NA
|
NA
|
N
|
Y
(Binary Only)
|
Y
|
|
Principal
Component Analysis
|
NA
|
NA
|
NA
|
Y
|
Y
|
N
|
|
k-Means
Clustering
|
NA
|
NA
|
NA
|
Y
|
Y
|
N
|
|
Hierarchical
Clustering
|
NA
|
NA
|
NA
|
Y
|
Y
|
N
|
Legend: Y -
Supported; N - Not- Supported; Ordinal
: Ordinal includes binary (0/1) variables.
Note:
- Ordinal
means numerical and categorical. For example (0, 1, 2) is ordinal but (a, b,
c)
is not.
- Dichotomous
strictly means (0, 1).
Sample datasets with
XLMiner™
XLMiner™ comes with a collection of sample
datasets in the "Datasets"
folder, which is in the base directory where
XLMiner™
is installed. These
datasets are Excel files. A description of some datasets is provided in the table below.
XLMiner™
provides extensive online help with examples for each of the procedures. The sample datasets have been used in these
examples.
Despite the small size of some of
these datasets, they are still useful illustrations of data mining
techniques. In many cases, huge datasets can be effectively dealt with by
sampling a portion of the data to build a model, then applying the conclusions
from the model to the entire dataset.
|
Dataset
Name |
Filename |
Description |
| Associations
Dataset |
Associations.xls |
This
dataset is a subset of the Charles Book Club Dataset mentioned below.
It is used to study the Association rules. |
| Boston
Housing Dataset |
Boston_Housing.xls |
This dataset contains information collected by the
US Census Bureau concerning housing in the area of Boston Massachusetts. It was obtained from the StatLib archive
(http://lib.stat.cmu.edu/datasets/boston). The dataset has 506 cases.
The data were originally published by Harrison, D. and Rubinfeld, D.L.
"Hedonic prices and the demand for clean air", J. Environ. Economics &
Management, vol.5, 81-102, 1978.
There are 14 variables:
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000
sq.ft.
INDUS proportion of non-retail business acres per town.
CHAS Charles River dummy variable (1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment
centers
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000's |
| Charles
book club |
Charles_BookClub.xls |
The
Charles Book Club (“CBC”) was established in December of 1986, on
the premise that a book club could differentiate itself through a deep
understanding of its customer base and by delivering uniquely tailored
offerings. CBC focused on selling specialty books by direct marketing
through a variety of channels, including media advertising (TV,
magazines, newspapers) and mailing. CBC is strictly a distributor and
does not publish any of the books that it sells. In line with its
commitment to understanding its customer base, CBC built and
maintained a detailed database of its club members. Upon enrollment,
readers were required to fill out an insert and mail it to CBC.
Through this process, CBC has created an active database of 500,000
readers. CBC acquired most of these customers through advertising in
specialty magazines.
This Dataset is part of a case prepared by Ms.
Vinni Bhandari, a data mining consultant and Dr. Nitin Patel, a
visiting professor of Operations Research at the MIT Sloan School of
Management. The case has been derived from a Case Study in Database
Marketing titled 'BBB - The Bookbinders Club' prepared by Nissan Levin
and Jacob Zahavi, Tel Aviv University for the Direct Marketing
Educational Foundation, Inc. (March 1995).
The problem : CBC sent mailings to its club members each month containing its latest
offering.
On the surface, CBC looked like they were very successful, mailing
volume was increasing, book selection was diversifying and growing,
their customer database was increasing; however, their bottom line
profits were falling. The decreasing profits led CBC to revisit their
original plan of using database marketing to improve its mailing
yields and to stay profitable.
|
|
Digits data |
Digits.xls |
This
dataset was used as an example to illustrate Classification trees in
the book CART by Leo Breiman et al. The variables x1,...,x7 indicate
the state of their corresponding LED's in a seven segment display. The
variable y takes the desired value of the integer to be displayed. The
data was generated such that the probability an LED is in the correct
state (0 or 1) is 0.9.
For full reference see Leo Breiman, Jerome
Friedman, Charles J Stone, R.A. Olshen. Classification and Regression
Trees. Chapman & Hall / CRC. 1984.
x1,...,x7: State of corresponding LED
in a seven segment display.
y: Desired integer to be displayed. |
|
Fisher's Iris data |
Iris.xls |
The
Iris dataset was introduced by R. A. Fisher as an example for
discriminant analysis. The data report four characteristics (sepal
width, sepal length, petal width and petal length) of three species of
Iris flower.
The data originally appeared in
Fisher, R. A. (1936). "The Use of Multiple Measurements in Axonomic
Problems," Annals of Eugenics 7, 179-188.
- Species_No: Flower species as a
code
- Species_Name: Species name
- Petal_Width: Petal Width
- Petal_Length: Petal Length
- Sepal_Width: Sepal Width
- Sepal_Length: Sepal Length
|
| Flying
fitness of pilots |
Flying_Fitness.xls |
The data are on test results on
flying fitness tests for 40 pilots (var 1). There are six other categorical variables (named var2,through var6) indicative of the performance of the pilots on various physical and psychological tests.
Output Variable: Var 1
Input variables: Var 2 to Var 6
No. of classes: 2 |
| Utilities |
Utilities.xls |
This
dataset gives corporate data on 22 US public utilities. The goal is to
form groups of similar utilities. The objects to be clustered are the
utilities. There are 8 measurements on each utility described below.
An example where clustering would be useful is a study to predict the
cost impact of deregulation. To do the requisite analysis economists
would need to build a detailed cost model for each firm. It would save
a considerable amount of time and effort if we could cluster similar
types of utility firms, build detailed cost models for just one ”typical”
firm in each cluster, then scale up from these models to estimate
results for all the utility firms. The objects to be clustered are the
utility firms and there are 8 measurements on each utility.
This dataset can be found in Dean W.
Wichern & Richard Arnold Johnson, Applied Multivariate Statistical
Analysis, Prentice Hall, 5th edition (2002)
X1: Fixed-charge
covering ratio (income/debt)
X2: Rate of return on
capital
X3: Cost per KW
capacity in place
X4: Annual Load
Factor
X5: Peak KWH demand
growth from 1974 to 1975
X6: Sales (KWH use
per year)
X7: Percent Nuclear
X8: Total fuel costs
(cents per KWH) |
| Wine
dataset |
Wine.xls |
The
wine dataset contains properties of wine taken from three different wineries in the same region. There are 13 variables describing various properties of wine and 3 classes. This
dataset can be used for classification with Type as an output variable.
It can also be used (omitting the Type variable) to perform clustering.
This dataset can be found in the UCI
Machine Learning Repository (http://www.ics.uci.edu/~mlearn/MLSummary.html
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/wine/)
|
| DMEF
Dataset |
Catalog_multi.xls |
A
data set provided by Direct Marketing Educational Foundation, a
response to direct mail offer. Several variables are used as
information parameters about the customer and a binary output variable
shows if the customer is a buyer.
|
|