XLMiner Capabilities
XLMiner provides a comprehensive set of analysis features based
both on statistical and machine learning methods. A problem or a data
set can be analyzed by several methods. It is usually a good idea to try different approaches, compare their results, and then choose a model
that suits the problem well.
Limits
XLMiner can work with large data sets which may exceed the limits in Excel. A standard procedure is to sample data from a larger database, bring it into Excel to fit a model, and, in the case of supervised learning routines, score output back out to the database. In the standard edition of XLMiner, this feature is supported for Oracle, SQL Server and Access databases. This feature is not available in the education or free web trial versions. The free web trial demo version handles a maximum of 200 records per partition. More detailed information on XLMiner's capabilities and limits is available here.
Operations
There are five broad groups of operations in XLMiner:
(click on a link to see the online help pages for a topic)
| Partitioning |
A data set with known values of an outcome
(response) variable is necessary to train a data mining model.
For training a model, we usually choose (at random) a fraction of
the available data -- the training partition. Trained models can then be
applied to another partition -- the validation partition
-- of the same data set to see how well they do with
data that they were not trained with. In this phase,
models can be adjusted and the best performing model
selected. After a final model is selected, it can be
applied to a third partition -- the test partition -- to test
how well the final model will do with data that have been used
neither in testing nor in validation. XLMiner also supports partitioning with oversampling, used when rare events are modeled and you need to assure an adequate supply of those events in the modeling process.
More
on partitioning.
Partitioning with oversampling. |
| Classification |
When the outcome variable is discrete or
categorical, the objective of the data mining exercise is to classify
the records into the discrete classes or categories.
XLMiner offers several techniques for classification:
|
| Prediction |
When the outcome variable is continuous, the
objective is to predict the value of the outcome
variable for each of the data records.
XLMiner offers the following methods of prediction:
|
| Affinity Analysis |
Some problems involve detecting association among
the properties of data records. XLMiner supports generation of
Association
Rules for showing which attributes of the data occur
frequently together. One common application is to determine
groups of products customers are likely to buy together, also
known as Market Basket Analysis. |
| Time Series |
XLMiner offers time series forecasting, with the exploratory techniques ACF (Autocorrelation function) and PACF (Partial autocorrelation function), smoothing techniques (moving average, exponential, double exponential and Holt-Winters), as well as ARMA and ARIMA modeling. |
| Data Reduction and Exploration |
It is often useful or necessary to reduce the
dimensionality of data into only a few attributes that matter
more than others. In this situation, we do not attempt to
classify or predict an outcome variable. Instead, the objective is to
discover similarities in records and group them together using
the available attributes (variables).
One such method involves deciding which variables matter
most in explaining differences among records. Other methods
categorize data into clusters that can
be represented as a new categorical variable added to the
data.
XLMiner supports the following methods of data exploration
and reduction:
|
Output presentation and graphics
XLMiner provides special graphics to enhance the understanding of
the data and the analysis outcomes. For instance, tree diagrams in
classification and regression trees, and dendrograms in hierarchical
clustering give very useful insights.
In conjunction with XLMiner outputs, you can use Excel's built-in
features to work with the output. For instance,
histograms, scatter plots and bubble plots are very useful to provide
an insight into the data and the fitted outcomes. Lift charts and gain
charts can be easily generated from XLMiner outputs to see the benefit
produced by the data mining exercise.