Resampling Stats
XLMiner

Order
order online
printed form

Software
Capabilities
Download
User Guide
Technical Specifications

Support Tech support
Other Resources
Bug reports & Patches
Freq. Asked Questions
Teach with XLMiner

About Contact


XLMiner Capabilities

XLMiner provides a comprehensive set of analysis features based both on statistical and machine learning methods. A problem or a data set can be analyzed by several methods. It is usually a good idea to try different approaches, compare their results, and then choose a model that suits the problem well.

Limits

XLMiner can work with large data sets which may exceed the limits in Excel. A standard procedure is to sample data from a larger database, bring it into Excel to fit a model, and, in the case of supervised learning routines, score output back out to the database. In the standard edition of XLMiner, this feature is supported for Oracle, SQL Server and Access databases. This feature is not available in the education or free web trial versions. The free web trial demo version handles a maximum of 200 records per partition. More detailed information on XLMiner's capabilities and limits is available here.

Operations

There are five broad groups of operations in XLMiner:

(click on a link to see the online help pages for a topic)

Partitioning A data set with known values of an outcome (response) variable is necessary to train a data mining model. For training a model, we usually choose (at random) a fraction of the available data -- the training partition. Trained models can then be applied to another partition -- the validation partition --  of the same data set to see how well they do with data that they were not trained with.  In this phase, models can be adjusted and the best performing model selected.  After a final model is selected, it can be applied to a third partition -- the test partition -- to test how well the final model will do with data that have been used neither in testing nor in validation. XLMiner also supports partitioning with oversampling, used when rare events are modeled and you need to assure an adequate supply of those events in the modeling process.

More on partitioning.
Partitioning with oversampling.

Classification When the outcome variable is discrete or categorical, the objective of the data mining exercise is to classify the records into the discrete classes or categories.

XLMiner offers several techniques for classification:

Prediction When the outcome variable is continuous, the objective is to predict the value of the outcome variable for each of the data records.

XLMiner offers the following methods of prediction:

Affinity Analysis Some problems involve detecting association among the properties of data records. XLMiner supports generation of Association Rules for showing which attributes of the data occur frequently together. One common application is to determine groups of products customers are likely to buy together, also known as Market Basket Analysis.
Time Series XLMiner offers time series forecasting, with the exploratory techniques ACF (Autocorrelation function) and PACF (Partial autocorrelation function), smoothing techniques (moving average, exponential, double exponential and Holt-Winters), as well as ARMA and ARIMA modeling.
Data Reduction and Exploration It is often useful or necessary to reduce the dimensionality of data into only a few attributes that matter more than others. In this situation, we do not attempt to classify or predict an outcome variable. Instead, the objective is to discover similarities in records and group them together using the available attributes (variables).

One such method involves deciding which variables matter most in explaining differences among records. Other methods categorize data into clusters that can be represented as a new categorical variable added to the data.

XLMiner supports the following methods of data exploration and reduction:

Output presentation and graphics

XLMiner provides special graphics to enhance the understanding of the data and the analysis outcomes. For instance, tree diagrams in classification and regression trees, and dendrograms in hierarchical clustering give very useful insights.

In conjunction with XLMiner outputs, you can use Excel's built-in features to work with the output. For instance, histograms, scatter plots and bubble plots are very useful to provide an insight into the data and the fitted outcomes. Lift charts and gain charts can be easily generated from XLMiner outputs to see the benefit produced by the data mining exercise.

© 2006 Resampling Stats, Inc.
Visit Resampling Stats