|
Example:
Data
Size: Different versions of
XLMiner™ have varying limits on size of data. The size of data depicted
in the example below may not be supported by your version. Refer to Data Handling
Specifications
for details.
While
handling a time series situation, we can observe the following steps in
general.
-
Partition
the data : Partitioning the data into 60% Training and 40% Validation
generally helps to go ahead.
-
We
apply the exploratory techniques like ACF and PACF and view the plots for
Training and Validation. If they are in synch then it is worth going ahead
for model fitting. If these plots are same then we can apply the same model
to both the training and validation sets. The plots also show if there is a trend and / or
seasonality in the data. Thus ACF and PACF plots help us to tentatively
identify the model.
-
We
use ARIMA technique to fit a model.
-
When
we fit a model, XLMiner™ displays the ACF and PACF plots for residuals. If
these plots are in the band of UCL and LCL then it indicates that the
residuals are random and the model is adequate.
-
If
the residuals are not in the band, then some
correlation exists between them, so we may need to improve the
model.
Open
the data set Income.xls. Let us do a step by step model fitting and see the
outputs. This data set contains an yearly record of the statewise personal
incomes. Let us do partitioning as shown below.
-
Invoke
XLMiner --> Time Series --> Partition Data . Make the following
settings --

-
In
the dialog shown above we have selected the options for specifying the
#records. It also shows the %ge -- 70% and 30%.The numbers we have selected
here are just for getting, sort of "near ideal" output in
the given situation. That will enable us see the various qualities and
properties of the techniques we are going to use. See the output below.

See
that the partitioning is sequential here, not random. First 50 observations
go to Training set and the rest to Validation set.
-
Let's see what
inputs we get on using the exploratory techniques.

Invoke
XLMiner --> Time Series --> ACF (Autocorrelations).. Select CA as the
variable. Enter the lags as shown. Check for plotting charts.
Click
OK for getting output. See the ACF plots for Training and Validation set. We
observe the following :

-
Both
the charts exhibit a similar pattern. This means we can fit the same
model to both the sets.
-
The
ACF function shows a definite pattern, it decreases with the lags. This
means there is a trend in the data. Since the pattern does not repeat ,
we can conclude that the data does not show any seasonality.
-
Invoke
Time Series --> PACF (Partial Autocorrelations). Do
the following settings in PACF dialog.
-
Select
OK and see the output. The PACF plots also show the similar pattern in
training and validation sets.

-
Both
the charts exhibit a similar pattern. This means we can fit the same
model to both the sets.
-
The
PACF function shows a definite pattern, it decreases with the lags. This
means there is a trend in the data. Since the pattern does not repeat ,
we can conclude that the data does not show any seasonality.
-
From
ACF and PACF we can conclude that ARIMA (1,1,0) will be the best candidate.
-
Let
us now try fitting ARIMA (1,1,0). Invoke Time Series --> ARIMA. Enter the following
parameters.
Select
Advanced .. and enter the following settings.
Select
OK. Select
OK for the first ARIMA dialog. See the output.
XLMiner™
calculates and displays various parameters and charts in two output
sheets. Let us see and interprete a few outputs.
XLMiner™
has calculated the constant term and the AR1 term for our model, as seen
above. These are the constant and f1
terms of our forecasting
equation. See the following output of the Chi - square test.
We
can conclude that the fit is good since the p-value is insignificant.
XLMiner™
provides some more information in addition to these parameters.
Open
the sheet ARIMA_Residuals1. The table presents the actual and fitted values
and the resulting residuals. See the Time plot of Actual Vs Forecast
Here
we see that the Actual and Forecasted values greatly match. The usefulness of the model in forecasting will depend upon how close
the actual and forecasted values are in the Time plot of validation set
which we will see later. Let
us take a look at the ACF and PACF plots for residuals.
 Most
of the correlations are within the UCL and LCL band. This indicates that the
residuals are random, they are not correlated. This is one indication in
turn, that
ARIMA (1,1,0) is adequate for this data.
Open
the sheet ARIMA_Output1. See the Forecast table.
The
table shows the actual and forecasted value. The "Lower" and
"upper" values represent a band around the forecast with 95%
confidence interval in which the actual value may lie. Let
us take a look at the Time plot below. It is plotted with the values in the
table above. It indicates how the model which we fitted using Training data
works on validation data.
The
actual and forecasted values are fairly close. This is
a confirmation that our model is good for forecasting. If we plot the values under
the "Lower" and "Upper" column in the same chart (using
the EXCEL™ chart facility) we get
the following.
The
plot shows that the Actual values lie well inside the band created by the
upper and lower values in the table, almost always at the center of the band.
So we have fitted an adequate model.
Seasonal
Data :
What
if the data show seasonality? Consider the data set Airpass.xls. It consists
monthly records of the no. of passengers in an airline. The data are bound to be
seasonal. The airtraffic increases in the vacation months. It is moderate in
other seasons. Let us try to fit seasonal model using this data set.
-
Select
XLMiner --> Time Series --> Partition Data.
Here
we select the options to specify #records. XLMiner™ displays the %ge in
training and validation set just besides the #records.
Select
OK and see the output. As we did in the earlier example, let us find the ACF
and PACF functions. Enter the settings shown below.
Select
OK. See the outputs.
The
plots clearly show a repetition in the pattern indicating that the
data are seasonal, there is periodicity after every 12 observations, ie they
show seasonality and trend in the data. Let us view the PACF charts.

The PACF
plots also show seasonality, trend .The plots for Training set and
Validation set are similar. We can try applying the
same seasonal model to Training and Validation set.
-
We
will try to fit a seasonal model to this data. Let us try applying ARIMA (1,
1, 0) (1, 1, 0)12. This means that we are applying a
seasonal model with period =12. Selection of the value of period depends on
nature of data. In this case we can make a fair guess that the # passengers
will be more in holidays, ie every 12 months. So we select period = 12.
Select
Advanced...
Enter
the following parameters in Advanced box.
-
Select
OK. Select
OK for the first ARIMA dialog. See the output.
XLMiner™
calculates and displays various parameters and charts in two output
sheets. Let us see and interpret a few outputs.
XLMiner™
has calculated the constant term and the AR1 term for our model, as seen
above. These are the constant and f1
terms of our forecasting
equation. See the following output of the Chi - square test.
Open the sheet ARIMA_Residuals1.
The
actual and forecasted values match very well in the time plot
above.

Most
of the residuals are within the UCL and LCL band. This indicates that the
residuals are random, they are not correlated. This is one indication that
the fit is quite good.
Open
the sheet ARIMA_Output1. See the Forecast table.
The
table shows the actual and forecasted value. The "Lower" and
"upper" values represent a band around the forecast with 95%
confidence interval in which the actual value may lie. Let
us take a look at the Time plot below. It is plotted with the values in the
table above. It indicates how the model which we fitted using Training data
works on validation data.
The
actual and forecasted values are fairly close, though not as well as in the
earlier example. Still, we can conclude that this
model fits well. If we plot the values under
the "Lower" and "Upper" column in the same chart we get
the following.
The
plot shows that the Actual values lie well inside the band created by the
upper and lower values in the table, almost always at the center of the band.
So we
have fitted an adequate model.
Smoothing
:
Smoothing
techniques are useful because they are non parametric. By fine tuning the values
of a, b
and g we
can see the output easily. We will view an example in which we will use the Holt
Winters' smoothing. Let's
use the same Airpass.xls. Use the same method of partitioning as in the example
above.
Invoke
Time Series --> Smoothing --> Holt
Winter --> Multiplicative.. Let us try some tried and tested values for
a, b
and g so that we can see a good fit. These values are arrived at
by trial and error. You can try some more values later. Enter the following specifications.
Select
OK. We receive a sheet with various parameters, forecasts and error measures.
See the forecast.
The
errors are considerably less than the earlier example. See the Time plot of
Actual Vs Forecast.
The
Actual and Forecasted values are matching very well, better than in our ARIMA
example! This is possibly the best fit.
The
error measures also give us an idea how good the fit is. See the following error
measures :

See also
|