Saturday, January 21, 2012

Time Series Intervention Analysis with R and SAS

In a previous post, I worked through the theory behind intervention analysis. In his time series course, University of Georgia political science professor Jamie Monogan demonstrates how to implement intervention analysis in R.  The following example is from this course. It investigates the impact of the terrorist attacks of 911 on president Bush's approval ratings. An excerpt from the data set follows:


   year month  approve  t s11
1  2001     1 45.41947  1   0
2  2001     2 55.83721  2   0
3  2001     3 55.91828  3   0
4  2001     4 58.12725  4   0
5  2001     5 55.79231  5   0
.
.
.
etc


Note in the plot of the series above, observation 10 represents October 2001, post 9/11/2011. It appears that there is a drastic shift in the series, that slowly decays and eventually returns to previous levels. This may indicate an intervention model with a pulse function.  Following the Box-Jenkins methodology, an ARIMA(1,0,0) model with the intervention can be specified in R as follows:

model <- arimax(bush$approve, order=c(1,0,0), xtransf=bush$s11, transfer=list(c(1,0)))

where:
bush$approve = Yt , Bush's approval ratings
order =c(1,0,0) indicates and AR(1) process
xtransf=bush#s11 is the intervention series, flagging Pt = 1 to indicate September 11, 2001
transfer = list(c(1,0))  indicates the functional form of the transfer function. The first parameter indicates the denominator, while the second indicates a numerator factor. As specified in the code above the transfer function is specified as follows:

[w0/ 1- δB ]

Based on SAS documentation (http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/viewer.htm#etsug_arima_sect014.htm )  this could also be specified in SAS via PROC ARIMA as follows:

   proc arima data=bush;
      identify var=approve crosscorr=s11;
      estimate p =1  input=( / (1) s11 );
   run;

Using R, the estimated model coefficients are:
Coefficients:
         ar1  intercept  T1-AR1   T1-MA0
      0.8198    58.2875  0.8915  23.6921
s.e.  0.1184     4.6496  0.0166   3.9415

Based on the output above, w0 = 23.6921 and δ = .8915. The plot below shows the fitted values which look very similar the to graphs for this specification shown in the previous post.


The code below is was used to conduct this analysis, and is excerpted from Jamie Monogan's original R program.
#   ------------------------------------------------------------------
#  | PROGRAM NAME: R_ARIMAX
#  | DATE:1/21/12
#  | CREATED BY: Matt Bogard 
#  | PROJECT FILE: /Users/wkuuser/Desktop/Briefcase/R Programs              
#  |----------------------------------------------------------------
#  | PURPOSE: illustration of ARIMA model with transfer function/intervention series               
#  |
#  |
#  | 
#  |------------------------------------------------------------------
#  | COMMENTS: R code and data originally sourced from : http://monogan.myweb.uga.edu/teaching/ts/index.html
#  |------------------------------------------------------------------
 
#libraries
rm(list=ls())
library(foreign)
library(TSA)
 
setwd('/Users/wkuuser/Desktop/briefcase/R Data Sets') # mac 
 
#load data & view series
 
bush <- read.dta("BUSHJOB.DTA")
names(bush)
print(bush)
 
plot(y=bush$approve, x=bush$t, type='l')
 
#identify arima process
acf(bush$approve)
pacf(bush$approve)
 
#estimate arima model
mod.1 <- arima(bush$approve, order=c(0,1,0))
mod.1
 
#diagnose arima model
acf(mod.1$residuals)
pacf(mod.1$residuals)
Box.test(mod.1$residuals)
 
#Looks like I(1). Note: There is a theoretical case against public opinion data being integrated.
 
#estimate intervention analysis for september 11 (remember to start with a pulse
 
mod.2 <- arimax(bush$approve, order=c(0,1,0), xtransf=bush$s11, transfer=list(c(1,0))); mod.2
 
summary(mod.2)
 
#Notes: the second parameter directs the numerator. If 0 only, then concurrent effect only. The first parameter affects the denominator. c(0,0) replicates the effect of just doing "xreg"
#Our parameter estimates look good, no need to drop delta or switch to a step function.
 
#Graph the intervention model
y.diff <- diff(bush$approve)
t.diff <- bush$t[-1]
y.pred <- 24.3741*bush$s11 + 24.3741*(.9639^(bush$t-9))*as.numeric(bush$t>9)
y.pred <- y.pred[-1]
plot(y=y.diff, x=t.diff, type='l')
lines(y=y.pred, x=t.diff, lty=2)
 
#suppose an AR(1) process
mod.2b <- arimax(bush$approve, order=c(1,0,0), xtransf=bush$s11, transfer=list(c(1,0))); mod.2b
y.pred <- 58.2875 + 23.6921*bush$s11 + 23.2921*(.8915^(bush$t-9))*as.numeric(bush$t>9)
plot(y=bush$approve, x=bush$t, type='l')
lines(y=y.pred, x=bush$t, lty=2)
Created by Pretty R at inside-R.org

6 comments:

  1. This is very helpful! Thanks!

    ReplyDelete
  2. For anyone trying to follow along at home, you can get the approvals data set using the following command:

    require(foreign)
    bush <- read.dta("http://monogan.myweb.uga.edu/teaching/ts/BUSHJOB.DTA")

    In case that file ever goes offline, I've archived it with WebCite:

    require(foreign)
    bush <- read.dta("http://www.webcitation.org/67mAwDiBA")

    ReplyDelete
  3. Thank you at first, this post and the previous one is very helpful to understand the arima-intervention analysis. But I have two questions:

    1) I didn't get how to calculate the coefficients of y.pred function in this post's code part:

    y.pred <- 24.3741*bush$s11 + 24.3741*(.9639^(bush$t-9))*as.numeric(bush$t>9)

    I mean, where the 24.3741 coefficient come from? And also .9639 coefficient? I didn't calculate them when I try to replicate bush's approval example.

    2) How about arima (1,0,3) models? How can we apply intervention analysis into ar(2) or ma(3) or more complicated models? Is "transfer=list(c(1,0))" implies an ar(1) process? When I try "transfer=list(c(1,3))" code to my arima(1,0,3) model I meet an error says: "Error in optim(init[mask], armaCSS, method = "BFGS", hessian = FALSE, : initial value in 'vmmin' is not finite"

    Thank you again.

    ReplyDelete
  4. I really appreciated this excellent exposition, but I'm also unable to reproduce the coefficients in the call to arimax. I'm wondering if the dataset you used was modified. Using the BUSHJOB.dta files from Professor Monogan's site, the ARIMA 0,1,0 model (mod2) gives me an error message and the AR(1) model (mod 2b) yields the following prediction equation:
    y.pred <- 56.0327 + 27.6660*bush$s11 + 27.6660*(0.8984^(bush$t-9))*as.numeric(bush$t>9), which is the same as Professor Monogan's results. I know it's a minor detail, and thanks again for a nice review.

    ReplyDelete
  5. I still don't understand how can I tell whether 's11' has any impact on Bush approval rating. Is it determined by the p value of 'T1-AR1'?

    ReplyDelete