## Friday, April 22, 2011

### ARIMA Model Forecast of Annual Fertilizer Use in KY

I recently found some data on annual fertilizer use (measured by an index) in KY. This looked like a good series to try out  time series analysis in R (see program documentation for data source)as well as provide a basic demo for my earlier post on time series analysis.

The series is fertilizer use in Kentucky from 1960-2004, and is plotted below:

A look at the autocorrelation plot indicates a sinusoidal decay toward zero, which indicates an autoregressive process and partial autocorrelations show a cut off after the first lag, representing a possible AR(1) process(Newbold & Bos, 1990).

Fitting the ARMA(0,0,1) model in R based on values through 2003, yields a forecast of 2.447325 for 2004 vs. the actual observed value of 2.49. This is a much closer estimate than the naive estimate based soley on the previous value for 2003, which was 2.63. The forecast for 2005 is 2.298228, for which I have no data to verify. It would be interesting to build a more complex model that incorporates additional data, perhaps a vector auto-regression.

R code for for this post:
```# *------------------------------------------------------------------
# |  forecast for annual fertilizer use in KY (based on index)
# *-----------------------------------------------------------------

# data source: http://www.ers.usda.gov/Data/AgProductivity/

# read in year/time variable
yr <- c(1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004)
# read in fertilizer index
fertilizer <- c(1.4279, 1.4114, 1.5069, 1.6599, 1.8183, 1.8229, 2.1135, 1.9576, 1.3736, 1.5381, 1.5041, 1.5090, 1.4749, 1.6329, 1.9789, 1.8462, 2.0444, 1.9997, 1.9384, 1.8392, 2.1652, 1.9528, 1.4552, 1.3719, 1.3323, 1.3283, 1.5702, 1.5339, 1.4294, 1.4010, 1.4534, 1.4372, 1.4505, 1.5637, 1.4135, 1.6958, 1.7138, 1.7090, 2.3354, 2.4032, 2.7095, 1.9193, 2.3007, 2.6395, 2.4940)

kydat <- data.frame(yr,fertilizer) # create data frame, necessary for subsetting by yr
names(kydat) # list var names in data frame
dim(kydat)  # rows and columns
kydat2 <- kydat[kydat\$yr<2004,] # subset to remove last 5 values for forecasting
dim(kydat2) # rows and columns
ky_ftlzr <- ts(kydat2) # convert to a time series object
plot(ky_ftlzr[,2]) # plot the series objects
acf(ky_ftlzr[,2], 40,xlim=c(1,40)) # autocorrelation plot- sinusoidal decay to zero -indicates autoregressive process
pacf(ky_ftlzr[,2],40, xlim=c(1,40)) # consistent with a AR(1) model

# model
(x.fit = arima(ky_ftlzr[,2], order = c(1,0,0))) # fit a ARIMA(0,0,1)) AIC = 5084.73
tsdiag(x.fit, gof.lag = 20) # ARIMA model diagnostics
x.fore = predict(x.fit, n.ahead=2) # forecast the next 5 values
print(x.fore) # 2.447325 2.298228

# compare these values to the observed value
print(kydat[kydat\$yr>=2004,]) # 2004 = 2.49```
Created by Pretty R at inside-R.org