Sunday, December 26, 2010

Visualizing Taxes and Deficits

There has been a lot of debate about the impact of the early decade tax cuts on economic activity and deficits.

As the chart below depicts, from 2000-2009, we saw drastic increases in revenues (nearly 30% from 2001-2007) in the face of marginal tax cuts. Any deficit that resulted would have to be attributed to expenditures or outlays, and could not be attributed to cuts in marginal tax rates. As the graph shows, outlays also increased during this period, but even more drastically by 46%!


 As the next graphic shows, early on we saw a fairly rapid increase in the budget deficit from 2002-2003, a tapering off from 2003-2004 and  a rapid decline from 2004-2007, by as much as 61%! This is very impressive given the large amounts of spending increases depicted above. If it were not for the large influx of tax revenues during this period (in the face of marginal tax cuts) the deficit likely would have been on the increase vs. the precipitous fall depicted below.



However, on the heals of the financial crisis, going into 2008 & 2009, we start to see declining revenues, and unprecedented increases in spending and the deficit. From 2007 - 2009 we saw an increase in spending by about 28%, and an 88% increase over 2001 levels.  (indicated by the drastic upturn in outlays in the first graph)

But the impacts on the deficit were even more dramatic. From 2007-2008 we saw a 185% increase in the deficit, from 2008-2009 the deficit increased by 208%! Overall, compared to the 2002 levels that was an increase in the deficit of almost 800% over 7 years. If you compare to the 2007 low, considering the drastic reductions in the deficit after the tax cuts,  that is nearly an 800% increase in the deficit in just 3 years!

From 2004-2007 there was a rapid decline followed by a spike in the deficit during 2008 & 2009

Looking at the data, it appears that the reduction in marginal tax rates in the 2000's did not coincide with the rapid increase in the budget deficit that occurred at the end of the decade, but in fact were in step with the very rapid reduction in the budget deficit through 2007.

Most likely the deficit was the result from decreased revenues and increased expenditures associated with the financial crisis, not cuts in marginal tax rates. The real question becomes what was the cause of the financial crisis? There is no macroeconomic theory that I am aware of that links tax rates to business cycles, but many competing theories on business cycles as they relate to monetary policy or shocks to the production function. Pinning the blame for the deficit on one administration or the other during this transition period seems difficult, and a political subject that I am not interested in pursuing on this blog.

This adhoc analysis however does not prove that the effect of marginal tax cuts on the economy as a whole were positive or nrgative. Looking at one or two variables at a time (I haven't even included GDP or unemployment data) leaves one subject to mistakes. Only by building and testing models that specify multiple relationships among variables can you truly gauge the impact of the tax cuts on the deficit and economic output.  Lawrence Lindsey did this in 1987, looking specifically at revenue from income taxes paid by those earning over $200,000. Others have looked at the impact of tax cuts on economic activity, in terms of multipliers, and other research has been done relating taxes, spending, and unemployment (see references below). That is the proper context to view the impact of tax cuts, and as of yet, I am not aware of any empirical work that has been done to formally evaluate the true impact of the latest cuts in marginal income taxes.

References:

Lindsey, Lawrence B. 1987. “Individual Taxpayer Response to Taxcuts, 1982-1984.” J. of Public Economics 33 (July) 173-206

WHY DO EUROPEANS WORK (MUCH) LESS? IT IS TAXES AND GOVERNMENT SPENDING
Economic Inquiry, 2008, vol. 46, issue 2, pages 197-207

Economist Greg Mankiw gives a great review of the empirical work related to tax cuts and spending multipliers here on his blog:  http://gregmankiw.blogspot.com/2008/12/spending-and-tax-multipliers.html

Data Used: U.S. Budget Historical Tables http://www.whitehouse.gov/omb/budget/fy2009/hist.html (accessed Feb 2, 2009)


RECEIPTS  OUTLAYS      DEFICIT
2000 ............................................................................... 2,025,198  1,788,957 236,241
2001 ............................................................................... 1,991,142 1,862,906 128,236
2002 ............................................................................... 1,853,149 2,010,907 –157,758
2003 ............................................................................... 1,782,321 2,159,906 –377,585
2004 ............................................................................... 1,880,126 2,292,853 –412,727
2005 ............................................................................... 2,153,625 2,471,971 –318,346
2006 ............................................................................... 2,406,876 2,655,057 –248,181
2007 ............................................................................... 2,568,001 2,728,702 –160,701
2008 ............................................................................... 2,523,999 2,982,554 –458,555
2009 ............................................................................... 2,104,995 3,517,681 –1,412,686






































































Sunday, December 12, 2010

Visualizing Agricultural Subsidies by Kentucky County


In this post,  I provide results from my first full blown application of R to read, merge, clean, subset, manipulate, analyze and visualize data related to agricultural subsidies by Kentucky counties. This is very similar to the work I do on a daily basis, and was a great test of the capabilities of doing these tasks open source with R.  This is not breakthrough research, and doesn’t really even provide any new insights, but does demonstrate that farm subsidies accrue to those that grow the most crops.  It does demonstrate R’s usefulness in manipulating data sets and visualization tools.

Data

I extracted CSV files from the USDA Agricultural Statistics service related to the production (2008 data) of three major staple commodities grown in Kentucky and number of farm operations by county in Kentucky. I merged these data sets (via a series of left joins) with data from the Environmental Working Group related to subsidy accounts by Kentucky county.  No apparent way was available to extract this data to csv so it was copied, pasted, and cleaned up in excel to produce a csv file.  Additional transformations (conversion from character to numeric etc.) were conducted in R.

The data sets were combined through a series of left joins made possible from the ‘merge’ function in R, creating a data set called KyCropsAndSubsidies.

Overlapping Density Charts

The following is a histogram plotting the log acres planted by county with kernal density estimates for three populations: the entire KY county sample, those  counties receiving above median subsidies, and those receiving below median subsidies. It can be seen that the density curve for those receiving above median subsidies is to the right of the distributions of the population as a whole, and those receiving below median subsidy amounts (indicating more acres planted by this group of producers)

 

Bubble Charts

The following is essentially a scatter plot, plotting  acres planted in each county by the number of farms in each county. The size of each data point reflects the relative amount of subsidies received by all producers in each county. (note some obvious counties like Warren, Barren, Logan are missing due to missing data). It can be seen that the large circles (those counties receiving the most subsidies) float to the top indicating more acres planted. The number of farms per county seems to be unrelated to acres planed and subsidy amounts. 

 
Maps

The data set KyCropsAndSubsidies was merged with a spatial data frame (by county) which contains data related to county boundaries necessary for creating maps and plotting data by county.  The relationship between acres planted and subsidies by county can be seen below. Those counties planting the most acres tend to receive the most in subsidies. It is well known in Kentucky, that most of the row crops are grown in the western part of the state.



 

Code and References:

#   -------------------------------------------------------------
#  | PROGRAM NAME: Bubble_Crops
#  | DATE: 12-4-2010
#  | CREATED BY: Matt Bogard
#  | PROJECT FILE: /Users/user/Desktop/R Programs            
#  |-------------------------------------------------------------
#  | PURPOSE: Intitially to create bubble charts to demonstrate  
#  |  allocation             
#  | of farm subsides to producers by KY county, but expanded to 
#  | include
#  | numerous other visualization tools available using R
#  |
#  |-------------------------------------------------------------
#  | COMMENTS:              
#  |
#  |     1: Final data set for analysis is read in in the section
#  |     labeled 'basic statistical analysis'
#  | 2: In analysis section data for acres planted and subsidies
#  | are rescaled for
#  |     maps generated later in the program
#  |     3: Code for bubble charts adapted from:
#  | http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/
#  |     4: Code for spatial analysis and mapping adapted from 
#  |  Harvard Applied Spatial Statistics workshop at:
#  |        
#  |http://www.people.fas.harvard.edu/~zhukov/spatial.html
#  |     5:
#  |
#  |
#  |-------------------------------------------------------------
#  | DATA USED:               
#  |     
#  |      1) various data sets downloaded from USDA NASS
#  |http://www.nass.usda.gov/#  |
#  |          2) data copied and pasted and formated in excel from EWG 
#  |subsidy data base: http://www.nass.usda.gov/
#  |      3) spatial data frame data used in the Harvard Applied 
#  |Spatial Statistics workshop at:
#  |       
#  | http://www.people.fas.harvard.edu/~zhukov/spatial.html
#  |      4) final data set located in project file:
#  |crops_and_subsidies.csv
#  |
#  |-------------------------------------------------------------
#  | CONTENTS:              
#  |
#  |          PART 1: get data    
#  |          PART 2: merge data sets
#  |          PART 3: recode and aggregate data sets
#  |          PART 4: basic statisticl analysis
#  |          PART 5: densisty plots
#  |          PART 6: bubble charts
#  |    PART 7: spatial analysis- Maps
#  |    
#  |
#  |-------------------------------------------------------------
#  | UPDATES:              
#  |
#  |
#  |
#   -------------------------------------------------------------


setwd('/Users/wkuuser/Desktop/R Data Sets')
rm(list=ls()) # get rid of any existing data
     ls() # view open data sets
    
#  *-------------------------------------------------------------
#  |
#  |
#  |
#  | get data
#  |
#  |
#  |
#  *-------------------------------------------------------------    
#  *-------------------------------------------------------------
#  | get farm # data
#  *-------------------------------------------------------------

farms <- read.csv("KyFarms.csv", na.strings=c(".", "NA", "", "?"), encoding="UTF-8")
names(farms)

# get only the most recentl available year (2007)
farms07 <- farms[farms$Year =="2007",]
dim(farms07) # n=120
print(farms07)
names(farms07)

# keep only variables you want, convert 'Value' to numeric 'Operations' and rename county code for common merge key
farms07 <-farms07[c("Value","County.Code")]
farms07<-transform(farms07, Operations=as.numeric(paste(farms07$Value)))
farms07 <- rename(farms07, c(County.Code="CoFips"))
dim(farms07)
names(farms07)
farms07



#  *-------------------------------------------------------------
#  | get corn data
#  *-------------------------------------------------------------

corn <- read.csv("CR08.csv", na.strings=c(".", "NA", "", "?"), encoding="UTF-8")
names(corn)

# subset- get only KY counties data

kycorn <- corn[corn$State =="Kentucky",]
dim(kycorn) # n=90
print(kycorn$County)
names(kycorn)

# keep and rename only relevant varaibles
# library(reshape)

kycorn <-kycorn[c("County","Yield","Harvested","CoFips","Planted.All.Purposes")]
names(kycorn)
kycorn <- rename(kycorn, c(Yield="cornYield",Harvested="cornHarvested",Planted.All.Purposes="cornPlanted"))
names(kycorn)
kycorn

#  *-------------------------------------------------------------
#  | get soybean data
#  *-------------------------------------------------------------

soybeans <- read.csv("SB08.csv", na.strings=c(".", "NA", "", "?"), encoding="UTF-8")
names(soybeans)


# subset- get only KY counties data

kysoybeans <- soybeans[soybeans$State =="Kentucky",]
dim(kysoybeans) # n=79
print(kysoybeans$County)
names(kysoybeans)

# keep and rename only relevant varaibles

library(reshape)

kysoybeans <-kysoybeans[c("County","Yield","Harvested","Planted.All.Purposes")]
names(kysoybeans)
kysoybeans <- rename(kysoybeans, c(Yield="soybeanYield",Harvested="soybeanHarvested",Planted.All.Purposes="soybeanPlanted"))
names(kysoybeans)
kysoybeans

#  *-------------------------------------------------------------
#  | get wheat data
#  *-------------------------------------------------------------

wheat <- read.csv("AW08.csv", na.strings=c(".", "NA", "", "?"), encoding="UTF-8")
names(wheat)

# subset- get only KY counties data

kywheat <- wheat[wheat$State =="Kentucky",]
dim(kywheat) # n= 70
print(kywheat$County)
names(kywheat)

# keep and rename only relevant varaibles
# library(reshape)

kywheat <-kywheat[c("County","Yield","Harvested","Planted.All.Purposes")]
names(kywheat)
kywheat <- rename(kywheat, c(Yield="wheatYield",Harvested="wheatHarvested",Planted.All.Purposes="wheatPlanted"))
names(kywheat)

#  *-------------------------------------------------------------
#  | get EWG subsidy data
#  *-------------------------------------------------------------

kysubsidies <- read.csv("EWGKYSubsidies.csv", na.strings=c(".", "NA", "", "?"), encoding="UTF-8")
names(kysubsidies)
kysubsidies


#  *-------------------------------------------------------------
#  |
#  |
#  |
#  | merge data sets
#  |
#  |
#  |
#  *-------------------------------------------------------------

# corn and soybeans left join
# note this will keep all corn growing counties (n=90) and add info about soybeans
# this will be the driver file for all subsequent merges, as a result, the staring base
# data is corn growing counties, i.e. the end data set will be based only on counties that
# grow corn in addition to other crops, but as a result counties that do not grow corn will be excluded
# from the analysis

corn_and_soybeans <- merge(kycorn,kysoybeans, by.kycorn=County,by.kysoybeans=County, all=FALSE, all.x=TRUE, all.y=FALSE)
dim(corn_and_soybeans)
names(corn_and_soybeans)
corn_and_soybeans

# all 3 crops- left join with wheat data
allKyCrops <- merge(corn_and_soybeans,kywheat, by.corn_and_soybeans=County,by.kywheat=County, all=FALSE, all.x=TRUE, all.y=FALSE)
dim(allKyCrops)
names(allKyCrops)

# left join with farm operations data on CoFips as key
# since County won't match due to case differences
farmAndCrops <- merge(allKyCrops,farms07, by.allKyCrops=CoFips,by.farms07=CoFips, all=FALSE, all.x=TRUE, all.y=FALSE)
dim(farmAndCrops)
names(farmAndCrops)
farmAndCrops

# crops and subsidies- left join with subsidy data
KyCropsAndSubsidies <- merge(farmAndCrops,kysubsidies, by.farmAndCrops=County,by.kysubsidies=County, all=FALSE, all.x=TRUE, all.y=FALSE)
dim(KyCropsAndSubsidies)
names(KyCropsAndSubsidies)

# quick report
KyCropsAndSubsidies[ c("County","cornPlanted", "soybeanPlanted","wheatPlanted","cornHarvested","soybeanHarvested","wheatHarvested","Operations","Subsidy")]



#  *-------------------------------------------------------------
#  |
#  |
#  |
#  | recode and aggregate variables
#  |
#  |
#  |
#  *-------------------------------------------------------------

# recode missing values-acres planted

KyCropsAndSubsidies$SoybeanAcresPlanted <- ifelse (is.na(KyCropsAndSubsidies$soybeanPlanted) == 'TRUE', (KyCropsAndSubsidies$SoybeanAcresPlanted <- 0),(KyCropsAndSubsidies$SoybeanAcresPlanted <- KyCropsAndSubsidies$soybeanPlanted))

KyCropsAndSubsidies[ c("County","SoybeanAcresPlanted")]

KyCropsAndSubsidies$WheatAcresPlanted <- ifelse (is.na(KyCropsAndSubsidies$wheatPlanted) == 'TRUE', (KyCropsAndSubsidies$WheatAcresPlanted <- 0),(KyCropsAndSubsidies$WheatAcresPlanted <- KyCropsAndSubsidies$wheatPlanted))

KyCropsAndSubsidies[ c("County","WheatAcresPlanted")]

# recode missing values - acres harvested

KyCropsAndSubsidies$SoybeanAcresHarvested <- ifelse (is.na(KyCropsAndSubsidies$soybeanHarvested) == 'TRUE', (KyCropsAndSubsidies$SoybeanAcresHarvested <- 0),(KyCropsAndSubsidies$SoybeanAcresHarvested <- KyCropsAndSubsidies$soybeanHarvested))

KyCropsAndSubsidies[ c("County","SoybeanAcresHarvested")]

KyCropsAndSubsidies$WheatAcresHarvested <- ifelse (is.na(KyCropsAndSubsidies$wheatHarvested) == 'TRUE', (KyCropsAndSubsidies$WheatAcresHarvested <- 0),(KyCropsAndSubsidies$WheatAcresHarvested <- KyCropsAndSubsidies$wheatHarvested))

KyCropsAndSubsidies[ c("County","WheatAcresHarvested")]

# aggregations

KyCropsAndSubsidies <- transform(KyCropsAndSubsidies, AcresPlanted = cornPlanted + SoybeanAcresPlanted + WheatAcresPlanted,
                                             AcresHarvested = cornHarvested + SoybeanAcresHarvested + WheatAcresHarvested)

KyCropsAndSubsidies <- transform(KyCropsAndSubsidies, LogAcres =log(AcresPlanted))