Wednesday, October 7, 2015

Metrics Monday with Marc Bellemare

 I have been following Marc Bellemare for a while now on twitter (@mfbellemare) and really became interested in his blog because of the proliferate amount of very good posts related to applied econometrics. He also writes about a number of interesting topics related to applied economics and in areas related to his own research. In the last few weeks (months?), he as been running a series of posts titled 'Metrics Monday where he addresses lots of issues related to applied econometrics that aren't always addressed in typical theory based courses. I think every advanced undergraduate, graduate student, or any 'metrics or analytics practitioner should read all of his econometrics related posts. Below are some links to selected posts. I'll probably add to this list as he posts more, and as I discover older related posts I have not yet read.

Some of my favorite 'Metrics Monday posts: 

Friends *do* let friends do IV

When is heteroskedasticity (not) a problem

Hypothesis Testing in Theory and Practice

Data Cleaning


Rookie mistakes in empirical analysis

What to do with missing data

Other Applied Econometrics Posts by Marc Bellemare:

Love it or logic, Or: people really care about binary dependent variables

A rant on estimation with binary dependent variables

In defense of the cookbook approach to econometrics

Econometrics teaching needs an overhaul

Do Both

Wednesday, September 30, 2015

Big Data, IoT, Ag Finance, and Causal Inference

Over at my applied economics blog, I recently discussed an article from AgWeb; How the feds interest rate decision affects farmers. This actually got me questioning some of the ramifications of leveraging data analysis in the context of ag lending (from both a farmer and lender perspective), which ultimately lead to me thinking about some interesting questions that would be exciting to investigate:
  1.  Is there a causal relationship between producers that leverage IoT and Big Data analytics applications and farm output/performance/productivity
  2. How do we quantify the outcome-is it some measure of efficiency or some financial ratio?
  3. If we find improvements in this measure-is it simply a matter of selection? Are great producers likely to be productive anyway, with or without the technology?
  4. Among the best producers, is there still a marginal impact (i.e. treatment effect) for those that adopt a technology/analytics based strategy?
  5. Can we segment producers based on the kinds of data collected by IoT devices on equipment, aps, financial records, GPS etc.?  (maybe this is not that much different than the TrueHarvest benchmarking done at FarmLink) and are there differentials in outcomes, farming practices, product use patterns etc. by segment
See also:
Big Ag Meets Big Data (Part 1 & Part 2)
Big Data- Causality and Local Expertise are Key in Agronomic Applications
Big Ag and Big Data-Marc Bellemare
Other Big Data and Agricultural related Application Posts at EconometricSense
Causal Inference and Experimental Design Roundup

Friday, September 25, 2015

Propensity Score Matching Meets Difference-in-Differences

I recently have stumbled across a number of studies incorporating both difference-in-differences  (DD) and propensity score methods.  As discussed before, DD is a special case of fixed effects panel methods.  

In the World Bank's publication "Impact Evaluation in Practice" they give a nice summary of the power of DD in identification of causal effects:

"...we can conclude that many unobserved characteristics of individuals are also more or less constant over time. Consider, for example, a person's intelligence or such personality traits as motivation, optimism, self-discipline, or family health history...Interestingly, we are canceling out(or controling for) not only the effect of observed time invariant characteristics but also the effect of unobserved time invariant characteristics such as those mentioned above"

So with DD we can actually control for unobserved characteristics that we may not have data on or maybe couldn't measure appropriately or even quantify! That's powerful. In this framework we are controlling for unobservable characteristics that may be contributing to selection bias, we are achieving identification of treatment effects in a selection on unobservables context.

On the other hand, with propensity score matching, we are appealing to the conditional independence assumption, the idea that matched comparisons imply balance on observed covariates, which ‘recreates’ a situation similar to a randomized experiment  where all subjects are essentially the same except for the treatment(Thoemmes and Kim, 2011). Propensity score matching can identify treatment effects in a selection on observables context. 

But, what if we combine both approaches. The Impact Evaluation book has a section on mixed methods that gives a really good treatment of the power of using both PSM and DD:

"Matched difference-in-differences is one example of combining methods. As discussed previusly, simple propensity score matching cannot account for unobserved characteristics that might explain why a group chooses to enroll in a program and that might also affect outcomes. By contrast, matching combined with difference-in-differences at least takes care of any unobserved characteristics that are constant across time between the two groups"

Below are several papers that utilize the combination of DD and PSM:

Does Matching Overcome Lalonde’s Critique of Nonexperimental Estimators? Jeffrey Smith and Petra Todd. University of Maryland. 2003

Do Agricultural Land Preservation Programs Reduce Farmland Loss? Evidence from a Propensity Score Matching Estimator
Xiangping Liu and Lori Lynch January 2010

Measuring the Impact of Meat Packing and Processing Facilities in the Nonmetropolitan Midwest: A Difference- in-Differences Approach
Georgeanne M Artz, Peter Orazem, Daniel Otto
November 2005 Working Paper # 03003
Iowa State University

How Effective is Health Coaching in Reducing Health Services Expenditures?
Yvonne Jonk, PhD,* Karen Lawson, MD,w Heidi O’Connor, MS,z Kirsten S. Riise, PhD,y David Eisenberg, MD,8z Bryan Dowd, PhD,z and Mary J. Kreitzer, PhD, RN, FAANw
Medical Care �� Volume 53, Number 2, February 2015


Impact Evaluation in Practice
Paul J. Gertler Sebastian Martinez, Patrick Premand,
Laura B. Rawlings and Christel M. J. Vermeersch
Default Book Series.December 2010

Friday, September 11, 2015

Mastering Metrics....and the Grain Markets

I recently just finished two great books, Mastering 'Metrics, and Mastering the Grain Markets.

Mastering the Grain Markets

While I have a background in agricultural and applied economics, my interest was always related to the public choice and the environmental implications of biotechnology, as well as econometrics (hence this blog). So, I didn't really have much formal background related to commodity markets, other than a little exposure to options through a couple of finance classes.  I have certainly read some really good extension publications related to futures, options, and hedging but Mastering the Grain markets by Elaine Kub really brings these issues to life. She brought me back to my crop scouting days in her many discussions of corn production and the agronomics of our major commodities. She also tackles some major issues and controversies associated with modern agriculture, everything from speculation, to biotech to sustainability issues, gluten fad diets and more. Prepare for a trip from gate to plate in this book that teaches like a textbook but reads like a novel!

Even if you think all you are interested in are the specifics around how futures and options work, you'll end up being convinced that the holistic approach is essential. To borrow one quote:

"..any participation in the grain markets is a form of participation in agriculture, and it should be regarded as one piece of a beautiful, challenging, miraculous whole."
A couple areas that struck me as particularly interesting were her discussions of counterparty risk and over the counter contracts. I'll probably have a separate post on this blog or my ag econ blog regarding counterparty risk.

So, why share a review about a grain markets book on an applied econometrics blog? Well, all the discussion about OTCs and risk management rekindled my interest in copulas, which I have blogged about before, and also made me a little more curious about index based crop insurance. Risk modeling in commodities go hand in hand with econometrics. Oh, and she even hits on precision agriculture and alludes to big data in agriculture:

"At the end of the growing season, he has every data point he could possibly need (seed population, seed depth, input rates, final yield, soil moisture, etc.) to fine tune his production practices on each GPS mapped square foot of his farm."

Mastering Metrics

Before reading MM, I had previously read Angrist and Pischke's Mostly Harmless Econometrics. It was my first rigorous introduction to the potential outcomes framework and causal inference. It took me a while to work through and I still reference it often. Even though Mastering 'Metrics was supposed to be a 'lite' version or maybe an undergraduate version of MHE, reading in 'reverse' order worked out well. What I really liked was their intro to regression, and the presentation of regression as a matching estimator becomes even more crystal clear to me than it did in MHE. To borrow a quote:

"Specifically, regression estimates are weighted averages of multiple matched comparisons"

I really think a lot of people I encounter have a hard time thinking about that. I also got better insight and clarification on a number of issues related to instrumental variables, regression discontinuity, and difference-in-differences.  Within the IV discussion, I really like the causal-chain of effects presentation and discussion of 'intent to treat', and better understand all the things related to compliers and noncompliers etc. They also really got me up to speed with regard to the differences in parametric vs. non-parametric RD and an important distinction between fuzzy and sharp RD:

"....with fuzzy, applicants who cross a threshold are exposed to a more intense treatment, while with a sharp design, treatment switches cleanly on or off at the cutoff."

Another thing that stood out with me, in their DD chapter they made some clarifications about weighted regression and clustered standard errors that seemed very helpful. Other things in general, I really liked their treatment of the regression anatomy formula and understood it much better in this reading.  Their basic review and treatment of inference, standard errors, and t-statistics is really great and a good way to segway an undergraduate student from an introductory statistics class into the more advanced topics they present later in the text. I could also see certain graduate programs, even outside of economics making use of this text.

Both Mastering the Grain Markets and Mastering Metrics end with a final chapter tying everything together.

I highly recommend both books.

More thoughts....

So above I mentioned that risk modeling and econometrics go hand in hand, but have been thinking, were any of the techniques covered in MM useful for work related to the commodities markets? In terms of informing marketing and risk management strategies, I'm not sure. Maybe some readers have some idea. But, in terms of policy analysis as it relates to commodity markets, perhaps. There are some that advocate that we should restrict speculation in commodity markets. Scott Irwin looked at the impact of index funds on commodity markets, using granger causality (although granger causality was not discussed in MHE or MM).  Other work has relied on panel methods. A quick google search reveals some related work using instrumental variables discussed in MM. For now I'll just say to be continued.....


Irwin, S. H. and D. R. Sanders (2010), “The Impact of
Index and Swap Funds on Commodity Futures Markets:
Preliminary Results”, OECD Food, Agriculture and
Fisheries Working Papers, No. 27, OECD Publishing.
doi: 10.1787/5kmd40wl1t5f-en

Saturday, September 5, 2015

Econometrics, Math, and Machine Learning...what?

In a recent Bloomberg View piece, Noah Smith wrote a piece titled "Economics has a Math Problem" that has caught a lot of attention lately.  There were three interesting arguments or subjects I found interesting in the piece.

#1 In economics, theory often takes a unique role in the determination of causality

"In most applied math disciplines -- computational biology, fluid dynamics, quantitative finance -- mathematical theories are always tied to the evidence. If a theory hasn’t been tested, it’s treated as pure conjecture....Not so in econ. Traditionally, economists have put the facts in a subordinate role and theory in the driver’s seat. "

This alone might seem controversial to some, but to many economists, causality is a theory driven phenomenon, and can never truly be determined by data. I won't expand on this any further. But the point is that often, economists, outside of a purely predictive or forecasting scenario, are interested in answering causal questions, and despite all the work since the credibility revolution in terms of quasi-experimental designs, theory still plays an important role in determine causality and the direction of effects.

 #2 In economics and econometrics, there is a huge emphasis on explaining causal relationships, both theoretically and empirically, but in machine learning the emphasis is prediction, classification, and pattern recognition devoid of theory or data generating processes

"Machine learning is a broad term for a collection of statistical data analysis techniques that identify key features of the data without committing to a theory. To use an old adage, machine learning “lets the data speak.”…machine learning techniques emphasized causality less than traditional economic statistical techniques, or what's usually known as econometrics. In other words, machine learning is more about forecasting than about understanding the effects of policy."

That really gets at what I have written before, about machine learning vs classical inference. (If Noah's article is interesting to you, then I highly recommend the Leo Brieman paper I reference in that post). Its true, at first it might seem that most economists interested in causal inference might sideline machine learning methods for their lack if emphasis on identification of causal effects or a data generating process. One of the biggest differences between the econometric theory most economists have been trained in and the new field of data science is in effect familiarity and use of methods from machine learning. But if they are interested strictly in predictive modeling and forecasting, these methods might be quite appealing. (I've argued before that economists are ripe for being data scientists). As we know, the methods and approaches we take to analyzing our data differ substantially depending on whether we are trying to explain vs. predict.

But then things start to get interesting:

#3 Recent work in econometrics has narrowed the gap between machine learning and econometrics

"But Athey and Imbens have also studied how machine learning techniques can be used to isolate causal effects, which would allow economists to draw policy implications."

I have not actually drilled into the references and details around this but it is interesting. Just thinking about it a little, I recalled that not long ago I worked on a project where I used gradient boosting (a machine learning algorithm) to estimate propensity scores to estimate treatment effects associated with a web ap.

Even one of the masters of metrics and causal inference, Josh Angrist is offering a course titled "Applied Econometrics:Mostly Harmless Big Data" via the MIT open course platform. And for a long time, economist Kenneth Sanford has been following this trend of emphasis on data science and machine learning in econometrics.

Overall, I think it will be interesting to see more examples of applications of machine learning in causal inference. But, when these applications involve big data and the internet of things, economists will really have to test their knowledge of a range of other big data tools that have little to do with building models or doing calculations.

See also:
Analytics vs Causal Inference
Big Data: Don't throw the baby out with the bath water
Propensity Score Weighting: Logistic vs CART vs Boosting vs Random Forests 
Data Cleaning
Got Data? Probably not like your econometrics textbook!
In God we trust, all others show me your code.
Data Science, 10% inspiration, 90% perspiration
 Big Ag Meets Big Data (Part 1 & Part 2)

Wednesday, August 12, 2015

Index Based Crop Insurance and Big Data

 There is some interesting work going on currently in relation to risk management in the agriculture space as it relates to 'big data.'

"Agriculture risk management is about having access to ‘big data’ since growth conditions, risk types, climate and insurance terms vary largely in space. Solid crop models are based on large databases including simulated weather patterns with tempo-spatial correlations, crop planting areas, soil types, irrigation application, fertiliser use, crop rotation and planting calendars. … Similarly, livestock data need to include livestock densities which drive diseases, disease spread vectors and government contingency plans to address outbreaks of highly contagious diseases…Ultimately, big data initiatives will support selling agriculture insurance policies via smart phones based on highly sophisticated indices and will make agriculture insurance and risk management rapidly scalable and accessible to a large majority of those mostly affected – the farmers." - from The Actuary, March 2015

Similarly, in another issue of The Actuary there is more discussion related to this:

"In more recent years IBI (Indemnity Based Insurance) has received a renewed interest, largely drivenby advances in infrastructure (i.e., weather stations), technology (i.e., remote sensing
and satellites), as well as computing power, which has enabled the development of new statistical and mathematical models. With an IBI contract, indemnities are paid based on some index level, which is highly correlated to actual losses. Possible indices include rainfall, yields, or vegetation levels measured by satellites. When an index exceeds a certain predetermined threshold, farmers receive a fast, efficient payout, in some cases delivered via mobile phones. "

The article notes several benefits related to IBI products, including decreased moral hazard and adverse selection as well as the ability to transfer risk. However some challenges were noted related to 'basis' risk, where the index used to determine payments may not be directly linked to actual losses. In such cases, a farmer may recieve a payment when no loss is realized, or may actually experience loss but the index values don't trigger a payment. The farmer is left feeling like they have paid for something without benefit in the latter case. The article discusses three types of basis risk; variable, spatial, and temporal. Variable risk occurs when other unmeasured factors impact a peril not captured by the index. Maybe its wind speed during pollination or some undocumented pest damage or something vs measured items like temperature or humidity. An example of spatial risk might be related to cases where index data may be data generated from meteorological stations too far from the field location to accurately trigger payments for perils related to rain or temperature.  Temporal risk is really interesting to me in terms of the potential for big data:

"The temporal component of the basis risk is related to the fact that the sensitivity of yield to the insured peril often varies over the crops’ stages of growth. Factors such as changes in planting dates, where planting decisions are made based on the onset of rains, for example, can have a substantial impact on correlation as they can shift critical growth stages, which then do not align with the critical periods of risk assumed when the crop insurance product was designed." 

It would seem to me that the kinds of data elements being capture by services offered by companies like Climate Corp, Farmlink, John Deere etc. in combination of other aps (drones/smartphones/other modes of censoring/data collection) might be informative to creating and monitoring the performance of better indexes to help mitigate the basis risk associated with IBI related products.


New frontiers in agricultural insurance . The Actuary. March 2015. DR AUGUSTE BOISSONNADE

Lysa Porth and Ken Seng Tan

See also:

Copula Based Agricultural Risk Models
 Big Ag Meets Big Data (Part 1 & Part 2)

Copula Based Agricultural Risk Models

I have written previously about copulas, with some very elementary examples (see here and here). Below are some papers I have added to my reading list with applications in agricultural risk management. I'll likely followup in the future with some annotation/review.

Zimmer, D. M. (2015), Crop price comovements during extreme market downturns. Australian Journal of Agricultural and Resource Economics. doi: 10.1111/1467-8489.12119

Energy prices and agricultural commodity prices: Testing correlation using copulas method
Krishna H Koirala, Ashok K Mishra, Jeremy M D 'antoni, Joey E Mehlhorn
Energy 01/2015; DOI:10.1016/ · 

Xiaoguang Feng, Dermot J. Hayes
Diversifying Systemic Risk in Agriculture: A Copula-based Approach

Mixed-Copula Based Extreme Dependence Analysis: A Case Study of Food and Energy Price Comovements
Feng Qiu and Jieyuan Zhao
Selected Paper prepared for presentation at the Agricultural & Applied Economics Association’s 2014 AAEA Annual Meeting, Minneapolis, MN, July 27-29, 2014.

Price asymmetry between different pork cuts in the USA: a copula approach
Panagiotou and Stavrakoudis Agricultural and Food Economics (2015) 3:6
DOI 10.1186/s40100-015-0029-2

Copula-Based Models of Systemic Risk in U.S. Agriculture: Implications for Crop Insurance and
Reinsurance Contracts Barry K. Goodwin
October 22, 2012