Econometric Sense

“The wrong kind of tools and technology mindset, a...

2025-01-05T22:03:59.304-05:00

“The wrong kind of tools and technology mindset, and obsequiousness toward the technology, and a second-handed tendency to believe that we can or should be outsourcing, nay, sacrificing our thinking to AI in exchange for misleading if not false promises about value, is philosophically and epistemically disturbing.” I could not agree more. This is an excellent article!

Thanks for this post. This is a very important top...

2023-03-02T07:12:36.409-05:00

Thanks for this post. This is a very important topic for event study research. In business studies the estimation biases induced through treatment effect heterogeneity is always ignored which probably often leads to misinterpretation of the results.

Should the average marginal effect be a decrease o...

2016-03-15T07:52:57.288-04:00

Should the average marginal effect be a decrease of 2.5% or 2.5 percentage points?

I still don't understand how can I tell whethe...

2015-09-21T04:40:57.973-04:00

I still don't understand how can I tell whether 's11' has any impact on Bush approval rating. Is it determined by the p value of 'T1-AR1'?

Fantastic post, Matt! I'm an accounting acad...

2015-09-11T14:17:55.958-04:00

Fantastic post, Matt!

I'm an accounting academic who does a lot of applied econometric work, and I've started reading up on the machine learning literature. As you and Noah Smith note, the focus is on prediction rather than estimating the effect of a particular predictor.

Aside from some basic terminology differences--e.g. I'd never heard of training vs. testing samples until I started reading ML books--I've found most of the techniques quite accessible. And it was heartening to see that deep connections have been established between bread-and-butter econometric techniques like logistic regression and SVM methods.

The only parts of ML that I'm still trying to get comfortable with are methods that might yield better prediction, but whose outputs aren't as readily interpretable, such as tree-based boosting. It seems like you'd have to feel quite confident that the underlying relationships are stable over time if you're going to use these more less interpretable methods, no?

But I should already be thankful for ML methods, as they brought me to your blog...I found out about it through a suggested LinkedIn connection, a match developed via a ML-based algorithm. :)

Best,
George Batta

Thanks Matt. Here's the post by Kling on math...

2015-09-09T23:05:07.440-04:00

Thanks Matt.

Here's the post by Kling on math in the profession:

http://www.econlib.org/library/Columns/y2015/Klingmit.html

Thanks. I really think Levi Russell above does a g...

2015-09-09T21:57:06.462-04:00

Thanks. I really think Levi Russell above does a great job covering this in his comments. I'm still thinking about it, but I really like his stuff over at the Farmer Hayek blog and was kind of hoping he would to a post on this as well.

Thanks for the shout-out! :-) to many economists,...

2015-09-09T01:26:51.791-04:00

Thanks for the shout-out! :-)

to many economists, causality is a theory driven phenomenon, and can never truly be determined by data. I won't expand on this any further.

I would love to read a follow-up post about this...

While I'm certainly glad that we have more/bet...

2015-09-05T20:23:46.161-04:00

While I'm certainly glad that we have more/better data and methods for analyzing it, I don't think we should ever forget the lessons from masters of the past. A lot of their work is increasingly relevant as we are increasingly incentivized to think we can engineer policy (whether private or public) using data analysis. Ultimately the things we analyze as economists are too complex for data to deliver what some seem to expect of it (see Hayek).

I recently read Ronald Coase's paper "The Marginal Cost Controversy." This paper isn't emphasized in grad programs but it serves as a warning: what Coase calls "blackboard economics" can lead us to say things with confidence that are not so certain.

His interview in 2012 with Russ Roberts is definitely illuminating.
http://www.econtalk.org/archives/2012/05/coase_on_extern.html

Arnold Kling, a graduate of MIT, has some interesting thoughts about the mathematization of the profession that's really at the heart of this "data revolution."

I have some posts on this subject on the blog. Type "blackboard economics" into the search bar on the Farmer Hayek blog and you'll get a few of them!

My answer (so far) to the question has been (c) St...

2015-06-17T15:21:31.526-04:00

My answer (so far) to the question has been (c) Stata. :P

I need to look more into this Bayesian stuff.

2015-02-27T11:26:44.715-05:00

I need to look more into this Bayesian stuff.

Re: software implementation: It's also worth l...

2014-12-02T18:10:11.073-05:00

Re: software implementation: It's also worth looking into Python, especially for those new to applied work. There is a growing network of people using Python and it is a pretty easy language to pick up.

thank you for the reply. i really like your post; ...

2014-08-05T06:16:36.501-04:00

thank you for the reply. i really like your post; it helps clarify the difficult jargon in an intuitive way.

i have a quick, related question. is the phrase, "correlated unobervables" referring to the same phenomenon as "unobserved heterogeneity"?

thanks again

YES! THANK YOU! It should say BIASED. I need to co...

2014-08-01T06:09:57.054-04:00

YES! THANK YOU! It should say BIASED. I need to correct that.

good stuff however, there is a typo in the first ...

2014-08-01T03:07:00.440-04:00

good stuff

however, there is a typo in the first sentence: "When we estimate a regression such as (1) above and leave out an important variable such as X2 then our estimate of β1 can become unbiased and inconsistent."

should "our estimate of β1 can become BIASED and inconsistent

Thank you for this. Have you also discussed differ...

2014-07-28T11:36:12.098-04:00

Thank you for this. Have you also discussed difference in differences estimation especially the variety that takes on multiple time periods for a recurring treatment (as opposed to the common which involves only two periods)?

I really appreciated this excellent exposition, bu...

2014-04-15T11:59:21.779-04:00

I really appreciated this excellent exposition, but I'm also unable to reproduce the coefficients in the call to arimax. I'm wondering if the dataset you used was modified. Using the BUSHJOB.dta files from Professor Monogan's site, the ARIMA 0,1,0 model (mod2) gives me an error message and the AR(1) model (mod 2b) yields the following prediction equation:
y.pred <- 56.0327 + 27.6660*bush$s11 + 27.6660*(0.8984^(bush$t-9))*as.numeric(bush$t>9), which is the same as Professor Monogan's results. I know it's a minor detail, and thanks again for a nice review.

Manski presented at a forum at the University of K...

2014-04-10T17:04:34.305-04:00

Manski presented at a forum at the University of Kentucky back in 2012 on this topic - the paper is linked here http://www.nber.org/papers/w16207. I guess this general idea has been something on his agenda for many years since the similar paper you reference is 12 years earlier.

This may also be useful to your readers: The Las...

2014-01-20T12:43:41.680-05:00

This may also be useful to your readers:

The Last Shall Be First: Matching Patients by Choosing the Least Popular First
Stefanie J. Millar and David J. Pasta
ICON Clinical Research, San Francisco, CA

http://www.wuss.org/proceedings10/HOC/The%20Last%20Shall%20Be%20First%20-%20Matching%20Patients%20by%20Choosing%20the%20Least%20Popular%20First.pdf

Best,
Victoria

Hi Matt. Wow! Thank you for all the helpful resour...

2014-01-20T11:59:54.854-05:00

Hi Matt. Wow! Thank you for all the helpful resources. I think the easiest - but not entirely ideal - solution for me is to first determine the smallest follow-up period for my treatment group, and then observe all cases based on that follow-up period (first matching, then analysis). I can then move on to a longer follow-up period and take a look at who is left. It's not the cleanest method, but it appears to be the simplest. If I come up with something nicer, I will be sure to post here! Thanks again for all your help!

Best,
Victoria from Rutgers SCJ

I always thought, a general rule of blogging was t...

2014-01-17T23:26:02.269-05:00

I always thought, a general rule of blogging was to not dominate the comment section. I've violated that convention. But I wanted to point to a recent post that cites some articles that involve PS matching in the context of survival analysis: http://econometricsense.blogspot.com/2014/01/propensity-score-matching-meets.html

Also, in the STATA discussion I previously linked ...

2014-01-16T23:56:08.645-05:00

Also, in the STATA discussion I previously linked to, one commenter mentioned that time varying propensity scores could introduce bias or other issues if the treatment itself impacts the observables and hence PS in later periods. So, in this regard it would make since to estimate PS and implement the matches prior to the analysis without later updates. Again, an IPTW approach may be useful if you are concerned with censorship breaking down your matches.

I've also discussed IPTW regression on this bl...

2014-01-16T23:50:49.746-05:00

I've also discussed IPTW regression on this blog before, as a much more computationally tractable alternative to PS matching. My first thoughts when confronted with this issue was to implement some sort of IPTW based survival analysis if possible. This would take care (possibly) of the issue related to censored matches, since IPTW methods utilize PS adjustments without explicit matching. This is discussed using R here: http://www.stata.com/statalist/archive/2012-05/msg00517.html

Sorry for the typos "attaching" vs "...

2014-01-16T23:33:29.798-05:00

Sorry for the typos "attaching" vs "matching" above. I'm commenting via my iPad which tends to have latency issues. Regardless, time dependent covariates are challenging enough at least in SAS: http://econometricsense.blogspot.com/2012/04/analysis-of-gmo-vs-non-gmo-corn-hybrid.html

A colleague and I were discussing something very s...

2014-01-16T23:28:49.284-05:00

A colleague and I were discussing something very similar just last week. I really don't know, but I the mean time will work on a solution. It may take a while before I find out. I think it depends. If you have a selection on observable s situation that lends itself to attaching, I might venture to say that as long as the observable s don't change during followup periods then the propensity scores estimated and initial matching should hold throughout. If you the observables and hence propensity scores could somehow be percieved as time varying, then I would think you would have something similar to a survival analysis with time varying covariates, and may requre different matches and PS estimated for each period. And, another challenge, regardless of time varying PS or not, what do you do with an initial matched comparison if one of the matches becomes censored and the other doesn't. This may be more of an issue with 1:1 matches. I really need to think more about this. The scenarios I have mentioned would seem to be difficult to code, and I'm not sure of any procedures in SAS would handle them by default. I'm not a STATA user so I'm unsure there. If these are truly issues, I would not doubt someone has written an R package to deal with it.