Tuesday, February 21, 2017

Basic Data Manipulation and Statistics in R and Python

Below are links to a couple of gists with R and Python code for some very basic data manipulation and statistics. I have been using R and SAS for almost a decade, but the R code originates to some very basic scripts that I used when I was a beginning programmer. The python script is just a translation from R to python. This does not represent the best way to solve these problems, but provides enough code for a beginner to get a feel for coding in one of these environments. This is 'starter' code in the crudest sense and intended to allow one to begin learning R or python with a little intimidation with the simplest syntax as possible. However,  once started, one can google other sources  or enroll in courses to expand their programming skillset.

Basic Data Manipulation in R

Basic Data Manipulation in Python

Basic Statistics in R

Basic Statistics in Python 

For more advanced applications in R posted to this blog see all posts with the tag R Code.

Thursday, February 16, 2017

Machine Learning in Finance and Economics with Python

I recently caught a podcast via Chat with Traders that included one among several episodes related to quantitative finance and this one emphasized some basics of machine learning. Very good discussion of some fundamental concepts in machine learning regardless of your interest in finance or algorithmic trading.

You can find this episode via iTunes. But here is a link with some summary information.

Q5: Good (and Not So Good) Uses of Machine Learning in Finance w/ Max Margenot & Delaney Mackenzie


Some of the topics covered include (swiping from the link above):

What is machine learning and how is it used in everyday life?

Supervised vs unsupervised machine learning, and when to use each class.    

Does machine learning offer anything more than traditional statistics methods.

Good (and not so good) uses of machine learning in trading and finance.

The balance between simplicity and complexity.

 I believe the guests on the show were quantopian data scientists, and quantopian is a platform for algorithmic trading and machine learning applied to finance. They do this stuff for real.

There was also some discussion of python. Following up with that there was a tweet from @chatwithtraders  linking to a nice blog,  python for finance that covers some applications using python. Very good stuff all around. I wish I still taught financial data modeling!

See also: Modeling Dependence with Copulas and Quantmod in R

Sunday, February 12, 2017

Molecular Genetics and Economics

A really interesting article in JEP:

A slice:

"In fact, the costs of comprehensively genotyping human subjects have fallen to the point where major funding bodies, even in the social sciences, are beginning to incorporate genetic and biological markers into major social surveys. The National Longitudinal Study of Adolescent Health, the Wisconsin Longitudinal Study, and the Health and Retirement Survey have launched, or are in the process of launching, datasets with comprehensively genotyped subjects…These samples contain, or will soon contain, data on hundreds of thousands of genetic markers for each individual in the sample as well as, in most cases, basic economic variables. How, if at all, should economists use and combine molecular genetic and economic data? What challenges arise when analyzing genetically informative data?"



Beauchamp JP, Cesarini D, Johannesson M, et al. Molecular Genetics and Economics. The journal of economic perspectives : a journal of the American Economic Association. 2011;25(4):57-82.

Saturday, February 11, 2017

Program Evaluation and Causal Inference with High Dimensional Data

Brand new from Econometrica-

Abstract: "In this paper, we provide efficient estimators and honest confidence bands for a variety of treatment effects including local average (LATE) and local quantile treatment effects (LQTE) in data-rich environments.….We provide results on honest inference for (function-valued) parameters within this general framework where any high-quality, machine learning methods (e.g., boosted trees, deep neural networks, random forest, and their aggregated and hybrid versions) can be used to learn the nonparametric/high-dimensional components of the model." Read more...

Tuesday, January 24, 2017

Saturday, January 14, 2017

Identification Through Copulas

Recently I attended a talk (see Zimmer and Trivedi below) where a paper referenced work by Han and Vytlacil that used copulas to estimate probit models with dummy endogenous regressors. The seminar offered an extension to other types of models. However, here I wanted to summarize the approach more generally. You can find the referenced working paper below for more details, which I am told is forthcoming in the Journal of Econometrics.

Copula functions can be used to simulate a dependence structure independently from the marginal distributions.

Based on Sklar's theorem the multivariate distribution F can be represented by copula C as follows:

F(x1…xp) = C{ F1(x1),…, Fp(xp); θ}

The parameter θ represents the dependence between the two distributions F1 and F2. No let's set up the framework for what we are trying to model.
Suppose we want to predict some outcome Y. Let

Y = f(x,D)

where x is a vector of controls and D is a treatment indicator. We are interested in estimating the coefficient on D as our measure of the treatment effect. However, suppose that there is selection bias, such that those that choose to engage in the program indicated by D are more likely to have higher levels of Y regardless of treatment. (for the following for more on selection bias and unobserved heterogeneity and endogeneity).

We can model selection as follows:

D = g(x,z)

where x is a vector of controls and z is an instrument, correlated with the probability of D, but uncorrelated with selection. We can jointly model the outcome and selection functions using copulas where:

P(Y, D|x,z) = C{ F(.), G(.); θ} 

As it turns out, the term θ captures the dependence between outcome and selection allowing for unbiased estimation of treatment effects associated with D. Han and Vytlacil extend the results to cases without instruments.


Han, S. and E. Vytlacil (2015). Identification in a generalization of bivariate probit models with dummy endogenous regressors.Working paper, University of Texas at Austin.

A Note on IdentiÖcation of Discrete Bivariate Copulas. Pravin K. Trivedi and David M. Zimmer August 5, 2016

Tuesday, January 10, 2017

Mediators, Moderators, and Mechanisms

Recently Marc Bellemare shared a post highlighting an article in American Political Science ReviewExplaining Causal Findings Without Bias: Detecting and Assessing Direct Effects.  He does an awesome job giving an overview of the article. If you read his post, you will see that the paper emphasizes causal mechanisms and introduces this through controlled direct effects:

 "their method not only tells you whether M  is a mechanism through which D causes y, it can also tell you whether there is any significant amount of statistical variation left in the causal relationship flowing from D through y after M is accounted for"

Previously, I have been working on a post related to mediators and moderators, and his post motivated me to wrap it up today.

In the article Mediators and Mechanisms of Change in Psychotherapy Research, Kazdin provides some clarity about the differences and relationships between mediators, moderators, and mechanisms:

Mediator: an intervening variable that may account (statistically) for the relationship between the
independent and dependent variable. Something that mediates change may not necessarily explain the processes of how change came about. Also, the mediator could be a proxy for one or more other variables or be a general construct that is not necessarily intended to explain the mechanisms of change. A mediator may be a guide that points to possible mechanisms but is not necessarily a mechanism.

Mechanism: the basis for the effect, i.e., the processes or events that are responsible for the change; the reasons why change occurred or how change came about.

Moderator: a characteristic that influences the direction or magnitude of the relationship between and independent and dependent variable. If the relationship between variable x and y varies is different for males and females, sex is a moderator of the relation. Moderators are related to mediators and mechanisms because they suggest that different processes might be involved (e.g., for males or females).


Mediators and Mechanisms of Change in Psychotherapy Research
Alan E. Kazdin Annu Rev Clin Psychol. 2007;3:1-27