Friday, January 17, 2014

Propensity Score Matching Meets Survival Analysis

In one of my earlier posts regarding propensity score applications in higher ed research, a reader asked in the comment section about using propensity score methods in the context of survival analysis. Ironically, just a few days prior, I was having a similar discussion with another higher education researcher. Unfortunately, I have not been able to answer their questions adequately, but I think this is an interesting topic. I've recently located a few articles that deal with this. Unfortunately I have not had a chance to read through them but thought they may be of interest. In the least I now have them bookmarked for future reference. Hopefully they address some of these issues.

Effect of radiation therapy on survival in surgically resected retroperitoneal sarcoma: a propensity score-adjusted SEER analysis
 Ann Oncol (2012)
A. H. Choi1,
J. S. Barnholtz-Sloan2 and
J. A. Kim3*

Propensity score methods were used to perform survival analysis in patients who received radiation matched with patients who underwent surgery alone...Propensity scoring (309 matched pairs) and survival analysis using Kaplan–Meier methods demonstrated no difference between propensity score-matched patients receiving radiation therapy and those who did not (P = 0.35).

 Propensity score applied to survival data analysis through proportional hazards models: a Monte Carlo study.
Pharm Stat. 2012 Mar 12. doi: 10.1002/pst.537.
Gayat E, Resche-Rigon M, Mary JY, Porcher R.

A Monte Carlo simulation study was used to compare the performance of several survival models to estimate both marginal and conditional treatment effects. The impact of accounting or not for pairing when analysing propensity-score-matched survival data was assessed. In addition, the influence of unmeasured confounders was investigated....Our study showed that propensity scores applied to survival data can lead to unbiased estimation of both marginal and conditional treatment effect, when marginal and adjusted Cox models are used. In all cases, it is necessary to account for pairing when analysing propensity-score-matched data, using a robust estimator of the variance.

The performance of different propensity score methods for estimating marginal hazard ratios. Stat Med. 2013 Jul 20;32(16):2837-49. doi: 10.1002/sim.5705. Epub 2012 Dec 12. Austin PC. biomedical research, time-to-event outcomes occur frequently. There is a paucity of research into the performance of different propensity score methods for estimating the effect of treatment on time-to-event outcomes....We conducted an extensive series of Monte Carlo simulations to examine the performance of propensity score matching (1:1 greedy nearest-neighbor matching within propensity score calipers), stratification on the propensity score, inverse probability of treatment weighting (IPTW) using the propensity score, and covariate adjustment using the propensity score to estimate marginal hazard ratios. We found that both propensity score matching and IPTW using the propensity score allow for the estimation of marginal hazard ratios with minimal bias. Of these two approaches, IPTW using the propensity score resulted in estimates with lower mean squared error when estimating the effect of treatment in the treated. Stratification on the propensity score and covariate adjustment using the propensity score result in biased estimation of both marginal and conditional hazard ratios. Applied researchers are encouraged to use propensity score matching and IPTW using the propensity score when estimating the relative effect of treatment on time-to-event outcomes.

Tuesday, January 14, 2014

Analytics vs. Causal Inference

When I think of analytics, I primarily think about what Leo Breiman referred to as an 'algorithmic' approach to analysis. Breiman states:"There are two cultures in the use of statistical modeling to reach conclusions from data.  One "assumes that the data are generated by a given stochastic data model" the other"uses algorithmic models and treats the data mechanism as unknown."

To take an example from higher education, lets look at a hypothetical I have proposed before. Assume we want to evaluate the causal effects of a summer camp designed to prepare high school graduates for their first year of college. If we are interested in making inferences about the causal impact of the camp on retention (this fits under the stochastic data modeling culture) we realize that the impact of the camp program itself (which is the causal effect of interest) as well as academic potential and motivation are confounded. Students that attend the camp may also be very motivated and academically strong, and likely to have high retention rates regardless of their attendance. We proceed with our analysis using some form of experimental or quasi-experimental design to attempt to statistically estimate  or 'identify' the causal effects related to camp. We are concerned with things like standard errors, confidence intervals, statistical significance etc. We may use the results to evaluate the effectiveness of the camp program and improve resource allocation.

On the other hand, we may simply be interested in building a model that gives a measure of the probability of retention for first year students. We may want to take those results and segment the student population into strata based on their risk profile, and tailor programs to help with their academic success an improve resource allocation. The variable indicating 'camp' attendance among others might be a good predictor and aid in producing the probability estimates and well calibrated stratifications. We might accomplish this with logistic regression, decision trees, neural networks, random forests or gradient boosting or some other machine learning algorithm. We are concerned with things like predictive accuracy, discrimination, sensitivity, specificity, true positives, false positives,  ranking, and calibration.  We are not so concerned with p-values, statistical significance, standard errors etc.

Both approaches are data driven, and one is not more meritorious than the other. People often adopt one culture  or paradigm and impugn the other.  As Brieman states:

 "Approaching problems by looking for a data model imposes an apriori straight jacket that restricts the ability of statisticians to deal with a wide range of statistical problems."

In a past Campus Technology article "The Predictive Analytics Framework Moves Forward" the following comments depict this schism that sometimes occurs between cultures:

"For some in education what we’re doing might seem a bit heretical at first--we’ve all been warned in research methods classes that data snooping is bad! But the newer technologies and the sophisticated analyses that those technologies have enabled have helped us to move away from looking askance at pattern recognition. That may take a while in some research circles, but in decision-making circles it’s clear that pattern recognition techniques are making a real difference in terms of the ways numerous enterprises in many industries are approaching their work"

 In fact, both paradigms can often be complimentary in application. We might first develop an algorithm as indicated in the second approach and based on those results design a program or intervention targeted at students with a certain risk profile. We might then step into the world of inferential statistics and attempt to evaluate the effectiveness of this program.

In summary, both are going to utilize some sort of data driven model development, and as the popular parahrase of statistican George E.P. Box goes, all models are wrong, but some are useful. Regardless of the paradigm you are working in, the key is to to produce something useful for solving the problem at hand.


The Predictive Analytics Reporting Framework Moves Forward
A Q & A with WCET Executive Director Ellen Wagner on the PAR Framework.  Mary Grush 01/18/12 Campus Technology.

 'Statistical Modeling: The Two Cultures' by L. Breiman (Statistical Science
2001, Vol. 16, No. 3, 199–231)

See also: 

Predictive Analytics in Higher Ed

Culture War: Classical Statistics vs. Machine Learning: 

Economists as Data Scientists 

Saturday, January 4, 2014

The Oregon Medicaid Experiment and Linear Probability Models

I just recently discussed the methodology used in some recent papers analyzing the Oregon Medicaid expansion (see: ). This was one of the papers:

"The Oregon Experiment--Effects of Medicaid on Clinical Outcomes," by Katherine Baicker, et al. New England Journal of Medicine, 2013; 368:1713-1722. 

If you read the supplementary appendix you will find the following:

In all of our ITT estimates and in our subsequent instrumental variable estimates (see below), we fit linear models even though a number of our outcomes are binary. Because we are interested in the difference in conditional means for the treatments and controls, linear probability models would pose no concerns in the absence of covariates or in fully saturated models (Angrist 2001, Angrist and Pischke 2009). Our models are not fully saturated, however, so it is possible that results could be affected by this functional form choice, especially for outcomes with very low or very high mean probability. We therefore explore the sensitivity of our results to an alternate specification using logistic regression and calculating average marginal effects for all binary outcomes, and are reassured that the results look very similar (see Table S15a-d below). 

You will find a similar methodology in the more recent article in Science previously discussed. This weaves well with some of my past posts:

Linear Regression and Analysis of Variance with Binary Dependent Variables

Regression as an Empirical Tool (matching and linear probability models)

The Oregon Medicaid Experiment, Applied Econometrics, & Causal Inference

Recently, the findings of a paper published in Science that finds an increase in ER visits among patients benefiting from expanded medicaid in Oregon has been in the news. I like this work because it represents a great example of applied econometrics and causal inference in the field of healthcare econometrics: 

"The result, said Finkelstein, was that the groups of people with or without insurance were identical, "except for the fact that some have insurance and some don't. You've literally randomized the allocation of insurance coverage."

If you are not familiar with the context, the state of Oregon expanded medicaid coverage (pre PPACA) but only to randomly selected winners of a lottery. About half the winners did not apply for and utilize the expanded coverage, so the only TRUE RANDOM comparisons involve lottery winners to losers. So, as Finkelstein is quoted, it is literally a randomization of the allocation of insurance. This is a valid analysis under an 'intent-to-treat' framework, but bear in mind this is NOT necessarily a comparison of groups of identical people with or without insurance. Further, you cannot compare those 50% or so lottery winners that took the new coverage to losers without coverage and appeal to randomization or claim that the groups are identical and comparable. (there could be huge issues related to selection bias) However, if the authors used  instrumental variables to get an estimate of 'local average treatment effects' comparing those winners that took the new coverage to statistically similar losers, who likely would have taken coverage if they would have been winners:

"We compare outcomes between the “treatment group” (those randomly selected in the lottery) and the “control group” (those not randomly selected)......Our intent-to-treat analysis, comparing the outcomes in the treatment and control groups, provides an estimate of the causal effect of winning the lottery (and being permitted to apply for OHP Standard)."

"Of greater interest may be the effect of Medicaid coverage itself. Not everyone selected by the lottery enrolled in Medicaid; some did not apply and some who applied were not eligible for coverage. To estimate the causal effect of Medicaid coverage, we use a standard instrumental-variable approach with lottery selection as an instrument for Medicaid coverage. This analysis uses the lottery’s random assignment to isolate the causal effect of Medicaid coverage.

So, what is the practical difference between intent-to-treat and the local average treatment effect (via instrumental variables) in the context of this research? The authors explain that very well:

"The intent-to-treat estimate may be a relevant parameter for gauging the effect of the ability to apply for Medicaid; the local-average-treatment-effect estimate is the relevant parameter for evaluating the causal effect of Medicaid for those actually covered."

Also as discussed in the supplementary appendix, both the ITT and IV estimates are based on linear probability models and comparison to marginal effects derived from logistic regression. (see Oregon Medicaid Experiment and Linear Probability Models)

You can find a good discussion about this experiment intent to treat, etc in the context of the NEJM paper in an EconTalk podcast with Jim Manzi and Russ Roberts this past year: 

Also, a nice profile of MIT economist Amy Finkelstein in a related story from Bloomberg: "MIT Economist Seeks Facts in Health-Care Policy Debate." 


"The Oregon Experiment--Effects of Medicaid on Clinical Outcomes," by Katherine Baicker, et al. New England Journal of Medicine, 2013; 368:1713-1722.

Medicaid Increases Emergency-Department Use: Evidence from Oregon's Health Insurance Experiment. Sarah L. Taubman,Heidi L. Allen, Bill J. Wright, Katherine Baicker, and Amy N. Finkelstein. Science 1246183Published online 2 January 2014 [DOI:10.1126/science.1246183]