Friday, May 8, 2015

Mendelian Instruments (Applied Econometrics meets Bioinformatics)

Recently I defended the use of quasi-experimental methods in wellness studies, and a while back I sort of speculated that genomic data might be useful in a quasi-experimental setting-but wasn’t sure how: 

If causality is the goal, then merge 'big data' from the gym app with biometrics and the SNP profiles and employ some quasi-expermental methodology to investigate causality.”

Then this morning at marginal revolution I ran across a link to a blog post that mentioned exploiting mendelian variation as instruments for a particular study related to alcohol consumption.

This piece gives a nice intro I think:

Stat Med. 2008 Apr 15;27(8):1133-63. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology.

Lawlor DA1, Harbord RM, Sterne JA, Timpson N, Davey Smith G


“Observational epidemiological studies suffer from many potential biases, from confounding and from reverse causation, and this limits their ability to robustly identify causal associations. Several high-profile situations exist in which randomized controlled trials of precisely the same intervention that has been examined in observational studies have produced markedly different findings. In other observational sciences, the use of instrumental variable (IV) approaches has been one approach to strengthening causal inferences in non-experimental situations. The use of germline genetic variants that proxy for environmentally modifiable exposures as instruments for these exposures is one form of IV analysis that can be implemented within observational epidemiological studies. The method has been referred to as 'Mendelian randomization', and can be considered as analogous to randomized controlled trials. This paper outlines Mendelian randomization, draws parallels with IV methods, provides examples of implementation of the approach and discusses limitations of the approach and some methods for dealing with these.”

Tuesday, April 28, 2015

Healthcare Analytics at SAS Global Forum 2015

I was not able to attend this year's SAS Global Forum, but have had a chance to browse the numerous session papers as well as enjoy some live content. The conference page has a searchable catalog and for each paper you will find links to similar sessions in the sidebar. There were around 5000 attendees at this year's conference and over 600 papers. For a 2 1/2 day conference that's more than 200 papers to cover per day. Below is a selection of some papers related to healthcare analytics. If we widen the search to include papers related to other fields, but applications in healthcare analtyics the selection would probably double. I'm sure I missed something, and would be glad to know if you had a favorite paper or presentation you'd like to share in the comments.

1329 - Causal Analytics: Testing, Targeting, and Tweaking to Improve Outcomes This session is an introduction to predictive analytics and causal analytics in the context of improving outcomes. The session covers the following topics: 1) Basic... View More 20 minutes Breakout Jason Pieratt

1340 - Using SAS® Macros to Flag Claims Based on Medical Codes Many epidemiological studies use medical claims to identify and describe a population. But finding out who was diagnosed, and who received treatment, isn't always simple.... View More 50 minutes Breakout Andy Karnopp

2382 - Reducing the Bias: Practical Application of Propensity Score Matching in Health-Care Program Evaluation To stay competitive in the marketplace, health-care programs must be capable of reporting the true savings to clients. This is a tall order, because most health-care programs... View More 20 minutes Breakout Amber Schmitz

2920 - Text Mining Kaiser Permanente Member Complaints with SAS® Enterprise Miner™ This presentation details the steps involved in using SAS® Enterprise Miner™ to text mine a sample of member complaints. Specifically, it describes how the Text Parsing, Text... View More 30 minutes E-Poster Amanda Pasch

3214 - How is Your Health? Using SAS® Macros, ODS Graphics, and GIS Mapping to Monitor Neighborhood and Small-Area Health Outcomes With the constant need to inform researchers about neighborhood health data, the Santa Clara County Health Department created socio-demographic and health profiles for 109... View More 20 minutes Breakout Roshni Shah

3254 - Predicting Readmission of Diabetic Patients Using the High-Performance Support Vector Machine Algorithm of SAS® Enterprise Miner™ 13.1 Diabetes is a chronic condition affecting people of all ages and is prevalent in around 25.8 million people in the U.S. The objective of this research is to predict the... View More 20 minutes Breakout Hephzibah Munnangi

3281 - Using SAS® to Create Episodes-of-Hospitalization for Health Services Research An essential part of health services research is describing the use and sequencing of a variety of health services. One of the most frequently examined health services is... View More 20 minutes Breakout MeriƧ Osman

3282 - A Case Study: Improve Classification of Rare Events with SAS® Enterprise Miner™ Imbalanced data are frequently seen in fraud detection, direct marketing, disease prediction, and many other areas. Rare events are sometimes of primary interest. Classifying... View More 20 minutes Breakout Ruizhe Wang

3411 - Identifying Factors Associated with High-Cost Patients Research has shown that the top five percent of patients can account for nearly fifty percent of the total healthcare expenditure in the United States. Using SAS® Enterprise... View More 30 minutes E-Poster Jialuo Cheng

3488 - Text Analytics on Electronic Medical Record Data This session describes our journey from data acquisition to text analytics on clinical, textual data. 50 minutes Breakout Mark Pitts

3560 - A SAS Macro to Calculate the PDC Adjustment of Inpatient Stays The Centers for Medicare & Medicaid Services (CMS) uses the Proportion of Days Covered (PDC) to measure medication adherence. There is also some PDC-related research based on... View More 20 minutes Breakout anping chang

3600 - When Two Are Better Than One: Fitting Two-Part Models Using SAS In many situations, an outcome of interest has a large number of zero outcomes and a group of nonzero outcomes that are discrete or highly skewed.  For example, in modeling... View More 20 minutes Breakout Laura Kapitula

3740 - Risk-Adjusting Provider Performance Utilization Metrics Pay-for-performance programs are putting increasing pressure on providers to better manage patient utilization through care coordination, with the philosophy that good... View More 50 minutes Breakout Tracy Lewis

3741 - The Spatio-Temporal Impact of Urgent Care Centers on Physician and ER Use The unsustainable trend in healthcare costs has led to efforts to shift some healthcare services to less expensive sites of care. In North Carolina, the expansion of urgent... View More 50 minutes Breakout Laurel Trantham

3760 - Methodological and Statistical Issues in Provider Performance Assessment With the move to value-based benefit and reimbursement models, it is essential toquantify the relative cost, quality, and outcome of a service. Accuratelymeasuring the cost... View More 50 minutes Breakout Daryl Wansink

SAS1855 - Using the PHREG Procedure to Analyze Competing-Risks Data Competing risks arise in studies in which individuals are subject to a number of potential failure events and the occurrence of one event might impede the occurrence of other... View More 20 minutes Breakout Ying So

SAS1900 - Establishing a Health Analytics Framework Medicaid programs are the second largest line item in each state’s budget. In 2012, they contributed $421.2 billion, or 15 percent of total national healthcare expenditures.... View More 20 minutes Breakout Krisa Tailor

SAS1951 - Using SAS® Text Analytics to Examine Labor and Delivery Sentiments on the Internet In today’s society, where seemingly unlimited information is just a mouse click away, many turn to social media, forums, and medical websites to research and understand how mothers feel about the birthing process. Mining the data in these resources helps provide an understanding of what mothers value and how they feel. This paper shows the use of SAS® Text Analytics to gather, explore, and analyze reports from mothers to determine their sentiment about labor and delivery topics. Results of this analysis could aid in the design and development of a labor and delivery survey and be used to understand what characteristics of the birthing process yield the highest levels of importance. These resources can then be used by labor and delivery professionals to engage with mothers regarding their labor and delivery preferences. View Less 20 minutes Breakout Michael Wallis

Saturday, March 28, 2015

Using the R MatchIt package for propensity score analysis

Descriptive analysis between treatment and control groups can reveal interesting patterns or relationships, but we cannot always take descriptive statistics at face value. Regression and matching methods allow us to make controlled comparisons to reduce selection bias in observational studies.
For a couple good references that I am basically tracking in this post see here and here. These are links to the pages of the package authors and a nice paper (A Step by Step Guide to Propensity Score Matching in R) from higher education evaluation research respectively.

In both Mostly Harmless Econometrics and Mastering Metrics Angrist and Pischke discuss the similarities between matching and regression. From MM:

"Specifically, regression estimates are weighted averages of multiple matched comparisons"

In this post I borrow from some of the previous references, and try to follow closely the dialogue in chapter 3 of MHE. So, conveniently the R matchit propensity score matching package comes with a subset of the Lalonde data set referenced in MHE. Based on descriptives, it looks like this data matches columns (1) and (4) in table 3.3.2. The Lalonde data set basically consists of a treatment variable indicator, an outcome re78 or real earnings in 1978 as well as other data that can be used for controls. (see previous links above for more details). If we use regression to look at basic uncontrolled raw differences between treatment and control groups, it appears that the treatment (a job training program) produces negative results (on the order of $635):

R code:

            Estimate Std. Error t value            Pr(>|t|)   
(Intercept)     6984        361   19.36 <0.0000000000000002 ***
treat           -635  <---      657   -0.97                0.33 

Once we implement matching in R, the output provides comparisons between the balance in covariates for the treatment and control groups before and after matching. Matching is based on propensity scores estimated with logistic regression. (see previous post on propensity score analysis for further details). The output below indicates that the propensity score matching creates balance among covariates/controls as if we were explicitly trying to match on the controls themselves.

R Code:
m.out1 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = lalonde, method = "nearest", distance = "logit")

Summary of balance for all data:
         Means Treated Means Control SD Control  Mean Diff   eQQ Med  eQQ Mean   eQQ Max
distance        0.5774        0.1822     0.2295     0.3952    0.5176    0.3955    0.5966
age            25.8162       28.0303    10.7867    -2.2141    1.0000    3.2649   10.0000
educ           10.3459       10.2354     2.8552     0.1105    1.0000    0.7027    4.0000
black           0.8432        0.2028     0.4026     0.6404    1.0000    0.6432    1.0000
hispan          0.0595        0.1422     0.3497    -0.0827    0.0000    0.0811    1.0000
nodegree        0.7081        0.5967     0.4911     0.1114    0.0000    0.1135    1.0000
married         0.1892        0.5128     0.5004    -0.3236    0.0000    0.3243    1.0000
re74         2095.5737     5619.2365  6788.7508 -3523.6628 2425.5720 3620.9240 9216.5000
re75         1532.0553     2466.4844  3291.9962  -934.4291  981.0968 1060.6582 6795.0100

Summary of balance for matched data:
         Means Treated Means Control SD Control Mean Diff  eQQ Med eQQ Mean    eQQ Max
distance        0.5774        0.3629     0.2533    0.2145   0.1646   0.2146     0.4492
age            25.8162       25.3027    10.5864    0.5135   3.0000   3.3892     9.0000
educ           10.3459       10.6054     2.6582   -0.2595   0.0000   0.4541     3.0000
black           0.8432        0.4703     0.5005    0.3730   0.0000   0.3730     1.0000
hispan          0.0595        0.2162     0.4128   -0.1568   0.0000   0.1568     1.0000
nodegree        0.7081        0.6378     0.4819    0.0703   0.0000   0.0703     1.0000
married         0.1892        0.2108     0.4090   -0.0216   0.0000   0.0216     1.0000
re74         2095.5737     2342.1076  4238.9757 -246.5339 131.2709 545.1182 13121.7500
re75         1532.0553     1614.7451  2632.3533  -82.6898 152.1774 349.5371 11365.7100

Estimation of treatment effects can be obtained via paired or matched comparisons (Lanehart et al, 2012; Austin, 2010-see previous posts here and here)


t = 1.2043, df = 184, p-value = 0.23
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -579.6904 2396.0948
sample estimates:
mean of the differences
               908.2022 <---

This indicates an estimated treatment effect of about $900.00, which is quite a reversal from the raw uncontrolled/unmatched comparisons. In Mostly Harmless Econometrics, as part of the dialogue relating regression to matching, Angrist and Pischke present results in table 3.3.3 for regressions utilizing data that has been 'screened' by eliminating observations where ps > .90 or < .10. Similar results were obtained in R below:

summary(lm(re78~treat + age + educ + black + hispan + nodegree + married + re74 + re75, data = m.data3))

              Estimate Std. Error t value Pr(>|t|) 
(Intercept)    23.6095  3920.6599    0.01    0.995 
treat        1229.8619 <---  849.1439    1.45    0.148 
age            -3.4488    45.4815   -0.08    0.940 
educ          506.9958   240.1989    2.11    0.036 *
black       -1030.4883  1255.1766   -0.82    0.412 
hispan        926.5288  1498.4450    0.62    0.537

Again we get a reversal of the values from the raw comparisons, and a much larger estimated treatment effect than the non-paramterically matched comparisons above.

Note: Slightly different results were obtained from A&P, partly because I am not sure the exact specification of their ps model, which may have impacted the screening and ultimately differences in the data used.  Full R code is provided below with the ps model specification. Also, I have replicated similar results in SAS using the Mayo Clinic %gmatch macro as well as using approaches outlined by Lanehart et al (2012).  These results may be shared in a later post or white paper. 

See also: A Toy Instrumental Variable Application

R Program:

# *------------------------------------------------------------------
# | PROGRAM NAME: ex ps match mostly harmless R
# | DATE: 3/24/15 
# | PROJECT FILE: Tools and References/Rcode             
# *----------------------------------------------------------------
# | PURPOSE: Use R matchit and glm to mimic the conversation in 
# | chapter 3 of Mostly Harmless Econometrics              
# | NOTE: because of random sorting within matching application different results may be
# | obtained with each iteration of matchit in R
# *------------------------------------------------------------------
rm(list=ls()) # get rid of any existing data 
ls() # view open data sets
library(MatchIt) # load matching package
options("scipen" =100, "digits" = 4) # override R's tendency to use scientific notation
# raw differences in means for treatments and controls using regression
# nearest neighbor matching via MatchIt
# estimate propensity scores and create matched data set using 'matchit' and lalonde data
m.out1 <- matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = lalonde, method = "nearest", distance = "logit")
summary(m.out1) # check balance
m.data1 <-,distance ="pscore") # create ps matched data set from previous output
hist(m.data1$pscore) # distribution of propenisty scores
# perform paired t-tests on matched data
# lets look at regression on the ps restricted data on the entire sample per MHE chapter 3
m.data2 <- lalonde # copy lalonde for ps estimation
# generate propensity scores for all of the data in Lalonde 
ps.model <- glm(treat ~ age + educ + black + hispan + nodegree + married + re74 +  
re75, data = m.data2, family=binomial(link="logit"),
# add pscores to study data
m.data2$pscore <- predict(ps.model, newdata = m.data2, type = "response")
hist(m.data2$pscore) # distribution of ps
# restrict data to ps range .10 <= ps <= .90
m.data3 <- m.data2[m.data2$pscore >= .10 & m.data2$pscore <=.90,]
# regression with controls on propensity score screened data set
summary(lm(re78~treat + age + educ + black + hispan + nodegree + married + re74 + re75, data = m.data3))
# unrestricted regression with controls
summary(lm(re78~treat + age + educ + black + hispan + nodegree + married + re74 + re75, data = lalonde))
Created by Pretty R at

Applied Propensity Score Analysis with SAS: SAS Global Forum Papers

Below are two interesting SAS Global Forum papers implementing propensity score methods.

Paper 314-2012
Rheta E. Lanehart, Patricia Rodriguez de Gil, Eun Sook Kim, Aarti P. Bellara, Jeffrey D. Kromrey, and Reginald S. Lee, University of South Florida

This paper is a very good paper with a succinct review of the literature and very clear SAS code for implementation of several propensity score methods including caliper based matching, inverse probability of treatment weighted regression, and analysis of covariance. 

Paper 220-2013
Propensity Score-based Analysis of Short-Term Complications in Patients with Lumbar Discectomy in the ACS-NSQIP Database
Yubo Gao The University of Iowa Hospitals and Clinics, Iowa City, Iowa

What I find interesting about the above paper is the use of the hash object in the matching implementation (vs. arrays used in the previous paper)

I am currently working on a suite of SAS programs that utilize the Mayo Clinic %gmatch macro as well as parametrization of the programs presented in Lanehart (suitable for a macro call)

Friday, March 27, 2015

Propensity Score Matching: Optimal Caliper Width and Paired Comparisons

Below are abstracts from some really good propensity score matching papers (among many). Sure there are a lot of good papers out there (think Rosenbaum/Rubin etc.) but these are nice applied papers that I appreciate as a practitioner.

Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharmaceut. Statist. 2011, 10 150–161
Peter C. Austin

In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject’s measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences in means (for continuous outcomes) and risk differences (for binary outcomes). When estimating differences in means or risk differences, we recommend that researchers match on the logit of the propensity score using calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. When at least some of the covariates were continuous, then either this value, or one close to it, minimized the mean square error of the resultant estimated treatment effect. It also eliminated at least 98% of the bias in the crude estimator, and it resulted in confidence intervals with approximately the correct coverage rates. Furthermore, the empirical type I error rate was approximately correct. When all of the covariates were binary, then the choice of caliper width had a much smaller impact on the performance of estimation of risk differences and differences in means.

Comparing paired vs non-paired statistical methods of analyses when making inferences about absolute risk reductions in propensity-score matched samples. Statist. Med. 2011, 30 1292--1301
Peter C. Austin

Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk reductions) when statistical methods for independent samples are used compared with when statistical methods for paired samples are used in propensity-score matched samples. We found that compared with using methods for independent samples, the use of methods for paired samples resulted in: (i) empirical type I error rates that were closer to the advertised rate; (ii) empirical coverage rates of 95 per cent confidence intervals that were closer to the advertised rate; (iii) narrower 95 per cent confidence intervals; and (iv) estimated standard errors that more closely reflected the sampling variability of the estimated risk difference. Differences between the empirical and advertised performance of methods for independent samples were greater when the treatment-selection process was stronger compared with when treatment-selection process was weaker. We recommend using statistical methods for paired samples when using propensity-score matched samples for making inferences on the effect of treatment on the reduction in the probability of an event occurring.

Elizabeth Stuart has lots of addtional resources and links on her page, for instance:


Software (R/SAS etc.)

See also: Considerations in Propensity Score Matching

Thursday, March 26, 2015

To Explain or Predict

Some people may not make the important distinctions between prediction vs inference when it comes to modeling approaches/methodologies/data handling/assumptions.  I recently ran across a blog post by Rob J. Hyndman that pointed to the following article in the journal Statistical Science:

 Statist. Sci.
 Volume 25, Number 3 (2010), 289-310.

"Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process."

This is a nice article which I think complements Leo Brieman's paper discussed here before regarding two cultures of predictive modeling. Rob gives a nice synopsis of some of the main points from the paper:

  1. The AIC is better suited to model selection for prediction as it is asymptotically equivalent to leave-​​one-​​out cross-​​validation in regression, or one-​​step-​​cross-​​validation in time series. On the other hand, it might be argued that the BIC is better suited to model selection for explanation, as it is consistent.
  2. P-​​values are associated with explanation, not prediction. It makes little sense to use p-​​values to determine the variables in a model that is being used for prediction. (There are problems in using p-​​values for variable selection in any context, but that is a different issue.)
  3. Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-​​region of the predictors used when estimating the model.
  4. An ARIMA model has no explanatory use, but is great at short-​​term prediction.
  5. How to handle missing values in regression is different in a predictive context compared to an explanatory context. For example, when building an explanatory model, we could just use all the data for which we have complete observations (assuming there is no systematic nature to the missingness). But when predicting, you need to be able to predict using whatever data you have. So you might have to build several models, with different numbers of predictors, to allow for different variables being missing.
  6. Many statistics and econometrics textbooks fail to observe these distinctions. In fact, a lot of statisticians and econometricians are trained only in the explanation paradigm, with prediction an afterthought. That is unfortunate as most applied work these days requires predictive modelling, rather than explanatory modelling.

Rob also links Galit Shmueli's web page, (the author of the article above) who apparently has done some extensive research related to these distinctions.  Lots of additional resources (blog) here in this regard. Galit states:

"My thesis is that statistical modeling, from the early stages of study design and data collection to data usage and reporting, takes a different path and leads to different results, depending on whether the goal is predictive or explanatory." 

I touched on these distinctions before, but did not realize the extent of the actual work being done in this area by Galit.

Analytics vs Causal Inference
Big Data: Don't throw the baby out with the bath water

See also: Paul Allison on multicollinearity

Sunday, March 8, 2015

Are observational studies off the table or can you have your eggs and eat them too?

From the New York Times:

"But the primary problem is that nutrition policy has long relied on a very weak kind of science: epidemiological, or “observational,” studies in which researchers follow large groups of people over many years. But even the most rigorous epidemiological studies suffer from a fundamental limitation. At best they can show only association, not causation. Epidemiological data can be used to suggest hypotheses but not to prove them."

I remember being in a discussion once about the safety of GMO foods and someone made the remark that epidemiological studies were unreliable. I really was perplexed that someone could throw an entire field like epidemiology under the bus. They were most likely thinking about some of the claims made in an article I have referred to before here recently:

Deming, data and observational studies: A process out of control and needing fixing. 2011 Royal Statistical Society.  (link)

“Any claim coming from an observational study is most likely to be wrong.” Startling, but true. Coffee causes pancreatic cancer. Type A personality causes heart attacks. Trans-fat is a killer. Women who eat breakfast cereal give birth to more boys. All these claims come from observational studies; yet when the studies are carefully examined, the claimed links appear to be incorrect. What is going wrong? Some have suggested that the scientific method is failing, that nature itself is playing tricks on us. But it is our way of studying nature that is broken and that urgently needs mending, say S. Stanley Young and Alan Karr; and they propose a strategy to fix it."

This criticism of observational studies is interesting. I have discussed this before in relation to wellness program evaluation studies and generally concluded that those criticisms sort look like straw man arguments trying to hold observational studies against the golden standard of the randomized clinical trial. Obviously you would expect results from a RCT to be more reliable and replicable. Does this mean that we refuse to ask interesting questions or analyze important policy decisions just because a RCT is too expensive or impractical, when there are quasi-experimental approaches that could be used to advance our knowledge and understanding of important issues? Or in these cases are we just better off basing our understanding of the world on circumstantial and anecdotal evidence alone? 

Jayson Lusk has blogged about this recently: 

"What about the health impacts of meat consumption?  It is true that many observational, epidemiological studies show a correlation between red meat eating and adverse health outcomes (interestingly there is a fair amount of overlap on the authors of the dietary studies and the environmental studies on meat eating).  But, this is a pretty weak form of evidence, and much of this work reminds of the kinds of regression analyses done in the 1980s and 90s in economics before the so-called "credibility revolution."

So, perhaps some bad practices associated with p-hacking and lack of identification has given epidemiological research, and observational studies in general, a worse reputation than they deserve. And if Jayson is correct, a lot of these poor study designs lacking identification may have misled us about our diets and healthy food choices.