Thursday, January 24, 2019

Modeling Claims with Linear vs. Non-Linear Difference-in-Difference Models

Previously I have discussed the issues with modeling claims costs. Typically medical claims exhibit non-negative highly skewed values with high zero mass and heterskedasticity. The most commonly suggested approach to addressing these distributional concerns in the literature call for the use of non-linear GLM models.  However, as previously discussed (see here and here) there are challenges with using difference-in-difference models in the context of GLM models. So once again, the gap between theory and application presents challenges, tradeoffs, and compromises that need to be made by the applied econometrician.

In the past I have written about the accepted (although controversial in some circles) practice of leveraging linear probability models to estimate marginal effects in applied work when outcomes are dichotomous. But what about doing this in the context of claims analysis. In my original post regarding the challenges of using difference-in-differences with claims I speculated:

"So as Angrist and Pischke might ask, what is an applied guy to do? One approach even in the context of skewed distributions with high mass points (as is common in the healthcare econometrics space) is to specify a linear model. For count outcomes (utilization like ER visits or hospital admissions are often dichotomized and modeled by logit or probit models) you can just use a linear probability model. For skewed distributions with heavy mass points, dichotomization with a LPM may also be an attractive alternative."

But at the time I really had not seen examples in the healthcare econometrics literature where this was actually recommended. Recently I have found that this advice is pretty consistent with the social norms and practices in the field.

In their analysis of the ACA Cantor, et al (2012) leverage linear probability models for difference-in-differences for healthcare utilization stating:

"Linear probability models are fit to produce coefficients that are direct estimates of the relevant policy impacts and are easily interpreted as percentage point changes in coverage outcomes. This approach has been applied in earlier evaluations of insurance market reforms (Buchmueller and DiNardo 2002; Monheit and Steinberg Schone 2004;  Levine, McKnight, and Heep 2011;  Monheit et al. 2011). It also avoids complications associated with estimation and interpretation of multiple interaction terms and their standard errors in logit or probit models (Ai and Norton 2003)."

Jhamb et al (2015) uses LPMs for dichotomous outcomes as well as OLS models for counts in a DID framework.

Interestingly, Deb and Norton (2018) discuss an approach to address the challenges of DID in a GLM framework head on:

"Puhani argued, using the potential outcomes framework, that the treatment effect on the treated in the difference-in-difference regression equals the expected value of the dependent variable for the treatment group in the post period with treatment compared with the hypothetical expected value of the dependent variable for the treatment group in the post period if they had not received treatment. In nonlinear models, the treatment effect on the treated equals the difference in two predicted values. It always has the same sign as the coefficient on the interaction term. Because we estimate many nonlinear models using a difference-in-differences study design, we report the treatment effect on the treated in all tables of results."

In presenting their results they compare their GLM based approach to results from linear models of healthcare expenditures. While they argue the differences are substantial in supporting their approach, I did not find the OLS estimate (-$323.4) to be practically different from the second part (conditional on positive) of the two part GLM model (-$321.4), although the combined results from the two part model had large practical differences from OLS. It does not appear they compared a two-part GLM to a two-part linear model (which could be problematic if the first part OLS model gave probabilities greater than 1 or less than zero). In their paper they cited a number of authors using linear difference-in-differences to model claims you will find below.

See the references below for a number of examples (including those cited above).

Related: Linear Literalism and Fundamentalist Econometrics


Cantor JC, Monheit AC, DeLia D, Lloyd K. Early impact of the Affordable Care Act on health insurance coverage of young adults. Health Serv Res. 2012;47(5):1773-90.

Modeling Health Care Expenditures and Use
Partha Deb and Edward C. Norton
Annual Review of Public Health 2018 39:1, 489-505

Buchmueller T, DiNardo J. “Did Community Rating Induce an Adverse Selection Death Spiral? Evidence from New York, Pennsylvania and Connecticut” American Economic Review. 2002;92(1):280–94.

Monheit AC, Cantor JC, DeLia D, Belloff D. “How Have State Policies to Expand Dependent Coverage Affected the Health Insurance Status of Young Adults?” Health Services Research. 2011;46(1 Pt 2):251–67

Amuedo-Dorantes C, Yaya ME. 2016. The impact of the ACA’s extension of coverage to dependents on young adults’ access to care and prescription drugs. South. Econ. J. 83:25–44

Barbaresco S, Courtemanche CJ, Qi Y. 2015. Impacts of the Affordable Care Act dependent coverage provision on health-related outcomes of young adults. J. Health Econ. 40:54–68

Jhamb J, Dave D, Colman G. 2015. The Patient Protection and Affordable Care Act and the utilization of health care services among young adults. Int. J. Health Econ. Dev. 1:8–25

Sommers BD, Buchmueller T, Decker SL, Carey C, Kronick R. 2013. The Affordable Care Act has led
to significant gains in health insurance and access to care for young adults. Health Aff. 32:165–74

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.