Friday, May 31, 2013

References: Social Network Analysis and Student Integration and Persistence

See also: An Introduction to SNA using R and NetDraw, SNA & Predictive Modeling, Using Twitter to Demonstrate Basic Concepts from SNA,  An Introduction to SNA with Applications

Social Psychology of Education
June 2012, Volume 15, Issue 2, pp 165-180
A social network analysis of student retention using archival data
James E. Eckles,
Eric G. Stradley 

This study attempts to determine if a relationship exists between first-to-second-year retention and social network variables for a cohort of first-year students at a small liberal arts college. The social network is reconstructed using not survey data as is most common, but rather using archival data from a student information system. Each student is given a retention score and an attrition score based on the behavior of their immediate relationships in the network. Those scores are then entered into a logistic regression that includes tradition background and performance variables that are traditionally significantly related to retention. Students' friends' retention and attrition behaviors are found to have a greater impact on retention that any background or performance variable.

University of San Francisco, California

 Used social network analysis to examine the role of social support networks in student persistence among residential and commuter students. Found that commuter students are less likely to persist, while residential students who reported making greater numbers of new friends with connections to the school also reported attaining personal and academic goals at a significantly greater rate.

The Journal of Higher Education   
Vol. 71, No. 5, Sep. - Oct., 2.
Ties That Bind: A Social Network Approach to Understanding Student Integration and Persistence
Scott L. Thomas

This study examined the social networks of college students and how such networks affect student commitment and persistence. The study's theoretical framework was based on application of the social network paradigm to Tinto's Student Integration Model, in which a student's initial commitment is modified over time as a result of the student's integration into the campus community. Freshmen enrolled for the spring 1993 semester responded (322 of 379) to the First-Year Experiences Survey, which involved identifying students with whom they frequently spoke and the dimensions on which they related to these students. Results were compared with enrollment data for the fall 1993 semester to identify students returning for their sophomore year. The largest effect on persistence was associated with the number of nominations received from other students, and this factor operated indirectly through enhanced social integration, institutional commitment, and intention. Overall, students with broader, well-connected networks were more likely to persist, whereas students with a higher proportion of ties falling within their social peer group were less likely to persist.

Physics Education Research Conference 2009
Part of the PER Conference series
Ann Arbor, Michigan: July 29-30, 2009
Volume 1179, Pages 105-108
Investigating Student Communities with Network Analysis of Interactions in a Physics Learning Center written by Eric Brewe, Laird H. Kramer, and George O'Brien 

 Developing a sense of community among students is one of the three pillars of an overall reform effort to increase participation in physics, and the sciences more broadly, at Florida International University. The emergence of a research and learning community, embedded within a course reform effort, has contributed to increased recruitment and retention of physics majors. Finn and Rock [1] link the academic and social integration of students to increased rates of retention. We utilize social network analysis to quantify interactions in Florida International University's Physics Learning Center (PLC) that support the development of academic and social integration,. The tools of social network analysis allow us to visualize and quantify student interactions, and characterize the roles of students within a social network. After providing a brief introduction to social network analysis, we use sequential multiple regression modeling to evaluate factors which contribute to participation in the learning community. Results of the sequential multiple regression indicate that the PLC learning community is an equitable environment as we find that gender and ethnicity are not significant predictors of participation in the PLC. We find that providing students space for collaboration provides a vital element in the formation of supportive learning community.

Selection Bias and the Rubin Causal Model and Potential Outcomes Framework

The problem of selection bias is best characterized within the Rubin Causal Model or potential outcomes framework (Angrist and Pischke,2008; Rubin, 1974; Imbens and Wooldridge, 2009, Klaiber & Smith,2009)

Suppose Yi is the measured outcome of interest. This can be written in terms of potential outcomes as:

Yi = { y1i if  di =1 ;y0i, if di= 0}

 =  y0i + (y1i- y0i)di

The causal effect of interest is y1i- y0i, but is unobservable because we don’t see both outcomes for a single individual. Reality forces us to compare outcomes for different individuals (those treated vs. untreated). 

Let di= choice or selection or treatment
Y0i= baseline potential outcome
Y1i = potential treatment outcome

What we actually measure is E[Y­­­i|di=1] - E[Y­­­i|di=0], the observed effect or observed difference between means for treated vs. untreated groups . The problem of non-random treatment selection can be characterized as follows:

E[Y­­­i|di=1] - E[Y­­­i|di=0] =E[Y1i-Y0i]  +{E[Y0i|di=1] - E[Y0i|di=0]}

The observed effect or difference is equal to the population average treatment effect  (ATE) E[Y1i-Y0i]  in addition to the bracketed term for selection bias. If the potential outcomes ‘Y0i’ for those that are treated (di=1) differ from potential outcomes ‘Y0i’ from those that are not treated or don’t self-select(di=0), then the term {E [Y0i|di=1] - E [Y0i|di=0]} could have a positive or negative value, creating selection bias. When we calculate the observed difference between treated and untreated groups  selection bias becomes confounded with the actual treatment effect E[Y1i-Y0i].   Note, if the potential outcomes of the treated and control groups were the same, then the selection bias term would equal zero, and the observed difference would represent the population average treatment effect.

If the term  { E[Y0i|di=1] - E[Y0i|di=0]} representing section bias is large enough, it can overpower the actual treatment effect and leave the naïve researcher to conclude (based on the observed effect E[Y­­­i|di=1] - E[Y­­­i|di=0] ) that the intervention  or treatment was ineffectual or lead them to under or overestimate the true treatment effects depending on the direction of the bias.

Rubin, D. B.(1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, Vol 66(5), Oct 1974, 688-701

Angrist, J. D. &  Pischke J. (2008). Mostly harmless econometrics: An empiricist's companion. Princeton University Press. 

Imbens, G. W. & Wooldridge, J.M.(2009). Recent developments in the econometrics of program  evaluation. Journal of Economic Literature, 47:1, 5–86

Klaiber, H.A. & Smith,V.K. (2009). Evaluating Rubin's causal model for measuring the capitalization of    environmental amenities.  NBER Working Paper No 14957. National Bureau of Economic  Research.

Wednesday, May 22, 2013

Regression Discontinuity Designs in Higher Ed Research

 Previously I posted a brief introduction to RD designs. Here are some applications in Higher Ed research in the areas of developmental education and financial aid:

van der Klaauw (2002). Estimating the effect of financial aid offers on college enrollment: A regression-discontinuity approach. International Economic Review.  43(4), 1249–1287.

An important problem faced by colleges and universities, that of evaluating the effect of their financial aid offers on student enrollment decisions, is complicated by the likely endogeneity of the aid offer variable in a student enrollment equation. This article shows how discontinuities in an East Coast college’s aid assignment rule can be exploited to obtain credible estimates of the aid effect without having to rely on arbitrary exclusion restrictions and functional form assumptions. Semiparametric estimates based on a regression–discontinuity (RD) approach affirm the importance of financial aid as an effective instrument in competing with other colleges for students.

Brian G. Moss  and William H. Yeaton 
Shaping Policies Related to Developmental Education: An Evaluation Using the Regression-Discontinuity Design. EDUCATIONAL EVALUATION AND POLICY ANALYSIS September 21, 2006 vol. 28 no. 3 215-229

Utilizing the regression-discontinuity research design, this article explores the effectiveness of a developmental English program in a large, multicampus community college. Routinely collected data
were extracted from existing records of a cohort of first-time college students followed for approximately 6 years (N = 1,473). Results are consistent with a conclusion that students’ participation in the program increases English academic achievement to levels similar to those of students not needing developmental coursework. The findings are also consistent with a conclusion that those students in greatest need of developmental English benefit the most from the program. This study provides an inexpensive, inferentially rigorous, program evaluation strategy that can be applied with few additional efforts to assess existing programs and to guide policy decisions.

Lasik, S.A. (2008). Evaluating developmental education programs in higher education. ASHE/Lumina Policy Brief Issue 4

A key benefi t of the regression-discontinuity design is that it effectively assesses the extent that developmental programs result in improving student retention and academic success.

Colcagno, J. C. and Long, B.T. (2008).  The Impact of Postsecondary Remediation Using a Regression Discontinuity Approach: Addressing Endogenous Sorting and Noncompliance. NCPR Working Paper.

Remedial or developmental courses are the most common policy instruments used to assist underprepared postsecondary students who are not ready for college-level coursework. However, despite its important role in higher education and its substantial costs, there is little rigorous evidence on the effectiveness of college remediation on the outcomes of students. This study uses a detailed dataset to identify the causal effect of remediation on the educational outcomes of nearly 100,000 college students in Florida, an important state that reflects broader national trends in remediation policy and student diversity. Moreover, using a Regression Discontinuity design, we discuss concerns about endogenous sorting around the policy cutoff, which poses a threat to the assumptions of the model in multiple research contexts. To address this concern, we implement methods proposed by McCrary (2008) and discuss the strengths of this approach. The results suggest math and reading remedial courses have mixed benefits. Being assigned to remediation appears to increase persistence to the second year and the total number of credits completed for students on the margin of passing out of the requirement, but it does not increase the completion of college-level credits or eventual degree completion. Taken together, the results suggest that remediation might promote early persistence in college, but it does not necessarily help students on the margin of passing the placement cutoff make long-term progress toward earning a degree.

Actually there is a nice bibliography of research related to RD designs provided by Keith Smolkowski here: 

I’ve pulled out most of the references related to higher education research and education in general:

Jacob, B. A., & Lefgren, L. (2004). Remedial education and student achievement: A regression-discontinuity analysis. Review of Economics and Statistics, 86(1), 226 -244.

Seaver, W. B., & Quarton, R. J. (1976). Regression-discontinuity analysis of Dean's List effects. Journal of Educational Psychology, 68(4), 459-465.

Wong, V.C., Cook, T. D., Barnett, W.S., & Jung, K. (2008). An effectiveness-based evaluation of five state pre-k programs. Journal of Policy Analysis and Management, 27(1), 122-154

Bloom, H. S. (2012). Modern regression discontinuity analysis. Journal of Research on Educational Effectiveness, 5(1), 43-82.

Ludwig, J., & Miller, D. L. (in press). Does Head Start improve children's life chances: Evidence from a regression discontinuity approach. Quarterly Journal of Economics, 122(1), 159-208

Gorard, S., & Cook, T. D. (2007). Where does good evidence come from? International Journal of Research and Method in Education, 30(3), 307-323.

Lesik, S. (2006). Applying the regression-discontinuity design to infer causality with non-random assignment. Review Of Higher Education, 30(1), 1-19.

Marsh, H. W. (1998). Simulation study of nonequivalent group-matching and regression-discontinuity designs: Evaluations of gifted and talented programs. Journal of Experimental Education, 66(2), 163-192.

Reardon, S. F., & Robinson, J. P. (2012). Regression discontinuity designs with multiple rating-score variables. Journal of Research on Educational Effectiveness, 5(1), 83-104
Schochet, P. Z. (2008). Technical methods report: Statistical power for regression discontinuity designs in education evaluations (NCEE 2008-4026). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Retrieved from the National Center
for Education Evaluation: 

Schochet, P. Z. (2009). Statistical power for regression discontinuity designs in education evaluations. Journal Of Educational And Behavioral Statistics, 34(2), 238-266

Schochet, P., Cook, T., Deke, J., Imbens, G., Lockwood, J.R., Porter, J., Smith, J. (2010). Standards for regression discontinuity designs. Washington DC: U.S. Department of Education, Institute of Education Sciences, What Works Clearinghouse. Retrieved from

Schumacker, R. E., & Mount, R. E. (2007). Regression discontinuity: Examining model misspecification. Paper presented at the 2007 Annual Meeting of the American Educational Research Association, April, 2007 Chicago,Illinois

Thistlethwaite, D. L., & Campbell, D. T. (1960). Regression discontinuity analysis: An alternative to the ex post facto experiment. Journal of Educational Psychology, 51(6), 309-317

Angrist, J. D., & Lavy, V. (1999). Using Maimonides' Rule to estimate the effect of class size on scholastic achievement. Quarterly Journal of Economics, 144(2), 533-576

Gamse, B. C., Jacob, R. T., Horst, M., Boulay, B., & Unlu, F. (2008). Reading First Impact Study Final Report (NCEE 2009-4038). Washington, DC: National Center for Education Evaluation and Regional Assistance Institute of Education Sciences, U.S. Department of Education. Retrieved from 

Regression Discontinuity Designs

Suppose a policy or intervention is implemented or a treatment is applied based on arbitrary values of some observed covariate value or values X0. If there is some positive relationship between ‘X’ and the outcome ‘Y’ then how do we know if a treatment applied to subjects where X > X0 isn’t biased since subjects with higher values of X are more likely to exhibit higher levels of the outcome variable Y anyway?  Is it valid to make comparisons of observed outcomes (Y) between groups with differing values of (X)?  One solution would be to implement matched comparisons between groups with similar values of covariates.  Regression discontinuity designs allow us to compare differences between groups in the neighborhood of the cutoff value X0 giving us unbiased estimates of treatment effects.

  • Treatment effects can be characterized by a change in intercept or main effect at the discontinuity.
  •  Treatment assignment is equivalent to random assignment within the neighborhood of the cutoff   (Lee & Lemieux,2010).
  • More complicated functional forms may be estimated:

Y = f(x) +ρ D + e where f(x) may be a pth order polynomial
  • Comparisons of outcomes in the neighborhood of X0 provide estimates of the treatment effect  ρ that does not depend on an exactly correct specification of the functional form of E[Y|X] (Angrist &Pischke, 2009)
  •  Even more complicated methods including local linear regression may be implemented

The above illustrates only one potential visualization of RD designs.  As illustrated below, treatment effects  can be visualized as discontinuities  or changes in either the intercept or slope or both at the cutoff X0


In Shaping Policies Related to Developmental Education: An Evaluation Using the Regression-Discontinuity Design,  the authors use RD design to assess the impact of developmental education on student success in subsequent level English courses :

They find that ‘students’ participation in the program increases English academic achievement to levels similar to those of students not needing developmental coursework.’ Note in this case, the treatment (developmental course work) is applied where X < X0 = 85, vs. where X > X0 in the cases I presented above. The discontinuity/treatment effect in this case is represented by a change in slope/interaction at the cutoff.


Brian G. Moss  and William H. Yeaton 
Shaping Policies Related to Developmental Education: An Evaluation Using the Regression-Discontinuity Design. EDUCATIONAL EVALUATION AND POLICY ANALYSIS September 21, 2006 vol. 28 no. 3 215-229

Imbens, Guido W. & Lemieux, Thomas, 2008. "Regression discontinuity designs: A guide to practice," Journal of Econometrics, Elsevier, vol. 142(2), pages 615-635, February.

Regression Discontinuity Designs in Economics
David S. Lee and Thomas Lemieux.
 Journal of Economic Literature 48 (June 2010)281-355


Mostly Harmless Econometrics. Angrist & Pischke. 2009.