In the article “Impact of social network structure on content propagation: A study using YouTube data” the authors investigate the relationship between socioemetric measures like degree centrality with diffusion of videos across a network. In other words, they wanted to know if there was a causal relationship between network properties of those that share videos and the likelihood that a video would become viral. What first interested me about this article was that it was a very good example of an application of social network analysis and viral seeding. However, it also provides some very good examples of applications related to generalized method of moments, instrumental variables, unobserved heterogeneity and endogeneity, and causal inference. I previously was not aware of the GMM style of dynamic panel data models that instrument with lags, which is apparently quite popular in many econometric applications (see references below).
As the authors point out, any model that relates network properties to the outcome of video dissemination requires a careful estimation strategy if we are interested in making causal inferences. They identity several sources of endogeneity and unobserved heterogeneity. If we are trying to infer dissemination based on one’s position in the network, we have to consider that other unobserved factors related to network position and video type could also impact dissemination. It may be the case that all we are trying to do is predict video shares based on network position, and perhaps that is OK as long as these correlations hold over time.
In contrast, if we want to make causal inferences, these types of endogeneity must be accounted for and also make econometric estimation difficult. In this case what we really want to estimate is the independent causal effect of network position on video shares, so we are interested only in the ‘quasi-experimental’ variation in network position.
A natural solution involves an instrumental variables approach, but the challenge of finding an ‘external’ instrument that is correlated with network and video properties of interest, but uncorrelated with unobserved effects is rather daunting. Ultimately the authors propose a generalized method of moments dynamic panel estimator using lagged variables as instruments.
Anderson, T. W., & Hsaio, C. (1981). Estimation of dynamic models with error components. Journal of the American Statistical Association, 76(375), 598–606.
Arellano, M., & Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies, 58, 277–97.
DYNAMIC PANEL DATA MODELS:
A GUIDE TO MICRO DATA METHODS AND PRACTICE
THE INSTITUTE FOR FISCAL STUDIES
DEPARTMENT OF ECONOMICS, UCL
cemmap working paper CWP09/02
Impact of social network structure on content
propagation: A study using YouTube data
Quant Mark Econ (2012) 10:111–150