Given the required data and processing requirements for sociometric measures like betweeness, k-betweeness, or eigenvector centrality, and the fact that we may not always be able to characterize an entire network (due to missing nodes and edges) we might want to consider, how robust are metrics derived from incomplete network data? It is much easier in terms of both computational and data requirements to obtain measures of degree centrality vs. eigenvector centrality or even betweenness. Is degree similar enough to other measures of centrality to use in our analysis instead? Although not a complete literature review, the articles below investigate these issues.
The stability of centrality measures when networks are sampled
Elizabeth Costenbader a,∗, Thomas W. Valente.
Social Networks 25 (2003) 283–307
"Our results indicate relatively high correlation, albeit in some instances substantial absolute differences, between actual network properties and those calculated on randomly selected sub-samples for some network measures. This indicates that under some circumstances researchers may be still be able to use network data for which some data are missing to study network properties or create network-based interventions. In other words, researchers who do not interview all members of a community or network may still be able to take advantage of some aspects of network theory and techniques."
Their remarks on eigenvector centrality are particularly interesting.
"As noted previously, the stability of eigenvector centrality when calculated as a simple raw score may indicate that it is the preferred centrality measure when the network data are incomplete. However, the fact that sampling has less effect on this centrality measure may be due to the fact that in comparison to the other centrality measures, which measure the ones (i.e. the actual nominations), this measure is able to effectively capture the similarity of zeros. Since many of the studies restrict nominations to five people, there are a lot of zeros in the original networks. Consequently, eigenvector centrality as a simple raw score is less affected by sampling from the networks as the zeros are preserved."
How Correlated Are Network Centrality Measures? Thomas W. Valente, PhD,
University of Southern California, Department of Prevention Research, Los Angeles
Connect (Tor). 2008 January 1; 28(1): 16–26.
Kathryn Coronges, MPH, University of Southern California, Department of Prevention Research, Los Angeles
Cynthia Lakon, PhD, and University of Southern California, Department of Prevention Research, Los Angeles
Elizabeth Costenbader, PhD
Research Triangle Institute, Raleigh North Carolina
Investigates the the correlation among four centrality measures: degree, betweenness, closeness, and eigenvector and calculates 9 versions of these measures for 58 existing social networks previously analyzed by Costenbader and Valente (2003).
From the article:
"We correlated the 9 measures for each network and then calculated the average correlation, standard deviation, and range across centrality measures. We also calculated the overall correlation and compared it by study to assess the degree of variation in average correlation between studies."
"We find strong but varied correlations among the 9 centrality measures presented here. The average of the average correlations was 0.53 with a standard deviation of 0.14, indicating that most correlations would be considered strong. The level of correlation among measures seems nearly optimal - too high a correlation would indicate redundancy and too low, an indication that the variables measured different things. The amount of correlation between degree, betweenness, closeness, and eigenvector indicates that these measures are distinct, yet conceptually related."
A summary of the correlations for degree, betweenness and eigenvector centrality as reported in the article can be found below:
Borgatti, S.P., Carley, K., and Krackhardt, D. (2006). Robustness of Centrality Measures under Conditions of Imperfect Data. Social Networks 28: 124–136.
Many empirical studies approached the relationship between centrality measures across networks and in the context of missing data by using empirical data from actual networks. Borgatti points out a limitation of this approach:
"A limitation of this approach is that the sampling errors contained in the data are likely to be systematic, but the pattern is unknown. Another limitation is that the sample of networks is necessarily very limited. To overcome these limitations, we take a statistical computational approach and examine robustness in a very large sample of random graphs, into which we introduce a controlled amount and kind of error."
In the article, measures of degree, betweenness, closeness and eigenvector centrality from 'sampled' networks were compared to the 'actual' values from the complete networks. These comparisons were based on 5 measures of robustness (discussed in the article).
"the four centrality measures behave virtually identically in the face of measurement error. This suggests that the distinction between local and global measures of centrality (Scott, 2000) is not as important as previously thought. These results are consistent with those of Everett and Borgatti (2004) who found that betweenness calculated on ego networks (a local measure) was, on average, nearly identical to betweenness calculated on the full network in which ego networks were embedded (a global measure)."
Seeding Strategies for Viral Marketing: An Empirical Comparison
Oliver Hinz, Bernd Skiera, Christian Barrot, & Jan U. Becker
Journal of Marketing, Volume 75, Number 6, November 2011
Consistent with the results from some of the previous research, this article concludes:
"Remarkably, we reveal that to target a particular subnetwork (e.g., students of a particular university, Study 2) with a viral marketing message, the use of the respective subnetwork’s sociometric measures is not absolutely required to implement the desired seeding strategies. Instead, because the sociometric measures of subnetworks and their total network are highly correlated, marketers can use the socio- metric measures of the total network, without undertaking the complex task of determining exact network boundaries. Conversely, this appealing result also allows marketers to feel confident in inferring the connectivity of a person in an overall network from information about his or her connec- tivity in a natural subnetwork."