Tuesday, November 20, 2012

An Intuitive Approach to Eigenvector Centrality using SAS IML

A network can be thought of in terms of graph theory as a set of vertices connected by ties. The vertices can include individuals, teams, government agencies, organizations, facebook group members, topics, patents,etc. (Coulon,2005). The ties, represented as lines connecting the vertices are referred to as ‘edges.’ Edges therefore indicate the connections between individuals, organizations etc. Vertices and connections can be represented by what’s referred to as an adjacency matrix ‘A’, where Aij = 1 if there is an edge between vertices ‘i’ and ‘j’, and Aij = 0 otherwise. Adjacency matrices are symmetric, in that Aij= Aji. For example, the following adjacency matrix corresponds to the network depicted below:



Who’s who in a network?
The role a given vertice plays in the cohesiveness of the network (its ‘importance’ or ‘centrality’ )can be assessed by a number of social network metrics. 

Degree Centrality is simply the number of connections (or edges) a vertex has to other vertices. Its clear from counting that #1’s degree centrality is 3, while vertex #4 has a measure of degree centrality equal to 1. 

In terms of matrix algebra, if we can obtain this result as a vector (one entry per vertice) as follows:
x= (1,1,1,1)  degree = A’x 

Eigenvector Centrality is a measure that reflects the fact that not all connections are equal, and in fact, connections to people that are more influential are more important (Newman, 2012).  Mathematically, the measure of eigenvector centrality is derived from the weights that consist of the values from the leading eigenvector  (the eigenvector associated with the largest eigenvalue) of the adjacency matrix depicting the network. 

Recall, the value λ is an eigenvalue (or vector of eigenvalues) of the matrix A, and x is an eigenvector of A if the following equation holds:

λ x = A x

So, for the network depicted above, the centrality measures for each vertex = 1,2,3,4 can be derived from the components (x1,x2,x3,x4) that comprise the eigenvector x, and what we are interested in for this metric is the leading eigenvector (associated with the largest eigenvalue of A). 

Eigenvectors and eigenvalues can be derived by solving:

(A-λ I)x = 0
This metric can be found using canned functions in SAS and R. In SAS IML, the eigen function will allow you to obtain the leading eigenvector  x =( 0.6116285,0.5227207,0.5227207,0.2818452). 

A Path Centric Approach to Centrality

One way of looking at centrality in terms of connectivity is counting the number of paths emanating from a vertice to other vertices in the network.  For example, degree represents the number of paths of length 1 to other vertices in the network.  To look at this further, lets define the matrix B as:
B = A + I, where I is the identity matrix of A. The elements of B represent the number of paths of length 1 from a vertice ‘i’ to another vertice ‘j’. The row sums of B tell us the total number of paths of length one emanating for a given vertice (counting each vertice itself as 1). We can store this sum for each vertice in a vector ‘d’ obtained by multiplyting B’x



This gives us B’x = d1 = (4,3,3,2)

Now let’s define the matrix power of such that B2 = B*B


Each entry in B2 represents the number of paths of length 2 from vertice i to vertice j.
The sum  d2 = B2 *x = d2 = (12,10,10,6) is the total number of paths of length 2 originating from each corresponding vertice.

So Bk is a matrix with elements indicating the number of paths of length k between vertices i and j. The product Bk*x gives the sum of all paths of length k emanating from a given vertice. What we find is that as we consider longer and longer paths (as k becomes large) the normalized vector:

v = Bk*x / || Bk*x||   will approach the leading eigenvector of B (which consequently is also the leading eigenvector of A)

v1 = B1*x / || Bk*x||    = (0.6488857,0.4866643,0.4866643,0.3244428)
v5 = B5*x / || Bk*x||   =  (0.6121647,0.521951,    0.521951,0.2835289)
v10 = B10*x / || Bk*x||   =( 0.6116348,0.5227115,0.5227115,0.2818657)

Recall the leading eigenvector of A:

 x = ( 0.6116285,0.5227207,0.5227207,0.2818452)

So, what we have is a notion of eigenvector centrality as a metric that captures the number of paths emanating from a given vertice. A vertice that is connected to other vertices that also have lots of connections will have a greater number of k-length paths originating from it. This is information is reflected by eigenvector centrality. It turns out that the iterative  calculation of v above is essentially  the power iteration algorithm for calculating the leading eigenvector of a matrix B.

The following code from SAS IML and output provides an illustration of the preceding discussion.  In addition I have provided the function written by Rick Wicklin that implements the power iteration algorithm used to obtain the leading eigenvector and eigenvalue of a given matrix A.

REFERENCES:
Justification and Application of Eigenvector Centrality by Leo Spizzirri     https://www.math.washington.edu/~morrow/336_11/papers/leo.pdf

The power method: compute only the largest eigenvalue of a matrix.  The Do Loop. By  Rick Wicklin.

SAS IML MATRIX PROGRAMMING: 
 
proc iml;

*specify adjacency matrix A;
A = {0 1 1 1,
     1 0 1 0,
      1 1 0 0,
      1 0 0 0};

x = {1,1,1,1};

* Degree centraltiy;

d1 = A*x;
print(A);
print(x);
print(d1);

call eigen(u,v,A);
print v;

B = A + I(4);
print B;

d1 = B*x;
print d1;

B2 = B**2;
print B2;

* Degree Centrality including stopovers;

d1 = B*x;
print(d1);

d2 = (B**2)*x;
print(d2);

d3 = (B**3)*x;
print(d3);

* as k--> infinity, the normalized vector v approaches the leading eigenvector of A and/or B;
v1 =  d1/sqrt(d1[##]);
print v1;

d5 = (B**5)*x;
print(d5);

v5 =  d5/sqrt(d5[##]);
print v5;

d10 = (B**10)*x;
print(d5);

v10 =  d10/sqrt(d10[##]);
print v10;

run; quit;

* Rick Wicklin's Power Method Function;
proc iml;
     start PowerMethod(v, A, maxIters);
   /* specify relative tolerance used for convergence */
   tolerance = 1e-6;
   v = v / sqrt( v[##] );  /* normalize */
   iteration = 0; lambdaOld = 0;

   do while ( iteration <= maxIters);
      z = A*v;                /* transform */
      v = z / sqrt( z[##] );  /* normalize */
      lambda = v` * z;
      iteration = iteration + 1;
      if abs((lambda - lambdaOld)/lambda) < tolerance then
         return ( lambda );
      lambdaOld = lambda;
   end;
   return ( . ); /* no convergence */
finish;

* try our data;
A = {0 1 1 1,
     1 0 1 0,
      1 1 0 0,
      1 0 0 0};

v = {1,1,1,1}; 

lambda = PowerMethod(v, A, 40 );
print lambda;
print v;

run;quit;