EigenCentrality & PageRank

2nd November, 2015

In among the new features and enhancements of KeyLines 2.11 were two great new social network analysis measures: EigenCentrality and PageRank.

How do they work? When should you use them? Let’s take a closer look…

EigenCentrality: Network Influence

EigenCentrality is one of the Social Network Analysis (SNA) centrality measures available in the KeyLines SDK. They help you pinpoint important nodes.

Like degree centrality, EigenCentrality measures a node’s influence by counting the number of links it has to other nodes within the network. However, EigenCentrality goes a step further by also taking into account how well connected a node is, and how many links their connections have, and so on through the network.

KeyLines calculates each node’s EigenCentrality by converging on an eigenvector using the power iteration method. That means our algorithm generates random vectors and multiplies them through an adjacency matrix (a matrix summary of the connections between nodes) until the corresponding eigenvalue is found.

What does EigenCentrality tell me?

A high EigenCentrality score indicates a strong influence over other nodes in the network. It is useful because it indicates not just direct influence, but also implies influence over nodes more than one ‘hop’ away.

A good example is Bill Williams in our Enron Demo – a visualization of the 1.6 million emails published by the Federal Energy Regulation Commission.

Eigen1
Eigen1

Degree centrality (left) and EigenCentrality (right)

The left image shows nodes sized by degree (i.e. their number of links) which makes Bill look important.

The right screenshot sizes nodes by EigenCentrality. This view makes Bill seem much less important. This is because he has only one connection back to the wider network – via Timothy Belden – who himself is relatively disconnected from the network’s powerbase:


So, a node may have a high degree score (i.e. many connections) but a relatively low EigenCentrality score if many of those connections are with similarly low-scored nodes.

Also, a node may have a high betweenness score (indicating it connects disparate parts of a network) but a low EigenCentrality score because it is still some distance from the centers of power in the network.

We can see that here with John Lavorato – he’s in the center of the network topologically, but lacks Tana Jones’ volume of connections to high powered nodes:

eigen

PageRank: The Google Algorithm

PageRank is a variant of EigenCentrality, designed and made famous by Google founders Larry Page and Sergei Brin.

Designed for ranking webpages, PageRank uses links between pages as a measure of importance. Each webpage is treated as a node in a network, and is assigned a score based upon its number of in-coming links (its ‘indegree’). These links are also weighted depending on the relative score of its originating node.

The result is that nodes with many in-coming links are influential, and nodes to which they are connected share some of that influence.

What does PageRank tell me?

Like EigenCentrality, PageRank can help uncover influential or important nodes whose reach extends beyond just their direct connections.

The main difference to EigenCentrality, in KeyLines at least, is that PageRank takes link direction and weight into account*. This makes it a more useful measure in certain scenarios, including:

  • Understanding citations (e.g. patent citations, academic citations)
  • Visualizing network activity / propagation of malware
  • Modeling the impact of SEO and link building activity (although PageRank is now just one of many ranking algorithms used by Google)

* Although the algorithm can be modified not to consider link direction.

Let’s take a look at PageRank in action with our Enron data demo.


Let’s follow one employee. This screenshot shows the network with no centrality measures applied. We’ve selected Barry Tycholiz. Let’s see how he appears with EigenCentrality applied:


He remains a fairly small node in the network. He has relatively few connections (to only 9 other nodes) and seems pretty insignificant.

Let’s try PageRank:

PageRank

Despite his limited connections, Barry balloons to the largest node in the network when PageRank is applied. He is one of the few nodes in the network receiving in-coming links from highly influential nodes. This has pushed his PageRank score up significantly.

A quick Google confirms that Barry Tycholiz was VP of Enron North America – an important node in the network that we may not have identified with the other centrality measures.

Find the right measure for the job

Understanding network dynamics and influence can be a game of trial and error. Different measures are better suited to certain scenarios or datasets.

KeyLines has five different SNA measures, each designed to uncover different kinds of influence. To learn more…

Download the White Paper

Subscribe to our newsletter

Get occasional data visualization updates, stories and best practice tips by email