How to visualize very large networks

30th June, 2014

At Cambridge Intelligence, we’re frequently asked about KeyLines’ limits – usually with a request for the maximum number of nodes you can see at once.

It’s a common misconception that if a few dozen nodes generate some insight, then a few hundred thousand nodes will generate lots of insight.

Visualizing huge datasets

This is what happens when you try to visualize a dataset of 2000 nodes:

visualizing a large network
An example of an unhelpful chart – 2000 nodes & 2000 links

The problems visualizing large networks

We think there are four main problems with visualizing very large networks:

  • Limited number of pixels – your monitor only has a limited number of screen pixels. Even using tiny nodes, eventually you won’t be able to discern between them.
  • Limited computer processing power – if you expect you machine to process large networks, then you should be willing to accept a reduction in performance (e.g. very slow layouts and laggy applications).
  • Limited user brainpower – we also need to consider the processing abilities of the users. Realistically, the human brain can only interpret at most a few hundred nodes in one chart. This reduces to a few dozen nodes when dealing with detail.
  • The network hairball – in a large connected dataset, the number of links increases exponentially with nodes. Eventually this results in such a densely connected network that it’s beyond the help of any automated layout:
network visualization hairball
A network visualization hairball

Tools for visualizing large networks

When looking around for network visualization software, it’s more important to seek the functionality required to manage large networks, rather than just looking at the (artificial) node limits. There are, broadly, two strategies for visualizing very large networks:

1. Reduce Network Density

Grouping nodes – nodes can be grouped based on shared properties, for example in this chart we group nodes based on their country. These can then be expanded on demand to view relationships between individuals, or between individuals and groups:

Combining indidviduals by their country, reducing the network’s density

Merging nodes – charts often contain dirty data, especially if it is being pulled from multiple sources. By merging duplicate nodes, or multiple nodes that can be considered as one unit (e.g. a gang, cell, business unit, etc), large datasets can be simplified. This allows the user to explore the graph at the level of detail they require:

This examples shows a network of email accounts, with nodes being merged by department

2. Showing sub-sections of the network

Filters – the ability to hide certain nodes from the chart is key to digesting large graphs. With KeyLines, it’s possible to incorporate filters based on any logic you chose, allowing users to focus on smaller details within the network.

Centrality measures – social network analysis centrality measures help users identify the most important nodes in a network – for example, those with the most connections (degree centrality), or those that most frequently lie on paths between other nodes (betweenness centrality). Using this information, it’s possible to isolate the connections and sub-networks most likely to be of interest.

One example of this is kCores – repeatedly filtering nodes from the chart by their degree centrality, until we’re left just with highly connected clusters of nodes:

In this example, a network is being filtered by kCore score – reducing the network until we only see the few most densely-connected nodes.

What if I really need to see a huge network?

There are a few valid use cases for looking at larger networks. One such use case is when investigating fraud rings with hundreds of nodes, or looking at the effect of a malware or botnet incident on an IT network.

In these cases, you should use of KeyLines’ WebGL renderer, which offers the best performance for large networks, but you should also plan for some degradation. Layouts will take a few seconds longer than usual, interactions will be less responsive than normal.

Your solution should be able to degrade gracefully, reassuring the user with a loading screen or progress bar when performing a heavy process.

loading screen
A progress bar can reassure users when loading a very large network

So… How many nodes can KeyLines load?

If you’ve made it this far and still want to know a number then, sorry, we are going to have to disappoint.

The capacity of KeyLines is entirely dependent on your server hardware, database speed, the specification user’s machine and the browser being used, among other factors.

It’s worth noting, however, that KeyLines supports holding much larger volumes of data in memory than it is feasible to draw at once. Tens of thousands of nodes of data can be passed to the browser with sub-sets visualized on demand.

Try it for yourself

The best way to test KeyLines’ performance is to try it for yourself.

Request a free trial account to get started!

Subscribe to our newsletter

Get occasional data visualization updates, stories and best practice tips by email