At Cambridge Intelligence, we’re frequently asked about KeyLines’ limits – usually with a request for the maximum number of nodes you can see at once.
It’s a common misconception that if a few dozen nodes generate some insight, then a few hundred thousand nodes will generate lots of insight.
This is what happens when you try to visualize a dataset of 2000 nodes:
We think there are four main problems with visualizing very large networks:
When looking around for network visualization software, it’s more important to seek the functionality required to manage large networks, rather than just looking at the (artificial) node limits. There are, broadly, two strategies for visualizing very large networks:
Grouping nodes – nodes can be grouped based on shared properties, for example in this chart we group nodes based on their country. These can then be expanded on demand to view relationships between individuals, or between individuals and groups:
Merging nodes – charts often contain dirty data, especially if it is being pulled from multiple sources. By merging duplicate nodes, or multiple nodes that can be considered as one unit (e.g. a gang, cell, business unit, etc), large datasets can be simplified. This allows the user to explore the graph at the level of detail they require:
Filters – the ability to hide certain nodes from the chart is key to digesting large graphs. With KeyLines, it’s possible to incorporate filters based on any logic you chose, allowing users to focus on smaller details within the network.
Centrality measures – social network analysis centrality measures help users identify the most important nodes in a network – for example, those with the most connections (degree centrality), or those that most frequently lie on paths between other nodes (betweenness centrality). Using this information, it’s possible to isolate the connections and sub-networks most likely to be of interest.
One example of this is kCores – repeatedly filtering nodes from the chart by their degree centrality, until we’re left just with highly connected clusters of nodes:
There are a few valid use cases for looking at larger networks. One such use case is when investigating fraud rings with hundreds of nodes, or looking at the effect of a malware or botnet incident on an IT network.
In these cases, you should use of KeyLines’ WebGL renderer, which offers the best performance for large networks, but you should also plan for some degradation. Layouts will take a few seconds longer than usual, interactions will be less responsive than normal.
Your solution should be able to degrade gracefully, reassuring the user with a loading screen or progress bar when performing a heavy process.
If you’ve made it this far and still want to know a number then, sorry, we are going to have to disappoint.
The capacity of KeyLines is entirely dependent on your server hardware, database speed, the specification user’s machine and the browser being used, among other factors.
It’s worth noting, however, that KeyLines supports holding much larger volumes of data in memory than it is feasible to draw at once. Tens of thousands of nodes of data can be passed to the browser with sub-sets visualized on demand.
The best way to test KeyLines’ performance is to try it for yourself.
Request a free trial account to get started!
Read more blog posts about KeyLines.