How to visualize very large networks and still uncover insight

We’re often asked about visualizing very large networks. It’s a common belief that if a few nodes can generate some insight, a few thousand nodes will generate lots of insight. That’s not automatically the case. Many networks are too densely connected to be usefully visualized in one go.

In this blog, we’ll explore some strategies for visualizing large networks in a meaningful way.

Visualizing huge graph datasets

Here’s a network with 20,000 nodes and 20,000 links.

A randomly-generated network of 20,000 nodes and 20,000 links
A randomly-generated network of 20,000 nodes and 20,000 links

Although the powerful graph layout does a good job of highlighting the overall network structure, the amount of insight we can get from this chart is limited. It demonstrates the four main challenges of visualizing very large networks:

Limited pixels – your computer monitor has a limited number of screen pixels. The more nodes and links you try to cram into a chart, the less you’ll be able to learn about them.

Limited human brainpower – the human brain is an incredible thing, but most adults struggle to store more than 7 things in their short term memory at once, making it almost impossible to interpret a noisy chart with 20,000 nodes.

Graph hairball – in a large connected dataset, the number of links increases exponentially with nodes. Eventually, you’ll get such a densely connected graph that it’s beyond the help of any automated layout.

Limited computer processing – while processors and graphics rendering technologies are getting more and more powerful, it’s still the case that bigger graphs will mean reduced performance – with slower layouts and a laggy user experience.

There are two ways to get around these challenges.

Reduce graph density

Before tackling the scale of this data, we should first look to reduce its density. This means taking away everything unnecessary that’s cluttering the chart.

Aggregate your data

The obvious starting point for decluttering your data is to remove or merge duplicate and unneeded nodes.

Normally this happens on the back-end of your application, where database queries can tidy away millions of unnecessary data points in seconds. But you can also give users some front-end functionality to merge duplicates that materialize during the investigation process.

Aggregating data by combining nodes

Simplifying the data model

It’s tempting to use the same data model in your back-end and in your visualization, but this is rarely the best approach for your users.

Take this insurance fraud example. Initially, we’re loading all the data related to a case – policies, claims, claimants, incidents, repair garages, etc. In the second view, we show only claimants and garages, removing all the intermediates nodes and connections.

Simplifying the data model removes chart clutter to reveal insight

If the end-user needs to understand relationships between claimants and garages, the second view is much more effective.

Combine nodes

Combos are visual groupings of nodes and links. They can be opened, closed and nested, giving a really powerful way to tidy up charts.

Combos are also an example of the second strategy for visualizing large graph datasets.

Combine nodes to reduce clutter without losing nodes

Looking for a way to visualize your big network data?
Start a free trial

Show sub-sections of the graph

Now we’ve tackled graph density, we can start exploring the data in a meaningful way.

Just as combos can help users tidy noisy charts, they also let users explore graphs on-demand, digging deeper into datasets.

Network filtering

Filters are also great at this, giving users the ability to add or remove data from their view. One handy technique is to present users with an empty chart, and allow them to add data iteratively as required:

Start with an empty chart, and bring data in on demand

Social network centrality measures

You could also consider combining filters with social network analysis centrality measures. These algorithms identify the most important nodes in a network, based on their relative connectivity.

This information reveals connections and sub-graphs most likely to be of interest. One example of this is kCores – repeatedly filtering nodes from the chart by their degree centrality, until we’re left just with highly connected clusters of nodes:

Removing nodes from a chart based on their connectivity, revealing a single densely connected core

What if I really need to see a huge network?

Despite everything we’ve said above, it can sometimes be useful to visualize a very large network before digging into specific details.

In these cases, you should make sure your visualization tool harnesses modern rendering technology and layouts optimized for performance.

There are some visual tricks you can deploy too. Link gradients make network clusters more visible from a distance, and you can adapt your node, link and labeling styles based on zoom level.

Adaptive styling makes sure your network looks good at any zoom level

Finally, try to ensure your solution degrades gracefully too. Reassure users with a loading screen or progress bar when performing a heavy process.

A loading screen reassures users that their graph is coming
A loading screen reassures users that their graph is coming

Try it for yourself

Want to turn your big graph datasets into insight? We’ve built two network visualization toolkits, optimized for your biggest graph datasets. KeyLines is our network visualization toolkit for JavaScript, and ReGraph is for React developers.

Request a trial of our network visualization toolkits to get started.

This post was originally published some time ago. It’s still popular, so we’ve updated it with fresh content to keep it useful and relevant.

More from our blog

Visit our blog

Registered in England and Wales with Company Number 07625370 | VAT Number 113 1740 61 | 6-8 Hills Road, Cambridge, CB2 1JP. All material © Cambridge Intelligence 2020.