A few weeks ago, four investigative journalists from Buzzfeed released an intriguing new dataset called TrumpWorld.
It’s a valiant attempt to document the sprawl of organizations and individuals connected to the new administration. Released as a series of spreadsheets covering more than 1500 entities, the dataset came with a request: that others explore, enrich and extend the data to help build a complete picture of the network surrounding President Trump. So we decided to take a look with KeyLines.
It’s always fun to play with a new dataset, especially one as relevant to current events as the business interests of the US President. But this dataset was interesting because it posed a common graph visualization challenge: the hairball. This is when connections become so dense, they cannot be usefully visualized.
In this blog post, we’ll explore how we used KeyLines’ graph visualization capability to detangle the TrumpWorld dataset.
Step 1: Defining the graph model
The first step is to model the graph. With any graph, it’s important to keep this simple, especially with a potential hairball network.
The Buzzfeed team made this task easier by releasing the data as a set of connections:
- Organization to organization connections
- People to people connections
- People to organization connections
Which means we only have two types of node. We’ll color-code them for a simple visual:
Ideally, we’d also color-code the links, but the data includes nearly 500 categories of connections making it unfeasible without further classification. Instead we can use on-hover labels, to avoid overwhelming the user.
Scrolling through the data, we noticed how many of the companies are named after Donald Trump himself. Using a halo, we’ll highlight nodes containing any combination of Donald, Trump, DJT or DT:
The standard automatic graph layout does a good job of revealing some network features like clusters around the network periphery. It also reveals a large cluster to the left of the central node (US Secretary of Commerce, Wilber Ross). But we’re still left with an unusable hairball.
Step 2: Managing super nodes
Although the network looks good so far, a central ‘supernode’ distorts it. This is a common problem with many graph datasets. They focus on an individual data point, creating a hairball of connections that conceals possible connections elsewhere.
To fix this, we can:
- use a filter to remove / add the Trump Node from the chart
- add a glyph to those nodes previously directly connected to the Trump Node
This removes potential clutter from the chart, yet still lets us see which nodes have a direct connection to Donald Trump.
Step 3: Removing orphan nodes
Removing the Trump Node reveals a large number of disconnected nodes, called orphan nodes:
Because they’re disconnected from the remaining network, these nodes are often of less interest to an investigator.
Step 4: Pruning leaf nodes
Another efficient way to remove unnecessary chart clutter is to filter leaf nodes – nodes that have no child downward connections. In most circumstances, these nodes are less important parts of the network structure, and removing them reveals core structures:
Now we have a slimmed-down network ready to explore:
Much of the network is made up of unremarkable pairs of nodes, mostly companies acting as parent companies for subsidiaries. With the Trump Node and leaf nodes hidden, it shows that there’s really not much to see:
There’s also this star-shaped structure of organizations, bottom-left:
On closer inspection, this is almost entirely made up of grey halos and red T-glyphs. It’s just a more-complicated-than-most ownership chain of Trump subsidiaries.
Depending on your investigative goal, you may want to further explore these inter-organizational relationships. In this case, however, the relationships between people and organizations are more interesting – drawing us to the larger network in the top left of the chart.
Step 5: Run SNA measures
We’ve already reduced our network’s complexity significantly, so now let’s use KeyLines’ social network analysis measures to pick out nodes of interest.
In this example, we’ve calculated the betweenness of nodes – showing which nodes act as bridges across the network:
We can see this network is formed of two halves – DJT Holdings & its subsidiaries (right) and a more varied network of people and organizations (left).
Step 6: Finding shortest paths
One final technique available to pick through the dense connections is to use KeyLines’ shortest path function to highlight connections between two points.
At the moment, the TrumpWorld dataset is probably more interesting for the visualization challenges it poses than the insight it reveals. But you’re welcome to try KeyLines to explore it for yourself!
Request a trial to get started with the KeyLines visualization toolkit.