Untangling the hairball: Visualizing Donald Trump’s network

by Christian Miles, 6th February 2017

A few weeks ago, four investigative journalists from Buzzfeed released an intriguing new dataset called TrumpWorld.

It’s a valiant attempt to document the sprawl of organizations and individuals connected to the new administration. Released as a series of spreadsheets covering more than 1500 entities, the dataset came with a request: that others explore, enrich and extend the data to help build a complete picture of the network surrounding President Trump. So we decided to take a look with KeyLines.

It’s always fun to play with a new dataset, especially one as relevant to current events as the business interests of the US President. But this dataset was interesting because it posed a common graph visualization challenge: the hairball. This is when connections become so dense, they cannot be usefully visualized.

In this blog post, we’ll explore how we used KeyLines’ graph visualization capability to detangle the TrumpWorld dataset.

Step 1: Defining the graph model

The first step is to model the graph. With any graph, it’s important to keep this simple, especially with a potential hairball network.

The Buzzfeed team made this task easier by releasing the data as a set of connections:

Organization to organization connections
People to people connections
People to organization connections

Which means we only have two types of node. We’ll color-code them for a simple visual:

2 node data model example

Ideally, we’d also color-code the links, but the data includes nearly 500 categories of connections making it unfeasible without further classification. Instead we can use on-hover labels, to avoid overwhelming the user.

Scrolling through the data, we noticed how many of the companies are named after Donald Trump himself. Using a halo, we’ll highlight nodes containing any combination of Donald, Trump, DJT or DT:

Orange nodes represent people and yellow nodes represent companies. Companies named after Donald Trump have grey halos

The standard automatic graph layout does a good job of revealing some network features like clusters around the network periphery. It also reveals a large cluster to the left of the central node (US Secretary of Commerce, Wilber Ross). But we’re still left with an unusable hairball.

Step 2: Managing super nodes

Although the network looks good so far, a central ‘supernode’ distorts it. This is a common problem with many graph datasets. They focus on an individual data point, creating a hairball of connections that conceals possible connections elsewhere.

To fix this, we can:

use a filter to remove / add the Trump Node from the chart
add a glyph to those nodes previously directly connected to the Trump Node

This removes potential clutter from the chart, yet still lets us see which nodes have a direct connection to Donald Trump.

Removing the Trump Node

Step 3: Removing orphan nodes

Removing the Trump Node reveals a large number of disconnected nodes, called orphan nodes:

Orphan nodes displayed along the right-hand side of this chart

Because they’re disconnected from the remaining network, these nodes are often of less interest to an investigator.

Step 4: Pruning leaf nodes

Another efficient way to remove unnecessary chart clutter is to filter leaf nodes – nodes that have no child downward connections. In most circumstances, these nodes are less important parts of the network structure, and removing them reveals core structures:

Removing orphan and leaf nodes

Now we have a slimmed-down network ready to explore:

Our simplified chart gives us a clearer view of the core network components

Much of the network is made up of unremarkable pairs of nodes, mostly companies acting as parent companies for subsidiaries. With the Trump Node and leaf nodes hidden, it shows that there’s really not much to see:

This loose collection of companies does not reveal much interesting insight

There’s also this star-shaped structure of organizations, bottom-left:

An unusual star-shaped network structure

On closer inspection, this is almost entirely made up of grey halos and red T-glyphs. It’s just a more-complicated-than-most ownership chain of Trump subsidiaries.

Depending on your investigative goal, you may want to further explore these inter-organizational relationships. In this case, however, the relationships between people and organizations are more interesting – drawing us to the larger network in the top left of the chart.

Step 5: Run SNA measures

We’ve already reduced our network’s complexity significantly, so now let’s use KeyLines’ social network analysis measures to pick out nodes of interest.

In this example, we’ve calculated the betweenness of nodes – showing which nodes act as bridges across the network:

Applying betweenness measure to the network highlights nodes that are key bridges between the different structures

We can see this network is formed of two halves – DJT Holdings & its subsidiaries (right) and a more varied network of people and organizations (left).

Step 6: Finding shortest paths

One final technique available to pick through the dense connections is to use KeyLines’ shortest path function to highlight connections between two points.

The shortest path, excluding family ties, between Donald Trump and his son-in-law Jared Kushner, runs through a mutual use of Deutsche Bank

Similarly, the shortest connection between Donald Trump Holdings & Vladimir Putin is via several loose connections

In the data, Steve Bannon’s connection to the Russian president is equally insubstantial

To conclude…

At the moment, the TrumpWorld dataset is probably more interesting for the visualization challenges it poses than the insight it reveals. But you’re welcome to try KeyLines to explore it for yourself!

Request a trial to get started with the KeyLines visualization toolkit.