Data breach visualization: exploring cyber security graphs

Let’s see how our graph visualization toolkits help cyber security analysts explore thousands of attack records in a single data breach visualization.

Data sharing presents a huge challenge to the cyber security industry. When organizations are compromised, the common response is to switch to self-preservation mode. The full details of breaches are rarely disclosed which limits collective intelligence and arguably makes the life of attackers easier.

Happily, there are several projects focused on cyber security threat intelligence and working to fix this.

VERIS (the Vocabulary for Event Recording and Incident Sharing) aims to provide a common taxonomy for organizations to share information about their breaches. By helping organizations exchange war stories, they hope to facilitate cooperation and improve risk management.

Alongside this project is the Veris Community Database: a project to collate and disseminate information about all publicly disclosed data breaches. Excitingly for data visualization enthusiasts like us, their cyber security graph data is openly available on GitHub.

Let’s take a look.

The data model

The Veris team has designed a schema that helps organizations record breaches in a ‘structured and repeatable’ way. It uses the A4 model to describe and classify incidents by:

  • Actor – i.e. who performed the attack?
  • Action – i.e. what was the attack vector?
  • Asset – i.e. who or what was the attack victim?
  • Attributed – i.e. what was the outcome / impact of the attack

For each of the 5500+ attacks listed there are more than 150 data points, so we need to design a visual model that will enable us to explore the cyber security data set and answer some key questions.

As actors (attackers) and actions (vectors) are grouped into categories, we will model our graph in the following way:

The visual data model of our data breach visualization
Our visual data model. Nodes represent attackers and victims, with attack vectors shown as links.

Let’s take the VERIS Community Database (handily available from GitHub in JSON format) and load it into a KeyLines chart.

Step 1: Data overview

An overview of the entire dataset, ready to be filtered and explored
An overview of the entire dataset, ready to be filtered and explored.

This is a fairly large dataset, but KeyLines’ powerful Web Graphics Library (WebGL) engine delivers impressive rendering speed and quality. We have color-coded the links (attack vectors) by the categories supplied in the dataset, which helps us pick out some early patterns:

  • The large red group shows that Activist Groups favor ‘Advanced Technology’ including remote access, command shell and VPNs.
  • The yellow/purple/orange clusters indicate breaches originating from end-users or employees. These are more likely to be caused by carelessness, physical access or basic technology, like desktop sharing or document theft.

Step 2: Temporal patterns

One of the data points collated by Veris is a date stamp of when the breach was reported. Let’s add this to our chart with the KeyLines time bar:

The KeyLines time bar showing peaks in network activity

The overall picture here is quite lumpy, with peaks in February ’13 and ’14. But using the filters, let’s take a sub-network view of how different vectors change through the months:

A filtered view of the cyber security data showing how more breaches seem to come from advanced technologies not emails
Comparing advanced technologies (red) to email (pale purple) – breaches via email seemingly pose less of a threat.
A data breach visualization showing how basic attack methods seem to be less popular towards the end of our cyber security dataset
Basic methods seem to be less popular towards the end of our cyber security dataset – perhaps a consequence of companies being more aware of simple document theft?
A data breach visualization showing how intruders are more of a threat than suppliers and partners
Comparing Physical Access (purple) to Third party facilities (blue) seems to suggest intruders are more of a threat than suppliers and partners.

Get started with our toolkit technology

Not tried KeyLines or ReGraph yet?

Request a free trial


Step 3: Attack vectors by attacker group

The advantage of a graph-based visualization is we can see our data in its full connected environment. Using simple filters, we can find some other trends. For example:

Our data breach visualization shows that email is seemingly most popular with organized criminal groups
Our data breach visualization shows that email is seemingly most popular with organized criminal groups.
A data breach visualization showing how cyber attacks by nation states are a mix of advanced techniques and the unknown.
Attacks by nation states are a mix of advanced techniques and the unknown.
A data breach visualization showing how cyber attackers focus on high profile public sector targets.
They also focus, unsurprisingly, on high profile public sector targets.
Visualizing the tactics of network administrators and software developers
The tactics of network administrators and software developers are equally unknown.

Step 4: Find the unlucky victims in our data breach visualization

We’ve looked at the attackers and the attack vectors. The third entity type in this data is the victim. By sizing victim nodes by degree (number of connections to other nodes in the network) we can get an idea of the most frequently breached organizations:

A data breach visualization showing that the most attacked organizations are multinationals
In this dataset, the most attacked organizations are mostly multinationals, but also some smaller organizations hit by breaches.

Create your own data breach visualization

With our toolkit technology, you can visualize and explore thousands of attack records in a single chart. By configuring the data model to show different aspects of the Veris schema we can find new trends – a process made faster and more interactive with the help of automatic layouts, clever filters, social network analysis and the time bar.

Want to explore the data for yourself? You can find this demo and over 80 others on our KeyLines SDK. Ready to start an evaluation?

A screen showing a graph visualization created using KeyLines
FREE: Start your KeyLines trial today

Visualize your data! Request full access to our KeyLines SDK, demos and live-coding playground.

TRY KEYLINES

We first published this popular post some time ago. This version features brand new example visualizations to keep it useful and relevant.

How can we help you?

Request trial

Ready to start?

Request a free trial

Learn more

Want to learn more?

Read our white papers

“case

Looking for success stories?

Read our case studies

Registered in England and Wales with Company Number 07625370 | VAT Number 113 1740 61
6-8 Hills Road, Cambridge, CB2 1JP. All material © Cambridge Intelligence 2022.
Read our Privacy Policy.