Visualizing crime patterns data as a graph

7th June, 2017

In a previous post on law enforcement and data visualization, we saw how successful law enforcement agencies understand the wealth of data they have at their disposal.

Graphs can reveal trends and give insight into relationships between people, times and locations. Graph visualization is a vital tool for exploring and understanding graph data at scale.

This blog post gives another law enforcement graph visualization use case, and one that I recently presented with our partners at Neo4j. Watch the presentation.

Exploring the crime data set

One of the many sources of data available to the police is their RMS – or Records Management System. It’s often the core repository, containing details crimes, individuals, officers and vehicles, etc.

In recent years, many agencies have made some of their RMS data available to the public – a great resource for anyone interested in law enforcement activity. Obviously, identifying details are stripped out, but there’s still enough data to build a valuable graph visualization.

One example of this is the initiative, detailing real-life crime incidents in the city of Boston.

Creating the data model and loading into Neo4j

Once I’d loaded this dataset into the Neo4j graph database, I could use Cypher query language to quickly and easily query the data.

I used the Neo4j Awesome Procedures on Cypher (APOC) capability to call the API, pulling in a JSON object detailing 1000 records. APOC provides a flexible way to ingest data into Neo4j with a few lines of Cypher. I used Incident Number fields as the core nodes with additional attributes, like offense group, district, date, longitude and latitude.

Finally, I added another set of nodes – crime descriptions – plus a link between the two node types. Here’s our data model:

Our graph data model
Our graph data model

And here’s the original Cypher:

	CALL apoc.load.json($limit=1000&$offset=0)
	YIELD value AS crime
	MERGE (c:Crime {incidentnum: crime.incident_num})
	ON CREATE SET	c.offense=crime.offense_code_group,, c.district=crime.district,, c.longitude=crime.long
	MERGE (desc:Description {name: crime.offense_code_group})

Loading the data into KeyLines

Next, I wanted to translate 1000 lines of data into something more compelling – a KeyLines visualization:

The initial data load into KeyLines
The initial data load into KeyLines

These are Crime Categories (the central nodes) connected to actual crimes (the outer nodes). We can spot some basic patterns at a glance. For example, in our dataset there’s only one located missing person:

The initial data load into KeyLines
There’s only one crime reference number connected to both “Missing Person Reported” and “Missing Person Located”

This doesn’t suggest that the police have problems locating missing persons; it’s probably due to our limited dataset. With full access to the data, we might use a double-click to expand method to see the full picture.

Looking at Crime Category connections

We can also see some categories are more densely interconnected than others. Filtering out ‘leaf nodes’ (nodes with only a single connection) reveals network of inter-related crime categories:

The initial data load into KeyLines
We can clearly see groups of crime categories that are likely to happen in tandem
There’s a close relationship between Vandalism, Disorderly Conduct and Simple Assault
There’s a close relationship between Vandalism, Disorderly Conduct and Simple Assault

Inspecting crime by district

There was a District property in our original dataset, so we can incorporate that into our visualization.

KeyLines Combos feature makes it easy to combine nodes based on any feature. Let’s combine by district, to see which geographic areas report the most crimes:

The initial data load into KeyLines
Combining nodes by district

We can also use the latitude and longitude information on the nodes to see the crimes in a geospatial view:

Crime volumes being committed in different districts
Crime volumes being committed in different districts

I’ve also added donuts to nodes to show the volumes of different reported crimes. I’ve included two crime categories – Larceny in blue, Motor Vehicle Crime in purple – but we can already see some stark differences in our dataset. South Boston (C6) has a much higher proportion of Larceny than East Boston (A7).

Adding people, phones, vehicles and times

We don’t have access to real-world crime data. To give you some idea of what a police officer might see, I used GraphAware’s excellent GraphGen tool to supplement the data with additional (fake) attributes:

Our data with some additional (fake) attributes
Our data with some additional (fake) attributes

The fake data doesn’t reveal much insight, but it demonstrates the potential value of this approach. An officer can visually explore phones, addresses and individuals connected to incidents to see how they relate to previous incidents.

A close-up of our data
A close-up of our data

We can enhance this view with the KeyLines Time Bar, which lets us drill down to specific time periods:

Filtering the graph by time, using the time bar
Filtering the graph by time, using the time bar

Storing data in a Neo4j graph database means we can run complex graph queries that would otherwise be incredibly time-consuming.

“Return all individuals in district D4 who have previously been associated with a vandalism crime in 2017 who drive a red Ford” becomes a fast and simple Cypher query. KeyLines and Neo4j combine to become an effective and efficient tool to help officers make the best use of their data.

Want to try it for yourself?

This post is just an illustration of how graph visualization techniques can help law enforcement to understand the complex connected data. We’d love to see how this approach works using real-world data. If you’d like to try KeyLines for yourself, just start a trial or contact us.

Subscribe to our newsletter

Get occasional data visualization updates, stories and best practice tips by email