My first experience of visualizing data with KeyLines

2nd April, 2019

When I joined the Cambridge Intelligence team back in January, one of my first tasks was to learn about our graph visualization technology.

I’m not a typical first-time KeyLines user. With limited coding experience (competent SQL, some rusty C#), my only previous exposure to JavaScript was completing an online course some years back.

In this blog post, I’ll describe my first experience of creating a visualization. Using KeyLines to analyze greenhouse gas emissions in the US, I’ll share insight into two things I’m passionate about: cutting-edge technology and the environment.

Where do US greenhouse gas emissions come from?
Where do US greenhouse gas emissions come from?

Getting started with KeyLines

From the minute I started with KeyLines, the developer documentation was my friend.

The newly-updated quick start guide on the KeyLines SDK site took me quickly through the basics. I had to refer to the API reference and Google a few things to get the JavaScript syntax right, but in half an hour I managed to create a chart, load some data, run a few layouts and include some basic events.

The KeyLines SDK site features all the documentation you need
The KeyLines SDK site features all the documentation you need

Keen to learn a little more, I also looked through the ‘Basics’ documentation pages aimed at KeyLines newcomers. Now I was ready to find the data I wanted to visualize.

United States Environmental Protection Agency data

Interested in contributing factors to climate change, I started hunting for connected data on greenhouse gas emissions. (If you don’t already know why too much of them is a bad thing, read more).

The United States Environmental Protection Agency (EPA) run a Greenhouse Gas Reporting Program. The latest report shows that in 2017, over 7,500 facilities across nine industries released 2.91 billion metric tons of carbon dioxide equivalent directly into the atmosphere. That’s roughly half of all US greenhouse gas emissions for the year.

According to the World Health Organization (WHO), air pollution poses the greatest environmental risk to health in 2019
According to the World Health Organization (WHO), air pollution poses the greatest environmental risk to health in 2019

There are lots of connections worth exploring there – which individual facilities emit the highest proportion of what gas? Which US state is responsible for the greatest emissions?

This data is available to download from the EPA. Next I had to work out how to present it.

Choosing the right data model

There’s so much detail in the EPA data. I had to work out what was the most important information to visualize, what to ignore, and how best to represent it on my chart.

Having never modelled data before, I found useful tips on keeping this practical and simple in the Graph data modelling 101 blog post.

I decided on:

  • nodes for each of the three main greenhouse gas emission types: carbon dioxide (CO2), methane (CH4) and nitric oxide (N2O)
  • other nodes for each of the facilities, with properties including industry sector, US state and total emissions reported
  • links to show the emissions reported by each facility
The data model
The data model

Straightforward data loading

KeyLines works with data from any source, so once I’d copied over my .xlsx spreadsheet and KeyLines parsed it into the JSON format it needs, I could load nodes and links into a chart.

Visualizing the entire dataset
Visualizing the entire dataset

There’s so much data! The only instantly-recognizable nodes are those for those representing gas emission types. I had some work to do.

I knew how I wanted my chart to look, but I wasn’t sure how to make it happen. Here’s where the demos on the KeyLines SDK saved me time and effort. I could search for the effects I wanted to achieve, play around with examples, and (the best bit for a JavaScript novice) copy the code to reuse in my own visualization.

Effective grouping

My first challenge was figuring out how to manage data clutter. Even after I’d included a filter to focus on the highest emissions, there were still hundreds of facilities to deal with.

Using combos to group facilities from the same state made things clearer. Glyphs show how many facilities each state contains.

Structural layout automatically groups similar nodes (like these US states) closer together
Structural layout automatically groups similar nodes (like these US states) closer together

I can open a combo to show the facilities when I want to drill into the detail. Facility nodes are sized depending on their emission levels: the greater the emissions, the larger the node. Using a concentric arrangement inside the combo, notice how larger nodes are centered so they’re easier to spot.

Open a combo to reveal the facilities emitting greenhouse gases in that US state
Open a combo to reveal the facilities emitting greenhouse gases in that US state

I also added tooltips on hover so I can reveal node property information about each facility when I need it, without cluttering the chart with too much information up front.

Node property information in tooltips provide detail on demand
Node property information in tooltips provide detail on demand

Customized visual styling

Looking through the KeyLines SDK demos gave me lots of ideas for styling. The hardest bit is working out which customization options will bring the data to life. It’s easy to make the mistake of cramming in lots of cool styles that don’t help with the end goal.

I wanted my visualization to show at a glance which states are emitting the highest proportions of each greenhouse gas. Color-coded donuts do a good job at making the answer clearer.

The color of each of the highest-emitting states match up with the greenhouse gas node donut segments. The majority ‘blue’ segment represents emissions from all other states combined. In this way, we can see the 6 states with the highest emissions, and which gases they’re responsible for.

Color-coded nodes makes it easy to spot which states emit the most greenhouse gas
Color-coded nodes makes it easy to spot which states emit the most greenhouse gas

Meaningful alerts on links

To focus on the subset of connections I’m interested in, you can see from the image above that if I click on a particular node, the items it’s linked to stand out and everything else is sent to the background.

I also learned that glyphs aren’t just for nodes – they can also alert you to important information about links. The ! glyphs show which states contain a facility that emits more than 16,000,000 metric tons of CO2.

Indiana as a state is a high emitter of CO2 – we know that already, because its color matches a color-coded halo on the CO2 node. But it’s also home to one of the top two highest emitting facilities.

Interestingly, the other facility responsible for the largest total direct CO2 emissions is in Alabama, which isn’t a top emitting state.

Direct emissions from a facility on Alabama
Direct emissions from a facility on Alabama

Why analyze greenhouse gas emissions data?

Greenhouse gases trap the sun’s heat in the atmosphere and increase the earth’s surface temperature. Without them, our planet wouldn’t be warm enough for habitation by millions of species, including humans.

But greenhouse gases are now at their highest levels in history, and they’re causing the earth’s temperature to rise at an unprecedented rate.

Over the last 150 years, the increase in greenhouse gases has almost entirely been caused by human activities such as the production and burning of fossil fuels. In the last 30 years, global emissions of carbon dioxide alone has increased by almost 50%.

As a result, ice caps are melting, seas are warming and levels are rising, and there are more extreme weather events than ever before. The United Nations lead the initiative for every country to work together to limit the global temperature rise to below 2 degrees centigrade.

Monitoring reported greenhouse gas emissions worldwide is an important part of this project.

Next steps

I’m happy with how quickly I managed to visualize the data, but I’ve only scratched the surface of what’s possible.

Next I could use the data’s latitude and longitude location properties to visualize emissions on a map. Geospatial analysis might reveal the impact of states on greenhouse emissions more clearly. Loading emissions data from previous years would give us an opportunity to use time-based analysis to see how emission levels have changed. Social network analysis methods applied to sectors and organizations across states could reveal interesting patterns at a national level.

With detailed documentation to guide me, there are many interesting ways to develop my project into something even more insightful.

Ready to start your KeyLines journey?

My first KeyLines visualization took me from complete novice to enthusiastic apprentice in a short space of time. I really enjoyed the experience.

Whether you’re an occasional coder or a qualified expert, if you’re ready to see what KeyLines can do for your data, request a free trial or get in touch.

More from our blog

Visit our blog