In this blog post, I’ll describe my first experience of creating a visualization. Using KeyLines to analyze greenhouse gas emissions in the US, I’ll share insight into two things I’m passionate about: cutting-edge technology and the environment.
From the minute I started with KeyLines, the developer documentation was my friend.
Keen to learn a little more, I also looked through the ‘Basics’ documentation pages aimed at KeyLines newcomers. Now I was ready to find the data I wanted to visualize.
Interested in contributing factors to climate change, I started hunting for connected data on greenhouse gas emissions. (If you don’t already know why too much of them is a bad thing, read more).
The United States Environmental Protection Agency (EPA) run a Greenhouse Gas Reporting Program. The latest report shows that in 2017, over 7,500 facilities across nine industries released 2.91 billion metric tons of carbon dioxide equivalent directly into the atmosphere. That’s roughly half of all US greenhouse gas emissions for the year.
There are lots of connections worth exploring there – which individual facilities emit the highest proportion of what gas? Which US state is responsible for the greatest emissions?
This data is available to download from the EPA. Next I had to work out how to present it.
There’s so much detail in the EPA data. I had to work out what was the most important information to visualize, what to ignore, and how best to represent it on my chart.
Having never modelled data before, I found useful tips on keeping this practical and simple in the Graph data modelling 101 blog post.
I decided on:
KeyLines works with data from any source, so once I’d copied over my .xlsx spreadsheet and KeyLines parsed it into the JSON format it needs, I could load nodes and links into a chart.
There’s so much data! The only instantly-recognizable nodes are those for those representing gas emission types. I had some work to do.
My first challenge was figuring out how to manage data clutter. Even after I’d included a filter to focus on the highest emissions, there were still hundreds of facilities to deal with.
Using combos to group facilities from the same state made things clearer. Glyphs show how many facilities each state contains.
I can open a combo to show the facilities when I want to drill into the detail. Facility nodes are sized depending on their emission levels: the greater the emissions, the larger the node. Using a concentric arrangement inside the combo, notice how larger nodes are centered so they’re easier to spot.
I also added tooltips on hover so I can reveal node property information about each facility when I need it, without cluttering the chart with too much information up front.
Looking through the KeyLines SDK demos gave me lots of ideas for styling. The hardest bit is working out which customization options will bring the data to life. It’s easy to make the mistake of cramming in lots of cool styles that don’t help with the end goal.
I wanted my visualization to show at a glance which states are emitting the highest proportions of each greenhouse gas. Color-coded donuts do a good job at making the answer clearer.
The color of each of the highest-emitting states match up with the greenhouse gas node donut segments. The majority ‘blue’ segment represents emissions from all other states combined. In this way, we can see the 6 states with the highest emissions, and which gases they’re responsible for.
To focus on the subset of connections I’m interested in, you can see from the image above that if I click on a particular node, the items it’s linked to stand out and everything else is sent to the background.
I also learned that glyphs aren’t just for nodes – they can also alert you to important information about links. The ! glyphs show which states contain a facility that emits more than 16,000,000 metric tons of CO2.
Indiana as a state is a high emitter of CO2 – we know that already, because its color matches a color-coded halo on the CO2 node. But it’s also home to one of the top two highest emitting facilities.
Interestingly, the other facility responsible for the largest total direct CO2 emissions is in Alabama, which isn’t a top emitting state.
Greenhouse gases trap the sun’s heat in the atmosphere and increase the earth’s surface temperature. Without them, our planet wouldn’t be warm enough for habitation by millions of species, including humans.
But greenhouse gases are now at their highest levels in history, and they’re causing the earth’s temperature to rise at an unprecedented rate.
Over the last 150 years, the increase in greenhouse gases has almost entirely been caused by human activities such as the production and burning of fossil fuels. In the last 30 years, global emissions of carbon dioxide alone has increased by almost 50%.
As a result, ice caps are melting, seas are warming and levels are rising, and there are more extreme weather events than ever before. The United Nations lead the initiative for every country to work together to limit the global temperature rise to below 2 degrees centigrade.
Monitoring reported greenhouse gas emissions worldwide is an important part of this project.
I’m happy with how quickly I managed to visualize the data, but I’ve only scratched the surface of what’s possible.
Next I could use the data’s latitude and longitude location properties to visualize emissions on a map. Geospatial analysis might reveal the impact of states on greenhouse emissions more clearly. Loading emissions data from previous years would give us an opportunity to use time-based analysis to see how emission levels have changed. Social network analysis methods applied to sectors and organizations across states could reveal interesting patterns at a national level.
With detailed documentation to guide me, there are many interesting ways to develop my project into something even more insightful.
My first KeyLines visualization took me from complete novice to enthusiastic apprentice in a short space of time. I really enjoyed the experience.