Visualizing NYC taxi cab data

18th June, 2018

Working in the data visualization field, I’m intrigued by different datasets and using graph visualization to explore, understand and interact with them. I came across this interesting New York taxi cab database and this study, and I was curious to see what I could uncover using KeyLines – the graph visualization toolkit.

The data includes information on taxi trips taken in the city and the study found an increase in cab activity between the Federal Reserve Bank of New York and major Wall Street banks around the time of central bank policy meetings.

In this blog, we’ll see how the graph visualization approach can be useful when working with large and complex datasets like this one. Let’s take a look!

NYC taxi cab data
NYC taxi cab data

A quick intro to KeyLines

Looking at data as a table, like the one above, it’s difficult to gain any insight. KeyLines is a toolkit for building game-changing interactive visualizations that unlock insights in complex graph data. By visualizing connected data as a graph, you can quickly find and investigate anomalies in data.

Our data and data model

Before we start, we need to decide on our data and visual model.

The visual model you choose depends on the questions you need to ask of your data (graph data modelling 101) but I used:

  • Green nodes to represent a pick-up location, with a value indicating the number of passengers.
  • Red nodes to represent a drop-off location with values showing the total fare and any tips.
  • Grey glyphs on the links to highlight the distance of the journey in miles.
Node-link model: cab journey
Node-link model: cab journey

Now that we have our data and data model, we need to load it into KeyLines and run a layout. I chose to load a subset of data from 15th April 2016 between 11AM and 2PM. It looks something like this:

New York taxi cab dataset topographic view
New York taxi cab dataset topographic view

Using KeyLines geospatial – an integration for visualizing connected data on maps – I was able to visualize the data on a map. Depending on your needs or preference, KeyLines allows you to switch from a map view to a topographic view in one click.

New York taxi cab dataset map view
New York taxi cab dataset map view

By visualizing the dataset, we can begin to spot patterns which may not have been as obvious in a textual format – providing users with the accurate, fast insight they need.

Our chart is quite cluttered, let’s clear it up a bit.

Visualize dynamic networks with the time bar

Data is often large-scale, complex and noisy. Using filters we can reduce some of that noise to focus on the data of interest. Most graph data has some kind of temporal element, so the time bar is a really useful tool. It enables us to filter by time period, to isolate activity from a specific time period:

Chart showing the time bar
Chart showing the time bar

The above chart gives us a great overview but let’s zoom in to gain more insight. By selecting a glyph, the time bar reveals information relevant to the selected trip and shows it as a yellow trend line in the histogram.

Explore data dynamically with the time bar
Explore data dynamically with the time bar

Let’s see how social network analysis (SNA) measures can help us understand this dataset better.

Understand influence with centrality measures

By default, all the nodes are the same size, but using KeyLines SNA measures, we can resize the nodes based on their importance. Here I’ve sized them based on their EigenCentrality – a measure of influence. This helps us quickly spot the popular drop-off and pick-up locations in our dataset:

Nodes sized by their EigenCentrality
Nodes sized by their EigenCentrality

Now, let’s focus our attention on combos, KeyLines’ exclusive node-grouping functionality.

Reduce clutter with combos

Combos are an intuitive and powerful way to explore your most complex networks. By combining nodes into groups we can clear up some of the clutter, and understand some of the macro-trends happening in out data.

I used combos to combine the nodes into geographic zones, and added donuts to indicate the volume of pick-ups (green) and drop-offs (red).

I’ve also grouped the links and weighted them by volume. This helps us easily spot the more popular routes:

Combos in action
Combos in action

Charts with many nodes and connections are a challenge for any analyst to deal with. Using these simple techniques, we can simplify a complex scenario into a single visualization that can be understood at a glance – a great starting point for any data investigation.

Get started with KeyLines

Visualizing data in an interactive and dynamic way can help you uncover patterns and recognize connections you may not have been able to with alternative methods of analysis. Discover the true power of graph visualization by trying out KeyLines yourself. Request a free trial here.

Subscribe to our newsletter

Get occasional data visualization updates, stories and best practice tips by email