Visualizing the Amazon Neptune database with KeyLines

19th February, 2018

In November 2017, Amazon launched a limited preview of Amazon Neptune, a hosted graph database service with an engine optimized for storing billions of relationships and querying the graph with milliseconds of latency. This new service lets developers focus more on their applications and less on database management.

What’s special about Neptune is that it supports two different open standards for describing and querying data:

  • Gremlin – a graph traversal language from Apache TinkerPop
  • Resource Description Framework (RDF) queried with SPARQL, a declarative language based on Semantic Web standards from W3C

We’re big fans of both approaches here at Cambridge Intelligence, and KeyLines can work with either. So, we thought we’d check out Neptune to see how easily it can be integrated with KeyLines.

KeyLines' Amazon Neptune integration. Visit the KeyLines SDK to try it for yourself.
KeyLines’ Amazon Neptune integration. Visit the KeyLines SDK to try it for yourself

Integrating KeyLines with Amazon Neptune

Step 1: Launch Amazon Neptune

Launching the Amazon Neptune database was pretty straightforward, thanks to the quick start guide.

Neptune runs inside your own Amazon Virtual Private Cloud (VPC) which you then add to your own Amazon EC2 instance. You manage all that using a launch wizard in the Neptune console.

Once it’s launched, you can configure database options (parameter group, port, cluster name, etc). In our example, we used the connection endpoint Amazon provides:
neptune-test.cxmaujvq0cze.us-east-1-beta.rds.amazonaws.com
That’s all we need to know to start using the database instance.

Step 2: Load some data

Next, we need to load data into the Neptune database. Your data files have to be in one of the following formats:

  • CSV for the property graph / Gremlin
  • N-Triples, N-Quads, RDF/XML or Turtle for RDF / SPARQL

As we’ve mentioned, there are two different query engines that can be used with Neptune. In this example we’re showing how to connect to the SPARQL endpoint with /sparql.

We used a movies dataset representing films and the actors in them in turtle format (.ttl), a textual representation of an RDF graph. Here’s what it looks like:

@prefix imdb:http://www.imdb.com/>.
@prefix dbo: <http://dbpedia.org/ontology/>.
@prefix mo: <http://www.movieontology.org/2009/10/01/movieontology.owl#>.

<http://imdb.com/movie/Avatar> a mo:Movie;
    imdb:hasTitle "Avatar";
    mo:hasActor <http://imdb.com/actor/Sam_Worthington>;
    imdb:imageUrl "http://cf1.imgobject.com/posters/374/4bd29ddd017a3c63e8000374/avatar-mid.jpg".
<http://imdb.com/actor/Sam_Worthington> a dbo:Actor;
  imdb:hasName "Sam Worthington".
<http://imdb.com/movie/Pirates_of_the_Caribbean:_The_Curse_of_the_Black_Pearl> a mo:Movie;
    imdb:hasTitle "Pirates of the Caribbean: The Curse of the Black Pearl";
    mo:hasActor <http://imdb.com/actor/Zoe_Saldana>;
    imdb:imageUrl "http://cf1.imgobject.com/posters/242/4bc9018b017a3c57fe000242/pirates-of-the-caribbean-the-curse-of-the-black-pearl-mid.jpg".
<http://imdb.com/actor/Zoe_Saldana> a dbo:Actor;
  imdb:hasName "Zoe Saldana".
<http://imdb.com/movie/Avatar> a mo:Movie;
    imdb:hasTitle "Avatar";
    mo:hasActor <http://imdb.com/actor/Zoe_Saldana>;
    imdb:imageUrl "http://cf1.imgobject.com/posters/374/4bd29ddd017a3c63e8000374/avatar-mid.jpg".
[...]

Step 3: Send queries to Amazon Neptune

The next step is to copy the data to an Amazon S3 (Simple Storage Service) bucket. It’s important to remember that the S3 bucket must be in the same AWS Region (us-east-1 is the only region available at the time of writing) as the cluster that loads the data.

To run the Neptune loader, at the command line, enter:

curl -X POST \
    -H 'Content-Type: application/json' \
    http://neptune-test.cxmaujvq0cze.us-east-1-beta.rds.amazonaws.com:8182/loader -d '
    { 
      "source" : "s3://camintel-neptune/movies.ttl", 
      "format" : "turtle", 
      "region" : "us-east-1", 
      "failOnError" : "FALSE"
    }'

which returned:

{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "2cafaa88-5cce-43c9-89cd-c1e68f4d0f53"
    }
}

It’s not the most informative response, but it tells us that our data was successfully loaded.

Now we have successfully added data into the Neptune instance we can use SPARQL query to retrieve information and explore the database. The template query we used was:

curl -X POST --data-binary 'my-query' 
https://your-neptune-endpoint:8182/sparql

where ‘my-query’ is of the form:

query=prefix mo: 
<http://www.movieontology.org/2009/10/01/movieontology.owl#> prefix imdb:
<http://www.imdb.com/> SELECT DISTINCT ?actor ?title ?img ?name WHERE {
<http://imdb.com/movie/The_Matrix> mo:hasActor ?actor; imdb:hasTitle
?title; imdb:imageUrl ?img. ?actor imdb:hasName ?name.}

So, to get back the actors from a given movie (e.g. “The Matrix”), or to find the movies played by a certain actor (e.g. “Gloria Foster”), we submitted a query like this:

query=prefix mo: <http://www.movieontology.org/2009/10/01/movieontology.owl#> prefix imdb: <http://www.imdb.com/> SELECT DISTINCT ?movie ?title ?img ?name WHERE {?movie mo:hasActor <http://imdb.com/actor/Gloria_Foster>; imdb:hasTitle ?title; imdb:imageUrl ?img. <http://imdb.com/actor/Gloria_Foster> imdb:hasName ?name.}

Now let’s parse the data.

Step 4: Format the data

In our case, we just need to format the JSON data returned from our SPARQL queries into a KeyLines JSON object that details nodes and links.

In the query example below, we’re requesting the nodes and links connected to a specific node
(baseNode). The response we get back – either the actors in a specific movie, or the movies a specific actor appeared in – is stored in the object called json. KeyLines nodes and links are created calling makeNode and makeLink.

function makeKeyLinesItems(json, baseNode){
  var items = [];
  items.push(baseNode);
  if (json.results.bindings) {
    json.results.bindings.forEach(function(item) {
      var node = item.actor ? makeNode(item.actor.value, "actor", item) : 
        makeNode(item.movie.value, "movie", item);
      items.push(node);
      items.push(makeLink(item, node, baseNode));
    });
  }
  return items;
}

function makeNode (id, type, item) {
  var isActor = type === 'actor';
  var node = {
    type: 'node',
    id,
    ci: true,
    e: isActor ? 1 : 2,
    d: {
      type: type
    },
    t: isActor ? item.name.value : item.title.value
  };

  return node;
}

function makeLink (item, node, baseNode) {
  const id1 = item.actor ? node.id : baseNode.id;
  const id2 = item.actor ? baseNode.id : node.id;
  const id = [id1, id2].sort().join('-');
  var link = {
      type: 'link',
      id,
      id1,
      id2,
      fc: 'rgba(52,52,52,0.9)',
      c: 'rgb(0,153,255)',
      w: 2
    };

  return link;
};

Step 5: Load the data into KeyLines and start customizing your app

All we need to do is load our data into the KeyLines chart. Neptune will return our nodes and links with the makeKeyLinesItems() function, which we can easily load into KeyLines using using chart.load().

By now, we have a simple working prototype of a graph visualization tool, running on an Amazon Neptune back-end.

It might look fairly basic right now, but it’s easy to get your KeyLines app looking good through customization and styling. Need some inspiration? We have plenty of demos to get your started on the KeyLines SDK.

Register here to request a free evaluation account for the KeyLines SDK.

Subscribe to our newsletter

Get occasional data visualization updates, stories and best practice tips by email