You would be forgiven for thinking that relational databases don’t work for network visualization.
Whilst it’s true that the tabular data model and schema restrictions sometimes make relational databases an inefficient option for visualizing connected data, with the right data modeling it is possible. In fact, sometimes a tabular view may be the most efficient storage and query mechanism.
This page introduces the topic of relational databases, and how they can be used for graph / network visualization.
What is a relational database?
A relational database, as the name suggests, is one storing related data, represented in two-dimensional tables of columns and rows. Each data entity is represented with its attributes in the columns, and data instances in the rows:
|AB123123||123 Road Street||19UYA1234L000213||Coupe||2012|
|ZN 987987||42 Mill Crescent||1005||Estate||2007|
|JP 456456||12 Sample Street||19UYA1234L000213||Coupe||2012|
Creating a network with SQL Joins
These tables, known as relations, can rapidly grow to millions, or billions of rows. It’s possible to get data from multiple tables. For example, suppose the data shown above was stored in two separate tables:
|1||AB123123||123 Road Street||1001|
|2||ZN987987||42 Mill Crescent||1005|
|3||JP456456||12 Sample Street||1001|
As long as there is a column that appears in both tables (in this case, the vehicle ID), we can cross-reference using an SQL operator called “JOIN”. In the vehicle table, this column is known as the Primary Key; in the policy table it is the foreign key.
If the primary key is deleted or altered, the database may trigger an action on cascade to keep the consistency of the references.
Using the JOIN operator, we can start to build up a complex connected data structure, resembling a network (or graph) that KeyLines can work with. Joins, however, can be expensive in terms of performance.
Optimizing SQL joins for data visualization
Once we understand how a join works, it’s possible to fine-tune the logic to speed up system performance. For example:
FROM Policy JOIN Vehicle
ON Policy.Vehicle_ID = Vehicle.ID
The SQL statement above will search all the Vehicle IDs in the Policy table to find a match in the Vehicles table. If the table contains 1,000,000 rows, the user will have to wait for the system to finish scanning all of them.
Database administrators can overcome this by:
- Designing data schemas to reducing reliance on joins (this article details specific strategies).
- Using indexes to improve lookup speed.
Taking these steps can help ensure database performance isn’t too much of a bottleneck when using a relational datastore for network visualization.
Converting relational data structure to a network
How relational data is modeled into a graph depends entirely on the data being visualized and the questions that need to be answered.
Taking the above vehicle / policy example, there would be four generic steps required:
- Understand the data – know which entity attributes and record formats are present in the data, and how they relate to one another.
- Combine required data relations – where possible, move the data into a single table to reduce the reliance on joins.
- Identify key relationships – understand which relationships users need to understand to answer the questions they are asking.
- Model data around key relationships – For example, an insurance fraud analyst may need to identify vehicles with more than one insurance policy, in which case the Vehicle Identification Numbers and Policy Numbers would be mapped to nodes, with the remaining data entities included as properties:
When mapping tabular data to a connected graph format, it’s important to consider how to manage IDs – important because KeyLines will automatically merge new data into an existing node if the ID already exists on the chart.
For example, in the case above we could set the vehicle make as the node ID field, so all Nissan vehicles appear as a single node. This could be useful for understanding patterns on a brand level.
Alternatively, the car number plate could be set as the node ID. This way, each unique number plate would appear as a unique node, which is potentially more useful to an analyst. However, it would still be necessary to carefully plan this structure and consider potential anomalies – stolen, shared or transferred number plates, for example.
Relational Database Visualization infrastructure
The architecture for connecting KeyLines to a relational database is relatively simple:
A user interaction in the KeyLines component raises an AJAX request for data from the database, via a REST API. Depending on the choice of database, this might be available out of the box, or need to be configured as part of the set up.
Once the data is retrieved from the database, it is parsed into KeyLines’ own published JSON format and merged or loaded into the KeyLines chart.
To learn more about the KeyLines network visualization toolkit and how to visualize your complex connected data, download a copy of our KeyLines white paper.
This blog post shows how quick and easy it is to integrate Stardog with KeyLines using Academy Awards data.
KeyLines is flexible enough to work with any datastore, but it’s an especially good fit for graph databases. In this blog post, we’re showcasing how to integrate KeyLines with the Memgraph databas
This blog walks you through the commands you need to integrate Angular with KeyLines and start coding.