Relational database visualization

with KeyLines

You may think that relational databases aren’t suited to graph analysis and visualization.

Whilst it’s true that the tabular data model and schema restrictions can sometimes make relational databases an inefficient option for visualizing connected data, with the right data modeling it is possible. In fact, sometimes a tabular view may be the most efficient storage and query mechanism.

What is a relational database?

A relational database stores related data in two-dimensional tables of columns and rows. Each data entity is represented with its attributes in the columns, and data instances in the rows:

Policy_number Policy_Address Vehicle_VIN Vehicle_Model Vehicle_Year
AB123123 123 Road Street 19UYA1234L000213 Coupe 2012
ZN 987987 42 Mill Crescent 1005 Estate 2007
JP 456456 12 Sample Street 19UYA1234L000213 Coupe 2012


Using a non-relational datastore?

We support NoSQL too


Creating a network with SQL Joins

These tables, known as relations, can rapidly grow to millions, or billions of rows. It’s possible to get data from multiple tables. For example, suppose the data shown above was stored in two separate tables:

Policy

ID Number Address Vehicle_ID
1 AB123123 123 Road Street 1001
2 ZN987987 42 Mill Crescent 1005
3 JP456456 12 Sample Street 1001


Vehicle

ID VIN Model Year
1001 19UYA1234L000213 Coupe 2012
1005 14PLU1234L120213 Estate 2007
1007 16MYA1234L000215 SUV 2014


As long as there is a column that appears in both tables (in this case, the vehicle ID), we can cross-reference using the JOIN SQL operator. In the vehicle table, this column is known as the Primary Key. In the policy table, it’s the foreign key.

Using the JOIN operator, we can start to build up a complex connected data structure, resembling a network, or graph, that we can visualize in KeyLines. Joins, however, can be expensive in terms of performance.

Optimizing SQL joins for relational database visualization

Once we understand how a join works, it’s possible to fine-tune the logic to speed up system performance. For example:

SELECT *
FROM Policy JOIN Vehicle
ON Policy.Vehicle_ID = Vehicle.ID

The SQL statement above will search all the Vehicle IDs in the Policy table to find a match in the Vehicles table. If the table contains 1,000,000 rows, the user will have to wait for the system to finish scanning all of them.

Database administrators can overcome this by:

  • Designing data schemas to reducing reliance on joins
  • Using indexes to improve lookup speed

Taking these steps can help ensure database performance doesn’t become too much of a bottleneck.

Converting relational data structure to a network

How relational data is modeled as a graph depends entirely on the data being visualized and the questions you’re asking.

Taking the above vehicle / policy example, there would be four generic steps required:

  • Understand the data – know which entity attributes and record formats are present in the data, and how they relate to one another.
  • Combine required data relations – where possible, move the data into a single table to reduce the reliance on joins.
  • Identify key relationships – understand which relationships users need to understand to answer the questions they are asking.
  • Model data around key relationships – For example, an insurance fraud analyst may need to identify vehicles with more than one insurance policy, in which case the Vehicle Identification Numbers and Policy Numbers would be mapped to nodes, with the remaining data entities included as properties:

Relational database visualization: network visualization makes it easy to collate, explore and communicate complex connected data from multiple data silos

When mapping tabular data to a connected graph format, it’s important to consider how to manage IDs – important because KeyLines will automatically merge new data into an existing node if the ID already exists on the chart.

For example, in the case above we could set the vehicle make as the node ID field, so all Nissan vehicles appear as a single node. This could be useful for understanding patterns on a brand level.

Alternatively, the car number plate could be set as the node ID. This way, each unique number plate would appear as a unique node, which is potentially more useful to an analyst. However, it would still be necessary to carefully plan this structure and consider potential anomalies – stolen, shared or transferred number plates, for example.

This blog post explains the basics of modeling tabular data as a graph.

Registered in England and Wales with Company Number 07625370 | VAT Number 113 1740 61
6-8 Hills Road, Cambridge, CB2 1JP. All material © Cambridge Intelligence 2024.
Read our Privacy Policy.