Link analysis for fraud detection

30th January, 2020

Insurance fraud detection is a matter of understanding connections.

To uncover scams, investigators look for unusual links between people, events, locations and times. They scour huge, noisy, complex, and often incomplete, datasets to understand which connections are genuine, and which could indicate fraud.

Network visualization – or ‘link analysis’ as it is more commonly called by fraud teams – has long been a vital part of the fraud investigator’s arsenal. A powerful and well-designed link analysis for fraud detection tool is the ideal way to overcome data challenges and investigate fraud in an interactive and intuitive way. If you’re looking for a way to visualize your fraud data, request a trial of our toolkits.

Below is an illustration of how our link analysis toolkits are used to investigate insurance claims.

Reviewing insurance claims with link analysis

Most insurance fraud detection systems work in a similar way. Data is collated, rule scored and sorted into three categories: fraud, not fraud and unsure.

A team of analysts then manually reviews the ‘unsures’ – a careful balancing act between keeping genuine customers happy with fast, accurate decisions and preventing real frauds from getting through.

Link analysis, powered by our toolkits, is a great way to view these complex scenarios in a simple format, directly embedded in the investigation workflow.

The fraud detection visual data model

In a real-world link analysis for fraud detection tool, an investigator would be interested in a large number of data points.

Their precise data model would depend on their business processes, structures, and operations, but some typical elements include:

  • Policy – policy car registration, policyholder email, policyholder address, policyholder phone
  • Third party – third party car registration, third party address, third party email, third party phone

This model places claims and policies at the top of a hierarchy, with the third party and policyholder information on the next level.

For the purposes of this example, however, we’ll simplify the visual data model:

simplified data model insurance fraud
A simpler visual model
  • Claim – being investigated
  • Vehicle – involved in the claim
  • Claimant – associated with the vehicle (and claim)
  • Address – at which the claimant lives

Step 1: Load a claim

This claim folder involves two vehicles and three claimants, associated with three separate addresses.
This claim folder involves two vehicles and three claimants, associated with three separate addresses.

Our first step is to load our disputed claim, using the hierarchy layout to simplify the view.

In this example, we have two people (Stephen Porter and Julia Rodriguez) claiming for damage to their vehicles. An additional person, Everett Page, is named in the claim as a witness.

Step 2: Find matches

After loading a claim, the user is offered the ability to ‘find matches’. This runs a query back to the database (in this example we used the DataStax DSE Graph and Gremlin query language under the hood) to find all other claims sharing any similar attributes:

Here we can see claims with shared attributes side-by-side
Here we can see claims with shared attributes side-by-side

Doing so returns two other claims – 2015-06-07 and 2015-03-16 – which show matches on the vehicle and address of a claimant in the original case being investigated.

Step 3: Combine matches

To emphasize unusual connections, we can use our toolkits’ combine feature to merge identical nodes:

graph visualization - insurance fraud - keylines - screenshot3
Combining identical / duplicate nodes will remove chart clutter and make anomalies easier to detect

This adjusts the layout of our network so we can more easily see unusual connections.

graph visualization - insurance fraud - keylines - screenshot4
The hierarchy layout reveals relationships between nodes, highlighting unusual connections

Here we can see our original claimant’s address in Colnbrook Street is associated with an earlier claim. Given they share a surname, however, it’s not really suspicious.

To learn more about link analysis and fraud detection, download our updated white paper. It contains more in-depth examples and best practice advice.

visualizing fraud networksvisualizing fraud networks

Download the White Paper

Step 4: Escalate or accept

This example is a more suspicious match.

graph visualization - insurance fraud - keylines - screenshot5
A suspicious connection – why does Page Everett (witness) share an address with Walter Stweart (claimant)?

Our witness, Everett Page, shares an address with a man named Walter Stewart, who has previously made a claim relating to one of the vehicles involved in this incident.

At this point, an investigator would probably want to escalate this case for special investigation. We can use a context menu to help our analyst progress this further through the workflow:

accept or investigate
Link analysis tools should be embedded directly into the fraud analyst’s workflow

Representing data as a network offers an engaging way for analysts to rapidly understand events. By incorporating KeyLines into existing claims management workflow, we have made the process simple and intuitive.

Find fraud in your own data

This is a simple example, using synthesized data, of how our graph visualization toolkits can make a complex and high-risk exercise simpler and more intuitive.

With the help of a number of graph visualization techniques, including graph layouts and combos, we have built simplified data from multiple policies into a single chart and identified a potential incidence of fraud.

If you want to try our toolkits on your own data, just request a trial account.

Want to learn more?
Fighting Fraud with Graph Databases webinar recording

This post was originally published some time ago. It’s still popular, so we’ve updated it with new example visualizations to keep it useful and relevant.

More from our blog

Visit our blog