Why visualize financial transactions?
Our toolkits are for visualizing networks (or graphs) in data. Transactions fit the graph model perfectly:
- Nodes represent accounts
- Links represent transactions
- Value is represented by node size of link weight
Real transaction data is difficult to obtain and share, for obvious reasons. So instead we decided to take a look at Bitcoin – the peer-to-peer electronic currency system.
Mapping Bitcoin transactions to a graph
The common misconception about Bitcoin is that it is anonymous. Anonymity isn’t actually built into the Bitcoin model, but instead is a by-product of the complex public/private key system used to facilitate payments and deter fraud and theft.
With a lot of processing and investigation, individual Bitcoin accounts can be identified.
In a 2011 paper, the Clique Graph and Network Analysis Research Cluster at University College Dublin demonstrated how public keys could be identified from private transaction keys – effectively allowing Bitcoin transactions tracked and mapped as a graph.
During the project, the Dublin team was able to identify certain key accounts – including the victim of a theft. We’ll use our technology to visualize it.
Parsing from a text file to JSON
We downloaded two user-friendly, pre-processed text files containing a copy of the data collected between 03 Jan 2009 and 12 Jul 2011 – one for nodes and the other for links.
In the node file, each line is a user followed by their transaction keys
17BptPvonJVA3pLDVjgzLEq7Aujgb1LjPS 1Mp3qWVVjBLCsJhmH65EjvAosViTF13aY8 1BorkLa6yrk1TRwqELhkzi4nCWm8BhXWzL 1AMNhMZC7hCyb8rMVda9E8bEf7FB1RpDAF 13EJ9b8qLH7TARcssSZnZVmyW864ar8J3i 1DnsBgY9KkWWp2xw9pL1Xv1QT145UR5TWp
The link file contains transaction value and timestamp data:
905914 20572 0.01 2011-06-23-19-10-01 905914 622803 220.07592886 2011-06-23-19-10-01 823336 118969 2.12 2011-05-16-01-58-01 823336 330686 0.56210609 2011-05-16-01-58-01
We simply parse this data into a JSON format our toolkits can understand.
Visualizing a major Bitcoin account
Once we’d loaded the data, we did a search for the transactions linked to a single account. At first the result is somewhat overwhelming:
The account we searched for is present here as a central node, with many thousands of inbound transactions placed around it. A yellow link indicates a transaction below 10k btc. Red is used for transactions above 10k btc.
Immediately it stands out that there is a second prominent node with its own large orbit of inbound transactions. Our random naming method labeled this account as ‘Mr U’.
Filtering by value
If we filter all the transactions below 10k btc, a more discernible pattern emerges:
The central node is isolated completely – they did not participate in any 10k+ btc transactions during the data collection period. They also do not appear to pay money out – only collect it in small increments. This could indicate it is an online store or service, receiving payment in btc. Or it could indicate the beginnings of a money-laundering operation.
Poor Mr A
Although Mr U still stands out as a very key node in the graph, perhaps more interesting is the chain of very high-value transactions above him, emanating from Mr A.
This is actually a theft (see details of Mr A’s bitcoin fraud). On 13 June 2011, ‘Mr A’s’ slush fund was compromised and payout address changed to ‘Mr D’s’ account.
Here is the same network, with this chain isolated and expanded back to our suspected money launderer (just two hops from the ‘thief’).
Visualizing this data makes it easy to find anomalies and outliers in vast quantities of data. An investigator could also use these charts as a case management tool, adding notes and comments to nodes during the investigation.
Visualize your own transaction data?
This example demonstrates how transaction visualization using tools like KeyLines and ReGraph can rapidly identify interesting areas of activity for more detailed investigation, and improve understanding of events that would be harder to analyze with other techniques.