This time we look at another exciting new way to understand your connected data: Clustering!
Currently in beta, the clustering function can be used to identify communities in your networks. It has been carefully optimized to balance speed and quality, providing insight into potential community structures.
Let’s take a closer look…
To understand clustering, we need to understand a network (or ‘graph’) concept called modularity.
Modularity is a way to measure how readily a network can be divided into sub-networks, which we call modules. A high modularity score means there are tightly connected modules, with relatively few links connecting the modules together. A low modularity score indicates the opposite – or a relatively even distribution of links between nodes in the network.
In KeyLines, we calculate network modularity as the fraction of the links whose ends fall inside a group, minus the expected fraction if links were distributed at random. This gives us a score between 0 and 1.
For example, if we imagine a network with 100 nodes and 200 links. If one cluster has 25 nodes and 100 links – i.e. a quarter of the nodes, but half of the links – the modularity would be ½ -¼ = ¼.
Our clustering algorithm works by finding network partitions that will minimize the modularity score. At the beginning of the algorithm, it takes each node as a cluster. We then run through every permutation by moving nodes into clusters, keeping the configuration if the modularity score increases.
The result is ‘optimal’ partitions for different numbers of modules:
Note: This works for both connected and disconnected graphs, and can also take link weightings into account.
Uncovering and understanding communities is a great way of gaining network insight.
Of course, these three use cases are just a tiny fraction of the potential ways clustering can help you find insight in your complex connected data.
Why not try KeyLines clustering capabilities for yourself?