What is Sparsification?

Data Mining Database Data Structure

The m by m proximity matrix for m data points can be defines as a dense graph in which each node is linked to some others and the weight of the edge between some group of nodes follow their pairwise proximity. Although each object has some method of similarity to each other object, for most data sets, objects are hugely same to a small number of objects and weakly same to most other objects.

This feature can be used to sparsify the proximity graph (matrix), by setting some low-similarity (high-dissimilarity) values to 0 before starting the actual clustering process. The sparsification can be implemented, for instance, by dividing all links that have a same (dissimilarity) below (above) a defined threshold or by maintaining only links to the k nearest neighbors of point. This method creates what is known as knearest neighbor graph.

The benefits of sparsification is as follows −

Data size is reduced − The amount of data that required to be processed to cluster the data is extremely decreased. Sparsification can remove more than 99% of the entries in a proximity matrix. Accordingly, the size of problems that can be managed is enhanced.

Clustering may work better − Sparsification methods keep the link to their closest neighbors of an object while dividing the connections to more distinct objects. This is in maintaining with the closest neighbor principle that the closest neighbors of an object influence belong to the similar class (cluster) as the object itself. This decrease the impact of noise and outliers and file the distinction among clusters.

Graph partitioning algorithms can be used − There has been a large amount of work on heuristic algorithms for discovering min-cut partitionings of sparse graphs, particularly in the space of parallel computing and the design of integrated circuits. Sparsification of the proximity graph creates it applicable to use graph partitioning algorithms for the clustering phase such as Opossum and Chameleon need graph partitioning.

Sparsification of the proximity graph must be regarded as an original step before the need of actual clustering algorithms. A best sparsification can leave the proximity matrix divide into connected elements correlating to the desired clusters, but in practice, this appears.

It is simply for an individual edge to connect two clusters or for an individual cluster to be divided into multiple disconnected subclusters. Indeed, it can see when Jarvis-Patrick and SNN use density-based clustering, the sparse proximity graph is changed to yield a new proximity graph. This new proximity graph can be sparsified. Clustering algorithms operate with the proximity graph that is the outcomes of all these preprocessing procedure.

Ginni

Updated on: 14-Feb-2022

445 Views

Kickstart Your Career

Get certified by completing the course

Get Started