Data ultrametricity and clusterability

28 Aug 2019  ·  Dan Simovici, Kaixun Hua ·

The increasing needs of clustering massive datasets and the high cost of running clustering algorithms poses difficult problems for users. In this context it is important to determine if a data set is clusterable, that is, it may be partitioned efficiently into well-differentiated groups containing similar objects. We approach data clusterability from an ultrametric-based perspective. A novel approach to determine the ultrametricity of a dataset is proposed via a special type of matrix product, which allows us to evaluate the clusterability of the dataset. Furthermore, we show that by applying our technique to a dissimilarity space will generate the sub-dominant ultrametric of the dissimilarity.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here