Princeton University Library Catalog

A practical clustering algorithm based on k-NN density mode estimation with provable learning bounds

Jiang, Hanxi Heinrich [Browse]
Senior thesis
Kpotufe, Samory [Browse]
Singer, Amit [Browse]
Princeton University. Department of Mathematics [Browse]
Class year:
69 pages
Summary note:
We present a new clustering algorithm based on first estimating the modes of the unknown density f from sample points generated by f. Our notion of mode is generalized to include points with densities similar to that of a local mode of f. Thus, we first identify sample points which can be confidently clustered together. We call these the cluster cores. The next step then consists of assigning the remaining points to cluster cores. We make mild assumptions on the underlying distribution and the shapes of the clusters. We develop mode estimation procedures based on the k-NN density estimate and use high probability finite-sample uniform bounds on the k-NN density estimator to obtain theoretically strong guarantees. The procedure can be implemented in O(nk log n) time. We show that the clustering algorithm is competitive against the state-of-art (including DBSCAN, mean-shift, and spectral clustering) based on simulations and real data experiments.