A practical clustering algorithm based on k-NN density mode estimation with provable learning bounds

Author/​Artist
Jiang, Hanxi Heinrich [Browse]
Format
Senior thesis
Language
English
Description
69 pages

Details

Advisor(s)
Kpotufe, Samory [Browse]
Contributor(s)
Singer, Amit [Browse]
Department
Princeton University. Department of Mathematics [Browse]
Class year
2015
Summary note
We present a new clustering algorithm based on first estimating the modes of the unknown density f from sample points generated by f. Our notion of mode is generalized to include points with densities similar to that of a local mode of f. Thus, we first identify sample points which can be confidently clustered together. We call these the cluster cores. The next step then consists of assigning the remaining points to cluster cores. We make mild assumptions on the underlying distribution and the shapes of the clusters. We develop mode estimation procedures based on the k-NN density estimate and use high probability finite-sample uniform bounds on the k-NN density estimator to obtain theoretically strong guarantees. The procedure can be implemented in O(nk log n) time. We show that the clustering algorithm is competitive against the state-of-art (including DBSCAN, mean-shift, and spectral clustering) based on simulations and real data experiments.

Supplementary Information