- Jiang, Hanxi Heinrich [Browse]
- Senior thesis
- 69 pages
- Kpotufe, Samory [Browse]
- Singer, Amit [Browse]
- Princeton University. Department of Mathematics [Browse]
- Class year
- Summary note
- We present a new clustering algorithm based on first estimating the modes of the unknown
density f from sample points generated by f. Our notion of mode is generalized to include
points with densities similar to that of a local mode of f. Thus, we first identify sample points which
can be confidently clustered together. We call these the cluster cores. The next step then consists
of assigning the remaining points to cluster cores.
We make mild assumptions on the underlying distribution and the shapes of the clusters. We develop
mode estimation procedures based on the k-NN density estimate and use high probability finite-sample
uniform bounds on the k-NN density estimator to obtain theoretically strong guarantees.
The procedure can be implemented in O(nk log n) time. We show that the clustering algorithm is
competitive against the state-of-art (including DBSCAN, mean-shift, and spectral clustering) based
on simulations and real data experiments.