Accessibility navigation


Space partitioning for scalable K-means

Pettinger, D. and Di Fatta, G. (2010) Space partitioning for scalable K-means. In: Draghici, S., Khoshgoftaar, T. M., Palade, V., Pedrycz, W., Wani, M. A. and Zhu, X. (eds.) The Ninth International Conference on Machine Learning and Applications (ICMLA 2010). IEEE Press, pp. 319-324. ISBN 9781424492114

Full text not archived in this repository.

To link to this article DOI: 10.1109/ICMLA.2010.46

Abstract/Summary

K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.

Item Type:Book or Report Section
Refereed:Yes
Divisions:Faculty of Science > School of Systems Engineering
ID Code:17498
Uncontrolled Keywords:KD-tree, approximate hierarchical clustering, binary space partitioning k-means, binary space partitioning trees, brute force algorithm, centroid, data partition, data point, data structure, distance computation, geometrical constraint, iterative refinement procedure, multidimensional binary search tree, scalable k-means clustering algorithm
Publisher:IEEE Press

Centaur Editors: Update this record

Page navigation