Space partitioning for scalable K-means

Pettinger, D; Di Fatta, Giuseppe

Download

Full text not archived in this repository.

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Pettinger, D. and Di Fatta, G. (2010) Space partitioning for scalable K-means. In: Draghici, S., Khoshgoftaar, T. M., Palade, V., Pedrycz, W., Wani, M. A. and Zhu, X. (eds.) The Ninth International Conference on Machine Learning and Applications (ICMLA 2010). IEEE Press, pp. 319-324. ISBN 9781424492114 doi: 10.1109/ICMLA.2010.46

Abstract/Summary

K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.

Altmetric Badge

Item Type	Book or Report Section
URI	https://centaur.reading.ac.uk/id/eprint/17498
Identification Number/DOI	10.1109/ICMLA.2010.46
Refereed	Yes
Divisions	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Uncontrolled Keywords	KD-tree, approximate hierarchical clustering, binary space partitioning k-means, binary space partitioning trees, brute force algorithm, centroid, data partition, data point, data structure, distance computation, geometrical constraint, iterative refinement procedure, multidimensional binary search tree, scalable k-means clustering algorithm
Publisher	IEEE Press
Download/View statistics	View download statistics for this item

Related URLs

Deposit Details

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	04 May 2011 09:17	Date item deposited into CentAUR
Last Modified:	24 Mar 2024 05:13	Date item last modified