Scalability of efficient parallel K-Means

Lists

Tools

Pettinger, D. and Di Fatta, G. (2009) Scalability of efficient parallel K-Means. In: 5th IEEE International Conference on E-Science Workshops, 2009, 9-11 Dec. 2009, Oxford, United Kingdom, pp. 96-101, 10.1109/ESCIW.2009.5407991.

[thumbnail of DiFatta09-Pettinger-IEEE-eScience-CS-WS2009.pdf]

Preview

Text - Published Version
· Please see our End User Agreement before downloading.
126kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1109/ESCIW.2009.5407991

Abstract/Summary

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Item Type:	Conference or Workshop Item (Paper)
Refereed:	Yes
Divisions:	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:	4499
Publisher Statement:	"©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE."

Download Statistics

Downloads

Downloads per month over past year

Altmetric

Related URLs

Deposit Details

University Staff: Request a correction | Centaur Editors: Update this record

University of Reading

CentAUR: Central Archive at the University of Reading

Accessibility navigation

Scalability of efficient parallel K-Means

Abstract/Summary

Downloads

Page navigation

See also

Footer navigation