Accessibility navigation


Optimisation techniques for parallel K-Means on MapReduce

Al Ghamdi, S., Di Fatta, G. and Stahl, F. (2015) Optimisation techniques for parallel K-Means on MapReduce. In: Proceedings of the 8th International Conference on Internet and Distributed Computing Systems - Volume 9258, pp. 193-200.

Full text not archived in this repository.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Official URL: http://dx.doi.org/10.1007/978-3-319-23237-9_17

Abstract/Summary

The K-Means algorithm is one the most efficient and widely used algorithms for clustering data. However, K-Means performance tends to get slower as data grows larger in size. Moreover, the rapid increase in the size of data has motivated the scientific and industrial communities to develop novel technologies that meet the needs of storing, managing, and analysing large-scale datasets known as Big Data. This paper describes the implementation of parallel K-Means on the MapReduce framework, which is a distributed framework best known for its reliability in processing large-scale datasets. Moreover, a detailed analysis of the effect of distance computations on the performance of K-Means on MapReduce is introduced. Finally, two optimisation techniques are suggested to accelerate K-Means on MapReduce by reducing distance computations per iteration to achieve the same deterministic results.

Item Type:Conference or Workshop Item (Paper)
Refereed:Yes
Divisions:Faculty of Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:68356
Uncontrolled Keywords:Clustering, K-Means, Mapreduce, Parallel K-Means
Publisher:Springer-Verlag New York, Inc.

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation