Optimisation techniques for parallel K-Means on MapReduce
Al Ghamdi, S., Di Fatta, G. and Stahl, F. Full text not archived in this repository. It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. Official URL: http://dx.doi.org/10.1007/978-3-319-23237-9_17 Abstract/SummaryThe K-Means algorithm is one the most efficient and widely used algorithms for clustering data. However, K-Means performance tends to get slower as data grows larger in size. Moreover, the rapid increase in the size of data has motivated the scientific and industrial communities to develop novel technologies that meet the needs of storing, managing, and analysing large-scale datasets known as Big Data. This paper describes the implementation of parallel K-Means on the MapReduce framework, which is a distributed framework best known for its reliability in processing large-scale datasets. Moreover, a detailed analysis of the effect of distance computations on the performance of K-Means on MapReduce is introduced. Finally, two optimisation techniques are suggested to accelerate K-Means on MapReduce by reducing distance computations per iteration to achieve the same deterministic results.
Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |