Optimisation techniques for parallel K-Means on MapReduceAl Ghamdi, S., Di Fatta, G. and Stahl, F. ORCID: https://orcid.org/0000-0002-4860-0203 (2015) Optimisation techniques for parallel K-Means on MapReduce. In: Proceedings of the 8th International Conference on Internet and Distributed Computing Systems - Volume 9258, pp. 193-200. Full text not archived in this repository. It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. Official URL: http://dx.doi.org/10.1007/978-3-319-23237-9_17 Abstract/SummaryThe K-Means algorithm is one the most efficient and widely used algorithms for clustering data. However, K-Means performance tends to get slower as data grows larger in size. Moreover, the rapid increase in the size of data has motivated the scientific and industrial communities to develop novel technologies that meet the needs of storing, managing, and analysing large-scale datasets known as Big Data. This paper describes the implementation of parallel K-Means on the MapReduce framework, which is a distributed framework best known for its reliability in processing large-scale datasets. Moreover, a detailed analysis of the effect of distance computations on the performance of K-Means on MapReduce is introduced. Finally, two optimisation techniques are suggested to accelerate K-Means on MapReduce by reducing distance computations per iteration to achieve the same deterministic results.
Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |