Accessibility navigation


Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

Di Fatta, G., Blasa, F., Cafiero, S. and Fortino, G. (2013) Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks. Journal of Parallel and Distributed Computing, 73 (3). pp. 317-329. ISSN 07437315

[img]
Preview
Text - Accepted Version
· Please see our End User Agreement before downloading.

198kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1016/j.jpdc.2012.09.009

Abstract/Summary

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Item Type:Article
Refereed:Yes
Divisions:Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:29421
Publisher:Elsevier

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation