Accessibility navigation

Intelligent agents for fault tolerance: from multi-agent simulation to cluster-based implementation

Varghese, B., McKee, G. and Alexandrov, V. (2010) Intelligent agents for fault tolerance: from multi-agent simulation to cluster-based implementation. In: 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops (WAINA). IEEE, pp. 985-990. ISBN 9781424467013

Full text not archived in this repository.

To link to this article DOI: 10.1109/WAINA.2010.21


Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Item Type:Book or Report Section
Divisions:Faculty of Science > School of Systems Engineering
ID Code:17489
Uncontrolled Keywords:cluster-based implementation; fault tolerance; intelligent agents; swarm-array computing
Additional Information:Conference was held in Perth, Australia, 20-23 Apr 2010.

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation