A scalable expressive ensemble learning using Random Prism: a MapReduce approach

Stahl, Frederic; May, David; Mills, Hugo; Bramer, Max; Gaber, Mohamed Medhat

Download

Preview

Text
- Accepted Version

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Stahl, F. ORCID: https://orcid.org/0000-0002-4860-0203, May, D., Mills, H., Bramer, M. and Gaber, M. M. (2015) A scalable expressive ensemble learning using Random Prism: a MapReduce approach. Transactions on Large-Scale Data- and Knowledge-Centered Systems, 9070. pp. 90-107. doi: 10.1007/978-3-662-46703-9_4 (LNCS)

Abstract/Summary

The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.

Altmetric Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/39793
Identification Number/DOI	10.1007/978-3-662-46703-9_4
Refereed	Yes
Divisions	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Publisher	Springer Berlin Heidelberg
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Deposit Details

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	31 Mar 2015 09:13	Date item deposited into CentAUR
Last Modified:	24 Jan 2024 18:55	Date item last modified