PMCRI: a parallel modular classification rule induction frameworkStahl, F. ORCID: https://orcid.org/0000-0002-4860-0203, Bramer, M. and Adda, M. (2009) PMCRI: a parallel modular classification rule induction framework. In: Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science (5632). Springer, pp. 148-162. ISBN 9783642030697 Full text not archived in this repository. It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.1007/978-3-642-03070-3_12 Abstract/SummaryIn a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.
Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |