Accessibility navigation


J-measure based hybrid pruning for complexity reduction in classification rules

Liu, H., Gegov, A. and Stahl, F. (2013) J-measure based hybrid pruning for complexity reduction in classification rules. WSEAS Transactions on Systems, 12 (9). pp. 433-446. ISSN 2224-2678

[img]
Preview
Text - Published Version
· Please see our End User Agreement before downloading.

764kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Official URL: http://www.wseas.org/cms.action?id=6952

Abstract/Summary

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.

Item Type:Article
Refereed:Yes
Divisions:Faculty of Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:34529
Uncontrolled Keywords:Data Mining, Machine Learning, Classification Rules, J-pruning, Jmax-pruning, Jmid-pruning, if-then rules, overfitting, J-measure
Publisher:WESAS

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation