Accessibility navigation


On expressiveness and uncertainty awareness in rule-based classification for data streams

Le, T., Stahl, F., Gaber, M. M., Gomes, J. B. and Di Fatta, G. (2017) On expressiveness and uncertainty awareness in rule-based classification for data streams. Neurocomputing, 265. 127- 141. ISSN 0925-2312

[img]
Preview
Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.

2MB
[img] Text - Accepted Version
· Restricted to Repository staff only
· Available under License Creative Commons Attribution Non-commercial No Derivatives.

4MB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1016/j.neucom.2017.05.081

Abstract/Summary

Mining data streams is a core element of Big Data Analytics. It represents the velocity of large datasets, which is one of the four aspects of Big Data, the other three being volume, variety and veracity. As data streams in, models are constructed using data mining techniques tailored towards continuous and fast model update. The Hoeffding Inequality has been among the most successful approaches in learning theory for data streams. In this context, it is typically used to provide a statistical bound for the number of examples needed in each step of an incremental learning process. It has been applied to both classification and clustering problems. Despite the success of the Hoeffding Tree classifier and other data stream mining methods, such models fall short of explaining how their results (i.e. classifications) are reached (black boxing). The expressiveness of decision models in data streams is an area of research that has attracted less attention, despite its paramount of practical importance. In this paper, we address this issue, adopting Hoeffding Inequality as an upper bound to build decision rules which can help decision makers with informed predictions (white boxing). We termed our novel method Hoeffding Rules with respect to the use of the Hoeffding Inequality in the method, for estimating whether an induced rule from a smaller sample would be of the same quality as a rule induced from a larger sample. The new method brings in a number of novel contributions including handling uncertainty through abstaining, dealing with continuous data through Gaussian statistical modelling, and an experimentally proven fast algorithm. We conducted a thorough experimental study using benchmark datasets, showing the efficiency and expressiveness of the proposed technique when compared with the state-of-the-art.

Item Type:Article
Refereed:Yes
Divisions:Faculty of Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:71000
Uncontrolled Keywords:Modular Classification Rule Induction
Publisher:Elsevier

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation