Accessibility navigation


Scalable real-time classification of data streams with concept drift

Tennant, M., Stahl, F., Rana, O. and Gomes, J. B. (2017) Scalable real-time classification of data streams with concept drift. Future Generation Computer Systems, 75. pp. 187-199. ISSN 0167-739X

[img]
Preview
Text (Open Access) - Published Version
· Available under License Creative Commons Attribution.
· Please see our End User Agreement before downloading.

2MB
[img] Text - Accepted Version
· Restricted to Repository staff only
· Available under License Creative Commons Attribution Non-commercial No Derivatives.

1MB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1016/j.future.2017.03.026

Abstract/Summary

Inducing adaptive predictive models in real-time from high throughput data streams is one of the most challenging areas of Big Data Analytics. The fact that data streams may contain concept drifts (changes of the pattern encoded in the stream over time) and are unbounded, imposes unique challenges in comparison with predictive data mining from batch data. Several real-time predictive data stream algorithms exist, however, most approaches are not naturally parallel and thus limited in their scalability. This paper highlights the Micro-Cluster Nearest Neighbour (MC-NN) data stream classifier. MC-NN is based on statistical summaries of the data stream and a nearest neighbour approach, which makes MC-NN naturally parallel. In its serial version MC-NN is able to handle data streams, the data does not need to reside in memory and is processed incrementally. MC-NN is also able to adapt to concept drifts. This paper provides an empirical study on the serial algorithm’s speed, adaptivity and accuracy. Furthermore, this paper discusses the new parallel implementation of MC-NN, its parallel properties and provides an empirical scalability study.

Item Type:Article
Refereed:Yes
Divisions:Faculty of Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:70047
Publisher:Elsevier

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation