Accessibility navigation

Scaling up data mining techniques to large datasets using parallel and distributed processing

Stahl, F., Gabber, M. M. and Max, B. (2013) Scaling up data mining techniques to large datasets using parallel and distributed processing. In: Rausch, P., Sheta, A. F. and Ayesh, A. (eds.) Business Intelligence and Performance Management. Springer, pp. 243-259. ISBN 9781447148654

Full text not archived in this repository.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1007/978-1-4471-4866-1_16


Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.

Item Type:Book or Report Section
Divisions:Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
ID Code:31267

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation