A text mining framework for Big Data

Pavlopoulou, Niki; Abushwashi, Aeham; Stahl, Frederic; Scibetta, Vittorio

Download

Preview

Text
- Published Version

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Pavlopoulou, N., Abushwashi, A., Stahl, F. ORCID: https://orcid.org/0000-0002-4860-0203 and Scibetta, V. (2017) A text mining framework for Big Data. Expert Update, 17 (1). ISSN 1465-4091 (Special Issue on the 1st BCS SGAI Workshop on Data Stream Mining Techniques and Applications)

Abstract/Summary

Text Mining is the ability to generate knowledge (insight) from text. This is a challenging task, especially when the target text databases are very large. Big Data has attracted much attention lately, both from academia and industry. A number of distributed databases, search engines and frameworks have been developed to handle the memory and time constraints, which are required to process a large amount of data. However, there is no open-source end-to-end framework that can combinenearreal-timeandbatchprocessingofingestedbigtextualdataalongwith user-deﬁned options and provision of speciﬁc, reliable insight from the data. This is important as this way new unstructured information is made accessible in near real-time, more personalised customer products can be created and novel unusual patterns can be found and actioned on quickly. This work focuses on a proprietary complete near real-time automated classiﬁcation framework for unstructured data with the use of Natural Language Processing and Machine Learning algorithms on Apache Spark. The evaluation of our framework shows that it achieves a comparable accuracy with respect to some of the best approaches presented in the literature.

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/70108
Official URL	http://www.expertupdate.org/
Refereed	Yes
Divisions	Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Publisher	BCS Specialist Group on Artifical Intelligence
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Funded Project

Related URLs

Deposit Details

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Funders:	Technology Strategy Board	The sponsoring bodies who contributed funding for the creation of this item. Example: NERC Example: The Royal Society of Chemistry A pick list of funders may appear as you type in the funder's name in full or as an acronym. Select a correct match to complete the field or type in a new entry in full. For new entries, the full name is preferred.
Projects:	The University of Reading and Exonar Limited Funded by: Technology Strategy Board (509447 - £85,417) 1 January 2015 - 31 December 2016	Click Add to select your project (received at Reading) from an autocomplete list.

Date Deposited:	28 Apr 2017 11:14	Date item deposited into CentAUR
Last Modified:	10 May 2026 08:18	Date item last modified