A text mining framework for Big DataPavlopoulou, N., Abushwashi, A., Stahl, F. ORCID: https://orcid.org/0000-0002-4860-0203 and Scibetta, V. (2017) A text mining framework for Big Data. Expert Update, 17 (1). ISSN 1465-4091 (Special Issue on the 1st BCS SGAI Workshop on Data Stream Mining Techniques and Applications)
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. Official URL: http://www.expertupdate.org/ Abstract/SummaryText Mining is the ability to generate knowledge (insight) from text. This is a challenging task, especially when the target text databases are very large. Big Data has attracted much attention lately, both from academia and industry. A number of distributed databases, search engines and frameworks have been developed to handle the memory and time constraints, which are required to process a large amount of data. However, there is no open-source end-to-end framework that can combinenearreal-timeandbatchprocessingofingestedbigtextualdataalongwith user-defined options and provision of specific, reliable insight from the data. This is important as this way new unstructured information is made accessible in near real-time, more personalised customer products can be created and novel unusual patterns can be found and actioned on quickly. This work focuses on a proprietary complete near real-time automated classification framework for unstructured data with the use of Natural Language Processing and Machine Learning algorithms on Apache Spark. The evaluation of our framework shows that it achieves a comparable accuracy with respect to some of the best approaches presented in the literature.
Download Statistics DownloadsDownloads per month over past year Funded Project Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |