Text data augmentations: permutation, antonyms and negation

Haralabopoulos, Giannis; Torres, Mercedes Torres; Anagnostopoulos, Ioannis; McAuley, Derek

Download

[thumbnail of 1-s2.0-S0957417421002104-main.pdf]

Text
- Published Version
· Restricted to Repository staff only
· The Copyright of this document has not been checked yet. This may affect its availability.

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Haralabopoulos, G. ORCID: https://orcid.org/0000-0002-2142-4975, Torres, M. T., Anagnostopoulos, I. and McAuley, D. (2021) Text data augmentations: permutation, antonyms and negation. Expert Systems with Applications, 177. 114769. ISSN 0957-4174 doi: 10.1016/j.eswa.2021.114769

Abstract/Summary

Text has traditionally been used to train automated classifiers for a multitude of purposes, such as: classification, topic modelling and sentiment analysis. State-of-the-art LSTM classifier require a large number of training examples to avoid biases and successfully generalise. Labelled data greatly improves classification results, but not all modern datasets include large numbers of labelled examples. Labelling is a complex task that can be expensive, time-consuming, and potentially introduces biases. Data augmentation methods create synthetic data based on existing labelled examples, with the goal of improving classification results. These methods have been successfully used in image classification tasks and recent research has extended them to text classification. We propose a method that uses sentence permutations to augment an initial dataset, while retaining key statistical properties of the dataset. We evaluate our method with eight different datasets and a baseline Deep Learning process. This permutation method significantly improves classification accuracy by an average of 4.1%. We also propose two more text augmentations that reverse the classification of each augmented example, antonym and negation. We test these two augmentations in three eligible datasets, and the results suggest an -averaged, across all datasets-improvement in classification accuracy of 0.35% for antonym and 0.4% for negation, when compared to our proposed permutation augmentation.

Altmetric Badge

Dimensions Badge

Item Type	Article
URI	https://centaur.reading.ac.uk/id/eprint/99989
Identification Number/DOI	10.1016/j.eswa.2021.114769
Refereed	Yes
Divisions	Henley Business School > Digitalisation, Marketing and Entrepreneurship
Publisher	Elsevier
Download/View statistics	View download statistics for this item

Deposit Details

CORE (COnnecting REpositories)

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	03 Sep 2021 13:00	Date item deposited into CentAUR
Last Modified:	15 Jun 2025 14:06	Date item last modified