Accessibility navigation

Simplified inverse filter tracked affective acoustic signals classification incorporating deep convolutional neural networks

Kuang, Y., Wu, Q., Wang, Y., Dey, N., Shi, F., Crespo, R. G. and Sherratt, S. ORCID: (2020) Simplified inverse filter tracked affective acoustic signals classification incorporating deep convolutional neural networks. Applied Soft Computing, 97 (A). 106775. ISSN 1568-4946

Text - Accepted Version
· Available under License Creative Commons Attribution Non-commercial No Derivatives.
· Please see our End User Agreement before downloading.


It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1016/j.asoc.2020.106775


Facial expressions, verbal, behavioral, such as limb movements, and physiological features are vital ways for affective human interactions. Researchers have given machines the ability to recognize affective communication through the above modalities in the past decades. In addition to facial expressions, changes in the level of sound, strength, weakness, and turbulence will also convey affective. Extracting affective feature parameters from the acoustic signals have been widely applied in customer service, education, and the medical field. In this research, an improved AlexNet-based deep convolutional neural network (A-DCNN) is presented for acoustic signal recognition. Firstly, preprocessed on signals using simplified inverse filter tracking (SIFT) and short-time Fourier transform (STFT), Mel frequency Cepstrum (MFCC) and waveform-based segmentation were deployed to create the input for the deep neural network (DNN), which was applied widely in signals preprocess for most neural networks. Secondly, acoustic signals were acquired from the public Ryerson Audio-Visual Database of Affective Speech and Song (RAVDESS) affective speech audio system. Through the acoustic signal preprocessing tools, the basic features of the kind of sound signals were calculated and extracted. The proposed DNN based on improved AlexNet has a 95.88% accuracy on classifying eight affective of acoustic signals. By comparing with some linear classifications, such as decision table (DT) and Bayesian inference (BI) and other deep neural networks, such as AlexNet+SVM, recurrent convolutional neural network (R-CNN), etc., the proposed method achieves high effectiveness on the accuracy (A), sensitivity (S1), positive predictive (PP), and f1-score (F1). Acoustic signals affective recognition and classification can be potentially applied in industrial product design through measuring consumers’ affective responses to products; by collecting relevant affective sound data to understand the popularity of the product, and furthermore, to improve the product design and increase the market responsiveness.

Item Type:Article
Divisions:Life Sciences > School of Biological Sciences > Biomedical Sciences
Life Sciences > School of Biological Sciences > Department of Bio-Engineering
ID Code:93153


Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation