Accessibility navigation


Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features

Yang, N., Dey, N., Sherratt, R. S. and Shi, F. (2019) Recognize basic emotional statesin speech by machine learning techniques using mel-frequency cepstral coefficient features. Journal of Intelligent & Fuzzy Systems. ISSN 1875-8967 (In Press)

[img] Text - Accepted Version
· Restricted to Repository staff only
· The Copyright of this document has not been checked yet. This may affect its availability.

826kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.3233/JIFS-179963

Abstract/Summary

Speech Emotion Recognition (SER) has been widely used in many fields, such as smart home assistants commonly found in the market. Smart home assistants that could detect the user’s emotion would improve the communication between a user and the assistant enabling the assistant to offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable algorithm considering performance verses complexity for deployment in smart home devices. The four emotional speech sets were selected from the Berlin Emotional Database (EMO-DB) as experimental data, 26 MFCC features were extracted from each type of emotional speech to identify the emotions of happiness, anger, sadness and neutrality. Then, speaker-independent experiments for our Speech emotion Recognition (SER) were conducted by using the Back Propagation Neural Network (BPNN), Extreme Learning Machine (ELM), Probabilistic Neural Network (PNN) and Support Vector Machine (SVM). Synthesizing the recognition accuracy and processing time, this work shows that the performance of SVM was the best among the four methods as a good candidate to be deployed for SER in smart home devices. SVM achieved an overall accuracy of 92.4% while offering low computational requirements when training and testing. We conclude that the MFCC features and the SVM classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion.

Item Type:Article
Refereed:Yes
Divisions:Faculty of Life Sciences > School of Biological Sciences > Biomedical Sciences
Faculty of Life Sciences > School of Biological Sciences > Department of Bio-Engineering
ID Code:88046
Publisher:IOS Press
Publisher Statement:Special issue: Applied Machine Learning & Management of Volatility, Uncertainty, Complexity and Ambiguity

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation