Affective computing: detecting emotional facial expressions and classifying user affective stateSannasi, M. V. (2024) Affective computing: detecting emotional facial expressions and classifying user affective state. PhD thesis, University of Reading
It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing. To link to this item DOI: 10.48683/1926.00118402 Abstract/SummaryThe ability to identify emotional facial expressions has several areas of applications concerning health, safety, and wellbeing (e.g., road safety management, national/home-land security, healthcare monitoring). A frightened crowd or a person in tears can signal concerns about safety, and an angry mob, potential danger. However, human behavioural studies have observed delayed action by victims and onlookers due to flight-fight-freeze responses in such threatening situations. Thus, automation of emotion recognition can only be regarded advantageous. However, Convolutional Neural Networks (CNNs), the artificial neural network algorithms that are frequently used for recognising facial expressions in humans, require huge training sets and high-resolution images, to be able to have high classification power. Furthermore, low image quality has been found to be a technical challenge to CNNs. Additionally, huge number of high-quality images causes large storage, transmission and computational requirements. This has led to techniques such as image compression, to reduce the size of the image. Still, less consideration is given to tasks after compression, leading to misclassification because of loss of vital information. Hence the current technology used in automatically identifying emotional expressions is severely limited, especially in images of lesser quality. Conversely, the human visual system is known to be capable of detecting emotional facial expressions confidently in almost 100-250 milliseconds and presumably automatic, as demonstrated by masking experiments (e.g., when a fearful face is quickly replaced by a neutral face) where a physiological response to a fearful face existed even in the absence of conscious awareness. In order to identify features selected by the human visual system to make such accurate decisions, two approaches were used: 1) singular value decomposition technique to enable decomposing images into 20 levels of image degradation and ascertaining level(s) with the best accuracy; and, 2) eye-tracking technology to determine the type of information (facial features/areas) used to classify facial expressions in the presence of image degradation. The decomposed images were also used to test three different CNN frameworks to understand their performance in the presence of image degradation. Hence, this research was designed to understand CNNs’ capabilities of facial expression recognition in the presence of image degradation and compare it with that of humans, to identify the most effective features (level and type of information), thus supporting affective automation. The results suggest that the least level of granularity has a statistically significant and negative influence, for both human and CNN predictions. Yet, in human predictions, when considering together with the facial expression type, the latter has a statistically significant influence, with the granularity levels either deterring or facilitating the decision, based on the image quality and type of information available. Differences amongst confidence scores of human participants on their decisions are statistically significant for some emotional expressions, but fearful and sad types did not elicit such difference. Pupil size differences are also statistically significant amongst some emotional expressions (e.g. fear), and deeper analyses suggest subconscious emotion processing. Features selected by both humans and CNNs indicate that identification of evidence-based characteristic facial features correlated with accurate classification of the emotional expressions. The recommendation, thus, would be to ensure that the images contain typical features required to classify emotional expressions, especially in the presence of image degradation, to be accurately classified by CNNs. The features, however, must be representative of the population and should not be constrained by demographics.
Altmetric Deposit Details University Staff: Request a correction | Centaur Editors: Update this record |