Accessibility navigation


What does a typical CNN “see” in an emotional facial image?

Sannasi, M. V., Kyritsis, M. ORCID: https://orcid.org/0000-0002-7151-1698 and Gray, K. L. H. ORCID: https://orcid.org/0000-0002-6071-4588 (2023) What does a typical CNN “see” in an emotional facial image? In: Proceedings of the 9th World Congress on Electrical Engineering and Computer Systems and Sciences (EECSS’23), 3-5 Aug 2023, London, United Kingdom, https://doi.org/10.11159/mvml23.114.

[img]
Preview
Text - Published Version
· Please see our End User Agreement before downloading.

658kB

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.11159/mvml23.114

Abstract/Summary

The objective of this research is to understand the current capabilities of artificial neural network algorithms and contrast them to the human visual system, in order to identify the most effective features to support affective automation. This can, in turn, aid in optimisation of resources used for storage and transmission by understanding which level of information can be used to augment and potentially accelerate accurate identification of emotional facial expressions. For the first part of our experiment, which we present in this paper, we focused on evaluating feature selection of facial expression images using machine learning. 70 (10 examples of each basic emotion) randomly selected from the NIMSTIM dataset images were used, which were split into train (56) and test (14) sets. The testing images were then processed using Singular Vector Decomposition to vary the levels of information shown in the image. Next, the training dataset was used to train a Convolutional Neural Network algorithm with 18 layers (with convolutional, max pooling, dropout, flattening and activation layers) and 66,884,615 trainable parameters. The validation accuracy was 45% and the confusion matrix showed that the emotion disgust was predicted at almost 100% accuracy, surprise at 55%, and sorrow/happiness/neutral at 46-47%. As expected, the granularity level of the test images had an effect on the successful predictions. A feature map visualisation was performed to demonstrate what the CNN “sees” (i.e., the feature selection) in the image in order to accurately predict the emotional expression type. For the next phase of our experiment, we plan on contrasting the features and performance to that of the human visual system using an experimental design with eye tracking.

Item Type:Conference or Workshop Item (Paper)
Refereed:Yes
Divisions:Life Sciences > School of Psychology and Clinical Language Sciences > Department of Psychology
Henley Business School > Digitalisation, Marketing and Entrepreneurship
ID Code:116961

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation