What does a typical CNN “see” in an emotional facial image?

Sannasi, Mathy Vandhana; Kyritsis, Markos; Gray, Katie L. H.

Download

Preview

Text
- Published Version

Advice

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Tools

Lists

Sannasi, M. V., Kyritsis, M. ORCID: https://orcid.org/0000-0002-7151-1698 and Gray, K. L. H. ORCID: https://orcid.org/0000-0002-6071-4588 (2023) What does a typical CNN “see” in an emotional facial image? In: Proceedings of the 9th World Congress on Electrical Engineering and Computer Systems and Sciences (EECSS’23), 3-5 Aug 2023, London, United Kingdom. doi: 10.11159/mvml23.114

Abstract/Summary

The objective of this research is to understand the current capabilities of artificial neural network algorithms and contrast them to the human visual system, in order to identify the most effective features to support affective automation. This can, in turn, aid in optimisation of resources used for storage and transmission by understanding which level of information can be used to augment and potentially accelerate accurate identification of emotional facial expressions. For the first part of our experiment, which we present in this paper, we focused on evaluating feature selection of facial expression images using machine learning. 70 (10 examples of each basic emotion) randomly selected from the NIMSTIM dataset images were used, which were split into train (56) and test (14) sets. The testing images were then processed using Singular Vector Decomposition to vary the levels of information shown in the image. Next, the training dataset was used to train a Convolutional Neural Network algorithm with 18 layers (with convolutional, max pooling, dropout, flattening and activation layers) and 66,884,615 trainable parameters. The validation accuracy was 45% and the confusion matrix showed that the emotion disgust was predicted at almost 100% accuracy, surprise at 55%, and sorrow/happiness/neutral at 46-47%. As expected, the granularity level of the test images had an effect on the successful predictions. A feature map visualisation was performed to demonstrate what the CNN “sees” (i.e., the feature selection) in the image in order to accurately predict the emotional expression type. For the next phase of our experiment, we plan on contrasting the features and performance to that of the human visual system using an experimental design with eye tracking.

Altmetric Badge

Item Type	Conference or Workshop Item (Paper)
URI	https://centaur.reading.ac.uk/id/eprint/116961
Identification Number/DOI	10.11159/mvml23.114
Refereed	Yes
Divisions	Life Sciences > School of Psychology and Clinical Language Sciences > Department of Psychology Henley Business School > Digitalisation, Marketing and Entrepreneurship
Download/View statistics	View download statistics for this item

Download Statistics

Downloads

Downloads per month over past year

Related URLs

Deposit Details

University Staff: Request a correction | Centaur Editors: Update this record

Date Deposited:	04 Jul 2024 09:17	Date item deposited into CentAUR
Last Modified:	08 Jan 2025 10:34	Date item last modified