Facial Expression Recognition Using Convolutional Neural Network with Attention Module

Habib Bahari Khoirullah - Brawijaya University, Malang, East Java, 65145, Indonesia
Novanto Yudistira - Brawijaya University, Malang, East Java, 65145, Indonesia
Fitra Abdurrachman Bachtiar - Brawijaya University, Malang, East Java, 65145, Indonesia


Citation Format:



DOI: http://dx.doi.org/10.30630/joiv.6.4.963

Abstract


Human Activity Recognition (HAR) is an introduction to human activities that refer to the movements performed by an individual on specific body parts. One branch of HAR is human emotion. Facial emotion is vital in human communication to help convey emotional states and intentions. Facial Expression Recognition (FER) is crucial to understanding how humans communicate. Misinterpreting Facial Expressions can lead to misunderstanding and difficulty reaching a common ground. Deep Learning can help in recognizing these facial expressions. To improve the probation of Facial Expressions Recognition, we propose ResNet attached with an Attention module to push the performance forward. This approach performs better than the standalone ResNet because the localization and sampling grid allows the model to learn how to perform spatial transformations on the input image. Consequently, it improves the model's geometric invariance and picks up the features of the expressions from the human face, resulting in better classification results. This study proves the proposed method with attention is better than without, with a test accuracy of 0.7789 on the FER dataset and 0.8327 on the FER+ dataset. It concludes that the Attention module is essential in recognizing Facial Expressions using a Convolutional Neural Network (CNN). Advice for further research first, add more datasets besides FER and FER+, and second, add a Scheduler to decrease the learning rate during the training data.

Keywords


Facial expression recognition; attention; CNN.

Full Text:

PDF

References


J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer Normalization,†2016, [Online]. Available: http://arxiv.org/abs/1607.06450.

F. Y. Rahadika, N. Yudistira, and Y. A. Sari, “Facial Expression Recognition using Residual Convnet with Image Augmentations,†J. Ilmu Komput. dan Inf., vol. 14, no. 2, pp. 127–135, 2021, doi: 10.21609/jiki.v14i2.968.

M. Pourmirzaei, G. A. Montazer, and F. Esmaili, “Using Self-Supervised Auxiliary Tasks to Improve Fine-Grained Facial Representation,†2021.

N. Thseen, “Face-To-Face Communication, Non-Verbal Body Language and Phubbing: the Intrusion in the Process,†Russ. J. Educ. Psychol., vol. 11, no. 2, p. 22, 2020, doi: 10.12731/2658-4034-2020-2-22-31.

L. F. Barrett, R. Adolphs, S. Marsella, A. M. Martinez, and S. D. Pollak, “Emotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements,†Psychol. Sci. Public Interes., vol. 20, no. 1, pp. 1–68, 2019, doi: 10.1177/1529100619832930.

F. Psychol, “Facial Expressions in Context: Electrophysiological Correlates of the Emotional Congruency of Facial Expressions and Background Scenes,†frontiersin.org, 2017. https://www.frontiersin.org/articles/10.3389/fpsyg.2017.02175/full.

S. Turabzadeh, H. Meng, R. Swash, M. Pleva, and J. Juhar, “Facial Expression Emotion Detection for Real-Time Embedded Systems,†Technologies, vol. 6, no. 1, p. 17, 2018, doi: 10.3390/technologies6010017.

M. J. Taylor, C. Shikaislami, C. McNicholas, D. Taylor, J. Reed, and I. Vlaev, “Using virtual worlds as a platform for collaborative meetings in healthcare: A feasibility study,†BMC Health Serv. Res., vol. 20, no. 1, pp. 1–10, 2020, doi: 10.1186/s12913-020-05290-7.

I. Bello, B. Zoph, Q. Le, A. Vaswani, and J. Shlens, “Attention augmented convolutional networks,†Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 3285–3294, 2019, doi: 10.1109/ICCV.2019.00338.

K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao, “Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition,†IEEE Trans. Image Process., vol. 29, no. 8, pp. 4057–4069, 2020, doi: 10.1109/TIP.2019.2956143.

H. Zhou et al., “Exploring emotion features and fusion strategies for audio-video emotion recognition,†ICMI 2019 - Proc. 2019 Int. Conf. Multimodal Interact., pp. 562–566, 2019, doi: 10.1145/3340555.3355713.

E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, “Training deep networks for facial expression recognition with crowd-sourced label distribution,†ICMI 2016 - Proc. 18th ACM Int. Conf. Multimodal Interact., pp. 279–283, 2016, doi: 10.1145/2993148.2993165.

T. H. Vo, G. S. Lee, H. J. Yang, and S. H. Kim, “Pyramid with Super Resolution for In-the-Wild Facial Expression Recognition,†IEEE Access, vol. 8, pp. 131988–132001, 2020, doi: 10.1109/ACCESS.2020.3010018.

D. Gera, G. N. Vikas, and S. Balasubramanian, Handling ambiguous annotations for facial expression recognition in the wild, vol. 1, no. 1. Association for Computing Machinery, 2021.

A. Anton, N. F. Nissa, A. Janiati, N. Cahya, and P. Astuti, “Application of Deep Learning Using Convolutional Neural Network (CNN) Method For Women’s Skin Classification,†Sci. J. Informatics, vol. 8, no. 1, pp. 144–153, 2021, doi: 10.15294/sji.v8i1.26888.

R. Salakhutdinov and G. Hinton, “Replicated softmax: An undirected topic model,†Adv. Neural Inf. Process. Syst. 22 - Proc. 2009 Conf., pp. 1607–1614, 2009.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,†Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.

B. Mandal, A. Okeukwu, and Y. Theis, “Masked Face Recognition using ResNet-50,†2021, [Online]. Available: http://arxiv.org/abs/2104.08997.

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,†Adv. Neural Inf. Process. Syst., vol. 2015-Janua, pp. 2017–2025, 2015.

C. Luna-Jiménez, J. Cristóbal-Martín, R. Kleinlein, M. Gil-Martín, J. M. Moya, and F. Fernández-Martínez, “Guided spatial transformers for facial expression recognition,†Appl. Sci., vol. 11, no. 16, 2021, doi: 10.3390/app11167217.