Facial Expression Recognition Using Convolutional Neural Network with Attention Module

Habib Bahari Khoirullah; Novanto Yudistira; Fitra Abdurrachman Bachtiar

doi:10.30630/joiv.6.4.963

Facial Expression Recognition Using Convolutional Neural Network with Attention Module

Habib Bahari Khoirullah - Brawijaya University, Malang, East Java, 65145, Indonesia
Novanto Yudistira - Brawijaya University, Malang, East Java, 65145, Indonesia
Fitra Abdurrachman Bachtiar - Brawijaya University, Malang, East Java, 65145, Indonesia

Citation Format:

DOI: http://dx.doi.org/10.30630/joiv.6.4.963

Abstract

Human Activity Recognition (HAR) is an introduction to human activities that refer to the movements performed by an individual on specific body parts. One branch of HAR is human emotion. Facial emotion is vital in human communication to help convey emotional states and intentions. Facial Expression Recognition (FER) is crucial to understanding how humans communicate. Misinterpreting Facial Expressions can lead to misunderstanding and difficulty reaching a common ground. Deep Learning can help in recognizing these facial expressions. To improve the probation of Facial Expressions Recognition, we propose ResNet attached with an Attention module to push the performance forward. This approach performs better than the standalone ResNet because the localization and sampling grid allows the model to learn how to perform spatial transformations on the input image. Consequently, it improves the model's geometric invariance and picks up the features of the expressions from the human face, resulting in better classification results. This study proves the proposed method with attention is better than without, with a test accuracy of 0.7789 on the FER dataset and 0.8327 on the FER+ dataset. It concludes that the Attention module is essential in recognizing Facial Expressions using a Convolutional Neural Network (CNN). Advice for further research first, add more datasets besides FER and FER+, and second, add a Scheduler to decrease the learning rate during the training data.

Keywords

Facial expression recognition; attention; CNN.

Full Text:

PDF

References

J. L. Ba, J. R. Kiros, and G. E. Hinton, â€œLayer Normalization,â€ 2016, [Online]. Available: http://arxiv.org/abs/1607.06450.

F. Y. Rahadika, N. Yudistira, and Y. A. Sari, â€œFacial Expression Recognition using Residual Convnet with Image Augmentations,â€ J. Ilmu Komput. dan Inf., vol. 14, no. 2, pp. 127â€“135, 2021, doi: 10.21609/jiki.v14i2.968.

M. Pourmirzaei, G. A. Montazer, and F. Esmaili, â€œUsing Self-Supervised Auxiliary Tasks to Improve Fine-Grained Facial Representation,â€ 2021.

N. Thseen, â€œFace-To-Face Communication, Non-Verbal Body Language and Phubbing: the Intrusion in the Process,â€ Russ. J. Educ. Psychol., vol. 11, no. 2, p. 22, 2020, doi: 10.12731/2658-4034-2020-2-22-31.

L. F. Barrett, R. Adolphs, S. Marsella, A. M. Martinez, and S. D. Pollak, â€œEmotional Expressions Reconsidered: Challenges to Inferring Emotion From Human Facial Movements,â€ Psychol. Sci. Public Interes., vol. 20, no. 1, pp. 1â€“68, 2019, doi: 10.1177/1529100619832930.

F. Psychol, â€œFacial Expressions in Context: Electrophysiological Correlates of the Emotional Congruency of Facial Expressions and Background Scenes,â€ frontiersin.org, 2017. https://www.frontiersin.org/articles/10.3389/fpsyg.2017.02175/full.

S. Turabzadeh, H. Meng, R. Swash, M. Pleva, and J. Juhar, â€œFacial Expression Emotion Detection for Real-Time Embedded Systems,â€ Technologies, vol. 6, no. 1, p. 17, 2018, doi: 10.3390/technologies6010017.

M. J. Taylor, C. Shikaislami, C. McNicholas, D. Taylor, J. Reed, and I. Vlaev, â€œUsing virtual worlds as a platform for collaborative meetings in healthcare: A feasibility study,â€ BMC Health Serv. Res., vol. 20, no. 1, pp. 1â€“10, 2020, doi: 10.1186/s12913-020-05290-7.

I. Bello, B. Zoph, Q. Le, A. Vaswani, and J. Shlens, â€œAttention augmented convolutional networks,â€ Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-Octob, pp. 3285â€“3294, 2019, doi: 10.1109/ICCV.2019.00338.

K. Wang, X. Peng, J. Yang, D. Meng, and Y. Qiao, â€œRegion Attention Networks for Pose and Occlusion Robust Facial Expression Recognition,â€ IEEE Trans. Image Process., vol. 29, no. 8, pp. 4057â€“4069, 2020, doi: 10.1109/TIP.2019.2956143.

H. Zhou et al., â€œExploring emotion features and fusion strategies for audio-video emotion recognition,â€ ICMI 2019 - Proc. 2019 Int. Conf. Multimodal Interact., pp. 562â€“566, 2019, doi: 10.1145/3340555.3355713.

E. Barsoum, C. Zhang, C. C. Ferrer, and Z. Zhang, â€œTraining deep networks for facial expression recognition with crowd-sourced label distribution,â€ ICMI 2016 - Proc. 18th ACM Int. Conf. Multimodal Interact., pp. 279â€“283, 2016, doi: 10.1145/2993148.2993165.

T. H. Vo, G. S. Lee, H. J. Yang, and S. H. Kim, â€œPyramid with Super Resolution for In-the-Wild Facial Expression Recognition,â€ IEEE Access, vol. 8, pp. 131988â€“132001, 2020, doi: 10.1109/ACCESS.2020.3010018.

D. Gera, G. N. Vikas, and S. Balasubramanian, Handling ambiguous annotations for facial expression recognition in the wild, vol. 1, no. 1. Association for Computing Machinery, 2021.

A. Anton, N. F. Nissa, A. Janiati, N. Cahya, and P. Astuti, â€œApplication of Deep Learning Using Convolutional Neural Network (CNN) Method For Womenâ€™s Skin Classification,â€ Sci. J. Informatics, vol. 8, no. 1, pp. 144â€“153, 2021, doi: 10.15294/sji.v8i1.26888.

R. Salakhutdinov and G. Hinton, â€œReplicated softmax: An undirected topic model,â€ Adv. Neural Inf. Process. Syst. 22 - Proc. 2009 Conf., pp. 1607â€“1614, 2009.

K. He, X. Zhang, S. Ren, and J. Sun, â€œDeep residual learning for image recognition,â€ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770â€“778, 2016, doi: 10.1109/CVPR.2016.90.

B. Mandal, A. Okeukwu, and Y. Theis, â€œMasked Face Recognition using ResNet-50,â€ 2021, [Online]. Available: http://arxiv.org/abs/2104.08997.

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, â€œSpatial transformer networks,â€ Adv. Neural Inf. Process. Syst., vol. 2015-Janua, pp. 2017â€“2025, 2015.

C. Luna-JimÃ©nez, J. CristÃ³bal-MartÃn, R. Kleinlein, M. Gil-MartÃn, J. M. Moya, and F. FernÃ¡ndez-MartÃnez, â€œGuided spatial transformers for facial expression recognition,â€ Appl. Sci., vol. 11, no. 16, 2021, doi: 10.3390/app11167217.

Username
Password
Remember me