Using Multi-inception CNN for Face Emotion Recognition

Document Type : Original Article


1 CEECS, Florida Atlantic University, FL, USA

2 Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA

3 Department of Computer Science, Memorial University of Newfoundland, NF, Canada



One integral and necessary part of human behavior is emotion, which affects the way people communicate. Although human beings can recognize and interpret facial expressions, the identification of correct facial expressions continues to be a key and challenging task by computer systems. The main issues stem from the face's non-uniform design and variations in conditions such as light, facial structure, and posture. Several Convolutional Neural Network (CNN) approaches have been introduced for Face Emotion Recognition (FER), but these methods cannot completely reflect temporal variations in facial characteristics.
In this study, we use the CMU face data collection of four types of emotions to provide a method for the identification of facial emotions. Four classes of distinguished emotions are happy, sad, angry, and neutral. Pixel values are fed into a Neural Network with different architecture, and the accuracy of those methods has been compared. Restricted Boltzmann machine (RBM), Deep Belief Networks (DBN), Convolutional Neural Networks (CNN), and multi-inception ensemble Convolution Neural Networks are different methods that are used in this research. We note the latter has considerably higher accuracy compared to other ones. The results obtained from the proposed methods Multi-inception CNN is slightly more than 87 percent while for the Restricted Boltzmann Machine (RBM) model it is 26.1 percent and for Deep Belief Networks (DBN) results are almost the same and slightly more than 26 percent finally the results for simple CNN model is 55 percent.

Keyword: Face Emotion Recognition, FER, Deep Belief, RBM, multi-inception CNN.

Graphical Abstract

Using Multi-inception CNN for Face Emotion Recognition


[1]     Z. Salek, F. M. Madani and R. Azmi (2013), Intrusion detection using neural networks trained by differential evaluation algorithm,10th International ISC Conference on Information Security and Cryptology (ISCISC), Yazd, pp. 1-6. DOI: 10.1109/ISCISC.2013.6767341
[2]    Altaher, A. S., & Taha, S. M. R. (2017). Personal authentication based on finger knuckle print using quantum computing. International Journal of Biometrics, 9(2), 129-142.
[3]    Ali, A. M., Zhuang, H., & Ibrahim, A. K. (2017). An approach for facial expression classification. International Journal of Biometrics, 9(2), 96-112.
[4]    Ali, A. M., Zhuang, H., & Ibrahim, A. K. (2020). Multi-pose facial expression recognition using rectangular HOG feature extractor and label-consistent KSVD classifier. International Journal of Biometrics, 12(2), 147-162.
[5]    Ekman, P., & Friesen, W. (1971). Constants across cultures in the face and emotion. Journal Of Personality And Social Psychology, 17(2), 124-129. DOI: 10.1037/h0030377
[6]    Jack, R., Garrod, O., Yu, H., Caldara, R., & Schyns, P. (2012). Facial expressions of emotion are not culturally universal. Proceedings Of The National Academy Of Sciences, 109(19), 7241-7244. DOI: 10.1073/pnas.1200155109
[7]    Das, D., & Chakrabarty, A. (2016). Emotion recognition from face dataset using deep neural nets. 2016 International Symposium On Innovations In Intelligent Systems And Applications (INISTA). DOI: 10.1109/inista.2016.7571861
[8]    Li, S., & Deng, W. (2020). Deep Facial Expression Recognition: A Survey. IEEE Transactions On Affective Computing, 1-1. DOI: 10.1109/taffc.2020.2981446  
[9]    Ko, B. (2018). A Brief Review of Facial Emotion Recognition Based on Visual Information. Sensors, 18(2), 401. DOI: 10.3390/s18020401
[10] Wu, Y., & Qiu, W. (2017). Facial expression recognition based on improved deep belief networks. AIP Conference Proceedings 1864, 020130 (2017). Retrieved from
[11] Al Ani, L. A., & Al Tahir, H. S. (2020). Classification Performance of TM Satellite Images. Al-Nahrain Journal of Science, 23(1), 62-68.
[12] Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., & Kollias, S. (2005). Emotion recognition through facial expression analysis based on a neuro-fuzzy network. Neural Networks, 18(4), 423-435. DOI: 10.1016/j.neunet.2005.03.004
[13] Kenji, M. (1991). Recognition of facial expression from optical flow. IEICE TRANSACTIONS On Information And Systems, E74-D(10), 3474-3483.
[14] Y. Yacoob and L. Davis(1994), Computing Spatio-temporal representations of human faces .Computer Vision and Pattern Recognition. Proceedings CVPR’94., 1994 IEEE Computer Society Conference On. IEEE, pp. 70–75.
[15] M. J. Black and Y. Yacoob, (1995), Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. Proceedings of IEEE International Conference on Computer Vision, Cambridge, MA, USA, pp. 374-381. DOI: 10.1109/ICCV.1995.466915
[16] C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee,
A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan (2004), Analysis of emotion recognition using facial expressions, speech and multimodal information . Proceedings of the 6th international conference on Multimodal interfaces.  ACM, pp. 205–211.
[17] Minaee, S., & Abdolrashid, A. (2019). Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network. Retrieved from http://arXiv:1902.01019v1 [cs.CV] 4 Feb 2019
[18] P. Liu, S. Han, Z. Meng, and Y. Tong,(2014), Facial Expression Recognition via a Boosted Deep Belief Network. IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 1805-1812. DOI: 10.1109/CVPR.2014.233
[19] Restricted Boltzmann machine. (2020). Retrieved 25 June 2020, from Boltzmann machine  
[20] Y. Bengio. (2009). Learning deep architectures for AI. Foundations and trends® in Machine Learning, vol. 2, no. 1, pp. 1–127
[21] G. E. Hinton, S. Osindero, and Y.-W. (2006). A fast learning algorithm for deep belief nets. Journal of Neural computation, vol. 18, no. 7, pp. 1527–1554
[22] H. Larochelle, “Neural networks [7.7]: Deep learning - deep belief network.” [Online]. Available:
[23] Wang, C., & Xi, Y. (). Convolutional Neural Network for Image Classification. Johns Hopkins University Baltimore, MD, 21218.
[24] Li, H., Ellis, J. G., Zhang, L., & Chang, S. F. (2018, June). Pattern net: Visual pattern mining with a deep neural network. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval (pp. 291-299).
[25] Khalid, M., Wu, J., Ali, T. M., Ameen, T., Altaher, A. S., Moustafa, A. A., ... & Xiong, R. (2020). Cortico-Hippocampal Computational Modeling Using Quantum-Inspired Neural Network. Frontiers in Computational Neuroscience, 14, 80.
[26] Abidalkareem, A. J., Abd, M. A., Ibrahim, A. K., Zhuang, H., Altaher, A. S., & Ali, A. M. (2020, July). Diabetic Retinopathy (DR) Severity Level Classification Using Multimodel Convolutional Neural Networks. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (pp. 1404-1407). IEEE.
[27] S. Rifai, Y. Bengio, A. Courville, P. Vincent, and M. Mirza (2012) .Disentangling factors of variation for facial expression recognition. Springer ECCV, pages 808–822.
[28] Pan, Z., Luo, Z., Yang, J., and Li, H., 2020. Multi-modal Attention for Speech Emotion Recognition. arXiv preprint arXiv:2009.04107,.
[29] Chen, S., & Jin, Q. (2016, October). Multi-modal conditional attention fusion for dimensional emotion prediction. In Proceedings of the 24th ACM international conference on Multimedia (pp. 571-575).
[30] [30] Beard, R., Das, R., Ng, R. W., Gopalakrishnan, P. K., Eerens, L., Swietojanski, P., & Miksik, O. (2018, October). Multi-modal sequence fusion via recursive attention for emotion recognition. In Proceedings of the 22nd Conference on Computational Natural Language Learning (pp. 251-259).
[31] Zhao, S., Ding, G., Gao, Y., & Han, J. (2017, October). Learning visual emotion distributions via multi-modal features fusion. In Proceedings of the 25th ACM international conference on Multimedia (pp. 369-377).
[32] Liang, J., Chen, S., & Jin, Q. (2019, November). Semi-supervised Multimodal Emotion Recognition with Improved Wasserstein GANs. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 695-703). IEEE.
[33] Siriwardhana, S., Kaluarachchi, T., Billinghurst, M., & Nanayakkara, S. (2020). Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion. IEEE Access.
[34] Alphonse, A. S., Shankar, K., Rakkini, M. J., Ananthakrishnan, S., Athisayamani, S., Singh, A. R., & Gobi, R. (2020). A multi-scale and rotation-invariant phase pattern (MRIPP) and a stack of restricted Boltzmann machine (RBM) with preprocessing for facial expression classification. Journal of Ambient Intelligence and Humanized Computing, 1-17.
[35]  Li, Y., Fang, S., Bai, X., Jiao, L., & Marturi, N. (2020). Parallel Design of Sparse Deep Belief Network with Multi-objective Optimization. Information Sciences.
[36] Trujillo, L., Olague, G., Hammoud, R., & Hernandez, B. (2005, September). Automatic feature localization in thermal images for facial expression recognition. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)-Workshops (pp. 14-14). IEEE.
[37] Ilbeygi, M., & Shah-Hosseini, H. (2012). A novel fuzzy facial expression recognition system based on facial feature extraction from color face images. Engineering Applications of Artificial Intelligence, 25(1), 130-146.
[38] Priyasad, D., Fernando, T., Denman, S., Sridharan, S., & Fookes, C. (2019). Learning salient features for multimodal emotion recognition with recurrent neural networks and attention-based fusion. In 15th International Conference on Auditory-Visual Speech Processing (AVSP).
[39] Jain, N., Kumar, S., Kumar, A., Shamsolmoali, P., & Zareapoor, M. (2018). Hybrid deep neural networks for face emotion recognition. Pattern Recognition Letters, 115, 101-106.
[40] Ekundayo, O., & Viriri, S. (2019, March). Facial expression recognition: a review of methods, performances, and limitations. In 2019 Conference on Information Communications Technology and Society (ICTAS) (pp. 1-6). IEEE.