Multi-lingual Speech Emotion Recognition System Using Machine Learning

Creative Commons License

Çolakoğlu E., Hızlısoy S., Arslan R. S.

Selcuk University Journal of Engineering Sciences, vol.23, no.1, pp.1-11, 2024 (Peer-Reviewed Journal)


Predicting emotions from speech in different languages with high accuracy has been a challenging task for researchers in recent years. When we delve into the studies conducted in this field, it is clear that researchers generally try to recognize emotions from speech in their traditional language. However, these studies cannot be generalized for multi-lingual environments around the globe. The Turkish speech emotional dataset, which was created for use in our previous studies, was further expanded for use in this study too. Emo-db dataset was also used to benchmark the success of the proposed model. Various pre-processing stages such as standardization, sorting and resampling were applied to the data in the datasets to increase the performance of the model. OpenSMILE toolbox, which is frequently encountered in studies, was used to obtain features that provide meaningful information corresponding to the emotion in speech, and thousands of features were obtained from emobase2010 and emo_large feature sets. 8 different machine learning algorithms were used in the model to classify 4 different emotions for the Turkish dataset and 7 different emotions for the Emo-db dataset. The best recognition rates were achieved with 92.73% and 96.3%, respectively, for the Turkish dataset consisting of 1099 records and the Emo-db dataset consisting of 535 records, using the Emobase2010 as a feature set and Logistic Regression as a classifier.