JASIC Volume. 4, Issue 2 (2023)


Taiwo Kolajo, Emeka Ogbuju,Joy Ojochide Bello $ Francisca Oladipo


Speech analysis Speech emotion recognition Machine learning Deep learning RAVDESS

Download Full-text (PDF)

... Download File [ 0.71 MB ]
Go Back

Speech Emotion Recognition Model Using Deep Learning

Abstract: Speech has long been recognized as the main form of communication between people and computers. Technology made it possible for humans and computers to interact through the development of human-computer interfaces. Although speech emotion recognition systems have advanced quickly in recent years, many difficulties have also arisen during this development, such as the inability to recognize emotions that lead to depression and mood swings, which can be used by therapists to track their patients' moods. It is necessary to create a model that detects the many emotions that contribute to depression to improve doctor-patient relationships and increase the effectiveness of spoken emotion recognition models. In this paper, over 2000 audio files were compiled. We curated a local dataset that accounts for 60% of the total dataset acquired, 40% of the dataset used was obtained from RAVDESS. To extract the proper vocal features, we employed the Mel-Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), and Root Mean Square (RMS). The Tensorflow CONV1D with Relu activation function and several sequential layers was used to build the model. The batch size was 64 and the epoch size was 50. Seven emotional states, including anger, disgust, sadness, happiness, surprise, fear, and neutrality were extracted. The accuracy of the confusion matrix, which served as the performance metrics, is 96%.