An improved multi-labelled LSTM toxic comment classification

Muhammad Abubakar; Aminu Tukur; Usman Bukar Usman

An improved multi-labelled LSTM toxic comment classification

Abstract:

The origin of text classification was far back to the early '60s. Text classification classified text into different predefined classifications. One of the techniques used for text classification long short term memory, which is an artificial recurrent neural network architecture. Today, all around the world people are expressing themselves with their opinions and also discuss among others via the media. In such a setup, it is quite observable that discussions may arise due to differences in opinion. These discussions might take a dirty side and which may further result in combats over the social media platforms and may lead to offensive language termed as toxic comments. To identify online hate speech, a large number of scientific studies have been devoted to using Natural Language Processing in combination with Machine Learning and Deep Learning methods. Among the challenges of toxic comment, classifiers are the Out-of-vocabulary words problem, which is the occurrence of words that are not present in the training data. Long-Range Dependencies are also a challenge to toxic comment classification. Which is a situation whereby the toxicity of comments often depends on expressions made in the early parts of the comment. This is especially problematic for longer comments. Another challenge is the low accuracy of comment classification techniques. Epoch was used in improving the accuracy of long short term memory. Epoch tends to improve the accuracy of the classifier since it positively affects the speed and quality of the learning process. We have an improvement of 0.4068 in precision, 0.2871 in a recall, 0.2293 in F1, and 0.4291 inaccuracy.

JASIC Volume. 1, Issue 2 (2020)

Contributor(s)

Keywords

Download Full-text (PDF)

An improved multi-labelled LSTM toxic comment classification