An improved multi-labelled LSTM toxic comment classification
Abstract:The
origin of text classification was far back to the early '60s. Text
classification classified text into different predefined classifications. One
of the techniques used for text classification long short term memory, which is
an artificial recurrent neural network architecture. Today, all around the
world people are expressing themselves with their opinions and also discuss
among others via the media. In such a setup, it is quite observable that
discussions may arise due to differences in opinion. These discussions might
take a dirty side and which may further result in combats over the social media
platforms and may lead to offensive language termed as toxic comments. To
identify online hate speech, a large number of scientific studies have been
devoted to using Natural Language Processing in combination with Machine
Learning and Deep Learning methods. Among the challenges of toxic comment,
classifiers are the Out-of-vocabulary words problem, which is the occurrence of
words that are not present in the training data. Long-Range Dependencies are
also a challenge to toxic comment classification. Which is a situation whereby
the toxicity of comments often depends on expressions made in the early parts
of the comment. This is especially problematic for longer comments. Another
challenge is the low accuracy of comment classification techniques. Epoch was
used in improving the accuracy of long short term memory. Epoch tends to
improve the accuracy of the classifier since it positively affects the speed
and quality of the learning process. We have an
improvement of 0.4068
in precision, 0.2871 in a recall, 0.2293 in F1, and 0.4291 inaccuracy.