JASIC Volume. 1, Issue 1 (2020)

Contributor(s)

Lawal Abbas Maryam & Ajiboye Adeleke Raheem
 

Keywords

Dimensionality reduction clustering techniques feature selection traffic datasets algorithm.
 

Download Full-text (PDF)

... Download File [ 0.69 MB ]
 
Go Back

The Effects of Dimensionality Reduction in the Classification of Network Traffic Datasets Via Clustering

Abstract:

Unsupervised learning has emerged as an alternative meta-learning approach that is capable of accurately classifying the massive amount of data generated by modern-day applications. It is useful for active monitoring and provision of improved service quality by the network administrators. Extracting the optimal and most essential features with high discriminative power remains one of the critical challenges in unsupervised learning due to the absence of the class labels. The main objective of this research is to determine the effects of Dimensionality Reduction in Feature Selection via the clustering of internet traffic data sets. To achieve this overall goal, internet traffic data sets were retrieved, analyzed and clustered into application classes. A reduced form of these datasets was obtained and clustered using feature selection techniques. The results of the original and reduced data sets were compared and evaluated. The effects of two feature reduction techniques; Correlation-based Feature Selection (CFS) and Information Gain Attribute Evaluator were examined in K-means, Expectation Maximization and the Farthest-first clustering algorithms. The effectiveness of the candidate clustering algorithms was determined and the evaluation was based on overall accuracy, precision, recall, and Receiver Operating Characteristic (ROC) area metrics. Results revealed that both CFS and information gain significantly increase the performance of the three algorithms.