JASIC Volume. 5, Issue 1 (2024)

Contributor(s)

Akampurira Paul, Semalulu Paul, Elly Gamukama & Kareyo Margaret
 

Keywords

High-dimensional datasets early diagnosis breast cancer dimensionality reduction artificial intelligence and machine learning.
 

Download Full-text (PDF)

... Download File [ 0.74 MB ]
 
Go Back

Exploring Dimensionality Reduction Techniques for Improved Breast Cancer Diagnosis

Abstract: Breast cancer diagnosis is a critical area in medical research, where the challenge lies not only in accurate identification but also in managing the inherent complexity of high-dimensional datasets. This paper navigates this challenge by exploring dimensionality reduction techniques to enhance diagnostic accuracy. The primary objective of this research was to employ dimensionality reduction methods to refine breast cancer diagnosis, with a focus on improving accuracy and interpretability. The study investigates the impact of preprocessing techniques on a high-dimensional dataset, aiming to uncover meaningful patterns for effective diagnostic models. Starting with a dataset including 569 observations and 30 attributes, careful examination reveals imbalances in the dataset (63% benign, 37% malignant). To deal with multicollinearity, we use the coefficients of Pearson correlation to find and eliminate highly correlated features. Subsequent data transformation, utilizing min-max normalization, ensures uniform weighting. Principal Component Analysis (PCA) is then leveraged for comprehensive dimensionality reduction. Visualizations through scree plots and bi-plots underscore the efficacy of early principal components in distinguishing benign from malignant cases. Our results demonstrate a notable 24% reduction in data dimensionality, affirming the process's efficiency. This abstract beckon the exploration of detailed findings, emphasizing dimensionality reduction's pivotal role in refining breast cancer diagnosis for more accurate, efficient, and interpretable models.