Perbandingan Performa Antara Algoritma Naive Bayes Dan K-Nearest Neighbour Pada Klasifikasi Kanker Payudara

Annisa Nugraheni; Rima Dias Ramadhani; Amalia Beladinna Arifa; Agi Prasetiadi

doi:10.20895/dinda.v2i1.391

Annisa Nugraheni Institut Teknologi Telkom Purwokerto
Rima Dias Ramadhani Institut Teknologi Telkom Purwokerto
Amalia Beladinna Arifa Institut Teknologi Telkom Purwokerto
Agi Prasetiadi Institut Teknologi Telkom Purwokerto

DOI: https://doi.org/10.20895/dinda.v2i1.391

Keywords: Breast Cancer, Performance Test, Naïve Bayes, K-Nearest Neighbor, and Confusion Matrix.

Abstract

Breast cancer is the second most common cause of death from cancer after lung cancer is in the first place. Breast cancer occurs when cells in breast tissue begin to grow uncontrollably and can disrupt existing healthy tissue. Therefore, there is a need for a classification to distinguish breast cancer patients and healthy people. Based on previous research, the Naïve Bayes and K-Nearest Neighbor algorithms are considered capable of classifying breast cancer. In the research process using the breast cancer dataset from the Breast Cancer Coimbra dataset in 2018 UCI Machine Learning Repository with a total of 116 data, while for the calculation of the feasibility of the method using the Confusion Matrix (Accuracy, Precision, and Recall) and the ROC-AUC curve. The purpose of this study is to compare the performance of the Naïve Bayes and K-Nearest Neighbor algorithms. In testing using the Naïve Bayes algorithm and the K-Nearest Neighbor algorithm, there are several test scenarios, namely, data testing before and after normalization, model testing based on a comparison of training data and testing data, model testing based on K values in K-Nearest Neighbors, and model testing. based on the selection of the strongest attribute with the Pearson correlation test. The results of this study indicate that the Naïve Bayes algorithm has the highest average accuracy of 69.12%, healthy precision 64.90%, pain precision 83%, healthy recall 88%, sick recall 61.11% and AUC 0.82 which is included in the good classification category. Meanwhile, the highest average results of the K-Nearest Neighbor algorithm are 76.83% for accuracy, 76% healthy precision, 80.21% pain precision, 74.18% for healthy recall, 80.81% sick recall and 0.91 AUC which is included in the excellent classification category.