Random Forest Machine Learning for Spam Email Classification

Rizky Ageng; Rafdhani Faisal; Solahuddin Ihsan

doi:10.20895/dinda.v4i1.1363

Rizky Ageng Institut Teknologi Telkom Purwokerto
Rafdhani Faisal Institut Teknologi Telkom Purwokerto
Solahuddin Ihsan Institut Teknologi Telkom Purwokerto

DOI: https://doi.org/10.20895/dinda.v4i1.1363

Keywords: Spam Email. Random Forest, Confusion Matrix, ROC-AUC, Randomized Search CV

Abstract

This research discusses the crucial role of email as a main element in digital communication, facilitating information transfer and serving as an advertising platform. However, the problem of email spam, which involves sending unsolicited commercial messages, has had negative impacts such as consuming large amounts of resources and disrupting user experience. With its affordable cost and ease of sending messages to thousands of recipients, email spam includes product promotions, pornographic material, viruses and irrelevant content. The impact includes loss of time and damage to the user's computer resources. To address this problem, email services provide advanced spam filters that use email content analysis and machine learning techniques. This research focuses on the use of the Random Forest Classification algorithm as a basis for filtering spam emails. Although Random Forest is known to have strong classification capabilities, the risk of overfitting is a challenge. Therefore, this study adopts the Randomized Search CV method to identify the best parameter combination, ensuring the reliability of the model in dealing with the complexity of diverse email datasets. With this approach, this research contributes to the development of effective solutions to reduce the impact of email spam in digital communications.