Implementation of Random Forest Classification and Support Vector Machine Algorithms for Phishing Link Detection
Main Article Content
Abstract
This research compares two machine learning methods, Support Vector Machine (SVM) and Random Forest Classification (RFC), in detecting phishing links. Phishing is an attempt to obtain sensitive information by masquerading as a trustworthy entity in electronic communications. Detecting phishing links is crucial in protecting users from this cyber threat. In this study, we used a dataset consisting of features extracted from URLs, such as URL length, the use of special characters, and domain information. The dataset was then split into training and testing data with an 80:20 ratio. We trained the SVM and RFC models using the training data and evaluated their performance based on the testing data. The results show that both methods have their respective advantages. SVM, known for handling high-dimensional data well and providing optimal solutions for classification problems, demonstrated a high accuracy rate in detecting phishing links. However, SVM requires a longer training time compared to RFC. On the other hand, RFC, an ensemble method known for its resilience to overfitting, showed performance nearly comparable to SVM in terms of accuracy but with faster training time and better interpretability. This comparison indicates that RFC is more suitable for scenarios requiring quick results and easy interpretation, while SVM is more appropriate for situations where accuracy is critical, and computational resources are sufficient. In conclusion, the choice of phishing link detection method should be tailored to specific needs and available resource constraints. This research provides valuable insights for developing more effective, efficient, and relevant phishing detection systems.
Article Details
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright Notice
Authors who publish with Journal of Informatics, Information System, Software Engineering and Applications (INISTA) agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
References
[2] Z. Alkhalil, C. Hewage, L. Nawaf, and I. Khan, “Phishing Attacks: A Recent Comprehensive Study and a New Anatomy,” Front. Comput. Sci., vol. 3, no. March, pp. 1–23, 2021, doi: 10.3389/fcomp.2021.563060.
[3] N. Mtukushe, A. K. Onaolapo, A. Aluko, and D. G. Dorrell, “Review of Cyberattack Implementation, Detection, and Mitigation Methods in Cyber-Physical Systems,” Energies, vol. 16, no. 13, pp. 1–25, 2023, doi: 10.3390/en16135206.
[4] R. Zieni, L. Massari, and M. C. Calzarossa, “Phishing or Not Phishing? A Survey on the Detection of Phishing Websites,” IEEE Access, vol. 11, no. February, pp. 18499–18519, 2023, doi: 10.1109/ACCESS.2023.3247135.
[5] B. Naqvi, K. Perova, A. Farooq, I. Makhdoom, S. Oyedeji, and J. Porras, “Mitigation strategies against the phishing attacks: A systematic literature review,” Comput. Secur., vol. 132, p. 103387, 2023, doi: 10.1016/j.cose.2023.103387.
[6] M. F. Ansari, P. K. Sharma, and B. Dash, “Prevention of Phishing Attacks Using AI-Based Cybersecurity Awareness Training,” Int. J. Smart Sens. Adhoc Network., no. July, pp. 61–72, 2022, doi: 10.47893/ijssan.2022.1221.
[7] R. Alabdan, “Phishing attacks survey: Types, vectors, and technical approaches,” Futur. Internet, vol. 12, no. 10, pp. 1–39, 2020, doi: 10.3390/fi12100168.
[8] S. Hawa Apandi, J. Sallim, and R. Mohd Sidek, “Types of anti-phishing solutions for phishing attack,” IOP Conf. Ser. Mater. Sci. Eng., vol. 769, no. 1, 2020, doi: 10.1088/1757-899X/769/1/012072.
[9] R. Alazaidah et al., “Website Phishing Detection Using Machine Learning Techniques,” J. Stat. Appl. Probab., vol. 13, no. 1, pp. 119–129, 2024, doi: 10.18576/jsap/130108.
[10] C. Opara, Y. Chen, and B. Wei, “Look before you leap: Detecting phishing web pages by exploiting raw URL and HTML characteristics,” Expert Syst. Appl., vol. 236, no. August 2023, p. 121183, 2024, doi: 10.1016/j.eswa.2023.121183.
[11] M. S. Akhtar and T. Feng, “Comparison of Classification Model for the Detection of Cyber-attack using Ensemble Learning Models,” EAI Endorsed Trans. Scalable Inf. Syst., vol. 9, no. 5, pp. 1–11, 2022, doi: 10.4108/eai.1-2-2022.173293.
[12] T. O. Ojewumi, G. O. Ogunleye, B. O. Oguntunde, O. Folorunsho, S. G. Fashoto, and N. Ogbu, “Performance evaluation of machine learning tools for detection of phishing attacks on web pages,” Sci. African, vol. 16, p. e01165, 2022, doi: 10.1016/j.sciaf.2022.e01165.
[13] R. Yang, K. Zheng, B. Wu, C. Wu, and X. Wang, “Phishing website detection based on deep convolutional neural network and random forest ensemble learning,” Sensors, vol. 21, no. 24, pp. 1–18, 2021, doi: 10.3390/s21248281.
[14] A. Aljofey, Q. Jiang, Q. Qu, M. Huang, and J. P. Niyigena, “An effective phishing detection model based on character level convolutional neural network from URL,” Electron., vol. 9, no. 9, pp. 1–24, 2020, doi: 10.3390/electronics9091514.
[15] S. Alnemari and M. Alshammari, “Detecting Phishing Domains Using Machine Learning,” Appl. Sci., vol. 13, no. 8, 2023, doi: 10.3390/app13084649.
[16] A. Ferdita Nugraha, R. F. A. Aziza, and Y. Pristyanto, “Penerapan metode Stacking dan Random Forest untuk Meningkatkan Kinerja Klasifikasi pada Proses Deteksi Web Phishing,” J. Infomedia, vol. 7, no. 1, p. 39, 2022, doi: 10.30811/jim.v7i1.2959.
[17] S. Wajiha Zahra, S. Riaz, and A. Arshad, “Phishing Attack, Its Detections and Prevention Techniques,” Int. J. Wirel. Inf. Networks, vol. 1, no. 2, pp. 13–25, 2023, doi: 10.37591/ijwsn.
[18] Wijaya, Deny Setiawan; Widyaningrum, Destriana. Komparasi Metode Algoritma Klasifikasi pada Aanalisis Sentimen Komentar Cyberbullying di Instagram. Jurnal Tekinkom (Teknik Informasi dan Komputer), 2024, 7.1.
[19] E. F. Morales and H. J. Escalante, “A brief introduction to supervised, unsupervised, and reinforcement learning,” Biosignal Process. Classif. Using Comput. Learn. Intell., pp. 111–129, 2022, doi: 10.1016/b978-0-12-820125-1.00017-8.
[20] P. C. Sen, M. Hajra, and M. Ghosh, “Supervised Classification Algorithms in Machine Learning: A Survey and Review,” Adv. Intell. Syst. Comput., vol. 937, pp. 99–111, 2019, doi: 10.1007/978-981-13-7403-6_11.
[21] D. R. Hermawan, M. Fahrio Ghanial Fatihah, L. Kurniawati, and A. Helen, “Comparative Study of J48 Decision Tree Classification Algorithm, Random Tree, and Random Forest on In-Vehicle CouponRecommendation Data,” 2021 Int. Conf. Artif. Intell. Big Data Anal., 2021, doi: 10.1109/icaibda53487.2021.9689701.
[22] Y. Elgimati, “Weighted Bagging in Decision Trees: Data Mining,” JINAV J. Inf. Vis., vol. 1, no. 1, pp. 1–14, 2020, doi: 10.35877/454ri.jinav149.
[23] J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, no. 1, pp. 189–215, 2020, doi: 10.1016/j.neucom.2019.10.118.
[24] D. A. Pisner and D. M. Schnyer, “Support vector machine,” Mach. Learn., pp. 101–121, 2020, doi: 10.1016/b978-0-12-815739-8.00006-7.
[25] Dalimunthe, Muhammad Variansjah. Sentimen Analisis Mengenai Polusi Udara Menggunakan Algoritma Support Vector Machine dan Random Forest. 2024. PhD Thesis. Universitas Mercu Buana Jakarta.
[26] S. Islam, Data Classification And Incremental Clustering In Data Mining And Machine Learning. Springer Nature PP - S.L., 2022.
[27] Thenata, Angelina Pramana. Text Mining Literature Review on Indonesian Social Media. JEPIN (Jurnal Edukasi dan Penelitian Informatika), 2021, 7.2: 226-232.
[28] Tampinongkol F, Herdiyeni Y, Herliyana E. Feature Extraction of Jabon (Anthocephalus sp) Leaf Disease using Discrete Wavelet Transform. 2020. TELKOMNIKA (Telecommunication Computing Electronics and Control) 18 (2), 740-751.
[29] Herdian, C., Kamila, A., Tampinongkol, F. F., Kembau, A. S., & Budidarma, I. G. A. M.. “One-hot encoding feature engineering untuk label-based data studi kasus prediksi harga mobil bekas”. 2024. Informasi Interaktif : Jurnal Informatika Dan Teknologi Informasi, 9(1), 10–16. https://doi.org/10.37159/jii.v9i1.41
[30] N. F. Abedin, R. Bawm, T. Sarwar, M. Saifuddin, M. A. Rahman, and S. Hossain, “Phishing Attack Detection using Machine Learning Classification Techniques,” IEEE Xplore. pp. 1125–1130, 2020. doi: 10.1109/ICISS49785.2020.9315895.