Data Mining Analysis of K-means Algorithm and Decision Tree for Early Detection of Students at Risk of Dropping Out
Main Article Content
Abstract
Dropout occurs in higher education, where students are unable to complete their studies within a specified timeframe. It has become a significant concern in education due to its substantial impact on individuals, institutions, and society. This study aims to develop a model for predicting the early potential for students' dropout using the K-Means Algorithm and decision trees. The research method consists of a Dataset, Data Preprocessing, K-means implementation, labeling student data, and Decision Tree implementation. This study resulted in 4 clusters. The students in Cluster 1 have an excellent average GPA, a substantial number of credits, and are very active. The students in Cluster 2 have a lower average GPA and are less active than in Cluster 1. The students in Cluster 3 show a relatively good average GPA, which is lower than in Clusters 1 and 2. The number of active students indicates that students in this cluster are much less active or at risk of D.O. than those in clusters 1 and 2. Cluster 4 indicates that the average GPA of students is very low, often close to zero, and they are generally inactive in academic activities. Thus, they are significantly at risk of D.O. at Universitas Muhammadiyah Enrekang. This research provides significant results, both in terms of accuracy and data interpretation. The resulting insights enable universities to make more strategic and targeted decisions, thereby reducing the risk of university dropout rates, increasing resource efficiency, and supporting the overall educational success of students. The accuracy of the resulting model is 98.52% which indicates that the model has excellent performance in classifying students at risk of D.O.
Article Details

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
References
[2] Casanova JR, Cervero A, Núñez JC, Almeida LS, Bernardo A. Factors that determine the persistence and dropout of university students. Psicothema. 2018;30(4):408–14.
[3] Agrusti F, Bonavolontà G, Mezzini M. University dropout prediction through educational data mining techniques: A systematic review. J E-Learning Knowl Soc. 2019;15(3):161–82.
[4] Gitto L, Minervini LF, Monaco L. University dropouts in Italy: Are supply side characteristics part of the problem? Econ Anal Policy. 2016;49(February):108–16.
[5] Panggabean DD, Motlan, Harahap MH, Irfandi, Sirait AP. Development of Accelerating Strategy on Improvement of Study Program Accreditation in Accordance With 9 Criterias of Ban-Pt in the State University of Medan. Adv Soc Sci Res J. 2020;7(1):483–91.
[6] Utari M, Warsito B, Kusumaningrum R. Implementation of Data Mining for Dropout Prediction using Random Forest Method. 2020 8th Int Conf Inf Commun Technol ICoICT 2020. 2020;
[7] Singh HP, Alhamad IA. A Data Mining Approach to Predict Key Factors Impacting University Students Dropout in a Least Developed Economy. Arch Bus Res. 2022;10(12):48–59.
[8] Roiger RJ. Data Mining: A Tutorial-Based Primer, Second Edition. Data Mining: A Tutorial-Based Primer, Second Edition. 2017. 1–487 p.
[9] Darwis M, Hasibuan LH, Firmansyah M, Ahady N. Implementation of K-Means C lustering A lgorithm in M apping the G roups of G raduated or D ropped-out S tudents in the Management Department of the National University. 04(01):1–9.
[10] Guntara M, Suprawoto T. Drop Out Student Clusterization Using the k-Medoids Algorithm. Jl Raya Janti Karang Jambe. 2022;(2):61–6.
[11] Harwati, Virdyanawaty RI, Mansur A. Drop out Estimation Students based on the Study Period: Comparisonbetween Naïve Bayes and Support Vector Machines Algorithm Methods. IOP Conf Ser Mater Sci Eng. 2016;105(1).
[12] Pérez B, Castellanos C, Correal D. Predicting student dropout rates using data mining techniques: A case study. Commun Comput Inf Sci. 2018;833:111–25.
[13] Dewi Purba S, Harahap L, Panggabean JFR. Prediction Of Students Drop Out With Support Vector Machine Algorithm. J Mantik. 2021;6(1):582–6.
[14] Ogwoka TM, Cheruiyot W, Okeyo G. A Model for Predicting Students’ Academic Performance using a Hybrid of K-means and Decision tree Algorithms. Int J Comput Appl Technol Res. 2015;4(9):693–7.
[15] Sivakumar S, Venkataraman S, Selvaraj R. Predictive modeling of student dropout indicators in educational data mining using improved decision tree. Indian J Sci Technol. 2016;9(4):1–5.
[16] Safitri SN, Haryono Setiadi, Suryani E. Educational Data Mining Using Cluster Analysis Methods and Decision Trees based on Log Mining. J RESTI (Rekayasa Sist dan Teknol Informasi). 2022;6(3):448–56.
[17] Akbar I, Hazriani H, Arda AL, Samad IS. Analysis of Student Behavior Based on the History of Learning Activities in the Learning Management System Using the Pearson Correlation Method. Edumaspul J Pendidik. 2024;8(1):464–70.
[18] Iddrus I, Sari DW. Penerapan Data Mining Menggunakan Algoritma Decision Tree C4.5 Untuk Memprediksi Mahasiswa Drop Out Di Universitas Wiraraja. J Adv Res Inform. 2023;1(02):1–7.
[19] Ramadhani A, Fazarany Noor R, Vernanda D, Herdiawan T. Klasifikasi Mahasiswa Berpotensi Drop Out Menggunakan Algoritma C4.5 di Politeknik Negeri Subang. 18(1).
[20] Sugianto CA, Rahayu AH, Gusman A. Algoritma K-Means Untuk Pengelompokkan Penyakit Pasien Pada Puskesmas Cigugur Tengah.
[21] Selvi C, Sembiring D, Hanum L, Parsaoran Tamba S. PENERAPAN DATA MINING MENGGUNAKAN ALGORITMA K-MEANS UNTUK MENENTUKAN JUDUL SKRIPSI DAN JURNAL PENELITIAN (STUDI KASUS FTIK UNPRI). J Sist Inf dan Ilmu Komput Prima). 2022;5(2).
[22] Abdul Majid MB, Cani YM, Enri U. Penerapan Algoritma K-Means dan Decision Tree Dalam Analisis Prestasi Siswa Sekolah Menengah Kejuruan. J Sist Komput dan Inform. 2022 Dec 31;4(2):355.
[23] Rifa’i H, Ryan Hamonangan, Dian Ade Kurnia, Kaslani, Mulyawan. Implementasi Algoritma Decision Tree Dalam Klasifikasi Kompetensi Siswa. KOPERTIP J Ilm Manaj Inform dan Komput. 2022;6(1):15–20.
[24] Keputusan Dirjen Penguatan Riset dan Pengembangan Ristek Dikti S, Nurkholis A, Susanto T. Terakreditasi SINTA Peringkat 2 Algoritme Spatial Decision Tree untuk Evaluasi Kesesuaian Lahan Padi Sawah Irigasi. Masa Berlaku Mulai. 2017;1(3):978–87.
[25] Orpa EPK, Ripanti EF, Tursina T. Model Prediksi Awal Masa Studi Mahasiswa Menggunakan Algoritma Decision Tree C4.5. J Sist dan Teknol Inf. 2019;7(4):272.
[26] Lailatul Ramadhania H, Zakaria L, Nusyirwan dan. Aplikasi Metode Sillhouette Coefficient, Metode Elbow dan Metode Gap Staticstic dalam Menentukan K Optimal pada Analisis K-Medoids. Vol. 04, Jurnal Siger Matematika. 2023.
[27] Nurani S, Syahra Y, Calam A. Penerapan Data Mining Dalam Clustering Pencapaian Target Penjualan Menggunakan Algoritma K-Means. J Sist Inf Triguna Dharma (JURSI TGD). 2023;2(3):355.
[28] Abdullah A, Utami PY. 1816-4463-1-Pb. :540–54.
[29] Utari DT. Analisis Karakteristik Wilayah Transmisi Covid-19 dengan Menggunakan Metode K-Means Clustering. J Media Tek dan Sist Ind. 2021;5(1):25.
[30] Sulistiyawan E, Hapsery A, Arifahanum LJA. PERBANDINGAN METODE OPTIMASI UNTUK PENGELOMPOKAN PROVINSI BERDASARKAN SEKTOR PERIKANAN DI INDONESIA (Studi Kasus Dinas Kelautan dan Perikanan Indonesia). J Gaussian. 2021;10(1):76–84.
[31] Hartanti NT. Metode Elbow dan K-Means Guna Mengukur Kesiapan Siswa SMK Dalam Ujian Nasional. J Nas Teknol dan Sist Inf. 2020;6(2):82–9.
[32] Khairunnas K, Yuniarno EM, Zaini A. Pembuatan Modul Deteksi Objek Manusia Menggunakan Metode YOLO untuk Mobile Robot. J Tek ITS. 2021;10(1).
[33] Raka Sujono M, Bahtiar A, Irawan B. Analisis Model Machine Learning Untuk Jenis Aspal Di Jawa Barat Menggunakan Algoritma Decision Tree Dan Random Forest. JATI (Jurnal Mhs Tek Inform. 2024;7(6):3886–91.
[34] Raharja AR, Jayadi, Pramudianto A, Muchsam Y. Penerapan Algoritma Decision Tree dalam Klasifikasi Data “Framingham” Untuk Menunjukkan Risiko Seseorang Terkena Penyakit Jantung dalam 10 Tahun Mendatang. Technol J. 2024;1(1).