Implementation of Random Forest Algorithm with RFE and SMOTE on Cardiotocography Dataset

  • Muhammad Ahsani Nur Taqwimi Universitas Islam Nahdlatul Ulama Jepara
  • Buang Budi Wahono Nahdlatul Ulama Islamic University Jepara
  • Harminto Mulyo Nahdlatul Ulama Islamic University Jepara
Keywords: Cardiotocography, Random Forest, SMOTE, RFE, Fetal Health

Abstract

Having a healthy baby is a dream for mothers. However, the high rate of maternal and fetal mortality is still a serious problem, so more accurate fetal health monitoring is needed to prevent pregnancy complications. One of the devices used is Cardiotocography (CTG), which produces data on fetal conditions. The CTG dataset used in this study faces challenges in the form of class imbalance and a high number of features, which can reduce classification performance. This study aims to overcome these challenges by implementing the Random Forest algorithm combined with the Synthetic Minority Oversampling Technique (SMOTE) technique for class balancing and Recursive Feature Elimination (RFE) for feature selection. The dataset used is "Fetal Health Classification" from the Kaggle platform, which consists of 2,126 data with three classes: Normal, Suspect, and Pathological. The test results show that the RFE method is able to reduce the number of features from 22 to 18, while SMOTE increases the proportion of minority data. The model built produces good classification performance with an accuracy value of 95%, precision 93%, recall 89%, and F1-score 91%. The ROC-AUC value for the Normal class is 0.9881, Suspect 0.9789, and Pathological 0.9985. Although the model is able to predict the Normal and Pathological classes with high accuracy, the performance on the Suspect class still needs to be improved. Overall, the integration of Random Forest with SMOTE and RFE has proven effective in improving the accuracy of fetal health classification.

Published
2025-06-28