Journal of Dinda : Data Science, Information Technology, and Data Analytics

Systematic Literature Review : Population Density Mapping Using Data Mining

Naufal Maftuh — 2025-08-05

Mapping population density plays a crucial role in designing and developing urban policies. Traditional methods are often unable to capture complex spatial patterns, making the application of data mining techniques crucial. In this study, we conducted a Systematic Literature Review (SLR) of various data mining techniques, including K-Means, KDE, DBSCAN, Random Forest, linear regression, Cellular Automata, and Fuzzy C-Means. The findings of this study show that although K-Means proved to be effective, it is quite sensitive to the presence of outliers. On the other hand, DBSCAN successfully detects irregular distributions, while KDE is able to track trends despite being computationally intensive. Random Forest and linear regression can predict growth, but both require large datasets to provide accurate results. Meanwhile, Cellular Automata and Fuzzy C-Means offer flexibility, but also require comprehensive data. For future optimization, we recommend using AI-GIS hybrid models.

Implementation of Random Forest Algorithm with RFE and SMOTE on Cardiotocography Dataset

Muhammad Ahsani Nur Taqwimi — 2025-08-05

Having a healthy baby is a dream for mothers. However, the high rate of maternal and fetal mortality is still a serious problem, so more accurate fetal health monitoring is needed to prevent pregnancy complications. One of the devices used is Cardiotocography (CTG), which produces data on fetal conditions. The CTG dataset used in this study faces challenges in the form of class imbalance and a high number of features, which can reduce classification performance. This study aims to overcome these challenges by implementing the Random Forest algorithm combined with the Synthetic Minority Oversampling Technique (SMOTE) technique for class balancing and Recursive Feature Elimination (RFE) for feature selection. The dataset used is "Fetal Health Classification" from the Kaggle platform, which consists of 2,126 data with three classes: Normal, Suspect, and Pathological. The test results show that the RFE method is able to reduce the number of features from 22 to 18, while SMOTE increases the proportion of minority data. The model built produces good classification performance with an accuracy value of 95%, precision 93%, recall 89%, and F1-score 91%. The ROC-AUC value for the Normal class is 0.9881, Suspect 0.9789, and Pathological 0.9985. Although the model is able to predict the Normal and Pathological classes with high accuracy, the performance on the Suspect class still needs to be improved. Overall, the integration of Random Forest with SMOTE and RFE has proven effective in improving the accuracy of fetal health classification.

Evaluation of the Information System (Smart Deer System) at BKPSDMD of Bangka Belitung Islands Province

Aditya Ahmad Fauzi — 2025-08-05

Improving the quality of human resources (HR) is one of the important factors in the development of a region. To realize superior, competent, intelligent, and educated human resources, a fast, easy, and useful information system is needed in the management of further education in the BKPSDMD Prov. BaBel, therefore, introduced an information system called "SI Pelanduk Cerdik" which aims to make it easier for State Civil Apparatus (ASN) in the process of submitting competency development. Therefore, the purpose of this research is as feedback to correct the shortcomings of the "SI Pelanduk Cerdik" application. The qualitative description method is the method used in this study. The results of the study show that the use of "Si Pelanduk Cerdik" in BKPSDMD Prov. BaBel is very useful. This application makes it easier for ASN in the process of submitting competency development, with quick and easy access anytime and anywhere. The level of satisfaction of ASN with this application is also very high. Before this application, the process of applying for further education by ASN was manual and time-consuming. However, with the existence of the "SI Pelanduk Cerdik", the time needed for ASN to apply for competency development can be significantly reduced, in just about 30 minutes. The app lives up to the desired expectations

Unveiling Risk Patterns of Disability Progression A Clustering Based Transition Matrix Analysis Using Indonesian National Data

Ariyono Setiawan — 2025-08-05

This study investigates the progression of disability severity from "some difficulty" to "a lot of difficulty" using a transition matrix framework. It aims to identify risk patterns and classify severity clusters based on national survey data from Indonesia between 2010 and 2023. The study draws on the theory of functional limitation progression, which assumes that individuals with mild disabilities face varying probabilities of developing severe limitations depending on contextual and demographic factors. It also incorporates clustering theory to group similar progression behaviors. We utilize 20,604 data points from multiple disability types (cognitive, hearing, mobility, etc.). The transition rate is computed as the ratio of individuals with "a lot" difficulty to the total with "some" and "a lot" difficulty. Statistical analyses include descriptive summaries, Pearson correlation, and K-Means clustering via the FASTCLUS procedure. Heatmaps are generated to observe annual and typological patterns. The average transition rate is 66.77%, with a maximum of 99.6% in some subgroups. Three distinct severity clusters emerged, centered at 31.27%, 58.62%, and 82.20%. Transition rate negatively correlates with "some difficulty" prevalence (r = –0.45, p < .0001), indicating progressive concentration of severity in smaller populations. Heatmaps reveal consistent risk escalation over time, especially in cognitive and self-care disabilities. This study enables policy actors to stratify intervention priorities and monitor disability risk more accurately using dynamic, data-driven indicators. This is the first study in Indonesia to apply a large-scale transition matrix combined with clustering to map functional disability progression. It offers a novel quantitative method to uncover hidden severity patterns and informs future decision-support systems for inclusive health planning.

Enhancing Prediction Accuracy of the Happiness Index Using Multi-Estimator Stacking Regressor and Web Application Integration

Rofi Nafiis Zain — 2025-08-05

This study proposes a novel approach to enhance the prediction accuracy of the Happiness Index using a multi-estimator stacking regressor model and web application integration. By combining diverse regression models, such as decision tree, random forest, gradient boosting, LGBM, and support vector regressor (SVR), the proposed ensemble architecture achieved superior predictive performance with an score of 0.9814. A custom Happiness Score was formulated using weighted indicators derived from Pearson’s correlation analysis. Furthermore, SHapley Additive exPlanations (SHAP) were used to interpret model predictions, revealing the Human Development Index, Female Labour Force Rate, and Life Expectancy as key contributing features. The final model was deployed via a Python Flask-based web dashboard, enabling stakeholders to visualize happiness metrics interactively. The results suggest that stacking-based regression, when combined with interpretability techniques and real-time deployment, can offer a powerful solution for socioeconomic modeling and supporting urban policy.

The Utilizing GPT-4o Mini in Designing a WhatsApp Chatbot to Support the New Student Admission Process at Telkom University

Muhammad Lutfi Ruhallah — 2025-08-05

The rapid adoption of Artificial Intelligence (AI) in higher education has revolutionized student support services, yet delivering scalable, real-time assistance through familiar platforms remains a challenge. This study presents the design, implementation, and evaluation of a WhatsApp-based chatbot powered by a fine-tuned GPT-4o Mini model to streamline the new student admission process at Telkom University. A specialized dataset comprising frequently asked questions and admission-related dialogues was curated and preprocessed for model fine-tuning over 288 epochs. The chatbot system integrates the WhatsApp Business API, a Webhook interface, and the n8n automation platform, all deployed on a Virtual Private Server (VPS) to ensure reliability and low-latency communication. Functional and performance testing involved manual scenario-based assessments and quantitative measurements of response accuracy and latency. Results indicate that the chatbot consistently delivers contextually relevant answers—achieving an average accuracy above 85%—and reduces average response time to under 3 seconds. User interaction studies with prospective and current students revealed high satisfaction levels, highlighting improvements in accessibility and reduction of administrative workload. Challenges identified include occasional misinterpretation of complex queries and the need for enhanced scalability under peak loads. Future work will focus on periodic dataset updates, advanced prompt engineering, scalability stress testing, and the integration of multimodal features such as voice and image recognition. By aligning AI-driven conversational interfaces with users’ existing digital habits, this chatbot demonstrates a viable approach for enhancing admission services and operational efficiency in Indonesian higher education institutions.

Illegal Motorcycle Parking Detection in The Car Area

Nenen - Isnaeni — 2025-08-06

Illegal motorcycle parking in designated car areas at Politeknik Manufaktur Negeri Bangka Belitung (Polman Babel) disrupts campus parking management, reduces space availability, and poses safety risks. This paper proposes an automated detection system using computer vision and license plate recognition to identify motorcycles parked in car areas and notify their owners via WhatsApp and email alerts. The system integrates CCTV cameras with YOLOv11 for vehicle detection and EasyOCR for license plate recognition, coupled with a database for owner identification. Upon detection, owners receive immediate notifications to rectify the violation. Experiments in Polman Babel’s parking lot show a 94% accuracy in motorcycle detection and 88% in license plate recognition under diverse conditions. The system enhances parking enforcement efficiency, reduces manual intervention, and supports smart campus initiatives. This work offers a scalable, cost-effective solution adaptable to other institutions facing similar parking challenges.

AI-Based Hotel Front Office Training Application Game Concept for Hospitality Students

Tito Pandu Raharjo — 2025-08-13

The advancement of Artificial Intelligence (AI) technology present numerous opportunities in vocational education, particularly in the hospitality sector. Front office is a department studied by Hospitality Students, however many educational institutions face challenges in providing authentic front office training, whether due to limited access to actual hotel environments, budget constraints, or a lack of opportunities to interact directly with guests. This study proposes a conceptual design of utilizing AI as an interactive virtual guest in an educational game learning application for front office training. The concept also integrates speech recognition as the form of communication with the AI virtual guest to create a realistic and interactive learning experience. The model is designed to support independent and repetitive practice through various guest scenarios such as reservations, check-in/check-out services, and providing information. A qualitative descriptive method was employed through literature review and needs analysis. The findings recommend the use of AI-based simulation as a complement to live training and as a foundation for future development of hospitality education applications. Preliminary validation using the User Experience Questionnaire (UEQ) indicates that the concept received a score of 2.0 for attractiveness, 1.82 for pragmatic quality, and 1.72 for hedonic quality, which are in the category of Positive. These results suggest that the application concept could serve as an alternative solution for vocational learning by offering a simulated experience that closely resembles real-world front office operations.

Heart Failure Classification Using a Hybrid Model Based on SVM and Random Forest

Muh Sajid Abdilllah — 2025-08-15

This study discusses the development of a model to classify heart failure disease by combining two algorithms in the field of data mining: Support Vector Machine (SVM) and Random Forest (RF). The dataset used is the Heart Failure Prediction Dataset, consisting of 918 patient records containing medical information such as blood pressure, cholesterol levels, and heart rate. The research process began with data cleaning, normalization using MinMaxScaler, and data balancing with the SMOTE technique to equalize the number of cases between heart failure patients and non-patients. The data was then split into training and testing sets. Each model (SVM and RF) was tested individually and also combined into a hybrid model. Validation was performed using 5-Fold Cross Validation to ensure consistent results. The results show that SVM performed better in terms of precision for detecting heart failure after applying SMOTE, while RF remained stable even without data balancing. The hybrid model combining both algorithms achieved the best performance, with an accuracy of 91.20%, precision of 90.85%, recall of 92.44%, and an AUC score of 0.961. These results indicate that the hybrid model can detect heart failure more accurately and in a more balanced manner. With its high and consistent performance, this model is suitable for use as a decision support system in the medical field, particularly for early detection of heart failure.

Comparison of Accuracy of Linear Regression and Random Forest Models in Predicting Bitcoin Prices

Ahmad Habib Awwaluddin — 2025-08-15

Abstract

Bitcoin is a digital asset that has experienced significant growth in value since its launch in 2009. However, its high price volatility makes predicting Bitcoin's price movements a challenge for investors and financial analysts. Therefore, a data-driven approach capable of capturing patterns in historical Bitcoin price data is needed to support more accurate investment decision-making. This study aims to evaluate and compare the performance of two prediction algorithms, namely Linear Regression and Random Forest, in predicting Bitcoin prices based on daily historical data from 2018 to 2025. The dataset was obtained from the Kaggle platform and processed through pre-processing, predictive feature formation, and data normalization. Two validation schemes were used: a 70:30 data split and cross-validation using K-Fold Cross Validation (10-fold). Model performance evaluation was carried out using three main metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). The results show that the Linear Regression model produces better performance than Random Forest, both on split data and cross-validation, even though Random Forest has been optimized using GridSearchCV. The lowest RMSE value was obtained from Linear Regression in the K-Fold scheme, at 1314.47. These findings indicate that a simple model such as Linear Regression can still be effective in predicting Bitcoin prices if the data is properly processed. This research is expected to serve as a reference for developers of digital asset price prediction systems and stakeholders in data-driven decision-making..

Keywords: Bitcoin, Prediksi Harga, Regresi Linier, Random Forest, Evaluasi Model, Machine Learning, K-Fold Cross Validation

Classification of Indonesian Disasters with Decision Trees Based on Spatial and Text Data

Ridwan Ramadhan — 2025-08-19

Indonesia is one of the countries with a very high level of natural disaster vulnerability. The types of disasters that frequently occur include earthquakes, floods, landslides, volcanic eruptions, and others. This is because Indonesia is located at a geographical position where three world tectonic plates meet and has tropical climate conditions that make it prone to disasters. Therefore, Indonesia needs a system that can classify disaster types automatically and accurately to help the decision-making process quickly and accurately. This research aims to develop a natural disaster classification model based on information such as location (regency and province), time of occurrence (date), and causes that lead to disasters. The method used for classification in this research is the Decision Tree algorithm, because this algorithm can handle both numerical and categorical data and has high interpretability. Classification processing is also performed using textual cause data using Term Frequency-Inverse Document Frequency (TF-IDF) technique to convert text format into numerical form that can be processed by machine learning algorithms. The dataset obtained from the National Disaster Management Agency (BNPB) is open source. Test results show that the trained Decision Tree model can classify disaster types with an accuracy of 87%. This model also shows good precision, recall, and f1-score values in each disaster category. It is hoped that the results of this research can help in developing historical data-based disaster detection systems and assist government and society in responding to disasters more effectively and efficiently.

Analysis of Public Sentiment Toward the Increase in VAT Rates Using the SVM Algorithm

Elsa Azila Rahman — 2025-08-19

The Policy Of Increasing the Value Added Tax (VAT), particularly on luxury goods as stipulated in Minister of Finance Regulation (PMK) Number 131 of 2024, has sparked various public responses, many of which are captured through social media. In today's digital era, social media has become a primary platform for the public to express their opinions openly, including on government policies. This study aims to analyze public sentiment toward the VAT policy in order to provide insights for more responsive policymaking. A total of 4,000 comments were collected from the X platform using web crawling techniques, followed by preprocessing, resulting in 3,553 clean comments. Sentiment labeling was conducted automatically using a lexicon-based approach, which revealed that the majority of comments expressed positive sentiment (73.3%), while the remainder were negative (26.7%). Sentiment classification was performed using the Support Vector Machine (SVM) algorithm with a polynomial kernel and an 80:20 training-testing data split. Evaluation results showed that the model achieved an accuracy of 76.65%. The SVM model demonstrated excellent performance in detecting positive sentiment (precision 76.18%, recall 100%, and F1-score 86.51%), but was less effective in identifying negative sentiment (precision 100%, recall 7.78%, and F1-score 14.44%). These findings indicate that while the model is effective in recognizing positive opinions, further optimization is needed to improve performance in detecting negative sentiments.

ROC and COPRAS Methods in New Student Admissions Application (PPDB) MAN HUMBANG HASUNDUTAN

Anri Hafiz Tua — 2025-08-19

The development of information and communication technology, especially in the education sector, has opened up opportunities to increase efficiency and transparency in various processes, including New Student Admissions (PPDB). MAN Humbang Hasundutan faces challenges in manually screening hundreds of prospective students every year, which often introduces bias and inaccuracies in the selection process. Therefore, this research aims to develop a web-based PPDB application with the integration of the Rank Order Centroid (ROC) method for weighting criteria and Complex Proportional Assessment (COPRAS) for ranking. The ROC method assigns weights to criteria based on their level of importance, while the COPRAS method determines the ranking by taking into account the level of significance and utility of alternatives. The implementation of this application enables the processing of prospective student data quickly and objectively, as well as increasing the fairness and transparency of the selection process. Based on the results of previous research, the COPRAS method with ROC weighting has proven to be effective in assisting decision making in various fields. The proposed PPDB application is expected to simplify the selection process at MAN Humbang Hasundutan while increasing the credibility of the educational institution.

Academic Monitoring Information System Using Task Centered System Design Method Based On Web

Nur Haliza — 2025-08-31

Manual academic monitoring systems at SMA Swasta Teladan Cinta Damai present several challenges, such as delayed information delivery, data entry errors, and lack of transparency in academic records. This study aims to design and develop a web-based Academic Monitoring Information System using the Task Centered System Design (TCSD) approach, which focuses on the actual needs and tasks of users such as teachers, students, and parents. The system is developed using PHP as the programming language and MySQL as the database, and follows the Waterfall development model, which includes stages such as requirements analysis, system design, implementation, and testing. The results show that the system can present academic information in real time, improve monitoring efficiency, and facilitate access to information for all stakeholders. With its intuitive interface and task-oriented features, this system provides a digital solution that enhances the quality of academic management in the school environment.