Analisis Pengaruh Ketidakseimbangan Data terhadap Kinerja Model Klasifikasi Penyakit Jantung
Abstract
Heart disease remains one of the leading causes of mortality, highlighting the importance of data-driven predictive models for risk analysis. However, medical datasets commonly suffer from class imbalance and weak predictive signals, which can limit model performance. This study aims to evaluate the performance of a Logistic Regression model for heart attack prediction by comparing imbalanced and balanced datasets using different train–test split ratios of 80:20 and 90:10. Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrix. The experimental results show that models trained on imbalanced data achieved higher accuracy but exhibited biased performance, particularly low recall for the minority class. After applying data balancing techniques, accuracy decreased; however, the model demonstrated more balanced performance with improved recall and F1-score for the minority class. These findings indicate that accuracy alone does not adequately represent model performance on imbalanced medical datasets. Moreover, the results suggest that the relationship between the medical attributes and heart attack occurrence in the dataset is relatively weak, limiting the model’s ability to establish clear decision boundaries. Therefore, appropriate evaluation metrics and representative clinical datasets are essential for developing reliable heart disease risk prediction models.
Downloads
References
Akter, Simon Bin, Akter, Sumya, Hasan, R., Hasan, M. M., Eisenberg, D., Azim, R., Fresneda Fernandez, J., & Pias, T. S. (2025). Optimizing stability of heart disease prediction across imbalanced learning with interpretable Grow Network. Computer Methods and Programs in Biomedicine, 265, 108702. https://doi.org/10.1016/j.cmpb.2025.108702
Anwar, A. H. (2025). SISTEMATIC REVIEW FAKTOR RESIKO PENYAKIT JANTUNG. 02(01), 57–69.
Aryuni, M., Adiarto, S., Miranda, E., Madyatmadja, E. D., Albert, V. D. S., & Sestomi, E. (2023). Imbalanced Learning in Heart Disease Categorization: Improving Minority Class Prediction Accuracy Using the SMOTE Algorithm. INTERNATIONAL JOURNAL of FUZZY LOGIC and INTELLIGENT SYSTEMS, 23(2), 140–151. https://doi.org/10.5391/IJFIS.2023.23.2.140
Ath, S., Al, T., Darmawan, D., Fahmi, N., Hakim, A., Qibtiya, M. Al, & Syafei, N. S. (2022). Jurnal Teknologi Terpadu HYBRID MACHINE LEARNING MODEL UNTUK MEMPREDIKSI PENYAKIT JANTUNG DENGAN METODE LOGISTIC REGRESSION DAN RANDOM. 8(1), 40–46.
Bonek-Wytrych, G., Sierka, O., Szynal, M., & Dąbek, J. (2024). Quality of Life of Patients with Heart Failure Due to Myocardial Ischemia. Reviews in Cardiovascular Medicine, 25(9), 342. https://doi.org/10.31083/j.rcm2509342
Chen, Z. (2025). Heart Disease Prediction Models Performance Analysis based on Logistic Regression , Random Forest and XGBoost. 153, 115–124.
Fadlil, A., Perdana, L., Pujiyanta, A., Herman, H., Fathurrahman, H. I. K., & Samodro, M. M. J. (2025). Implementation of Dysarthria Identification Using MFCC and Multilayer Perceptron Algorithm. SSRG International Journal of Electrical and Electronics Engineering, 12(1), 32–46. https://doi.org/10.14445/23488379/IJEEE-V12I1P105
Ghani, L., Dewi, M., Novriani, H., Penelitian, P., & Daya, S. (2016). Faktor Risiko Dominan Penyakit Jantung Koroner di Indonesia. 153–164.
Indonesia, K. K. R. (2023). PROFIL KESEHATAN INDONESIA 2023.
Jogo, M. M. S., Biddinika, M. K., & Fadlil, A. (2023). Klasifikasi Penyakit Diabetes dengan Algoritma Decision Tree dan Naïve Bayes. RESISTOR (Elektronika Kendali Telekomunikasi Tenaga Listrik Komputer) Vol., 6(2), 113–118.
Jogo Samodro, M. M., Biddinika, M. K., & Fadlil, A. (2024). Optimal Feature Selection in Diabetes Classification Using the MLP Algorithm. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 18(2). https://doi.org/10.22146/ijccs.94575
Kemenkes. (2023). Surveki Kesehatan Indonesia.
Nasution, N., Nasution, F. B., & Hasan, M. A. (2025). Predicting Heart Disease Using Machine Learning : An Evaluation of Logistic Regression , Random Forest , SVM , and KNN Models on the UCI Heart Disease Dataset. 9(2), 140–150.
Okolie, A., Obunadike, C., Okoro, S. C., Olufemi, I. B., Nwoke, P., & Akwabeng, P. M. (2025). Heart Disease Prediction: A Logistic Regression Approach. Open Journal of Applied Sciences, 15(11), 3534–3552. https://doi.org/10.4236/ojapps.2025.1511229
Triyono, D., Liani, R., Utami, A. W., Tristiyanti, S., Supriatna, A., Surabaya, K. P., & Bandung, S. B. (2022). PENYAKIT JANTUNG KORONER DI INDONESIA : PERAN. 17(1), 86–94.
Wan, S. (2025). Machine learning approaches for cardiovascular disease prediction: A review. (1).
WHO. (2025). Cardiovascular diseases (CVDs). World Health Organization.
Zhang, P., Wu, L., Zou, T., Zou, Z., Tu, J., & Gong, R. (n.d.). Machine Learning for Early Prediction of Major Adverse Cardiovascular Events After First Percutaneous Coronary Intervention in Patients With Acute Myocardial Infarction : Retrospective Cohort Study Corresponding Author : 8. https://doi.org/10.2196/48487
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Maulana Samodro

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright Notice
An author who publishes in the Journal of Software Engineering and Information System (SEIS) agrees to the following terms:
- Author retains the copyright and grants the journal the right of first publication of the work simultaneously licensed under the Creative Commons Attribution-ShareAlike 4.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal
- Author is able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book) with the acknowledgement of its initial publication in this journal.
- Author is permitted and encouraged to post his/her work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of the published work (See The Effect of Open Access).
Read more about the Creative Commons Attribution-ShareAlike 4.0 Licence here: https://creativecommons.org/licenses/by-sa/4.0/.






