Analisis Pengaruh Ketidakseimbangan Data terhadap Kinerja Model Klasifikasi Penyakit Jantung

Authors

  • Maulana Samodro Universitas Safin Pati
Keywords: logistic regression, heart disease, imbalanced data, classification

Abstract

Heart disease remains one of the leading causes of mortality, highlighting the importance of data-driven predictive models for risk analysis. However, medical datasets commonly suffer from class imbalance and weak predictive signals, which can limit model performance. This study aims to evaluate the performance of a Logistic Regression model for heart attack prediction by comparing imbalanced and balanced datasets using different train–test split ratios of 80:20 and 90:10. Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrix. The experimental results show that models trained on imbalanced data achieved higher accuracy but exhibited biased performance, particularly low recall for the minority class. After applying data balancing techniques, accuracy decreased; however, the model demonstrated more balanced performance with improved recall and F1-score for the minority class. These findings indicate that accuracy alone does not adequately represent model performance on imbalanced medical datasets. Moreover, the results suggest that the relationship between the medical attributes and heart attack occurrence in the dataset is relatively weak, limiting the model’s ability to establish clear decision boundaries. Therefore, appropriate evaluation metrics and representative clinical datasets are essential for developing reliable heart disease risk prediction models.

Downloads

Download data is not yet available.

References

Akter, Simon Bin, Akter, Sumya, Hasan, R., Hasan, M. M., Eisenberg, D., Azim, R., Fresneda Fernandez, J., & Pias, T. S. (2025). Optimizing stability of heart disease prediction across imbalanced learning with interpretable Grow Network. Computer Methods and Programs in Biomedicine, 265, 108702. https://doi.org/10.1016/j.cmpb.2025.108702

Anwar, A. H. (2025). SISTEMATIC REVIEW FAKTOR RESIKO PENYAKIT JANTUNG. 02(01), 57–69.

Aryuni, M., Adiarto, S., Miranda, E., Madyatmadja, E. D., Albert, V. D. S., & Sestomi, E. (2023). Imbalanced Learning in Heart Disease Categorization: Improving Minority Class Prediction Accuracy Using the SMOTE Algorithm. INTERNATIONAL JOURNAL of FUZZY LOGIC and INTELLIGENT SYSTEMS, 23(2), 140–151. https://doi.org/10.5391/IJFIS.2023.23.2.140

Ath, S., Al, T., Darmawan, D., Fahmi, N., Hakim, A., Qibtiya, M. Al, & Syafei, N. S. (2022). Jurnal Teknologi Terpadu HYBRID MACHINE LEARNING MODEL UNTUK MEMPREDIKSI PENYAKIT JANTUNG DENGAN METODE LOGISTIC REGRESSION DAN RANDOM. 8(1), 40–46.

Bonek-Wytrych, G., Sierka, O., Szynal, M., & Dąbek, J. (2024). Quality of Life of Patients with Heart Failure Due to Myocardial Ischemia. Reviews in Cardiovascular Medicine, 25(9), 342. https://doi.org/10.31083/j.rcm2509342

Chen, Z. (2025). Heart Disease Prediction Models Performance Analysis based on Logistic Regression , Random Forest and XGBoost. 153, 115–124.

Fadlil, A., Perdana, L., Pujiyanta, A., Herman, H., Fathurrahman, H. I. K., & Samodro, M. M. J. (2025). Implementation of Dysarthria Identification Using MFCC and Multilayer Perceptron Algorithm. SSRG International Journal of Electrical and Electronics Engineering, 12(1), 32–46. https://doi.org/10.14445/23488379/IJEEE-V12I1P105

Ghani, L., Dewi, M., Novriani, H., Penelitian, P., & Daya, S. (2016). Faktor Risiko Dominan Penyakit Jantung Koroner di Indonesia. 153–164.

Indonesia, K. K. R. (2023). PROFIL KESEHATAN INDONESIA 2023.

Jogo, M. M. S., Biddinika, M. K., & Fadlil, A. (2023). Klasifikasi Penyakit Diabetes dengan Algoritma Decision Tree dan Naïve Bayes. RESISTOR (Elektronika Kendali Telekomunikasi Tenaga Listrik Komputer) Vol., 6(2), 113–118.

Jogo Samodro, M. M., Biddinika, M. K., & Fadlil, A. (2024). Optimal Feature Selection in Diabetes Classification Using the MLP Algorithm. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 18(2). https://doi.org/10.22146/ijccs.94575

Kemenkes. (2023). Surveki Kesehatan Indonesia.

Nasution, N., Nasution, F. B., & Hasan, M. A. (2025). Predicting Heart Disease Using Machine Learning : An Evaluation of Logistic Regression , Random Forest , SVM , and KNN Models on the UCI Heart Disease Dataset. 9(2), 140–150.

Okolie, A., Obunadike, C., Okoro, S. C., Olufemi, I. B., Nwoke, P., & Akwabeng, P. M. (2025). Heart Disease Prediction: A Logistic Regression Approach. Open Journal of Applied Sciences, 15(11), 3534–3552. https://doi.org/10.4236/ojapps.2025.1511229

Triyono, D., Liani, R., Utami, A. W., Tristiyanti, S., Supriatna, A., Surabaya, K. P., & Bandung, S. B. (2022). PENYAKIT JANTUNG KORONER DI INDONESIA : PERAN. 17(1), 86–94.

Wan, S. (2025). Machine learning approaches for cardiovascular disease prediction: A review. (1).

WHO. (2025). Cardiovascular diseases (CVDs). World Health Organization.

Zhang, P., Wu, L., Zou, T., Zou, Z., Tu, J., & Gong, R. (n.d.). Machine Learning for Early Prediction of Major Adverse Cardiovascular Events After First Percutaneous Coronary Intervention in Patients With Acute Myocardial Infarction : Retrospective Cohort Study Corresponding Author : 8. https://doi.org/10.2196/48487

Downloads

Published

2026-02-04

How to Cite

Samodro, M. (2026). Analisis Pengaruh Ketidakseimbangan Data terhadap Kinerja Model Klasifikasi Penyakit Jantung. Journal of Software Engineering and Information System (SEIS), 6(1), 56–62. Retrieved from https://ejurnal.umri.ac.id/index.php/SEIS/article/view/11050

Issue

Section

Articles