OPTIMASI MODEL XGBOOST UNTUK PREDIKSI PENYAKIT JANTUNG MENGGUNAKAN OPTUNA

Authors

  • Yasni Optarina Teknik Informatika, STMIK IKMI Cirebon
  • Nana Suarna Teknik Informatika, STMIK IKMI Cirebon
  • Agus Bahtiar Sistem Informasi, STMIK IKMI Cirebon
  • Nining Rahaningsih Komputerisasi Akuntansi, STMIK IKMI Cirebon
  • Willy Prihartono Komputerisasi Akuntansi, STMIK IKMI Cirebon

DOI:

https://doi.org/10.37859/seis.v6i1.10527
Keywords: XGBoost, Optuna, Hyperparameter Optimization, Heart Disease, SMOTE

Abstract

Heart disease is one of the leading causes of mortality worldwide, emphasizing the need for accurate early detection systems. Machine learning models such as XGBoost have demonstrated strong performance in medical classification tasks; however, their effectiveness is highly dependent on optimal hyperparameter configurations. This study aims to improve the performance of XGBoost for heart disease classification by applying hyperparameter optimization using the Optuna framework with the Tree-structured Parzen Estimator (TPE) algorithm. The UCI Heart Disease dataset, consisting of 918 records, is used in this study. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) is applied to the training data. Model performance is evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results show that the optimized XGBoost model achieves an accuracy of 89.13%, outperforming the baseline model with 87.50%, and improves recall from 87.50% to 89.10%. In addition, the optimized model attains a higher ROC-AUC value of 0.9319, indicating improved classification stability. These findings demonstrate that Optuna-based hyperparameter optimization effectively enhances the performance and reliability of XGBoost, making it suitable for supporting early heart disease diagnosis in medical decision support systems.

Downloads

Download data is not yet available.

References

Abdellatif, A., Abdellatef, H., Kanesan, J., Chow, C., Chuah, J., & Gheni, H. (2022). An effective heart disease detection and severity level classification model using machine learning and hyperparameter optimization methods. IEEE Access, 10, 79974–79985. https://doi.org/10.1109/access.2022.3191669

Amosa L. and Sebastian P. and Ismail I. and Ibrahim O. and Ayinla S., T. and I. (2023). Clinical errors from acronym use in electronic health record: a review of NLP-based disambiguation techniques. IEEE Access, 11, 59297–59316. https://doi.org/10.1109/access.2023.3284682

Baghdadi S. and Malki A. and Gad I. and Ewis A. and Atlam E., N. and A. (2023). Advanced machine learning techniques for cardiovascular disease early detection and diagnosis. Journal of Big Data, 10(1). https://doi.org/10.1186/s40537-023-00817-1

Gabriel L., J. and A. (2023). Optimizing coronary artery disease diagnosis: a heuristic approach using robust data preprocessing and automated hyperparameter tuning of extreme gradient boosting. IEEE Access, 11, 112988–113007. https://doi.org/10.1109/access.2023.3324037

Ghosh, S., & Khandoker, A. (2024). Investigation on explainable machine learning models to predict chronic kidney diseases. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-54375-4

Jafar, A., & Lee, M. (2023). HypGB: High accuracy GB classifier for predicting heart disease with HyperOpt HPO framework and LASSO FS method. IEEE Access, 11, 138201–138214. https://doi.org/10.1109/access.2023.3339225

Mahawan, T., Luckett, T., Iza, A., Pornputtapong, N., & Caamaño-Gutiérrez, E. (2024). Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis. BMC Medical Informatics and Decision Making, 24(S4). https://doi.org/10.1186/s12911-024-02578-0

Park, D., Park, M., Lee, H., Kim, Y., Kim, Y., & Park, Y. (2021). Development of machine learning model for diagnostic disease prediction based on laboratory tests. Scientific Reports, 11(1). https://doi.org/10.1038/s41598-021-87171-5

Saraswat P. and Verma A. and Prasad V. and Tanwar S. and Sharma G. and Sharma R., D. and B. (2022). Explainable AI for Healthcare 5.0: Opportunities and challenges. IEEE Access, 10, 84486–84517. https://doi.org/10.1109/access.2022.3197671

Sharma, N., Appukutti, S., Garg, U., Mukherjee, J., & Mishra, S. (2023). Analysis of Student’s Academic Performance based on their Time Spent on Extra-Curricular Activities using Machine Learning Techniques. International Journal of Modern Education and Computer Science, 15(1), 46–57. https://doi.org/10.5815/ijmecs.2023.01.04

Wei J. and Yu P. and Li A. and Xiong Z. and Yuan Z. and Luo J., C. and W. (2024). Comparison of different machine learning classification models for predicting deep vein thrombosis in lower extremity fractures. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-57711-w

Yi, F., Yang, H., Chen, D., Qin, Y., Han, H., Cui, J., & Yu, H. (2023). XGBoost-SHAP-based interpretable diagnostic framework for Alzheimer’s disease. BMC Medical Informatics and Decision Making, 23(1). https://doi.org/10.1186/s12911-023-02238-9.

Downloads

Published

2026-01-31

How to Cite

Optarina, Y. ., Suarna, N. ., Bahtiar, A. ., Rahaningsih, N. ., & Prihartono, W. . (2026). OPTIMASI MODEL XGBOOST UNTUK PREDIKSI PENYAKIT JANTUNG MENGGUNAKAN OPTUNA . Journal of Software Engineering and Information System (SEIS), 6(1), 50–55. https://doi.org/10.37859/seis.v6i1.10527

Issue

Section

Articles