OPTIMASI MODEL XGBOOST UNTUK PREDIKSI PENYAKIT JANTUNG MENGGUNAKAN OPTUNA
DOI:
https://doi.org/10.37859/seis.v6i1.10527
Abstract
Heart disease is one of the leading causes of mortality worldwide, emphasizing the need for accurate early detection systems. Machine learning models such as XGBoost have demonstrated strong performance in medical classification tasks; however, their effectiveness is highly dependent on optimal hyperparameter configurations. This study aims to improve the performance of XGBoost for heart disease classification by applying hyperparameter optimization using the Optuna framework with the Tree-structured Parzen Estimator (TPE) algorithm. The UCI Heart Disease dataset, consisting of 918 records, is used in this study. To address class imbalance, the Synthetic Minority Oversampling Technique (SMOTE) is applied to the training data. Model performance is evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results show that the optimized XGBoost model achieves an accuracy of 89.13%, outperforming the baseline model with 87.50%, and improves recall from 87.50% to 89.10%. In addition, the optimized model attains a higher ROC-AUC value of 0.9319, indicating improved classification stability. These findings demonstrate that Optuna-based hyperparameter optimization effectively enhances the performance and reliability of XGBoost, making it suitable for supporting early heart disease diagnosis in medical decision support systems.
Downloads
References
Abdellatif, A., Abdellatef, H., Kanesan, J., Chow, C., Chuah, J., & Gheni, H. (2022). An effective heart disease detection and severity level classification model using machine learning and hyperparameter optimization methods. IEEE Access, 10, 79974–79985. https://doi.org/10.1109/access.2022.3191669
Amosa L. and Sebastian P. and Ismail I. and Ibrahim O. and Ayinla S., T. and I. (2023). Clinical errors from acronym use in electronic health record: a review of NLP-based disambiguation techniques. IEEE Access, 11, 59297–59316. https://doi.org/10.1109/access.2023.3284682
Baghdadi S. and Malki A. and Gad I. and Ewis A. and Atlam E., N. and A. (2023). Advanced machine learning techniques for cardiovascular disease early detection and diagnosis. Journal of Big Data, 10(1). https://doi.org/10.1186/s40537-023-00817-1
Gabriel L., J. and A. (2023). Optimizing coronary artery disease diagnosis: a heuristic approach using robust data preprocessing and automated hyperparameter tuning of extreme gradient boosting. IEEE Access, 11, 112988–113007. https://doi.org/10.1109/access.2023.3324037
Ghosh, S., & Khandoker, A. (2024). Investigation on explainable machine learning models to predict chronic kidney diseases. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-54375-4
Jafar, A., & Lee, M. (2023). HypGB: High accuracy GB classifier for predicting heart disease with HyperOpt HPO framework and LASSO FS method. IEEE Access, 11, 138201–138214. https://doi.org/10.1109/access.2023.3339225
Mahawan, T., Luckett, T., Iza, A., Pornputtapong, N., & Caamaño-Gutiérrez, E. (2024). Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis. BMC Medical Informatics and Decision Making, 24(S4). https://doi.org/10.1186/s12911-024-02578-0
Park, D., Park, M., Lee, H., Kim, Y., Kim, Y., & Park, Y. (2021). Development of machine learning model for diagnostic disease prediction based on laboratory tests. Scientific Reports, 11(1). https://doi.org/10.1038/s41598-021-87171-5
Saraswat P. and Verma A. and Prasad V. and Tanwar S. and Sharma G. and Sharma R., D. and B. (2022). Explainable AI for Healthcare 5.0: Opportunities and challenges. IEEE Access, 10, 84486–84517. https://doi.org/10.1109/access.2022.3197671
Sharma, N., Appukutti, S., Garg, U., Mukherjee, J., & Mishra, S. (2023). Analysis of Student’s Academic Performance based on their Time Spent on Extra-Curricular Activities using Machine Learning Techniques. International Journal of Modern Education and Computer Science, 15(1), 46–57. https://doi.org/10.5815/ijmecs.2023.01.04
Wei J. and Yu P. and Li A. and Xiong Z. and Yuan Z. and Luo J., C. and W. (2024). Comparison of different machine learning classification models for predicting deep vein thrombosis in lower extremity fractures. Scientific Reports, 14(1). https://doi.org/10.1038/s41598-024-57711-w
Yi, F., Yang, H., Chen, D., Qin, Y., Han, H., Cui, J., & Yu, H. (2023). XGBoost-SHAP-based interpretable diagnostic framework for Alzheimer’s disease. BMC Medical Informatics and Decision Making, 23(1). https://doi.org/10.1186/s12911-023-02238-9.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Yasni Optarina, Nana Suarna, Agus Bahtiar, Nining Rahaningsih, Willy Prihartono

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright Notice
An author who publishes in the Journal of Software Engineering and Information System (SEIS) agrees to the following terms:
- Author retains the copyright and grants the journal the right of first publication of the work simultaneously licensed under the Creative Commons Attribution-ShareAlike 4.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal
- Author is able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book) with the acknowledgement of its initial publication in this journal.
- Author is permitted and encouraged to post his/her work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of the published work (See The Effect of Open Access).
Read more about the Creative Commons Attribution-ShareAlike 4.0 Licence here: https://creativecommons.org/licenses/by-sa/4.0/.






