Prediksi Lead Scoring untuk Optimasi Penjualan Menggunakan Random Forest dan Teknik SMOTE
DOI:
https://doi.org/10.37859/jf.v16i1.11292
Abstract
Accurate lead scoring systems have become a strategic necessity for organizations operating in data-driven marketing environments, as they enable systematic identification of high-value customer prospects to maximize sales conversion efficiency. A fundamental challenge confronting conventional classification models is the class imbalance inherent in real-world marketing data, which induces majority-class bias and substantially reduces sensitivity toward minority-class prospects. This study proposes a Random Forest (RF)-based lead scoring prediction model integrated with the Synthetic Minority Over-sampling Technique (SMOTE) to address this limitation systematically. The dataset employed is the Lead Scoring Dataset from Kaggle, comprising 9,240 customer prospect records from an educational company with a class imbalance ratio of 1.59:1. Preprocessing included missing value treatment, removal of attributes exceeding 40% data loss, mode-based imputation, and categorical feature encoding. Following an 80:20 stratified split, SMOTE was applied exclusively to the training set to produce a balanced class distribution and prevent data leakage. The RF model was configured with n_estimators = 100, max_features = 'sqrt', and class_weight = 'balanced'. The proposed RF+SMOTE model achieved accuracy of 88.80%, precision of 86.44%, recall of 84.13%, F1-Score of 85.27%, and AUC-ROC of 0.9453, outperforming the baseline across four of five evaluation metrics. The most notable improvement was observed in recall, with a gain of 1.26 percentage points. Stratified 5-Fold Cross-Validation confirmed robust generalization capability, with AUC-ROC values consistently ranging between 94% and 95%. These findings demonstrate that the hybrid RF+SMOTE approach effectively enhances high-potential prospect detection while maintaining overall model stability for real-world Customer Relationship Management (CRM) deployment.
Downloads
References
N. Ahmad, M. J. Awan, H. Nobanee, A. M. Zain, A. Naseem, and A. Mahmoud, “Customer Personality Analysis for Churn Prediction Using Hybrid Ensemble Models and Class Balancing Techniques,” IEEE Access, vol. 12, pp. 1865–1879, 2024, doi: 10.1109/ACCESS.2023.3334641.
J. Lin, “Application of machine learning in predicting consumer behavior and precision marketing,” PLoS One, vol. 20, no. 5 May, pp. 1–12, 2025, doi: 10.1371/journal.pone.0321854.
L. González-Flores, J. Rubiano-Moreno, and G. Sosa-Gómez, “The relevance of lead prioritization: a B2B lead scoring model based on machine learning,” Front. Artif. Intell., vol. 8, 2025, doi: 10.3389/frai.2025.1554325.
A. Yocupicio-Zazueta, A. Brau-Avila, F. Cirett-Galán, and M. Valenzuela-Galván, “Design and Deployment of ML in CRM to Identify Leads,” Appl. Artif. Intell., vol. 38, no. 1, 2024, doi: 10.1080/08839514.2024.2376978.
M. Mujahid et al., “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, Dec. 2024, doi: 10.1186/s40537-024-00943-4.
M. Altalhan, A. Algarni, and M. Turki-Hadj Alouane, “Imbalanced Data Problem in Machine Learning: A Review,” IEEE Access, vol. 13, pp. 13686–13699, 2025, doi: 10.1109/ACCESS.2025.3531662.
A. Manzoor, M. Atif Qureshi, E. Kidney, and L. Longo, “A Review on Machine Learning Methods for Customer Churn Prediction and Recommendations for Business Practitioners,” IEEE Access, vol. 12, pp. 70434–70463, 2024, doi: 10.1109/ACCESS.2024.3402092.
E. F. Agyemang et al., “Addressing Class Imbalance Problem in Health Data Classification: Practical Application From an Oversampling Viewpoint,” Appl. Comput. Intell. Soft Comput., vol. 2025, no. 1, 2025, doi: 10.1155/acis/1013769.
Z. Zheng, “Financial Risk Early Warning Model Combining SMOTE and Random Forest for Internet Finance Companies,” J. Cases Inf. Technol., vol. 26, no. 1, 2024, doi: 10.4018/JCIT.356504.
Husain et al., “SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models,” Algorithms, vol. 18, no. 1, Jan. 2025, doi: 10.3390/a18010037.
I. Aruleba and Y. Sun, “Effective Credit Risk Prediction Using Ensemble Classifiers With Model Explanation,” IEEE Access, vol. 12, pp. 115015–115025, 2024, doi: 10.1109/ACCESS.2024.3445308.
B. Amirshahi and S. Lahmiri, “Bankruptcy prediction using optimal ensemble models under balanced and imbalanced data,” Expert Syst., vol. 41, no. 8, Aug. 2024, doi: 10.1111/exsy.13599.
S. Gholampour, “Impact of Nature of Medical Data on Machine and Deep Learning for Imbalanced Datasets: Clinical Validity of SMOTE Is Questionable,” Mach. Learn. Knowl. Extr., vol. 6, no. 2, pp. 827–841, Jun. 2024, doi: 10.3390/make6020039.
N. S. Thomas and S. Kaliraj, “An Improved and Optimized Random Forest Based Approach to Predict the Software Faults,” SN Comput. Sci., vol. 5, no. 5, Jun. 2024, doi: 10.1007/s42979-024-02764-x.
J. Lyu, J. Yang, Z. Su, and Z. Zhu, “LD-SMOTE: A Novel Local Density Estimation-Based Oversampling Method for Imbalanced Datasets,” Symmetry (Basel)., vol. 17, no. 2, Feb. 2025, doi: 10.3390/sym17020160.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 DAFFA PRATAMA PUTRA, Dimas Agil Kusuma, M. Rizki Al Akbar, Ali Ibrahim, Fathoni Fathoni

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Copyright Notice
An author who publishes in the Jurnal FASILKOM (teknologi inFormASi dan ILmu KOMputer) agrees to the following terms:
- Author retains the copyright and grants the journal the right of first publication of the work simultaneously licensed under the Creative Commons Attribution-ShareAlike 4.0 License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal
- Author is able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book) with the acknowledgement of its initial publication in this journal.
- Author is permitted and encouraged to post his/her work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of the published work (See The Effect of Open Access).
Read more about the Creative Commons Attribution-ShareAlike 4.0 Licence here: https://creativecommons.org/licenses/by-sa/4.0/.










_(1).png)



