Implementation of Ensemble Learning Algorithm - Stacking Regressor on PM2.5 Prediction Model
Abstract
This study aims to develop a PM2.5 concentration prediction model using ensemble learning techniques, focusing on the application of stacking regressor. The developed model is compared with several other basic models, namely LSTM, Random Forest, XGBoost, and GBM, to evaluate its performance in terms of prediction accuracy. The results show that the stacking regressor model provides more accurate prediction results than these basic models. The LSTM model has a Mean Squared Error (MSE) value of 80.42 and a coefficient of determination (R²) of -0.29, which shows unsatisfactory performance despite being able to capture temporal patterns. Random Forest gave better results with an MSE of 10.90 and R² of 0.83, thanks to its ability to handle heterogeneous data. The XGBoost and GBM models showed similar performance, with MSE of 9.01 and 8.81 respectively and R² of 0.86. However, the stacking regressor model with the RidgeCV meta-learner achieved the best results with an MSE of 8.07 and R² of 0.87, indicating that this technique successfully combines the advantages of various base models to improve prediction accuracy. In conclusion, the stacking regressor model proved effective in improving PM2.5 prediction accuracy, making it a potential tool for air quality monitoring and supporting public health policy making. This research makes an important contribution to the development of ensemble learning-based air quality prediction methods, and the results can serve as a reference for future research
Downloads
References
[2] R. Rakhim dkk., “Dampak Pelaksanaan Pembatasan Sosial Berskala Besar ( PSBB ) Terhadap Konsentrasi PM 10 di Pekanbaru Impact of Large-Scale Social Restrictions ( PSBB ) on PM 10 Concentration in Pekanbaru,” vol. 15, no. 1, hlm. 13–22, 2021.
[3] Y. Fujii, H. S. Huboyo, S. Tohno, T. Okuda, dan Syafrudin, “Chemical speciation of water-soluble ionic components in PM2.5 derived from peatland fires in Sumatra Island,” Atmospheric Pollution Research, vol. 10, no. 4, hlm. 1260–1266, 2019, doi: https://doi.org/10.1016/j.apr.2019.02.009.
[4] M. Unik, I. S. Sitanggang, L. Syaufina, dan I. N. S. Jaya, “PM 2 . 5 Estimation using Machine Learning Models and Satellite Data : A Literature Review,” vol. 14, no. 5, hlm. 359–370, 2023.
[5] P. Thangavel, D. Park, dan Y. C. Lee, “Recent Insights into Particulate Matter (PM2.5)-Mediated Toxicity in Humans: An Overview,” 1 Juni 2022, MDPI. doi: 10.3390/ijerph19127511.
[6] H. Karimian, Y. Li, Y. Chen, dan Z. Wang, “Evaluation of different machine learning approaches and aerosol optical depth in PM2.5 prediction,” Environ Res, vol. 216, Jan 2023, doi: 10.1016/j.envres.2022.114465.
[7] M. Unik dan Sri Nadriati, “Overview: Random Forest Algorithm for PM2.5 Estimation Based on Remote Sensing,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 3, hlm. 422–430, Des 2022, doi: 10.37859/coscitech.v3i3.4380.
[8] P. Gupta dan S. A. Christopher, “Particulate matter air quality assessment using integrated surface, satellite, and meteorological products: 2. A neural network approach,” Journal of Geophysical Research Atmospheres, vol. 114, no. 20, hlm. 1–14, 2009, doi: 10.1029/2008JD011497.
[9] D. J. Lary, A. H. Alavi, A. H. Gandomi, dan A. L. Walker, “Machine learning in geosciences and remote sensing,” Geoscience Frontiers, vol. 7, no. 1, hlm. 3–10, 2016, doi: 10.1016/j.gsf.2015.07.003.
[10] K. J. Bergen, P. A. Johnson, M. V. De Hoop, dan G. C. Beroza, “Machine learning for data-driven discovery in solid Earth geoscience,” Science (1979), vol. 363, no. 6433, 2019, doi: 10.1126/science.aau0323.
[11] Q. Di dkk., “An ensemble-based model of PM(2.5) concentration across the contiguous United States with high spatiotemporal resolution.,” Environ Int, vol. 130, hlm. 104909, Sep 2019, doi: 10.1016/j.envint.2019.104909.
[12] Q. Di dkk., “Assessing NO(2) Concentration and Model Uncertainty with High Spatiotemporal Resolution across the Contiguous United States Using Ensemble Model Averaging.,” Environ Sci Technol, vol. 54, no. 3, hlm. 1372–1384, Feb 2020, doi: 10.1021/acs.est.9b03358.
[13] P. Zhang, L. Yang, W. Ma, N. Wang, F. Wen, dan Q. Liu, “Spatiotemporal estimation of the PM2.5 concentration and human health risks combining the three-dimensional landscape pattern index and machine learning methods to optimize land use regression modeling in Shaanxi, China,” Environ Res, vol. 208, hlm. 112759, 2022, doi: https://doi.org/10.1016/j.envres.2022.112759.
[14] A. Masood dan K. Ahmad, “Data-driven predictive modeling of PM2.5 concentrations using machine learning and deep learning techniques: a case study of Delhi, India,” Environ Monit Assess, vol. 195, no. 1, Jan 2023, doi: 10.1007/s10661-022-10603-w.
[15] H. Feizi, M. T. Sattari, R. Prasad, dan H. Apaydin, “Comparative analysis of deep and machine learning approaches for daily carbon monoxide pollutant concentration estimation,” International Journal of Environmental Science and Technology, vol. 20, no. 2, hlm. 1753–1768, 2023, doi: 10.1007/s13762-022-04702-x.
[16] X. Li, L. Li, L. Chen, T. Zhang, J. Xiao, dan L. Chen, “Random Forest Estimation and Trend Analysis of PM2.5 Concentration over the Huaihai Economic Zone, China (2000–2020),” Sustainability (Switzerland), vol. 14, no. 14, Jul 2022, doi: 10.3390/su14148520.
[17] W. Yu, S. Li, T. Ye, R. Xu, J. Song, dan Y. Guo, “Deep Ensemble Machine Learning Framework for the Estimation of PM2:5 Concentrations,” Environ Health Perspect, vol. 130, no. 3, Mar 2022, doi: 10.1289/EHP9752.
[18] Z. Li, Z. Di, M. Chang, J. Zheng, T. Tanaka, dan K. Kuroi, “Study on the influencing factors on indoor PM2.5 of office buildings in beijing based on statistical and machine learning methods,” Journal of Building Engineering, vol. 66, hlm. 105240, 2023, doi: https://doi.org/10.1016/j.jobe.2022.105240.
[19] Y. Zhu, C. Liu, dan J. Ma, “Prediction of PM2.S concentration in Changchun based on ensemble learning model,” dalam 2022 18th International Conference on Computational Intelligence and Security (CIS), 2022, hlm. 79–83. doi: 10.1109/CIS58238.2022.00024.
[20] S. Alexandra dan S. Joel, “Estimating spatiotemporally resolved PM2.5 concentration across the contiguous United States using Super learning,” ISEE Conference Abstracts, vol. 2022, no. 1, Agu 2024, doi: 10.1289/isee.2022.P-0890.
[21] W. Ban dan L. Shen, “PM2.5 Prediction Based on the CEEMDAN Algorithm and a Machine Learning Hybrid Model,” Sustainability (Switzerland), vol. 14, no. 23, Des 2022, doi: 10.3390/su142316128.
[22] F. Pedregosa dkk., “Scikit-learn: Machine Learning in Python,” 2011. [Daring]. Tersedia pada: http://scikit-learn.sourceforge.net.
[23] W. Chen dkk., “Estimating PM2.5 with high-resolution 1-km AOD data and an improved machine learning model over Shenzhen, China,” Science of the Total Environment, vol. 746, hlm. 141093, 2020, doi: 10.1016/j.scitotenv.2020.141093.
[24] G. J. Hahn, “The Coefficient of Determination Exposed,” Chemical Technology, vol. 3, no. 10, hlm. 609–612, 1973.