نوع مقاله : مقاله پژوهشی
عنوان مقاله English
نویسندگان English
Comparison Between Quantile Mapping Downscaling Method and Random Forest Model
Extended abstract
Introduction
Accurate prediction of daily minimum air temperature in regions with complex topography, where elevation and terrain irregularities play a significant role in temperature variability, is of great importance due to its direct relationship with frost periods. However, Global Climate Models (GCMs), owing to their coarse spatial resolution and inherent systematic biases, are unable to accurately represent regional climatic processes and therefore require bias correction. In recent years, statistical models and machine learning approaches have attracted considerable attention for bias correction and downscaling. Nevertheless, to date, no study has been conducted comparing these models in predicting minimum temperature across Iran.
Materials and Methods
In this study, monthly data from the Birjand synoptic station for the period 1991–2020 were used as observational records to compare the quantile mapping and random forest methods. Additionally, output from the IPSL-CM6A-LR climate model of the CMIP6 project under the SSP2-4.5 scenario was employed for the historical period (1991–2020) and future projections (2030–2059). For the statistical quantile mapping approach in the R environment, the Qmap package with the asymmetric exponential transfer function (expasympt) was applied, while the machine learning model utilized the Random Forest package. The performance of both models was evaluated using statistical metrics including R², RMSE, MAE, NSE, and KGE, as well as through analysis of empirical cumulative distribution function (ECDF) plots and scatter diagrams.
Results and Discussion
In this study, bias correction of monthly minimum temperature at the Birjand synoptic station was examined using two different approaches. The statistical BCSD method employed the asymmetric exponential transfer function (expasympt) from the Qmap package, while the machine learning approach utilized the Random Forest algorithm. The data from 1991–2010 were used for calibration, and the period 2011–2020 was considered for validation. The evaluation of the two methods during the validation period (2011–2020) was carried out using statistical indices and analytical diagrams.
The results indicated that the Random Forest model (R² = 0.94, NSE = 0.93, KGE = 0.91) showed a higher agreement with the observed data compared to the BCSD method (R² = 0.90, NSE = 0.90, KGE = 0.90). Furthermore, the error metrics of the Random Forest model (RMSE = 1.96 °C, MAE = 1.63 °C) indicated lower prediction errors compared to the BCSD method (RMSE = 2.39 °C, MAE = 1.93 °C). In the comparison of scatter plots, the Random Forest model was able to establish a nearly linear relationship close to the 1:1 line between predicted and observed values, whereas the BCSD results showed greater dispersion and deviation from perfect correlation. Similarly, in the empirical cumulative distribution function (ECDF) plots, the Random Forest model closely matched the observational curve and more accurately reproduced the statistical distribution of minimum temperature particularly in the middle and upper portions of the distribution (moderate to warmer temperatures) compared to the BCSD model. In contrast, the BCSD approach performed less effectively in the lower tail of the distribution (colder temperatures).The superior performance of the Random Forest model can be attributed to its inherent structure, which leverages an ensemble of decision trees and aggregates their outputs, enabling it to identify and generalize hidden, non-linear relationships between input and output variables. This capability is particularly advantageous in climate datasets influenced by complex and variable factors. In contrast, the BCSD method, relying on predefined transfer functions and lacking the ability to learn data-driven relationships, cannot provide the same level of adaptability and precision.Based on the statistical evidence obtained in this research, the Random Forest model demonstrates strong potential for bias correction of climate data, particularly minimum temperature, and offers greater accuracy and stability compared to traditional statistical methods.
Conclusion
The findings of this study demonstrate that the Random Forest (RF) model possesses the capability to capture complex and nonlinear relationships, leading to superior performance in bias correction of minimum temperature values compared to the Quantile Mapping (QM) model. Therefore, employing machine learning algorithms such as RF can be an effective approach for improving climate predictions in arid regions like Iran. It is recommended that future research explore more advanced models, including XGBoost, deep neural networks, and hybrid methods.
Keywords:
Bias correction, downscaling, Coupled Model Intercomparison Project Phase 6 (CMIP6), quantile mapping, machine learning.
کلیدواژهها English