افزایش دقت برآورد داده‌های گمشده بارش ماهانه با الگوریتم ژنتیک و کلونی مورچگان

نوع مقاله : مقاله پژوهشی

نویسندگان

1 گروه مهندسی آب، دانشکده کشاورزی، دانشگاه فردوسی مشهد

2 استاد گروه علوم و مهندسی آب- دانشکده کشاورزی- دانشگاه فردوسی مشهد

3 استاد دانشگاه فردوسی مشهد- گروه مهندسی آب

4 دانشیار، گروه آمار، دانشگاه فردوسی مشهد

چکیده

  بارش از مهم‌ترین متغیرهای هوا و اقلیم­شناسی بوده و ارتباط مستقیم با وضعیت اقلیمی منطقه دارد. دقت شبیه سازی این متغیر با توجه به تغییرات زیاد آن از اهمیت بسزایی برخوردار است. آمار مشاهده‌ای در اولین ایستگاه های همدید ایران از سال 1330 (1951 میلادی) در سایت سازمان هواشناسی ایران قابل دسترس است. آمار قدیمی و طولانی مدت دما و بارش ماهانه پنج شهر ایران شامل مشهد توسط سفارت امریکا و انگلیس از دوره قاجار (حدود 1880) اندازه‌گیری و در کتبی ثبت شده­است. متاسفانه، این آمار دارای داده گمشده می باشد. داده های گمشده ماهانه عمدتا در طول جنگ جهانی دوم (1949-1941) و به­طور پراکنده در طول دوره آماری وجود دارد. ایستگاه­هایی از کشورهای مجاور با توجه به معیار فاصله، همبستگی و تکمیل بودن داده­ها در دوره­های دارای داده گمشده به­عنوان ایستگاه­های مبنا انتخاب شدند. این پژوهش ده الگوی چندگانه رگرسیونی را به بارش ماهانه ایستگاه مشهد برازش داده و سپس پارامترهای این الگوها  با روش­های الگوریتم ژنتیک و الگوریتم کلونی مورچگان بهینه کرده­است. نتایج نشان داد الگوریتم ژنتیک و کلونی مورچگان دقت برآورد داده­های گمشده بارش را به طور چشمگیری بالا می­برد. کمترین معیار خطای RMSE الگوهای رگرسیونی 79/9 است که با بهینه سازی با ژنتیک الگوریتم تا 560/2 و با الگوریتم کلونی مورچگان تا 559/2 کاهش می­بابد.

کلیدواژه‌ها


عنوان مقاله [English]

Improve the Accuracy of Imputation Missing monthly rainfall data by Genetic and Ant Colony Algorithms

نویسندگان [English]

  • Mahboobeh Farzandi 1
  • Seyed Hossein Sanaei Nejad 2
  • Bijan Ghahreman 3
  • Majid Sarmad 4
1 Department of Water Engineering, Faculty of Agriculture, Ferdowsi University of Mashhad
2 Professor, Water Engineering, College of Agriculture, Ferdowsi University of Mashhad
3 Professor, Ferdowsi University of Mashhad
4 Associate Prof. Ferdowsi University of Mashhad
چکیده [English]

Precipitation as one of the most important parameters of meteorology and climate, is basic factor in water resource management. This factor has a direct relation with the regional climate. The accuracy of simulating this parameter is very important due to its wide variation. Observation data at Iran's first synoptic stations from 1330 (1951) is available at the Iranian Meteorological Organization website. Old and long-term temperature and monthly precipitation data in five cities of Iran Including Mashhad, measured by the Embassy of the United States and Britain from the Qajar period (around 1880) and recorded in World Weather records. Unfortunately, these data have missing. Monthly missing data are during World War II (1949-1949) and sporadically during the statistical period. Stations from neighboring countries due to the Parity criterion, solidarity and completeness of data in missing periods selected as base stations. Monthly precipitation of Ashgabat Station from Tajikistan and monthly rainfall of Sarakhs, Kooshkah, Bayram Ali, Kerki and Repetek from Turkmenistan were selected as independent variable in the making of Missing Rainfall in Mashhad. Three factors of distance to Mashhad station, correlation and existence of data in missing months were effective in selecting these stations. This research has fitted ten multiple regression models to monthly rainfall of Mashhad station and then the parameters of these patterns are optimized by genetic and Ant Colony algorithm. In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). Genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. Ant colony optimization algorithm (ACO) is probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations.
The repair of the monthly precipitation of Mashhad with these stations has been done with ten regression linear, semi-logarithmic and logarithmic regression models as follows. This was done with programming in the R-Studio environment. The parameters of the five selected patterns were optimized by evolutionary methods (genetic algorithm and anion colony algorithm). Simulation of these methods has been done with the help of MATLAB software 2017. The results showed that the genetic algorithm and Ant Colony methods Ratio of regression methods , dramatically increase the accuracy of estimating missing rain data. The lowest RMSE regression pattern is 9.79, which is optimized by genetic algorithm to 2.66 and by Ant Colony algorithm to 2.659.
Precipitation as one of the most important parameters of meteorology and climate, is basic factor in water resource management. This factor has a direct relation with the regional climate. The accuracy of simulating this parameter is very important due to its wide variation. Observation data at Iran's first synoptic stations from 1330 (1951) is available at the Iranian Meteorological Organization website. Old and long-term temperature and monthly precipitation data in five cities of Iran Including Mashhad, measured by the Embassy of the United States and Britain from the Qajar period (around 1880) and recorded in World Weather records. Unfortunately, these data have missing. Monthly missing data are during World War II (1949-1949) and sporadically during the statistical period. Stations from neighboring countries due to the Parity criterion, solidarity and completeness of data in missing periods selected as base stations. Monthly precipitation of Ashgabat Station from Tajikistan and monthly rainfall of Sarakhs, Kooshkah, Bayram Ali, Kerki and Repetek from Turkmenistan were selected as independent variable in the making of Missing Rainfall in Mashhad. Three factors of distance to Mashhad station, correlation and existence of data in missing months were effective in selecting these stations. This research has fitted ten multiple regression models to monthly rainfall of Mashhad station and then the parameters of these patterns are optimized by genetic and Ant Colony algorithm. In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). Genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. Ant colony optimization algorithm (ACO) is probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. This algorithm is a member of the ant colony algorithms family, in swarm intelligence methods, and it constitutes some metaheuristic optimizations.
The repair of the monthly precipitation of Mashhad with these stations has been done with ten regression linear, semi-logarithmic and logarithmic regression models as follows. This was done with programming in the R-Studio environment. The parameters of the five selected patterns were optimized by evolutionary methods (genetic algorithm and anion colony algorithm). Simulation of these methods has been done with the help of MATLAB software 2017. The results showed that the genetic algorithm and Ant Colony methods Ratio of regression methods , dramatically increase the accuracy of estimating missing rain data. The lowest RMSE regression pattern is 9.79, which is optimized by genetic algorithm to 2.66 and by Ant Colony algorithm to 2.659.

 
ارقامی، ن.ر.، سنجری، ن.، بزرگ نیا، الف، (1380): مقدمه ای بر بررسی های نمونه ای، چاپ چهارم، 435 صفحه.
خلیلی، ع.، بذرافشان، ج.، (1387): ارزیابی مخاطره تداوم خشک سالی با استفاده از داده های بارندگی سالانه قرن گذشته در ایستگاه های قدیمی ایران، مجلهژئوفیزیکایران، جلد 2، شماره 2.
رضائی‌پژند. حجت. بزرگ نیا. ابوالقاسم. (1381): تحلیل رگرسیون غیرخطی و کاربردهای آن. انتشارات دانشگاه فردوسی مشهد. 398 صفحه.
سوری، ع.، (1396): اقتصاد‌سنجی (پیشرفته) ج 2 همراه با کاربرد stata12 و eviews8 ، انتشارات فرهنگ شناسی، 1022 صفحه.
فرزندی، م.، رضایی پژند, ح.، ثنائی نژاد، ح.، (1393): ترمیم و گسترش 127 سال دمای ماهانه مشهد، مجله پژوهش های اقلیم شناسی، 5 (17 و 18): 123-111.
مطیع قادر، ح.، لطفی، ش.، سیداسفهلان، م.م، (1389): مروری بر برخی روش های بهینه سازی هوشمند، چاپ دانشگاه آزاد اسلامی واحد شبستر.
Dastorani, M. T., Moghadamnia, A., Piri, J., Rico-Ramirez, M., 2010, "Application of ANN and ANFIS models for reconstructing missing flow data." Environmental monitoring and assessment 166.,1-4: 421-434.‏
Dingman, S. L., 2002, Physical Hydrology, Second Edition, PRENTICE HALL.
Dipak, V. P., Bichkar, R. S., (2010): Multiple Imputation of Missing Data with Genetic Algorithm based Techniques, Evolutionary Computation for Optimization Techniques.
Ghahraman, B., Ahmadi, F., (2007): Application of Geo statistics in Time series: Mashhad Annual Rainfall, Iran-Watershed Management Science & Engineering. Vol. 1, No. 1. 
Jacob. D., Reed. D. W., Robson. A. J., (1999): Choosing a pooling group. Flood Estimation Handbook. Vol. 3. Institute of Hydrology, Wallingford, UK.
Ranhao, S., Baiping, Z., and Jing, T., 2008, A Multivariate Regression Model for Predicting Precipitation in the Daqing Mountains, Mountain Research and Development, 28(3):318-325.
Smithsonian Institution. (1927, 1934, 1947): World weather records, 1910-1920., 1921-1930., 1931 – 1940., Smithson. Miss C. Collect. 79,90,105. (Publication2913.,3216.,3803)
Ustoorikar, K., Deo, M.C., (2008): Filling up gaps in wave data with genetic programming, Marine Structures, 21, 177–195.
Yozgatligil, C., Aslan S., Iyigun, C., Batmaz, I., (2013): Comparison of missing value imputation methods in time series: the case of Turkish meteorological data, Theory Apply Climatology, 112:143–167.
Little, R. JA, Rubin. D., B., 2002, Statistical analysis with missing data. John Wiley & Sons, 408 pages.‏
Patil, D V, Bichkar, R S. 2010, Multiple Imputation of Missing Data with Genetic Algorithm based Techniques. IJCA Special Issue on “Evolutionary Computation for Optimization Techniques.
El Assaad H., Samé, A., Govaert, G., Aknin, P., 2016, A variational Expectation–Maximization algorithm for temporal data clustering, Computational Statistics and Data Analysis,103:206–228.
Preis,A., Ostfeld, A., 2008, A coupled model tree–genetic algorithm scheme for flow and water quality predictions in watersheds, Journal of Hydrology (Elsevier), 349: 364– 375.
Iqbal M, WEN J, WANG Sh, TIAN Hu, ADNAN M. 2018, Variations of precipitation haracteristics during the period 1960-2014 in the Source Region of the Yellow River, China. Journal of Arid Land, 10(3): 388-401.