Gold Standard Agreement Model for Precipitation Forecast in Paraná Using Bootstrap

Demand for quality weather forecasts has increased in the last decades, leading national meteorological centers to develop new forecasting models. These models have parameterizations which can produce different predictions for the same location and agrometeorological variable. In the state of Paraná Brazil, studies on rain forecasting are important for planning the soybean crop. The objective of this study was to compare, based on a gold-standard and using bootstrapping residuals, forecasts of total rainfall by virtual stations of the following centers: Canadian Meteorological Center (CMC), European Center for Medium-Range Weather Forecasts (ECMWF), National Centers for Environmental Prediction (NCEP) and Center for Weather Forecasting and Climate Studies (CPTEC). Gold-standard measurements were obtained from Meteorological System of Paraná (SIMEPAR) meteorological stations. The studied region was the state of Paraná, in October–March of the harvest years 2011/2012–2015/2016; forecast ranges were 24 and 240 hours. Knowledge Discovery in Databases (KDD), focused on data mining techniques, was the chosen methodology. In the data preprocessing stage, spatial and temporal stratification, cleansing and grouping were performed. For the comparisons, 24 h and 240 h weather forecasts were used, being grouped in five-day and ten-day periods, respectively, and coefficients of agreement with the gold-standard measure were calculated. The choice of forecast center should consider the geographic location of a certain pluviometric station, and the temporal range of the forecast, according to its measure of agreement with the gold standard measure. Spatial variations of forecasting centers were identified within the mesoregions, which suggests the employment of different forecasting centers in a certain mesoregion.


Introduction
Availability of water is an essential factor for the development of agricultural crops and with its prediction, in part by means of rainfall models, which might be useful in areas assisted by irrigation systems or as covariables in harvest prediction models. Agriculture is an important part of the productive chain and depends on environmental conditions. Soil and climate attributes largely determine the growth and development of plants (Meireles et al., 2007;Tamburino et al., 2020). Within the different crops, soybean agriculture stands out as a source of foreign exchange for Brazil, contributing an important share in Brazilian exports. The state of Paraná is the second largest soybean producer in Brazil. During the 2015/2016 harvest it produced 16.84 million metric tons, covering an area of 5.45 million hectares (CONAB 2016).
In soybean culture, water availability is important, mainly, during two developmental periods: germination-emergence and flowering-filling of grains. Flowering-filling is the period in which there is greater evapotranspiration of the crop but is also responsible for the definition of productivity. Therefore, precipitation forecasts, especially for areas that have irrigation systems, is important for an irrigation management which adequately considers evapotranspiration (Kullberg et al., 2017). During this first developmental period, both excess water and water scarcity are detrimental to good uniformity in the plant population. Thus, the advance knowledge of rainfall behavior in the period immediate after planting is important, given the hydric requirements of the germination-emergence period of soybean (Rezende et al., 2003;. In the state of Paraná, the Agricultural Defense Agency of Paraná (ADAPAR, 2017) establishes September 11-December 31 of each agricultural year as the proper period for sowing soybean crops. This guidance is necessary in view of the potential losses caused by the pest Phakopsora pachyrhizi, the causal agent of Asian rust. Thus, accurate precipitation forecasts of the October-March period are fundamental to the agricultural sector in the state of Paraná-Brazil.
In terms of forecast models of meteorological variables, a few names stand out worldwide. The Observing System Research and Predictability Experiment (THORPEX) is an international research program, sponsored by the World Meteorological Organization (WMO), which aims to improvements in one to fifteen days, high impact forecasts. One of its main components is The International Grand Global Ensemble (TIGGE). TIGGE provides for the cooperation of several global centers in the development of (sets of forecast models) "ensembles" (Bougeault et al., 2010;Huang & Luo, 2017;Colston et al., 2018). Ensembles are global and have variables such as air temperature, pressure, wind velocity, precipitation, soil moisture and surface temperature (Berrisford et al., 2011;Jha et al., 2018). SIMEPAR's surface hydro-meteorological telemetry network consists of a series of automatic meteorological and hydrological stations, throughout Paraná (SIMEPAR, 2018). Meteorological stations take climatic measurements Krüger & Rossi (2015) in an hourly-schedule and send data referring to several variables -air temperature, precipitation, wind speed and direction, among others.
Considering all the advances in meteorology technologies, many researchers have been analyzing forecast models (Cunningham et al., 2015;Shahrban, et al., 2016;Chang et al., 2017;Cáceres et al., 2018), employing stochastic representations with parameterized physical processes. This has improved the probabilistic predictive abilities of ensembles (Ollinaho et al., 2017), with positive predictive capacities in comparison to reference forecasts being shown up to the 240 hours range (Chessa & Lalaurette, 2001). Global climatic centers use models with particular parameterizations, which can produce different predictions for the same location and agrometeorological variable. Thus, comparison methods using a gold standard can assess the degree of agreement of each center's forecast with the reference measure (Zaki et al., 2012;Galea, 2013).
The objective of this study was to compare control forecast predictions of total precipitation made by the following climatic centers: Canadian Meteorological Center (CMC), European Center for Medium-Range Weather Forecasts (ECMWF), National Centers for Environmental Prediction (NCEP) and Center for Weather Forecasting and Climate Studies (Centro de Previsão de Tempo e Estudos Climáticos -CPTEC), with measurements made by SIMEPAR meteorological stations, using the comparison model proposed by Laurent (1998), which propose a measure of agreement between two or more methods of measurement of some quantity where a gold standard is present, in October 1-March 31 of the 2011/2012-2015/2016 harvest years, in the state of Paraná.

Material and Methods
According to Deepthi & Anuradha (2016), the Knowledge Discovery in Databases (KDD) methodology is suggested for the analysis of large databases such as those from CMC, ECMWF, NCEP, CPTEC and SIMEPAR. KDD is divided into three stages: Preprocessing, Pattern Extraction and Postprocessing. The flowchart adapted from Deepthi & Anuradha (2016), first two was specified (Fig. 1).

Integration Extraction
The data were stratified spatially by selection of the TIGGE base of geographic coordinates (55ºW, 48ºW, 27ºS, 22ºS) corresponding to the rectangle containing Paraná (Fig. 2). The pixels have a dimension of 0.5º x 0.5º, with 165 pixels in total. Values from each pixel were obtained by database interpolation. Paraná has 72 virtual stations, indicated by the centroids of pixels in the stratified area. Data extraction was based on virtual stations corresponding to the 39 SIMEPAR meteorological stations. Therefore, there are 39 points of correspondence between virtual stations and physical stations (Fig. 2).

Selection and Reduction of Data
The TIGGE database provides rainfall forecasts in the 24-360 hours range, starting from October 2006. Comparisons disregarded missing data. For the 240 h range, 18 predictions in 10-day groupings available for each harvest year (October 1-March 31) were used. Predictions came from the same centers that provided data in the 24 h range, with locations of virtual stations corresponding to SIMEPAR physical stations. In total, there were 90 10-day periods. Ten-day groupings for the gold standard measures were also performed. This process generated another 90 ten-day groups, for comparison with predictions within the 240 h range, from each station.

Pattern Extraction
During pattern extraction, the model for obtaining patterns of agreement between gold standard measures and the centers' predictions was specified. This procedure is characterized by the sequence of tasks related to data mining and to the data pattern detection (Devakumari & Punithavathi, 2018;Janani & Poonguzhali, 2018).
The comparison model used to evaluate the measure of agreement between two or more instruments in respect to a gold standard, or reference measure, was proposed by Laurent (1998). , in which, is the measurement error for the unity of the method j = 1, ..., p; is the measurement performed via the j-th method on the i-th unity; is the gold standard-based measurement on the i-th unity, with a mean of and a variance of ; considering independent of , with a mean of and a variance of .
This study considered n=165 5-day periods (or n = 90 10-day periods): i = 1,...,165(90). The methods are the precipitation forecast models (ECMWF, CMC, NCEP, CPTEC), therefore j = 1,...,4 (p = 4). The grid has a total of 39 pixels: k = 1,...,39. The model of (Equation 1) can be written in matrix notation (Equation 2). , in which, is the vector of measures p x 1 of the p approximate methods in the unit i; 1p is a vector ; is the vector of random errors of the p methods in the unit i, considering i = 1,...,n.
The coefficient of agreement proposed by Laurent (1998) -which is given by , in which is the gold standard variance and is the variance of the errors between the gold standard and the j-th approximate method -can be used to evaluate the measure of agreement between p approximate methods and the gold standard.
Considering the vector , in which q = p + 1 measurements made via gold standard and via approximate methods in unit i. Suppose that the random vectors are independent and identically distributed with a multivariate normal distribution with a means vector and covariance matrix , that is, , then it is verified that the random variables and are independent, with distributions and , respectively. In this case, the maximum likelihood estimator (MLE) of , and are given, respectively, by , and , in which (Galea, 2013). Therefore, the MLE of the coefficient of agreement between the approximate methods and the gold standard is given by , in which and global measure of concordance is given by (Laurent, 1998).
If the multivariate normality of the vector cannot be verified, an alternative is the non-parametric approach, as proposed by the bootstrap methodology (Chernick & LaBudde, 2011). In this case, errors calculated via model (2) are re-sampled, with repetition, to generate a pseudo-sample , in which are the errors obtained via bootstrap, with b = 1,...,1000 (Dalposso et al., 2016). The errors were used to calculate the coefficients of agreement and confidence intervals (Efron, 1982).
Pattern extraction was done by analysis of the coefficients of agreement . In this way, it was possible to identify, for each station and within the mesoregions, the centers with prediction ranges of 24 h (5-day) and 240 h (10-day) that had forecast values closest to the gold standard (SIMEPAR). The R software was used to develop a computational routine that calculates, element after element, the descriptive statistics, the bootstrap coefficients of agreement and their respective confidence intervals (R Core Team, 2020).

The Study for 24 h Range
Rainfall forecast data grouped in five-day periods, for the 24 h range, obtained from the CMC, ECMWF, NCEP and CPTEC centers, as well as precipitation records from k=39 SIMEPAR meteorological stations, were characterized by positive asymmetry and absence of normality with p-values equal to zero according to the Kolgomorov-Smirnov test (Fig. 3).
The data had discrepant values. The average of the data is close to zero. The presence of mean values close to zero is related to the occurrence of drought days during the period. On the other hand, discrepant data points indicate events of intense precipitation that in some cases reaches values close to 150 mm, ECMWF (Fig. 3B), or above 250 mm (Fig 3E), SIMEPAR (gold standard).  (k=39) were compared with the model (Equation 2); the gold standard consisted of the 5-day groupings of the SIMEPAR meteorological stations data. As the data did not present normality, a resampling bootstrap with 1000 repetitions was used to generate 1000 pseudo-samples. The bootstrap residuals method, which defines the residuals as observed value less the estimated value of the adjusted regression model in each bootstrap repetition, was applied. The linear models were adjusted, using R software's 'lm' function, a part of its 'stats' package.
The estimated coefficient of agreement at each location (k=39) between the model predictions of each center CMC, ECMWF, NCEP and CPTEC with the SIMEPAR (gold standard), defined by CMC, ECMWF, NCEP and CPTEC, are presented in Fig. 4A to Fig. 4D, respectively, and the global measure of agreement ρG is presented in Fig. 4E.
The indices from 1 to 39 in Fig. 6 are the centroids of the virtual stations corresponding to SIMEPAR. The value of the agreement in each station is indicated by the color scale that varies from 0.2 to 0.76 (Fig 4). The lighter colors indicate a higher degree of agreement with SIMEPAR (gold standard). For example, in station 5 (central north), the highest agreement was observed for the ECMWF model with a value of 0.7; in station 13 (central north) the   At each location, the Efron (1982) percentile was calculated to construct a non-parametric interval with 95% confidence and plotted with the respective agreement (Fig. 5). The largest Efron interval corresponds to the location with the highest precipitation, with a gold standard close to 250 mm, suggesting the occurrence of extreme events may widen the interval. The equality of the coefficients of agreement of the centers was rejected by the test proposed by Cahoy (2010) for homogeneity of bootstrap variances. The agreements of each center show different values according to the spatial location of the stations spatial (Fig. 4). Considering the principle of parsimony, the standard deviations and Efron's percentile (1982) were calculated for the linear regression parameters , and for the coefficient of agreement in each center, considering the stations with the lowest and highest agreement, defined by MIN and MAX , respectively (Table 1).
In each station, 1000 bootstrap pseudo-samples were used, and 1000 linear models were adjusted. Parameters and of linear models with higher and the standard deviation, respectively, suggest similar variations in these parameters in stations with lowest and highest agreement ρ. The standard deviations of the bootstrap concordances ρ with an amplitude of 0.03 indicate low dispersion around the mean in the stations. Predictions of stations from centers with highest gold standard agreement were identified (Fig.  6). In the mesoregions, stations having some percentage of coverage inside the polygon are to be considered (Table 2). In the same mesoregion one can obtain two or more models producing forecasts with highest agreement with the gold standard, depending on the geographical location, for example, the northwest (NO) where 40% of the selection frequency was of the NCEP, 40% was of the CPTEC and 20% of the ECMWF; the southwest (SO) with 25% of ECMWF, 25% of NCEP and 50% of CPTEC. In Paraná, percentages of selection by center were: 17.94% (CMC), 30.77% (ECMWF), 17.95% (NCEP) and 33.33% (CPTEC). The value of the coefficient of agreement, as in Fig. 4, in each station was indicated by the color scale that varies from 0.2 to 0.76 (Fig. 6). The lighter colors indicate a higher degree of agreement with SIMEPAR (gold standard). For example, in station 14 (Center-east), the selected model was the CPTEC with highest agreement, value of 0.5, in relation of the others (CMC, ECMWF and NCEP) in the same mesoregion; in station 5 (Central north) the selected model was the ECMWF with highest agreement, value of 0.7. Thus, in each station the selected model and the respective agreement were indicated in the color scale (Fig. 6).  ISSN 2166-0379 2021 Data grouped in ten-day periods, for the 240 h range of rainfall forecasts, obtained from the CMC, ECMWF, NCEP and CPTEC centers, as well as precipitation records in SIMEPAR meteorological stations, were characterized by positive asymmetry and absence of normality according to the Kolgomorov-Smirnov test (Fig. 7). This result was similar to the one obtained in Fig. 3.

Journal of Agricultural Studies
The 10-day data have discrepant values, similar to what is shown in Fig. 3, with a mean close to 50 mm. The discrepant points indicate intense rainfall in the period, in some cases reaching values above 300 mm, ECMWF (Fig. 3B), or above 450 mm, SIMEPAR (gold standard) (E), revealing the occurrence of extreme events.  The data did not present normality and a bootstrap resampling, analogous to that of section 3.1 (24 h range), with 1000 repetitions, was used to generate 1000 pseudo-samples to which linear models were adjusted. Error variance of the linear models was used to calculate the coefficient of agreement between the predictions of each center (CMC, ECMWF, NCEP and CPTEC) with the gold standard in each location (k=39), and also the global measure of agreement (Fig. 8).
The indices from 1 to 39 in Fig. 8 are the centroids of the virtual stations corresponding to SIMEPAR. The value of the coefficient of agreement in each station was indicated by the color scale, as in Fig. 4, that varies from 0.2 to 0.76 (Fig. 8)  Each location's 95% confidence Efron (1982) non-parametric interval is shown alongside the respective agreement in Fig. 9. The largest Efron interval corresponds to the location with the highest precipitation, with a gold standard near 450 mm, indicating a relationship between amplitude and precipitation intensity. The equality of the coefficients of agreement of the centers was rejected by the test analogous to that of section 3.1 (24 h range). The agreements of each center show different values (Fig. 8) according to the spatial location of the stations spatial as in (Fig. 4). The principle of parsimony was used analogous to that of section 3.1 (24 h range), and in the stations with the lowest and highest agreement, defined by MIN e MAX , respectively, the standard deviations and confidence intervals were obtained for regression parameters , and for coefficient of agreement (Table 3).
For each station, 1000 bootstrap pseudo-samples were used, and 1000 models were adjusted.
Parameters and of models with higher and the standard deviation, respectively, suggest similar variations in these parameters in stations with lowest and highest agreement ρ. The standard deviations of the bootstrap agreement with an amplitude of 0.03 indicate low dispersion. For each station, predictions of centers with highest gold standard agreement were identified (Fig. 10). In the mesoregions, stations having some percentage of coverage inside the polygon were considered (Table 4). Analogous to that of section 3.1 (24 h range), in the same mesoregion one can obtain two or more models producing forecasts with highest agreement with the gold standard, depending on the geographical location, for example, the northwest (NO) where 80% of the selection frequency was of the CMC, 20% was of the ECMWF; the southwest (SO) with 25% of CMC and 75% of ECMWF. In Paraná, percentages of selection by center were: 17.94% (CMC), 30.77% (ECMWF), 17.95% (NCEP) and 57.14% (CPTEC). The value of the coefficient of agreement, as in Fig. 6, in each station was indicated by the color scale that varies from 0.2 to 0.76 (Fig. 10). The interpretation is analogous to that of Fig. 6. For example, in station 14 (Center-east), the selected model was the ECMWF with highest agreement, value of 0.66, in relation of the others (CMC and ECMWF) in the same mesoregion; in station 5 (Central north) the selected model was the ECMWF with highest agreement, value of 0.67. Thus, in each station the selected model and the respective agreement were indicated in the color scale (Fig. 10 Figure 10. Selection of precipitation forecast centers with the highest gold standard agreement (240 h range), for each station with 1: CMC, 2: ECMWF, 3: NC

Discussion
In the state of Paraná, dates for soybean planting differ according to mesoregion, although most of the sowing (65.16%) for the 2011/2012 harvest year took place between the third ten-day of October and the first ten-day of November. During this period, it was also observed that most of the soybean cultivation area of Paraná (65.46%) had its maximum vegetative development in January and harvest in March (53.92%). The soybean culture contributes to the production chain and is dependent of the climatic events that can negatively affect the productivity and price of the product (da Costa et al., 2021). Variations in temperature and precipitation can lead to losses in soybean productivity Lu et al., 2021) Different dates for sowing in the spring-summer period Yokoyama, et al., 2020) notwithstanding, it is possible to observe that sowing generally follows the ADAPAR (2017) recommendation, motivating the study of total rainfall in this period, considering the water requirements of the plant in the germination-emergence developmental phase (Rezende et al., 2003;Araya et al., 2015;Battisti & Sentelhas, 2015), and losses that may occur due to extreme events. Losses in crop productivity can be also associated with soil nutrient dynamics which are influenced by soil moisture whose main source is rainfall (Yang et al., 2021). Thus, the results obtained contribute to actions aimed at balancing water availability and variables associated with soil fertility.
Extreme precipitation events were identified in predictions for the 24 h and 240 h ranges, and also in the respective 5-day and 10-day groupings of SIMEPAR observations (Figs. 3 and 7). These observations agree with the results which indicate for the southern region of Brazil, a higher percentage of extreme precipitation events in comparison to extreme drought events (Cera & Ferraz 2015;Zandonadi et al., 2016;. In the analysis of total rainfall in the state of Paraná, a non-parametric approach was used, with comparison based on a gold standard, allowing for the evaluation of agreement between approximate measures (models predictions) and reference measurements with negligible error (gold standard). This approach, which aims to determine the measure closer to a gold standard, were presented by (Lin 1989;Grünheid T et al., 2014;Darroudi et al., 2017;Oliveira et al., 2020). This methodology allows the selection of precipitation forecasting centers, considering the geographic location of the virtual stations. It suggests the use of forecasts that have greater agreement with gold standard reference observations. Eldardiry and Habib (2020) obtained similar relationship about extreme precipitation events in a study carried out on the precipitation frequency estimates. They concluded that the regional spatial bootstrap approach surpasses the pixel based approach in terms of robustness for outliers identified in the maximum annual series.
For the 24 h range, the mean amplitude of agreements was 0.01 and the standard deviation was 0.03. The coefficient of variation indicates low dispersion for forecasts made by all centers. In the 240 h range the mean amplitude was 0.06 higher than in the 24 h range, which was expected due to the longer temporal range. Agreements in this range presented average dispersion, except in the case of CPTEC, in which low dispersion was shown. In Tables 3 and  6, presenting selection predominance of centers in each mesoregion, it is noteworthy that in the 24 h and 240 h ranges there was a greater predominance of the ECMWF in the north-central and north pioneer mesoregions; for the other mesoregions there were different percentages of selection predominance. This result suggests that the use of forecasts of a certain center for a mesoregion must consider not only the geographical location of the virtual station, but also the temporal range of the forecast. When overlapping measures of agreement with mean annual precipitation in Paraná (Silva et al., 2015;Nascimento et al., 2017), it was not possible to identify a relationship, indicating that agreements are not associated with the annual average rainfall pattern. The determination of a single model as the best in all the studied macro-regions was not possible. In other studies, on precipitation predictions in different regions, this also occurred, as reported by Bhardwaj et al. (2021) when concluding that, in general, the performance of a multimodal linear regression model, based on precipitation data from six centers, remains better than any single member model from day 1 to day 10 in rainfall forecasts for the Indian subcontinent.

Conclusion
The choice of forecast center should consider the geographic location of a certain pluviometric station, and the temporal range of the forecast, according to its measure of agreement with the gold standard measure. Spatial variations of forecasting centers were identified within the mesoregions, which suggests the employment of different forecasting centers in a certain mesoregion.