Concordance Modeling With a Gold Standard for Variables From the Three-Parameter Gamma Distribution

A way to compare two or more measurements for the same random variable can be achieved by using a negligible error reference measurement, which is called the gold standard, obtained by consolidated measurement methods. This paper presents a new methodology for comparing measurements in the presence of a gold standard with random variables from the multivariate three-parameter (shape, scale, and location) gamma distribution. The errors between gold standard measures and approximate measures have a gamma difference distribution with the same three parameters of the gamma distribution. The concordance measurements were obtained by mean of a coefficient, which measures the degree of agreement as a ratio between the variances of the gold standard and the errors. The developed methodology is illustrated with climatic data which is divided into four ranges. The measurements analyzed are rainfall forecasts of the following four national centers: Canadian Meteorological Center (CMC), European Center for Medium-Range Weather Forecasts (ECMWF), National Centers for Environmental Prediction (NCEP), and Center for Weather Forecasting and Climate Studies (CPTEC). The forecast range was 240 hours for the West mesoregion of Paraná – Brazil, and in the October 1–March 31 period of the 2010/2011 –2015/2016 harvest years. The period was selected because it is related to soybean crop development in the region and because several crop estimation models use rainfall forecast data in this period. The methodology applied spatially indicated the center to be selected in each geographical location according to each rainfall range interval. The gamma model fit well with the data and is an alternative to the normal one for modelling rainfall, in particular to estimate concordances between rainfall forecasts and the gold standard, which are used to improve the selection of rainfall forecast centers.


Introduction
The three-parameter (shape, scale, and location) gamma probability distribution according to Johnson et al. (1994) has several applications in stochastic modeling and hydrology. The three-parameter (shape), (scale), and (location) gamma probability density function is defined by Mathal and Moschopoulos (1992) as: ( 1) in which , , , and is the gamma function. The function (1) is central in defining the various forms to the multivariate gamma distribution whose marginal are gamma distributions. The literature presents several particular application cases for the multivariate gamma distribution that include bivariate cases whose discussion can be found in McKay (1934), Cherian (1941), Jensen (1970), Royen (1991), Mathal and Moschopoulos (1992), and references cited therein.
For evaluation of the degree of agreement (concordance) for measurements of a random variable with gamma distribution which were obtained by approximation methods, one can use the standard model (Lord & Novick, 1968;Donner, 1986;Fleiss, 1999;Galea, 2013) of the reproducibility for measures (agreement) with respect to a reference measure, called gold standard, , in which, is the random observation measurement error for the unity of the method j = 1, ..., p, is the measurement performed via the j-th method on the i-th unity, is the gold-standard-based measurement on the i-th unity, with a mean of and variance of ; considering independent of , with a mean of and variance of .
The model (2) can be written in matrix notation (Laurent, 1998) in which , is a vector of ones and . Let , a vector , with , of the measurements performed via gold standard and the approximation methods on the i-th unity.

Gamma Model Specification
Suppose that the random variables in the vector are independent and identically distributed (iid) with gamma distribution, i.e., and , . Let , with . The joint distribution of is a -variate gamma distribution Mathal and Moschopoulos (1992) defined by the density function given by in which , , , , for and zero elsewhere.
The multivariate gamma distribution given in (4) has several important properties (Mathal & Moschopoulos, 1992) some of them are: i) The marginal distribution of are three-parameter gamma with density function given in (1), i.e., , which and for .
ii) The mean and variance of are, respectively, given by iii) The correlation matrix of R has a positive correlation between and and is given by (7) in which .
iv) The covariance of and , for is (8) v) The covariance matrix of the vector denoted by , is given by (9) in which and , .
Suppose , , then in which is the Jacobean of the transformation of order given by then, note that the , thus one has Journal of Agricultural Studies ISSN 2166-0379 2020 Figure 1. Location map of the West mesoregion of Paraná , containing the ANA physical meteorological stations, virtual stations corresponding to CMC, ECMWF, NCEP and CPTEC A temporal stratification was performed, selecting October 1-March 31 of the 2010/2011 -2015/2016 harvest years as the temporal range. The range was selected because the state of Paraná is one of the largest soybean producers in Brazil, and the Agricultural Defense Agency of Paraná -Brazil (ADAPAR, 2018) establishes the proper period for sowing soybean crops of each agricultural year. Thus, the period was selected because it is related to soybean crop development in the region and because several crop estimation models use rainfall forecast data in this period. Agrometeorological variables directly influence crop yield estimation models (Battisti et al., 2018). In soybean culture, water availability is important, especially, during two development stages: germination-emergence and flowering-grain filling. (Rodrigues et al., 2017).
The rainfall forecast models of the TIGGE base used in this research were Canadian CMC, European ECMWF, North American NCEP and Brazilian CPTEC. The reference datasets, gold standard, were obtained of daily precipitation from ANA meteorological stations. Missing data were disregarded in the correspondences.
Descriptive statistics are presented in Table 1, for the data grouped in ten-day periods, corresponding to the 13 pixels in West mesoregion of Paraná -Brazil, and in the October 1-March 31 period of the 2010/201 -2015/2016 harvest years. The period includes soybean sowing in Paraná (Meotti et al., 2012;Bornhofen et al., 2015).
Several studies reveal the relation between spatial variability of the rainfall and the crop yield (Bezabih & Di Falco, 2012;Moraes et al., 2014;Jajoria et al., 2015). The crop yield estimation models are sensitive to variable precipitation (Cera et al., 2017). Missing ten-day periods were not found in ANA stations. The number of missing ten-day periods of centers are: 2 in CMC, 1 in ECMWF, 10 in NCEP, and 79 in CPTEC. The value of 100 mm was used as an indicator of extreme events (Zandonadi et al., 2016).
The highest incidence of extreme precipitation events was identified in pixel 12, and the pixel 1 was the least affected by these events. The coefficients of variation (CV) of the gold standard with the lowest value of 36.80% were obtained in pixel 7, and the highest value of For the forecast centers, the CV with the lowest value of 55.60% was obtained for CPTEC in the pixel 13 and the highest value of 73.30% for NCEP in the pixel 6. The values of the CV indicated heterogeneity of the data from the ANA stations and the TIGGE base.
The water stress is the main cause for losses in the soybean culture (Confalone et al., 2010;Nunes et al., 2016;Souza et al., 2016). For non-irrigated areas, the water deficit caused, mainly, during drought periods can increase losses in the agricultural crops (Nunes et al., 2016;Pugh et al., 2019). The spatial variability of the soybean culture and the several agricultural crops occur according to the water availability (Iglesias et al., 2012;Vivan et al., 2013;Zanon et al., 2016). Thus, crop yield estimation models to increase the reliability should consider a climatic center, which rainfall forecasts are closer to the gold standard measures in a study region. The main source of water for an agricultural system comes from rainfall, which can be modeled using a gamma distribution (Sadiq, 2014;Cristaldo, 2017;Hasan et al., 2019).
The three-parameter multivariate gamma distribution given in (4) can be used for modelling a group of variables with gamma distribution. In order to compute the degree of agreement of j-th approximate measure with the gold standard, coefficient given by equation (21), it is required to obtain the variance of the difference of random variables with gamma distribution according to the model (Mathal, 1993) given in (17).
Descriptive statistics are presented in Table 2, for the agreements , in which (centers) in the 13 pixels for each range of rainfall with data grouped in ten-day periods. The ranges were defined as follows: range 1 (0,00 |-61,09 mm), range 2 (61,09 |-122,18 mm), range 3 (122,18 |-183,27 mm), and range 4 (183,27 |-244,36 mm). The spatial variability of the concordances, in the study area, is indicated for the ranges 1, 2, 3, and 4 in Figure 2. The difference of concordances and the distinct spatial variation for each range in the pixels were evident.
The comparison method with a gold standard, which aims to evaluate the ratio between the variances of the gold standard and the errors Lin (1989), Feng et al. (2015), and Chabert et al. (2019), was used for the ranges in each pixel.  The results presented in Figure 2 suggest that a calibration procedure must be applied before a precipitation forecast is used. According to Li et al. (2008), calibration procedures are required to remove bias and increase accuracy of spatial data. The selection of a forecast model should consider the highest value of concordance with gold standard (Harris et al., 2001;Barnhart et al., 2007).

Range
The percentage of selected centers, which were indicated by circles in Figure 2, for each range were as follows: For the concordances in each range, which were indicated by circles in Figure 2 95% confidence limits for were estimated, using the bootstrapping pairs method according to Chernick and LaBudde (2011). The values of the 95% lower confidence limits (LCL) and 95% upper confidence limits (UCL) are presented in Table 3. Therefore, the selection of rainfall forecasts from the four CMC, ECMWF, NCEP, and CPTEC centers for use in yield estimation models should follow the Table 3 for crops such as soybean grown in the study period.
The qq-plots are presented in Figure 3 for the three-parameter gamma distribution, using the gold standard data grouped in ten-day periods and the corresponding 240 h range of the centers CMC, ECMWF, NCEP, and CPTEC. The three-parameter gamma distribution presented better data fit when compared to the normal distribution.
299 Table 3. Estimated concordances with lower and upper 95% confidence limits for the selected centers, which were indicated by circles in Figure 2, considering each range in the 13 pixels of the West mesoregion of Paraná -Brazil  Theo. Quan. Figure 3. QQ-plots for the three-parameter gamma distribution, considering the gold standard and the selected centers for each range in the 13 pixels of the West mesoregion of Paraná .

Conclusions
The estimated population variances for the gold standard measures and the errors of measurements, which were obtained with their respective distribution, allowed detecting spatial variability of the concordances in the study area. The detected variability was independent of forecast centers CMC, ECMWF, NCEP, and CPTEC. The geographical location and the range of precipitation should be considered when choosing a forecast center.
The estimated concordances using multivariate gamma distribution suggest that a calibration procedure, which aims to increase accuracy of spatial data, must be applied to the forecast data, before a precipitation forecast is used in a crop yield estimated model.
The crop yield estimates for soybean should use predictions from selected centers at locations within that pixel. For rainfall forecasts to be used in a yield estimation model of other crop cultures, a concordance analysis is required and should be applied according to crop development cycle to select the center.
The correspondence between the reference measurements, which are obtained from meteorological stations, and the climate model data from centers CMC, ECMWF, NCEP, and CPTEC requires that the geographical location and the precipitation range are respectively matched. A spatial correspondence of reference measures with climate model data can be obtained using the precipitation average value of the meteorological stations with distance from the centroid of pixel and which covers the pixel area. For the correspondence of precipitation range, the ten-day grouped period can be applied.
The confidence intervals for the concordances of selected centers CMC, ECMWF, NCEP or CPTEC indicated small variability for each precipitation range. The ranges 3 and 4 of precipitation with values between 122.18 and 244.36 mm, in general, presented the highest agreements with gold standard measures. The highest concordances in ranges 3 and 4 suggest that the forecast models used by centers CMC, ECMWF, NCEP, and CPTEC are more suitable for detection of extreme precipitation events, above 100 mm.
The multivariate gamma and the gamma difference distributions were used as an alternative to the normal distribution. The fitted gamma distribution for precipitation data from gold