Spatial-temporal Analysis of Soybean Productivity Using Geostatistical Methods

To have information about the soybean productivity over the crop years is essential to define strategies to increase profits and reduce costs and most important to reduce environmental impacts. One form of monitoring is the use of Geostatistical methods, which allow us to obtain maps with more accurate predictions. In this paper, an area of 127.16 ha was studied during six crop years between 2012/2013 and 2017/2018. We found that productivity values vary between crop years, mainly due to uncontrollable climatic factors. The removal of influential points caused changes in the predicted values showed in the maps, and the use of scaled semivariograms allowed us to obtain similar maps to those obtained considering the model without influential points, then there was no need to exclude observations. The use of a model with replicates helped to identify regions where productivity was lower. The use of explanatory variables allowed us to elaborate a more accurate thematic map in the 2017/2018 crop year, which was well evidenced by the prediction standard error map.

one of the most important crops worldwide. Brazil is currently the second-largest soybean producer, standing behind only the USA, and contributes approximately 34.7% of global production (Schwalbert et al., 2020). Given the importance of soy for agribusiness, research is often carried out to develop productivity maps, one of the most widely used precision farming techniques (Vega et al., 2019).
Because the production of soybean has a high cost due to the use of fertilizers, pesticides, machines, and seeds (Costa and Santana, 2018), the use of productivity maps help in identifying regions with greater productive potential. These regions can be managed differently, optimizing inputs, improving profits, and reducing environmental impacts (Vatsanidou et al., 2017). Among the many methodologies used to create thematic maps, geostatistical methods have been widely applied in precision agriculture (Lamour et al., 2020).
Although the result of a geostatistical analysis is, in most cases, a map, it can include a high number of techniques that aim to obtain more accurate maps when compared to maps generated by deterministic methods. The literature presents works that produced maps using, for example, local influence techniques (Grzegozewski et al., 2013 andUribe-Opazo et al., 2021), scaled semivariograms (Wendpap et al., 2015), spatial linear models with replicates (De Bastiani et al., 2017), and bootstrap methods (Dalposso et al., 2019). Among these works, we also find geostatistical studies effectively related to soybean productivity in a spatial-temporal context. Bakhsh et al. (2000) and Bottega et al. (2017) studied three crop years, and De Bastiani et al. (2017) studied five crop years. In these analyses, obtaining sample data becomes a complicated, time-consuming, and often uncertain task, given that the soybean culture often depends on unpredictable and uncontrollable factors.
In this context, the objective of this paper was to use geostatistical methods to monitor an agricultural area cultivated with soybean for six consecutive crop years. We investigated the impact of removing influential points and the use of scaled semivariograms in the elaboration of maps. We designed a map of soybean productivity ranging from the 2012/2013 to 2016/2017 crop years. Also, we have considered the use of explanatory variables to obtain a more accurate map in 2017/2018, given the missing data this crop year.

Figure 1. Geographical information of the study area
According to the Köppen classification, the region's climate is type Cfa with an average annual temperature of 21º C. The soil is classified as dystrophic Red Latosol, with a clay texture (EMBRAPA, 2013). A defined sample configuration in this experimental is lattice plus close pairs (Chipeta et al., 2017), with a minimum distance between the points of the regular grid of 141 meters and in some locals, randomly selection, sampling was carried out with minor distances of 75 meters and 50 meters between pairs of points, totalizing 74 sampling points (in each crop year). These samples are located and geo-referenced by means of a signal receiver as a global positioning system (GPS), in a UTM spatial coordinate system. Because of the issues that occurred during the collection in the 2017/2018 crop year, only 37 sampling points were collected this crop year (Fig. 1).
The productivity of soybeans was collected based on the amount of grains harvested from the plants distributed in two rows, along a meter in the length of each point, representing the plot. After sorting, grains are weighed for each plot, checking the water content and correcting for 13% moisture, and then converted in t ha -1 .

Geostatistical Analysis
To study the spatial dependence was considered an isotropic and second-order stationary stochastic process where , with as a bi-dimensional Euclidian space, with an random vector of the response variable, which corresponds to the spatial locations known in , with . It can be written matrix form as , in which is an matrix of columns with full rank, with its lines as a vector of explanatory variables in position , is a vector of unknown parameters to be estimated, and , is an vector of random errors. It is assumed that the random errors have a zero mean, that is, , and the variation between points in space is determined by a covariance function , for .
The spatial modeling in depends on the structure of the covariance matrix , in which, for , of the Gaussian stochastic process .
The covariance function is used in the spatial dependence study of the stationary process and is specified by a vector , of the form , (Mardia and Marshall, 1984), in which, is the identity matrix, is the nugget effect or variance error ; is the contribution or dispersion variance ; , is a symmetrical matrix, whose elements are in function of , De Bastiani et al., 2015). The parametric form of covariance matrix occurs for many stationary and isotropic processes, in which the covariance is defined according to the covariance function .
In the covariance function , the variance of the Gaussian stochastic process is given by . On the hypothesis that the errors present normal distribution of n-variate probabilities , we have that the spatial linear model has an n-variate normal distribution, , in which the vector of unknown parameters of the model , can be estimated by maximizing the logarithm of the likelihood function, defined in (1): ISSN 2166-0379 2021 (1)

Journal of Agricultural Studies
The parameters of the model were estimated, considering the maximum likelihood method (ML). In modeling considering the scaled semivariogram (see Subsection 2.5) the ordinary least squares method was used. To observe the behavior of the data , we performed an exploratory analysis, identifying the discrepant points (outliers). To determine the spatial dependence structure of the data in each agricultural year, it was constructed omnidirectional experimental semivariograms using the Matheron estimator. To model the spatial dependency structures, we used models of the Maté rn family with different shape parameters (Maté rn, 1986). The best fits were chosen by cross-validation (Faraco et al., 2008) and the predictions of unsampled data were obtained by ordinary kriging. In the analysis involving explanatory variables (see Subsection 2.7), kriging with external drift (KED) was used.

Local Influence on the Response Variable
In modeling studies, the removal of points (Jackknife) is one of the best-known techniques to assess the impact of a possible influential observation on the parameter estimates. However, a problem that can occur with the individualized withdrawal of observations is the effect of failing to detect jointly influential points. There are several robust methods for detecting influential points which low computational cost (Peña and Yohai, 1999). Cook (1986) presents a very innovative proposal in the field of diagnosis, where he proposes to evaluate the joint influence of the observations on small disturbances in the model, instead of the evaluation by ISSN 2166-0379 2021 the individual or joint removal of points. This methodology is called local influence and was widely accepted by researchers in the area.

Journal of Agricultural Studies
The local influence method proposed by Cook (1986)  De Bastiani et al. (2015) considered the perturbation scheme , which is based on Zhu et al. (2007), in which is the response vector, and is the covariance matrix. To evaluate the existence of influential points, we used the graph versus (order of data) presented by Poon and Poon (1999) as a diagnostic measure.

Scaled Semivariogram
The literature (Mercante et al., 2003, Vieira et al., 2010and Wendpap et al., 2015 brings studies on spatial and temporal processes that have used the technique of scheduling semivariograms as a proposal for modeling dependence in space and time. This technique consists of estimating parameters that define the structure of spatial and temporal dependence by fitting a single model to the semivariograms, called scaled semivariogram. The experimental semivariance function of soybean productivity (t ha -1 ) from crop years 2012/2013 to 2017/2018 was scaled using the equation , , (Vieira et al., 2010). In this equation, is the number of periods (crop years) of the study, indicates the distance between point pairs, represents the original semivariance function of the i-th period, and is the sample variance of the observations of the i-th period.

Spatial Linear Model With Replicates
De Bastiani et al. (2017) proposed modeling the structure of spatial and temporal dependence using linear spatial models with multiple independent replications and using the method of maximum likelihood for parameter estimation. They consider as  et al. (2017) and the samples of soybean productivity obtained from these crop years, we fit the models of the Maté rn family with different shape parameters , and the best model is chosen by cross-validation (Faraco et al., 2008).

Kriging with External Drift
Kriging with external drift (KED) (Wackernagel, 1995) is a methodology that can be used to obtain more accurate thematic maps since, in the model , the process mean (β) is described by auxiliary variables, unlike in ordinary kriging, which considers the constant mean. In the case of KED, predictions are made as in ordinary kriging, with the difference that the residual covariance matrix is extended with auxiliary predictors (Webster and Oliver, 2007).
As explained by Hengl et al. (2003), predictions in a new location are made by , where is the vector of the KED weights, and is the vector of the initial observations.
Considering the soybean productivity data (t ha -1 ) collected at the same 37 points sampled in 2017/2018 as explanatory variables, the estimation of the model parameters was performed using the maximum likelihood (ML) method and the criteria for selecting the geostatistical model for the covariance matrix was cross-validation (Faraco et al., 2008).

Prediction Standard Error Maps
The standard error is considered a summary measure of prediction precision since it characterizes how close they are, on average, to the unknown real values. The calculation of prediction variances is necessary to elaborate prediction standard error maps. Kyriakidis and Goodchild (2006) present the theoretical concepts for determining the prediction variance ISSN 2166-0379 2021 when interpolation is performed by universal kriging, while Lloyd and Atkinson (2001) described the concept for ordinary kriging. This study presents the prediction variance when interpolation is performed by KED.

Map Comparison
It was compared the soybean productivity maps (t ha -1 ) using the error matrix and the calculate the Kappa index ( ) (Cohen, 1960) following the levels of agreement proposed by Landis and Kock (1977) which considers the similarity is low if , the similarity is moderate if and the similarity is high if . It was prepared the soybean productivity (t ha -1 ) difference maps in absolute value, highlighting the differences between the map considering all points in 2017/2018 and the map generated using KED.

Computational resources
The analyses were done using the Microsoft R Open software (Microsoft R Core Team, 2021). The methods were implemented by the authors using resources available in the geoR package (Ribeiro Jr. and Diggle, 2001).

Exploratory Analysis
The average soybean productivity obtained in the agricultural year 2012/2013 (3.26 t ha -1 ) was satisfactory, considering that it was close to the average obtained in Paraná (3.35 t ha -1 ), the Brazilian state that received the second-highest average productivity in this crop year (CONAB, 2019). In the crop year 2013/2014, the average soybean productivity (4.23 t ha -1 ) was higher than the average in the state of Roraima (3.18 t ha -1 ), the largest producer in this crop year (CONAB, 2019). The average productivity in the crop year 2014/2015 was low when compared to the national (2.30 t ha -1 ) and state (3.29 t ha -1 ) averages in the same crop year (CONAB, 2019).
The low productivity in the crop year 2014/2015 can be explained by the occurrence of water stress that occurred during crop development. In the crop year 2015/2016, the average productivity (2.46 t ha -1 ) was lower than the state average (3.09 t ha -1 ), which is also a reflection of climatic adversities, which, in turn, limited production and caused losses of 10% of the harvest (SEAB, 2016). The productivities from crop years 2016/2017 and 2017/2018 (3.06 t ha -1 and 3.07 t ha -1 , respectively) were low when compared to the state (3.73 t ha -1 and 3.5 t ha -1 ) and national (3.36 t ha -1 and 3.39 t ha -1 ) productivities.

Spatial Analysis
In the crop year 2012/2013, the geostatistical model selected by cross-validation was Maté rn model with shape parameter k=2 (M2.0), presented a range of 101.25 m and spatial dependency index (SDI) of 1.24%, indicating a weak spatial dependence (Table 1) (Table 1).
In the last crop year studied (2017/2018), cross-validation indicated the exponential model (M0.5) as the best fit, which presented a moderate spatial dependence (SDI=6.01%) and the largest spatial dependency radius (326.23 m) among all fitted models. indicates that the spatial continuity quickly disappears (Vieira and Gonzalez, 2003). When analyzing the ranges of geostatistical models (Table 1), the range varies considerably (260.98 m) over the studied agricultural years, indicating a lack of temporal stability in the dependence radius of the spatial structure of the data. This fact can also be observed in the works conducted by Bakhsh et al. (2000) and Bottega et al. (2017), who obtained a variation range of 91 m and 248 m, respectively.

Local Influence Analysis
Local influence analysis (Fig. 3)  As stated by Dalposso et al. (2012), influential samples can change a decision in the determination of geostatistical models or the construction of thematic maps. Regarding the sample elements of soybean productivity identified as outliers (Fig. 2), no point was classified as influential. The wide-spread and non-regularity of these sample points in the initial three crop years stand out when analyzing the positioning of outliers and influential points in Fig. 1. In crop year 2012/2013, sampling point 28 (4.51 t ha -1 ) was classified as an outlier and is close to the central region of the studied area. Influential points 5 (2.89 t ha -1 ) and 61 (3.61 t ha -1 ) are in the southwest and northeast of the studied area, respectively. ISSN 2166-0379 2021 In the crop year 2013/2014, sampling point 7 (5.76 t ha -1 ), classified as an outlier, is in the southwest, while the influential points 15 (3.80 t ha -1 ) and 69 (3.97 t ha -1 ) are in the southeast and north of the studied area, respectively. In the crop year 2014/2015, the outlier point 33 (3.17 t ha -1 ) is in the west, and the influential points 42 (2.24 t ha -1 ) and 55 (2.12 t ha -1 ) are in the central and northwest regions, respectively. From crop years 2015/2016 to 2017/2018, there is a tendency for outliers in a stretch that extends from northwest to southwest. In the crop year 2015/2016, sampling point 66 (0.67 t ha -1 ), classified as an outlier, is in the northwest region, very close to sampling points 54 (2.42 t ha -1 ) and 67 (2.71 t ha -1 ), classified as influential.

Journal of Agricultural Studies
In the crop year 2016/2017, the sampling points classified as outliers 69 (1.79 t ha -1 ), 65 (1.67 t ha -1 ), 67 (1.66 t ha -1 ), and 68 (1.52 t ha -1 ) are also in the northwest region, above the points 34 (2.05 t ha -1 ) and 54 (2.92 t ha -1 ), classified as influential in the crop year 2016/2017, and points 34 (3.24 t ha -1 ) and 35 (3.77 t ha -1 ), classified as influential in 2017/2018. It is essential to highlight the sampling points classified as influential in two consecutive harvests, namely, The fact that outliers are not influential points can occur, as explained by Tóth et al. (2013), considering that the concepts are different. An outlier is an extreme value that does not follow the general tendency. However, this point will not necessarily produce changes in the model, while an influential point is one that, when excluded, causes a significant change in parameter estimates. Therefore, it was decided to conduct new statistical modeling by removing potentially influential sample elements (Table 2). In the crop year 2012/2013, after removing potentially influential points, cross-validation indicated the M0.5 model as the best fit, with a range of 144.87 m and weak spatial dependence (Table 2). Similar to what occurred in the initial modeling (Table 1), we identified the Minf model as the best fit in most of the studied agricultural years (83%) when the influential points were removed, highlighting a moderate spatial dependence in the crop years 2014/2015 and 2017/2018 (Table 2).
Although the removal of the influential points did not change the model selected by ISSN 2166-0379 2021 cross-validation in the crop year 2014/2015, the degree of spatial dependence changed from weak to moderate and the range passing from 270.74 m (Table 1) for 220.45 m (Table 2). In crop year 2015/2016, the removal of the influential points provided an increase in range from 207.55 m (Table 1) to 231.05 m (Table 2), and the degree of spatial dependence remains weak.

Journal of Agricultural Studies
Comparing the fitted models without the influencing points (Table 2) with the complete models (Table 1), it is observed that the parameter estimates were affected for the crop years in which cross-validation maintained the Gaussian model as the best fit (namely from crop years 2013/2014 to 2016/2017). For example, in the crop year 2016/2017, the estimated nugget effect of the complete model was 0.066 (Table 1) and 0.184 in the model without the influential points (Table 2). This small difference caused a change in the radius of spatial dependence (from 186.19 m to 178.87 m). However, the interpretation of these models remains similar because both have a weak spatial dependence.
In a similar analysis, considering the crop year 2012/2013, the removal of the influential points provided a considerable change. In addition to the difference in the models selected by cross-validation, the spatial dependency radius increased by 43.62 m. These changes corroborate the results found by Uribe-Opazo et al. (2012), in which the points identified as influential affected the parameter estimates.

Scaled Semivariogram
The purpose of considering a scaled semivariogram to model the spatial variability of soybean productivity in the studied crop years is to obtain more accurate estimates, with a view to better characterizing spatial variability. As explained by Silva et al. (2018), one of the advantages of the scaled semivariogram is that several semivariograms can be drawn on the same graph. Although the sample set is not small (Fig. 1), depending on the pre-defined distances (lags) used to calculate the semivariogram, the amount of semivariances obtained may be low. As in this case, the parameters estimates were obtained by least squares, the results obtained may be inaccurate since the regression models can produce biased estimates (Dalposso et al., 2016).
After adjusting the models to the scaled semivariogram, the cross-validation values indicated the M0.5 as the best adjustment, with parameters , , and a range of 247.09 m. The strong spatial dependence stands out when analyzing the parameters of this model.

Comparisons Between Maps
The soybean productivity maps of the studied crop years were elaborated using the scaled semivariogram to observe the behavior of soybean productivity in non-sampled locations and quantify the impacts generated by the exclusion of influential points (Fig. 4).
Comparing the 2012/2013 maps (Fig. 4), it was observed that the region of the sampling point identified as an outlier (point 28 in Fig. 1) was more evident in the maps obtained without the influential points and by the scaled semivariogram. Visually, it was verified a higher similarity between the map generated without the influential points and the map created with the scaled semivariogram. This similarity is evidenced by the Kappa index ( ) since, according to the ISSN 2166-0379 2021 classification established by Ladis and Kock (1997), the similarity is regular ( ) when comparing the map obtained with all points with the map obtained without the influential points, and too the similarity is low ( ) when comparing with the map obtained from the scaled semivariogram. The similarity is good ( ) when comparing the map obtained without the influential points with the one obtained from the scaled semivariogram.

Journal of Agricultural Studies
A high similarity ( ) was prominent in the crop year 2013/2014 between the map obtained with all points and obtained without the influential points. However, circular regions formed close to the sample points stand out, a phenomenon not observed in the map obtained by the scaled semivariogram. The circular regions close to the sampling points as explained by Menezes et al. (2016), represent the phenomenon known as the "bull eyes effect." This phenomenon was also observed by Dalposso et al. (2018) and occurs when the geostatistical model has a range close to the minimum distance between sample points. Because the models used to create the maps above had a range of 65.25 m (Table 1) and 77.16 m (Table 2), respectively, and the shortest distance between points was 49.1 m, the presence of circular regions is justified.
In the crop year 2014/2015, the highest calculated was 0.76 when comparing the map obtained with all points and the map obtained without the influential points, indicating a very good similarity. In crop year 2015/2016, all comparisons between maps indicated a good similarity, with emphasis on the correlation between the map obtained without influential points and the map obtained by the scaled semivariogram, which provided the highest Kappa index ( ).
In crop year 2016/2017, there is a good similarity between the map obtained with all points and the map obtained without influential points ( ) and a moderate similarity between the map obtained without influential points and the map obtained by the scaled semivariogram ( ). In the last crop year studied 2017/2018, the removal of the influential points (points 34 and 35 in Fig. 1) provided a considerable change in the thematic map, resulting on the map obtained with all points and the map without the influential points being considered as presenting low similarity ( ). ISSN 2166-0379 2021 Figure 4. Maps of soybean productivity (t ha -1 ) in the studied area

Journal of Agricultural Studies
The differences found between the maps obtained with all points and maps without the influential points in 2012/2013, 2014/2015 to 2016/2017, and 2017/2018 allow us to state that the exclusion of influential points causes changes in the levels of soybean productivity. This fact corroborates the results obtained by Uribe-Opazo et al. (2012) in the elaboration of maps for soybean productivity. The authors concluded that the use of techniques to evaluate the influence of observations should be part of the geostatistical analysis. ISSN 2166-0379 2021 The high similarity observed between the maps obtained without the influential points and the maps obtained by the scaled semivariogram indicates that using the scaled semivariogram is an alternative to obtain representative maps. A similar result was found by Wendpap et al. (2015). They obtained maps using scaled semivariograms classified by the Kappa index as presenting high similarity when compared to maps obtained by linear spatial models with replicates.

Spatial Linear Models with Replicates
A temporal comparison of the maps (Fig. 4) shows that some regions have similarities. For example, the northwestern region can be identified as with lower productivity, which is more , and a range of 163.44 m. This model has a strong spatial dependence, and the thematic map of soybean productivity associated with it ( Fig. 5a) shows the northwest and southeastern regions as presenting the lowest productivity values. The areas between the west and southwestern regions also stand out for presenting the highest productivity values. However, in most of the areas (71.43%), the soybean productivity (t ha -1 ) ranged between 3.012 t ha -1 and 3.121 t ha -1 .
Because the thematic map of soybean productivity from 2012/2013 to 2016/2017 (Fig. 5a) presents the average behavior of productivity over a long period, we expect the possibility of defining zones of more reliable management. This is possible because, as explained by Fleming et al. (2000), considering the production history allows the identification of different management zones in the agricultural area. In practice, the northwest and southeastern regions had the lowest productivity values. Therefore, a different treatment in these regions tends to increase productivity in future harvests. It is worth mentioning that small regions with an indicator of lower productivity, scattered in the studied area (Fig. 5a), can hinder a localized application. As highlighted by Al-Kaisi et al. (2016), this is one of the problems related to maps that have small regions, resulting in difficulty in defining management zones for a localized application of fertilizers. ISSN 2166-0379 2021 . 4) is less reliable than the thematic productivity maps of the other agricultural years since it is made with a lower number of sample points.

Journal of Agricultural Studies
Finding that the prediction standard error is lower near the points of interpolation and that it gradually increases with the distance of the points (Fig 5b) is a characteristic of this type of map. Similar results were reported by Meusburguer et al. (2012) and Olea (2018).

Prediction Model
An alternative to obtaining more accurate productivity estimates in 2017/2018, in a position within the studied area, is to develop a Gaussian spatial linear model considering soybean productivity information from previous agricultural years (2012/2013 to 2016/2017) as explanatory variables. After fitting the models, the cross-validation values indicated the spatial linear model as the best fit (Equation 2): ( It was adjusted this model considering the Maté rn covariance function with parameters , , , , and a range of 610.40 m. The structure has a moderate spatial dependence (SDI = 15.14%), and its spatial dependence radius is 87% greater than that of the model generated only with the sampling points collected in 2017/2018 (326.23 m). The soybean productivity map (t ha -1 ) for this model (Fig. 6a) shows apparent differences when compared to the map generated only with the sampling points collected in 2017/2018 (Fig. 4). ISSN 2166-0379 2021  Numerically, these differences are quantified by the Kappa index ( ), which indicates a low similarity between the maps. Visually, the differences between both maps are highlighted in the map of the productivity differences (t ha -1 ) in absolute values (Fig. 6b). The differences between both maps are lower than 0.106 t ha -1 in most places where there were sample points in 2017/2018 (Fig. 6b). However, the differences are bigger (Fig. 6b), where data were not collected in 2017/2018 (highlighted in Fig. 1).

Journal of Agricultural Studies
Comparing the predicting standard error maps ( Fig.7a and Fig.7b), it is evident that the map obtained by the model considering the information from previous crop years (2012/2013 to 2016/2017) is more accurate given the lower standard errors.
The lower precision of the thematic map of soybean productivity (t ha -1 ) in 2017/2018 (Fig. 4) is shown in the prediction standard error map (Fig. 7a). The reduction in the number of sample points (50% less) implied a considerable area in which the prediction errors are high (last class of the legend). In this sense, using a linear spatial model considering the productivity information of previous agricultural years as explanatory variables was useful for elaborating a more accurate thematic map. This was verified in the prediction standard error map of the soybean productivity (t ha -1 ) in 2017/2018 generated using KED (Fig 7b).  4) and (b) Prediction standard error map of the soybean productivity (t ha -1 ) in 2017/2018 generated using KED (Fig 6a)

Conclusion
The results showed that in the same agricultural area, soybean productivity could vary considerably, often due to uncontrollable factors such as those related to climatic events. It was found that the applied methodology was efficient in detecting influential points, and the removal of influential points affected the estimates of the parameters that define the spatial dependence structure and influence on the construction of the maps, which can promote changes in a thematic map. However, the use of scaled semivariograms allows us to obtain representative maps without excluding sample points. Although soybean productivity has shown a non-homogeneous behavior between crop years, we identified regions where productivity was lower (northwest and southeast of the studied area), which was shown in the map generated with replicates. The elaboration of the prediction standard error maps was essential to visually highlight that regions without sampling points present less precise values, showing that the use of explanatory provides predictions with a higher level of accuracy.