Spatial and Multivariate Analysis of Soybean Yield in the State of Paraná-Brazil

In this work, the aim was to evaluate the existence of spatial association of the municipal average official soybean yield (t ha) with agrometeorological data and vegetation indices. The information was observed by ten-day periods, in crop years 2010/2011, 2011/2012 and 2012/2013 in the State of Paraná. Local univariate spatial correlation (LISA index), as well as global bivariate correlation (L statistics) were calculated. With this study, we identified neighboring municipalities with high yield in the West as well as municipalities that are located with low-low yield Northwestern, showing positive spatial autocorrelation (( , significative (p-value < 0.05). In addition, there were differences between seeding times in different regions, and climate irregularity during flowering periods and grain filling in crop year 2011/2012 throughout the state, which caused a large drop in production in all municipalities of the state of Paraná. The analysis of local spatial association showed that in the three crop years, the Northwest region presented a significant low yield potential of soybean (p-value < 0.05). In addition, it was observed that the period from the 3 ten-day period of October to the 2 ten-day period of January was essential for the soybean cycle in the different regions of the state, since this period encompasses the critical phases of crop. Differences were also observed between the crop years studied, regarding the agrometeorological variables, which affected soybean yield mainly in the Western region of Paraná – Brazil.


Introduction
Brazil stands out in world agriculture, as the second largest soybean producer and the expansion of production of this oilseed in the country is due to the physical conditions of the soil, the favorable climate for its production, the advancing its cultivation in cerrado areas, and both the development and availability of scientific knowledge and technologies to the productive sector (Nunes, 2013).
Due to the complex interrelationships that soybean yield has with other factors of this agricultural activity, it is difficult to define standards for their respective production areas (Araújo et al., 2013). Studies in the literature demonstrate the importance of the investigation regarding the existence of a relationship or an effect of agrometeorological characteristics with soybean yield, aiming to establish a better management of the production of this agricultural commodity (Araújo et al., 2013;Araújo et al., 2014;Bilbas et al., 2017).
One way to analyze the relationship between soybean yield and agrometeorological characteristics of the productive region is to consider the spatial variability of these variables and the spatial relationships between them. In this context, methods of spatial area statistics are used, which simultaneously consider information regarding the value and geographical location of variables. Buss et al. (2019) verified the spatial variability of soybean productivity in relation to soil attributes using multivariate and geostatistical techniques. The result showed the efficiency of the statistical methods covered in the study.
The verification of a possible pattern of association between two georeferenced variables can be measured by measures of global bivariate autocorrelation, such as the bivariate Moran index and the L statistic (Lee, 2001;Lee, 2004). The purpose of these measurements is to show whether the values of one variable observed in one region reveal a relationship with the values of another variable observed in neighboring regions. Studies using these indicators have shown for the state of Paraná : the soybean crop profile at different seeding dates (Dalposso et al., 2013, Cima et al., 2018; how municipalities relate spatially to soybean production; which are the main municipalities producing soy (Prudente et al., 2014); and the analysis of the spatial relationship of soybean yield with agrometeorological characteristics (Grzegozewski et al., 2017) In this context, regarding the simultaneous analysis of a set of variables, the multivariate analysis techniques can be helpful in finding patterns generated by the set of variables. In the Western region of Paraná , Araújo et al. (2013) carried out a cluster analysis using information on the local Moran index (LISA) applied to agrometeorological and soybean yield data in crop year 2005/2006, and the formation of groups of similar municipalities was identified regarding their spatial distribution in relation to soybean yield and all agrometeorological elements analyzed.
Moreover in this focus of analysis, Vidigal et al. (2018), when analyzing the spatial arrangement of soybean productivity in the municipalities of Rio Grande do Sul in the period of 1990 to 2013 trough exploratory spatial analysis, concluded that soybean productivity was not homogeneous throughout the analysed period.
In addition to yield information, vegetation indices have been employed as indicators to discriminate areas with soybean cultivation (Grzegozewski et al., 2016), as well as areas with other types of land use (Araújo et al., 2014), and to analyze the conditions of soybean crop growth on a regional scale.
Thus, the objective of this work was to analyze the spatial variability of the official municipal average yield of soybean (t ha -1 ), and its spatial relationship with agrometeorological data and vegetation indices, grouped by ten-day periods in the crop years 2010/2011, 2011/2012, and 2012/2013 in the State of Paraná , through an analysis of univariate spatial autocorrelation (local) and bivariate spatial correlation (global), as well as using multivariate techniques.

Determination of the Study Area and Data Acquisition
The study area comprises the 399 municipalities of the state of Paraná , Southern Brazil, limited by the coordinates 22º 29' S and 26º 43' S and 48º 2' W and 54º 38' W. This area was divided into 10 mesoregions of according to IBGE (Brazilian Institute of Geography and Statistics); these mesoregions are presented in Grzegozewski et al. (2017).
The images used to obtain the vegetation index (EVI) were obtained from the MODIS product database (Moderate Resolution Imaging Spectroradiometer) of the Brazilian Agricultural Research Corporation (EMBRAPA, 2010) which is related to the MOD13Q1.5 product coupled to the Terra and Aqua satellite, "Tile" h13v11 from EVI (Enhanced Vegetation Index), which has a resolution of 250 m and a temporal resolution of 16 days.
The mapping of soybean cultivated areas for crop years 2010/2011 and 2011/2012 was provided by Souza et al. (2015), and for crop year 2012/2013 was provided by Grzegozewski et al. (2016). From these mappings, the pixel values of the average EVI vegetation index of the crop years studied were extracted for each municipality, and the details of the procedure adopted to calculate this average EVI for each municipality are described in Grzegozewski et al. (2017).  Figure 1A).
It was necessary to adjust the agrometeorological data of 303 virtual stations for a spatial scale of 399 municipalities. This procedure was adopted to provide information on agrometeorological characteristics in the same municipalities for which soybean average yield data were collected. Thus, for each ten-day period and for each municipality under study, an average value was calculated, using the values of the virtual stations of each agrometeorological variable, contained in the center, near or around the polygon, as exemplified in Figure 1B.

Analysis of Spatial and Multivariate Statistics
In order to verify the existence of spatial dependence in each study variable, the Moran Global Index ( -Equation 1) (Moran, 1950) was calculated and, based on the obtained value, a global statistical test was performed, in which the null hypothesis is the existence of random distribution of the variable under study, and the alternative hypothesis defines the existence of the association of similar or different values to a defined significance level (Anselin, 1995;Bailley, 1995;Nunes, 2013 ). Moran index is a value in the range [-1, 1], in which: it indicates a perfect negative autocorrelation, indicates the absence of autocorrelation, and indicates a perfect positive autocorrelation.
, ISSN 2166-0379 2020 wherein is the sample size, and are the attribute values observed respectively in areas and ( ); is the average value of the attribute in the region under study; and is a measure of spatial proximity between populations (municipalities) and ( ) (Moran, 1950;Anselin et al., 2004). Given that in this work we adopted the Queen type contiguity, so that a common boundary is shared with by analyzing this through the nodes (vertices) ( Figure 1B). Otherwise, ; to .

Journal of Agricultural Studies
In addition, for each variable, Moran scattering maps were constructed to identify points with a positive or non-positive spatial association (Anselin, 1995). This map was constructed based on the Moran scattering diagram, which is divided according to four quadrants: High- Although able to point to the general tendency of data clustering, Moran's I is a global measure and therefore does not reveal local patterns of spatial association; that is, they are generally hidden by global autocorrelation statistics.
Thus, in order to capture local association patterns (spatial clusters or outliers), the value of the indicator in the subarea composed of neighbors was also calculated for each study variable, using the function Local Indicator of Spatial Association (LISA) (Anselin, 1995) (Equation 2).

;
( ; wherein is the sample size, is the value of the attribute observed in area ( ); is the average value of the attribute in the region under study; and is a measure of spatial ISSN 2166-0379 2020 proximity between populations (municipalities) and ( ), calculated by the Queen contiguity criterion type, described in Equation 1.

Journal of Agricultural Studies
Using the local Moran index values, a map named LISA MAP (Nunes, 2013) was prepared for each variable, in which it can be considered that there is no spatial autocorrelation if the p-value is greater than 0.05. Otherwise, autocorrelation is significant. Then, the areas were classified considering the significance levels equal to 0.05, 0.1, and 0.01.
We also calculated a bivariate spatial association measure called statistics (Equation 3) (Lee, 2001). This measurement integrates Pearson's linear correlation coefficient with the univariate global spatial association measure of the Moran Index (Equation 1) and aims to observe whether there is spatial correlation between two variables, that is, whether the values of a variable in a given region are associated with the values of another variable observed in neighboring regions (Lee, 2001;Anselin et al., 2004). , wherein is the sample size, and are the values respectively of the attributes and observed in area ( ); and are the average values of attributes and , respectively, in the region under study; and is a measure of spatial proximity between populations (municipalities) and ( ), calculated by the Queen contiguity criterion type.
In addition, for each crop year under study and considering yield, agrometeorological variables and vegetation indices in each ten-day period, a principal component analysis was performed. This analysis was performed in order to understand the relationship between these variables, as well as reducing the dimensionality of the number of variables. The selection criterion for choosing the main number of components was to obtain a minimum percentage of 70% of the total variability of the variables (Furtado, 1996).
For the selected main components, their respective scores were calculated in each municipality and with these values the analysis of clustering of the municipalities was performed. The k-means cluster method was used, and the number of formed groups was defined by the following measures: C and Dunn indices, and Silhouette coefficient (Dunn, 1973;Rousseuw and Silhouette 1986).
All statistical analyses were performed using R Studio version 3.3.5 software (R Development ISSN 2166-0379 2020 Core Team, 2018). The representation on maps was performed in the software ArcGIS version 10.3 (ESRI, 2018).

Results and Discussions
The In addition, in these two crop years, the global Moran scattering map indicates that there is a low-low distribution of yield in the Center-South and Northwest of the state. This result indicates that the lowest yield values were concentrated in these regions of the state.
Moreover, for the three crop years, the Northwest region had clusters of municipalities with a significant similarity in yield (p-value 0.05, Figure 3). Concomitant with Moran scattering map, this result indicates that this region is formed by municipalities with low productive potential, as well as a region susceptible to summer, which harms soybean yield, corroborating Cima et al. (2018).   ISSN 2166-0379 2020 found that there is a positive and significant spatial interdependence in the regions of the State, and that the local standards are different.

Journal of Agricultural Studies
The spatial variations found for the EVI variable in the crop years ( Figure 4) can be explained by the differences between seeding dates and vegetation cover during the period under analysis. This variability is evident in the State, which has almost half of its municipalities with positive spatial autocorrelation containing crops in these periods, and the other half of the municipalities with negative spatial autocorrelation in the municipalities with no high vegetation index. In all crop years, the positive spatial autocorrelation of the EVI in the municipalities of the Western region occurred from the second decade of November, with the formation of clusters with significant local spatial association (Figures 4 and 5). The municipalities belonging to the West region presented similar characteristics for the EVI (Figure 4).
This result is associated with the early start of seeding, due to the increased levels of vegetation indices in these municipalities, demonstrating the difference between this and other regions of the state.
However, the variation of vegetation indices in the state is associated with water balance, rainfall, light intensity, and temperature. This can be verified by spatial autocorrelation for such variables during the period studied (Figures 4 to 6). It was also verified that soybean seeding occurred in municipalities with Cw, Te, and Gr had positive spatial autocorrelation [Q (+/+)] from 2 nd ten-day period of October (Figure 4 to 6).
The spatial autocorrelation of the EVI in 2011/2012 was similar to the previous year (Figures 3  and 4). In addition, it is observed that the Western region began soybean cultivation before the other regions, due to the occurrence of appropriate water balance for seeding. However, the positive spatial autocorrelation distribution of Cw, conducive to crop development, was between the 2 nd and 3 rd ten-day period of November, initial phase and implementation of seeding in the rest of the state. In December, a negative and significant spatial autocorrelation of Cw prevails (Figures 4 and 5) in regions where the crop is more advanced in this period, therefore being caused by water deficit. Due to the low rainfall during the crop cycle, in the first two ten-day periods of November, there was negative Cw and spatial autocorrelation with distribution ( Figure 5), especially in regions where soy was already in a vegetative stage. However, the irregularities that occurred did not harm the crop in its critical moments (emergence-germination and flowering-fruiting). Therefore, it was not considered detrimental to the final yield in almost the entire state, except for the Northwest region, where significant low-low spatial autocorrelation was found in several ten-day period of the three crop years, combined with the low productive potential of the region, which directly affected yield.

Journal of Agricultural Studies
Regarding solar radiation (Gr) in 2010/2011 and 2012/2013 (Figure 6), there were significant low-low (Q (-/-)) spatial autocorrelation values in part of the State, i.e., due to the positive soil water balance, evapotranspiration was lower with low light. In general, Gr presented high-high spatial autocorrelation (Q (+/+)) in the Northwest, West, Midwest, and North-Central regions in almost all three-day periods, which corroborates SEAB (2015).
In crop year 2011/2012, the autocorrelation values of Gr had adequate levels for seeding from the 3 rd ten-days period of September ( Figure 5). However, due to the Cw [Q (-/-)] that occurred between the 2 nd and 3 rd three-day periods of November homogenized the seeding in the State regions, besides the high light intensity causing high evapotranspiration, which affected the photosynthetic activity and, consequently, the development of the crop.
In each region of the state, temperatures for the three crop years studied were similar between the ten-day periods ( Figure 6). The Northwest and Western parts throughout the period showed high/high spatial autocorrelation [Q (+/+)], with several regions forming significant clusters (Figure 7). These results indicate that these regions present municipalities with high temperature values surrounded by municipalities with the same characteristic.
In the rest of the state, the temperature presented low/low spatial autocorrelation [Q (-/-)], with several regions forming significant clusters. That is, they form a cluster of municipalities with low temperature values. This finding corroborates temperature data obtained by IAPAR (2011) for Paraná .
The problems that occurred with the climate in the 2011/2012 crop year caused yield losses in almost the entire state, except for the Central-Eastern region (cities of Ponta Grossa, Castro, and Tibagi) whose production was higher than the national average, with the following results: 3.35, 3.55, and 3.30 ha -1 . In these municipalities, none of the agrometeorological variables varied during the cycle (Figures 3 and 5), so soybean yield was not compromised as in other regions of Paraná .  ISSN 2166-0379 2020 The results showed that the climatic problems that occurred in the 2011/2012 crop years compromised soybean productivity in this period, which can influence the profitability of that culture (Figures 6 and 7).

Bivariate Global Spatial Correlation
Analyzing the bivariate spatial correlation of yield with agrometeorological variables and vegetation index EVI (Figure 8) for the three crop years, we observed that there was a negative spatial correlation between yield and EVI of the first ten-day period of August until November 1 st , which means there were municipalities with low and/or high productivity in relation to the EVI surrounded by municipalities with similar characteristics. This occurs because vegetation index levels are too low or absent. From the 2 nd ten-day period of November, the spatial correlation between these variables is positive. However, for 2011/2012, for almost every ten-day period, the spatial correlation index between yield and EVI was lower, compared to other crop years, indicating a problem in crop development.  ISSN 2166-0379 2020 municipalities were more heterogeneous in relation to the 2012/2013 crop year. It is identified, therefore, that from the 2 nd ten-day period of January, the crop enters the phase of senescence and harvest, which corroborate Araújo et al. (2014).
For crop years 2011/2012 and 2012/2013, there was an inverse spatial association of yield and Cw (Figure 8), from the 1 st ten days of August to the 2 nd ten-day period of October. That is, in this period, the soil was with water deficit, which affected the seeding. In 2010/2011, there was less variation of the indices between these two variables, with values close to 0, from the 2 nd ten-day period of September to October 2 nd . For these two variables, both low (close to zero) and negative values of the bivariate spatial correlation index were identified from November 1 st to December 2 nd , as well as from January 1 st to February 1 st . However, for this crop year, this result did not affect soybean yield even though they occurred during a period of crop development. However, this result influenced the crop year 2011/2012, which affected the development of critical phases of the plants and soybean yield, which corroborates Caetano et al. (2018), who concluded that the climatic conditions influence soybean productivity.
The results obtained between yield and Gr, as well as between yield and Te, bivariate spatial

Cluster Analysis
To perform the cluster analysis, the dimensionality was firstly reduced by principal components (PCs). Five main components were generated for the 2010/2011 and 2011/2012 crop years, considering 70% of the total variability of the studied variables; and three main components for crop year 2012/2013, considering 76% of the total variability of the studied variables. Considering the relevance of each variable in the formation of CPs and the values of the scores of each CP in each group, Table 1  According to this profile and the groups formed (Table 1 and Figure 9), in the three crop years there was a well-defined formation of Western Paraná (group 1, Figure 8). Cluster analysis showed that this region has higher temperatures in common for all crop years, which is favorable to crop development. Thus, especially in crop year 2012/2013, the West region stood out for the concentration of municipalities with high yield values, and this was also observed by the Moran scattering map (Figure 3).
The West region also had high solar radiation values in the 2010/2011 and 2012/2013 crop years, as well as low vegetation index values in the first ten-day periods of each crop year (until November) to high EVI values in the December to January for the 2010/2011 crop year, and from November to December in the two subsequent crop years. However, in 2011/2012, in addition to high temperature values, low Cw values and high Gr values were identified in the West in most of the ten-day periods (from crop implantation to fruiting phase). Thus, this set of climatic results in 2011/2012 was responsible for a drop in yield in these Western municipalities (Figure 3), which are considered the largest producers in the state, that corroborates Kahlon et al. (2018).
Similarly to the Western region, the group of municipalities in the Northwest region of the state (and part of the Northern region in the 2011/2012 and 2012/2013 crop years) (group 2, Figure 9, and Table 1) is characterized by high values of temperature and solar radiation. However, this region differs from the Western region in the period of occurrence of low and high EVI values, which can be explained by the differences between seeding dates and vegetation cover in these two regions. These results corroborate the results presented by the Moran scattering maps of these variables (Figures 3 and 5).
The cluster analysis also shows that the North Central, North Pioneer and West Central regions formed group 3 in the three crop years (Figure 9). There were high EVI values in this region from December to January ( Two other groups that were formed (groups 4 and 5, Figure 9) correspond to the Southern and metropolitan region of the state of Paraná . These regions have lower temperatures, which were not ideal for seeding, and during the development period the high radiation values (Gr) in crop year 2012/2013 impaired the yield obtained in several municipalities of these regions ( Figure  3). In addition, in most of the municipalities in these regions, there were high EVI values until November (Table 1), according to Rosa et al. (2017). Compared to the crop years analyzed, the importance of identifying climate anomalies during the crop cycle is highlighted. Differences are seen for crop years affecting yields mainly in the Western region of the state. The lack of rainfalls associated with high temperatures and radiation caused losses in yield in this region. This fact can also be attributed to the climate differences in the growth seasons, being delimited by clusters obtained in the different crop years studied.
Therefore, the clusters analysis made it clear that soybean productivity depends on climatic conditions, mainly on precipitation associated with different temperatures at different growing seasons, which will determine the profitability of that crop.

Conclusions
Greater similarities were identified in the crop years 2011/2012, with lower productivity from the analysis of significant positive spatial autocorrelation ( , with p-value < 0.05, that is, due to climate variations, the spatial autocorrelation indices were similar in the municipalities of Paraná . Another finding was in relation to the Northwest region, where municipalities with low yield were identified in the three crop years studied, thus highlighting the low productive potential of the region. In the bivariate analysis, the essential period, for the entire states, was from 3 rd ten-day period of October to the 2 nd of January in the cycle of the culture, showing a significant positive spatial correlation (p-value < 0.05). In the cluster analysis, the importance and variability of climate for the State were observed, causing losses mainly in crop year 2011/2012.
The results also showed that in the three crop years there was a well-defined formation from the Western region of Paraná -Brazil. The cluster analysis made it clear that this region presented higher temperatures for all crop years, which is ideal for the development of the soybean culture. Finally, it is evident that in the 2012/2013 crop years, the West region stood out for the concentration of municipalities with high productivity.