Reduction of Sample Size in the Soil Physical-Chemical Attributes Using the Multivariate Effective Sample Size

Financial investment with collection and laboratory analysis of soil samples is an important factor to be considered when mapping agricultural areas with soybean planting. One of the alternatives is to use the spatial autocorrelation between the sample points to reduce the number of elements sampled, thus restricting the collection of redundant information. This work aimed to reduce the sample size of this agricultural area, composed of 102 sample points, and use it to analyze the spatial dependence of soil macro- and micro- nutrients, as well as the soil penetration resistance. The agricultural area used in this study has 167.35 ha, cultivated with soybean, which the soil is Red Dystroferric Latosol, and the sampling design has used in this agricultural area is the lattice plus close pairs. The reduction of the sample size was made by the multivariate effective sample size ( ESS multi ) methodology. The studies with the simulation data and the soil attributes showed an inverse relationship between the practical range and the estimated value of the univariate effective sample size. With the calculation of ESS multi , the sample configuration was reduced to 53 points. The Overall Accuracy and Tau concordance index showed differences between the thematic maps elaborated with the original and reduced sampling designs. However, the analysis of the variance inflation factor and the standard error of the spatial dependence parameters showed efficient results with the resized sample size.


Introduction
Throughout economic cycles, the Brazilian agribusiness has shown to be fundamental to the country's development, in addition to ensuring a prominent position capable of influencing the international market. Currently, soy (Glycine max (L.) Merrill) is the main agricultural product in Brazil and, in recent years, it has shown continuous growth in production and in planted area (Balbinot Junior et al., 2017). Agricultural expansion around the planted area is encouraged by the favorable climate and topography and by competitive hectare prices in some states in the North and Northeast of the country (Bolfe et al., 2016).
In the last decade, the area planted with soybeans in Brazil went from 23.4 million hectares to 36.8 million hectares, an increase of 57.2% (CONAB, 2020), whereas production increased from 68.9 million tons to 124.2 million tons of this oilseed, representing an increase of 80.2% (CONAB, 2020). Paraná ranks among the states with the highest soybean productivity in the country and contributed approximately 17% of the national planted area of this crop, as well as 18% of the national production of this grain between the 2009/10 and 2019/20 crop years (CONAB, 2020). These prove the importance of the commodity in the national and state context.
The chemical and physical properties of the soil can affect root growth and the production of sensitive crops, such as soybeans (Debiasi et al., 2013). Thus, knowledge about the spatial variability of soil attributes is essential for the future development of soybeans and should be considered in crop management (Sanches et al., 2019).
The theory of regionalized variables is considered as the basis of geostatistics, in which, from a set of georeferenced sampling elements, the spatial dependence structure is analyzed and interpolated thematic maps are constructed that represent the spatial variability of physical-chemical attributes of the soil (Cressie, 2015). However, the efficiency of geostatistical techniques in the spatial characterization of soil properties is directly conditioned by the quality of the soil sampling performed (Cherubin et al., 2014).
properties in the study area (Benedetti et al., 2015). In the study of spatial data there are several sample configurations in the literature, such as simple random, stratified, regular (square, triangular, hexagonal), lattice plus close pairs, and lattice plus in fill (Diggle and Lophaven, 2006). The sample configuration must consider mainly the shape of the area, the relief characteristics, and the available resources, which are mainly influenced by the costs of laboratory analyses (Gao et al., 2016).
The costs of collecting and analyzing samples of soil attributes have led to the development of many studies within the scope of sample resizing, aiming to reduce sampling costs, considering a minimal loss of information in spatial prediction. Among these, we can mention from the optimization algorithms (Guedes et al., 2016;Wadoux et al., 2017;Maltauro et al., 2019) to the use of the Effective Sample Size (ESS) . The calculation of the effective sample size considers the effect of spatial autocorrelation between the sampled points collected. To calculate the value of the effective sample size, a previous sample survey of the physical-chemical attributes of the soil in the desired agricultural area is necessary (Griffith, 2005;Vallejos and Osorio, 2014).
Studies and applications of effective sample size have been developed, especially with a univariate approach and space-time models (Griffith, 2008;Acosta et al., 2016). However, in the agricultural context, several soil attributes are generally considered in the study of spatial variability, making it impossible to obtain a different sample resizing for each variable.
Therefore, the objective of this work was to redefine the number of sample points using the effective multivariate sample size (ESSmulti). This proposal makes it possible to obtain a single sample size, considering the information on the spatial dependence structure of soil physical-chemical attributes collected in an agricultural area.

Material and Methods
Two studies were carried out: the first considering simulated data, and the second using data of physical-chemical attributes of the soil obtained in an agricultural area. The simulation study aimed to reproduce a list of possibilities present in the real data, in addition to adding practical and theoretical knowledge about sample resizing in the soil attributes with spatial dependence structure.

Description of the simulations
We considered 14 georreferenced variables with different spatial dependence structures (Figure 1 -B). Given the Gaussian linear spatial model, stationary and isotropic , with fixed mean ( 5), we performed 100 simulations for each of the 14 variables using a Monte Carlo experiment (Cressie, 2015). In these simulations, we fixed the same configuration (lattice plus close pairs), as well the same sample size (102 sampling points), from the commercial agricultural area used in the experimental data ( Figure 1 -C). We also fixed the exponential semivariance model (Diggle and Ribeiro the spatial dependence structure of the simulated variables are described in Figure 1-B. The scheme presented in Figure 1 -A presents other features of the simulations. After estimating the parameter vector by the maximum likelihood method for each simulated variable, the value of the univariate effective sample size (ESSuni) was estimated according to Equation 1 (Vallejos and Osorio, 2014). (1) where is the number of simulated sampling points in the original grid ( ); is an unit vector; is an estimated spatial correlation matrix of the sample points, where the estimated spatial correlation between the -th and the -th Also, we created four scenarios (S1, S2, S3, S4) ( Figure 1 -B), where each one has a different number of variables and corresponds to a subset of the set composed of the 14 variables ( Figure 1 -B). The first scenario (S1) aggregates variables with moderate to strong intensity of spatial dependence (defined by the relative nugget effect (%), = ) and intermediate spatial dependence radius ( ). The second and third scenarios (S2 and S3, respectively) have a strong spatial dependence. However, S2 only has high range values, whereas in S3, the range varies between intermediate and high. Finally, the fourth scenario (S4) brings together all the variables.
For these scenarios, we estimated the value of the multivariate effective sample size (ESSmulti) considering the methodological scheme presented in Figure 1 -A and the proposal developed by Vallejos & Osorio (2014)  representing the spatial correlation matrix of the -th variable, . ESSuni: Univariate effective sample size, ESSmulti: Multivariate effective sample size.

Description of the Experimental Data
Data was collected in the 2015/2016 crop year in a commercial area with 167.35 hectares cultivated with soybean, where direct planting has been carried out since 1994 (Figure 1 -C). The area is located in the municipality of Cascavel, in Western Paraná , Brazil, with approximate geographical latitude and longitude coordinates of 24.95º South and 53.37º West, and 650 meters of average altitude. The soil is Red Dystroferric Latosol with a clay texture and has an average slope of approximately 4%, categorized as smooth undulating. The climate of the region is temperate mesothermic and superhumid, type Cfa climate (Koeppen), with a mean annual temperature of 21º C.
We used a lattice plus close pairs sampling design, with 102 sampling points. This design contained a regular grid (with minimum distance between points equal to 141 meters), to which we added 19 sample points. These added locations presented smaller distances with some points of the regular grid (50 and 75 m). The sample was georeferenced and located with the aid of a signal receiving apparatus with a Geoexplore 3 (Trimble®) Global Positioning System (GPS) set up for the Universal Transverse Mercator (UTM) coordinate system.
The data set we used contained the following: as physical attribute, the soil penetration resistance (in Mpa) at depths of 0-10 cm (SPR1), 11-20 cm (SPR2), 21-30 cm (SPR3), and 31-40 cm (SPR4); as chemical attributes, the following soil macro-and micro-nutrients: calcium (Ca, cmol dm -3 ), carbon (C, g dm -3 ), copper (Cu, mg dm -3 ), manganese (Mn, cmol dm -3 ), and zinc (Zn, mg dm -3 ). The experimental data of this physical and chemical attributes belongs to the database of the Laboratory of Spatial Statistics of the Western Paraná State University -UNIOESTE, Cascavel. The analysis of these soil attributes is important because the imbalance of their quantities in the soil can alter the growth and development phases of the plant, thus affecting the grain and, consequently, the soybean productivity (Taiz et al., 2017). Moreover, to better understand the nutritional characteristics of the soil, it is important to combine samples of macro-and micro-nutrients and physical attributes, such as soil penetration resistance (SPR), which is related to the soil compaction. Compacted soils tend to hinder the availability of nutrients and water to the plant, which also interferes in the development of the plant (Colombi and Keller, 2019).
Considering for each point the collection of five replications, the soil samples were obtained from 0 to 20 cm deep in the vicinity of the points, which were mixed and placed in plastic bags, with approximately 500 g, for the sample composition homogeneous and representative of the parcel. The values of micronutrients were extracted by the Mehlich-1 method, the carbon by Walkley-Black, and the calcium by KCI 1 mol L -1 . The determination of the soil ISSN 2166-0379 2021 penetration resistance was measured by the penetrograph, as follows: for each sampling point, we performed three readings per centimeter, from 0 to 40 cm, covering the four depths considered (0-10 cm, 11-20 cm, 21-30 cm, and 31-40 cm). The data obtained was transformed in MPa, and the value of the soil resistance penetration at each depth consisted of the arithmetic mean of the three measurements.

Journal of Agricultural Studies
Considering the original sample design, we performed the exploratory and geostatistical analyses of each physical-chemical attribute of the soil, as detailed in Figure 2. Details on the geostatistical methodologies used in this research are in Cressie (2015).
We estimated the values of the univariate effective sample size (ESSuni, Equation 1) and multivariate effective sample size (ESSmulti, Equation 3) (Figure 2) by the same methodology applied in the simulated data (Figure 1 -A). The effective sample size methodology uses the Fisher's information matrix, which only considers the information on the spatial dependence structure, that is, it does not consider any information on spatial prediction in its sample resizing process (Vallejos and Osorio, 2014).

Journal of Agricultural Studies
ISSN 2166-0379 2021, Vol. 9, No. 1 Finally, we compared the results obtained between the two sample configurations (original and reduced), using the methodologies presented in Figure 2 -B.
The simulations and the statistical and geostatistical analysis were prepared in the R software (R Development Core Team, 2020), using the geoR package (Ribeiro Jr and Diggle, 2001). We developed a computational routine in the R software (R Development Core Team, 2020) using the geoR (Ribeiro Jr and Diggle, 2001) and matrixcalc (Novomestky, 2012) packages to estimate the univariate (ESSuni) and multivariate (ESSmulti) effective sample size.

Simulation Studies
The estimated values of the univariate effective sample size (ESSuni) evidenced that the simulated variables divided into three groups (Figure 3 -A): variables V5, V7, V8, V13, and V14 made up the first group, the second group consisted of variables V1, V2, V3, V4, and V6, and variables V9, V10, V11, and V12 constituted the third group. Regarding the estimated values of the effective multivariate sample size (ESSmulti), there was a similarity between scenarios S1 and S4 (Figure 3  Scenario S2 only aggregated variables with high values for the spatial dependence radius (between 1,000 and 1,200 m) and was different from the other scenarios because it presented the lowest estimated mean value of ESSmulti (20 sample points) (Figure 3 -B). Scenario S3 diverged from the other ones because it showed the highest estimated mean value of ESSmulti (68 sample points) (Figure 3 -B). In this scenario, the variables had the lowest values of practical ranges, between 50 and 110 m, these being low when compared to the maximum distance in the area (approximately 1,800 m).

Application of the Methodology in Physical-Chemical Attributes of the Soil
When estimating the effective sample size of each physical-chemical attribute of the soil, four groups were identified (Figure 4 -A). These groups were obtained by simultaneous analyzing the estimated values of the spatial dependence radius (a), the intensity of spatial dependence (RNE), and the ESSuni of each attribute.
In the identified first group, the attributes exhibited high intensity of spatial dependence ( between 4% and 30%, respectively; Cambardella et al., 1984) and low values for the spatial dependence radius (between 100 and 116 m), making this group have the highest estimated values of ESSuni (approximately 90 sample points) (Figure 4 -A). This group was composed of the SPRs at depths of 11-20 and 21-30 cm.
The identified second and the third groups exhibited estimated ESSuni values between 22 and 50 sample points (Figure 4 -A). These two groups contained the following attributes: carbon (C), manganese (Mn), and zinc (Zn) (group 2); and calcium (Ca) and SPRs at depths of 0-10 cm and 31-40 cm (group 3). The estimated values of the spatial dependence radius were close (between 220 and 390 m) for all attributes of these groups, except for carbon. ISSN 2166-0379 2021 Thus, the intensity of spatial dependence, i.e., the estimated value of the RNE, influenced the difference in the estimated value of ESSuni between these groups. That is because, in the second group, the attributes showed less intensity of spatial dependence (higher ) compared to the third group (Figure 4 -A). Isolated from the other attributes, the copper (Cu) content in the soil presented the lowest estimated values of ESSuni (11 sample points) and exhibited the highest estimated value of practical range (538.7 m) (Figure 4 -A).  Figure 4 -B), was selected for the study of the spatial dependence of the physical-chemical attributes of the soil.

Journal of Agricultural Studies
We observed that the reduction in the sample size did not influence dispersion, given by the coefficient of variation, or the mean value of the physical-chemical attributes of the soil (Table 1). The chemical analysis of the soil showed high mean levels for most of the macro-and micro-nutrients of the soil. Also, the most superficial depth layers of soil (0 to 20 cm) presented some limitations to root growth (Canarache, 1991).  In the geostatistical analysis, we verified that the spatial dependence structure depends only on the distance separating the locations observed, and does not differ with the direction. Therefore, the spatial dependence structure can be considered isotropic for all the physical-chemical attributes of the soil (Guedes et al., 2013).
Comparing the original and reduced sampling designs, we found that the chemical attributes of the soil showed greater variation in the estimated values of the parameters of the spatial dependence structure than the physical ones (Table 2).
For most soil attributes, the standard error value of the parameters was that of the spatial dependence structure increased after sample resizing (Table 2). Also, the estimated standard error values display the same magnitude as the estimated parameter (Table 2).
All the soil attributes (except zinc) showed a reduction in the estimated value of the spatial dependence radius, having, on average, a practical range 27% smaller with the reduced sample configuration (Table 2). However, even with a smaller radius of spatial dependence, most of the soil attributes showed a higher intensity of spatial dependence (RNE) ( Table 2). ISSN 2166-0379 2021 Finally, there was a lower value of the variance inflation factor (VIF) (Griffith, 2008) in 80% of the physical-chemical attributes of the soil when the reduced sample configuration was used (Table 2).  With the original sampling configuration, there was a sample every 1.7 hectares. Considering the sample resizing obtained and the size of the experimental agricultural area, approximately one sample would be collected every 3 hectares. As a result, both the visual analysis of the thematic maps of the soil attributes, as well as the quantitative results, signaled some differences between the original and reduced sampling grids.

Journal of Agricultural Studies
For Soil Penetration Resistance (SPR), medium to high accuracy was found between maps with the original and reduced sample configurations, at depths of 11-20 cm, 21-30 cm, and 31-40 cm (T ≥ 67%; Ma and Redmond, 1995;De Bastiani et al., 2012) ( Figure 5 -G, H, and I). For the chemical attributes of the soil, lower accuracy indexes were found (T < 67%; Ma and Redmond, 1995) (Figure 5 -A, B, C, D, and E).
However, we observed that, even with a 40% reduction in the sample size, the thematic maps ISSN 2166-0379 2021 obtained with the resized sample configuration maintained the pattern of spatial variability described by the maps with the original sample size. This can be observed through the points that represent the spatial distribution of the contents of the macro-and micro-nutrients in the soil, as well as at a depth of 0-10 cm of the SPR, according to each class of the thematic map ( Figure 5 -A, B, C, D, E, and F). The pattern was also maintained at depth layers between 11 and 40 cm of the SPR (Figure 5 -G, H, and I), where the maps presented circular regions around the sample points.

Analysis of Simulations Studies
Comparing these three groups formed by the simulated variables, we observed an inverse relationship between the radius of spatial dependence and the estimated values of ESSuni. In other words, the greater the radius of spatial dependence, the lower the estimated value of ESSuni, thus the greater the reduction in the number of sample points. Even considering another sample configuration, results similar to those obtained were found by Vallejos & Osorio (2014). Moreover, when fixing the value of the spatial dependence radius and varying the value of the nugget effect, and consequently changing the intensity of spatial dependence, no relevant changes were verified in the interval given by the (mean Standard Deviation) of the estimated value of ESSuni (Figure 3 -A).
The spatial dependence radius also proved to be a parameter that inversely influences the estimated values of the univariate effective sample size (ESSuni) and of the multivariate effective sample size (ESSmulti).

Analysis of the Physical-Chemical Attributes of the Soil
The results showed that the greater the radius of spatial dependence, the lower the estimated value of the effective univariate sample size (ESSuni). This result corroborates those obtained in the simulation studies of the present study. As well as in Griffith (2005) and in Vallejos & Osorio (2014), who used different geostatistical models to explore the spatial variability of the biomass, arsenic, and lead index in the soil and to resize the sample size, and found the same inverse relationship between the radius of spatial dependence and ESSuni. Furthermore, when the practical reach values were close, the intensity of the spatial dependence had a direct influence, that is, the higher the RNE, the greater the estimated value of ESSuni.
The reduction of the sample size to 53 points represents a reduction of approximately 47% of the original sample size. This result corroborates with studies carried out by Griffith (2005Griffith ( , 2008 in which, considering soil macro-and micro-nutrients and calculating the univariate effective sample size, sample reductions were obtained which varied between 30% and 48%. Also, this new sample size is similar to those used by Pelissari et al. (2014) and by Siqueira et al. (2014), who considered between 46 and 60 sample points for the analysis of the spatial variability of physical-chemical attributes in experimental areas, respectively; with a smaller dimension (less than 100 ha) and a similar size (150 ha) compared to this research.
Comparing the original and reduced sampling designs (Table 2), the greater variation in the values of the spatial dependence parameters of the chemical attributes in relation to the physical ones can be explained by the agricultural practices of soil and crop management, as well as by the natural variability of the soil. Both factors contribute to the chemical attributes having greater spatial variability than the physical ones, therefore being more sensitive to the reduction in the number of sample elements (Jacob and Klutle, 1956).
The increase in the standard error of the parameters after sample resizing indicates that the reduction in the number of sample points influenced the spatial dependence structure of the physical-chemical attributes of the soil (Table 2). Moreover, regarding the standard error values following the magnitude of the estimated parameter, this characteristic is also perceived in the results of Schemmer et al. (2017) and Fagundes et al. (2018), who used in their studies both the Gaussian linear spatial model (as well as this study), and the Slash and t-Student linear spatial models, applied to variables related to soil and plants.
Although the results indicated that sample resizing provided changes in the spatial dependence structure of the physical-chemical attributes of the soil; the lower values of the VIF (Table 2) showed that the soil attributes were less affected by the spatial autocorrelation, after the reduction in the number of sample points. This suggests that the original sample configuration presented sample elements with redundant spatial information.
The lower accuracy in the maps of the chemical attributes of the soil in relation to the physical ones ( Figure 5) agrees with several studies about spatial variability carried out with chemical attributes of the soil using different sample sizes (from 9 to 164 samples) and sample densities (from 25x25 to 173x173), in different Brazilian states, therefore susceptible to different climates and soil types, and with other crops besides soy (corn and oats) (Cherubin et al. 2014(Cherubin et al. , 2015Kestring et al., 2015;Guarç oni et al., 2017). This is because the chemical attributes of the soil are more sensitive than the physical ones (therefore, than the SPR), concerning the description of spatial variability with a reduced number of samples (Jacob and Klutle, 1956). Furthermore, the chemical attributes of the soil are often corrected by the application of inputs, during and between harvests.
Regarding the circular regions around the sample points ( Figure 5 -G, H, and I), this is a phenomenon known as 'bull eyes effect' (Menezes et al., 2016) and is justified by the small dependence radius value in these depth layers of SPR in the sample grids, both the original and the reduced ones ( Table 2).
The thematic maps with the reduced configuration maintained the standard in spatial variability ( Figure 5); this implies that it is possible to continue carrying out soil management and correction in the necessary places, however using a smaller number of samples collected and without having equipment with a high cost of acquisition and maintenance, such as machinery with a harvest monitor.
Therefore, this work showed that, with a single reduction in the number of sample points obtained by proposing the effective multivariate sample size (ESSmulti), it was possible to characterize the spatial dependence of the agricultural area. Also, the methodology implemented showed that there was collection of spatially redundant information in the area, which in future collections would imply unnecessary costs to the production process. We used physical-chemical attributes of the soil with different spatial dependence structures and obtained a considerable resizing of the sample, from 102 to 53 sample elements. In the practice, reducing the number of samples by approximately 50% implies a reduction in costs by the same proportion, as it will reduce the demand for time, labor, and laboratory analysis.
National Council for Scientific and Technological Development (CNPq). The authors would also like to thank the Spatial Statistics Laboratory -UNIOESTE -Cascavel, PR, Brazil.