Time Series Analysis and Forecasting of Rainfall for Agricultural Crops in India: An Application of Artificial Neural Network

Indian agriculture depends heavily on rainfall. It not only influences agricultural production but also affects the prices of all agricultural commodities. Rainfall is an exogenous variable which is beyond farmers’ control. The outcome of rainfall fluctuat ion is quite natural. It has been observed that fluctuation in rainfall brings about fluctuation in output leading to price changes. Considering the importance of rainfall in determining agricultural production and prices, the study has attempted to forecast monthly rainfall in India with the help of time series analysis using monthly rainfall data. Both linear and non-linear models have been used. The value of diagnostic checking parameters (MAE, MSE, RMSE) is lower in a non-linear model compared to a linear one. The non-linear model - Artificial Neural Network (ANN) has been chosen instead of linear models, namely, simple seasonal exponential smoothing and Seasonal Auto-Regressive Integrated Moving Average to forecast rainfall. This will help to identify the proper cropping pattern.


Introduction
Rainfall is one of the major factors of agricultural production. In the last few years, the world's climate change has tremendously affected rainfall. The distribution of rainfall has fluctuated throughout the world. In some places it has increased whereas it has decreased elsewhere. Farmers are finding it difficult to cope with this change, primarily because most of the crops are seasonal and rainfall dependent. Presently, about 60 percent of world's staple food production and 90 percent of that of Sub-Saharan Africa depend directly on rainfall. Agricultural productivity can move up with increased rainfall and improvement in agricultural technology. But the regional variation in rainfall across different parts of the world makes it difficult to suggest any general kind of agricultural technology to be adopted throughout the world. The agriculture in the regions of the world with high rainfall does not face any kind of problem arising out of water scarcity. Here, rainwater can be stored through dams, ponds and reservoirs which might help to irrigate the surrounding areas and therefore lead to an increase in crop production. However, regions with low rainfall suffer due to poor availability of water for cultivation (www.sourcetrace.com). Therefore, rainfall influences cropping pattern in different parts of the world in different ways.
The Economic Survey 2017-2018 pointed out that agriculture in India continues to remain vulnerable to the fluctuations of monsoon till date. This is because of the fact that all India percentage of net irrigated area to total cropped area is as low as 34.5 percent. Around 52 percent (73.2 million hectares area out of 141.4 million hectares net sown area) agricultural land is still un-irrigated and dependent on monsoon (Financial Express Bureau, 2018). According to the Agricultural Ministry of India, the kharif (June -September) cultivation has reduced by 7 percent in 2019 compared to 2018 due to rainfall shortage. The kharif cultivated area in 2019 was 56.7 million hectares whereas it was 60.9 million hectares in 2018. The lack of rainfall has affected kharif paddy and pulses cultivation more than any other crop. The shortfall of kharif, paddy and pulses cultivation of 2019 is about 9 percent and 16 percent respectively which are larger than 2018 (Chowdhury, 2019). The monsoon rainfall influences not only kharif cultivation but also impacts groundwater storage and reservoir levels (thus also emerging critical for irrigation of rabi crops). distribution across regions in a particular season is what matters. Therefore, a fluctuation in the distribution of rainfall over the cropping season adversely impacts rain-fed crops. If severe drought or flood occurs during the reproductive, stages the possibility of crop failure is inevitable. So, there is a strong positive impact of rainfall on crop production. The effect of poor monsoon or monsoon failure is generally understood by looking at the changes in output between the years of monsoon failure and the normal years (Chand and Raju, 2009). Hence, rainfall pattern is one of the most important limiting factors for rain-fed crop production.
So, it can be said that in order to select the proper cropping pattern in India, it is very important to determine the exact relationship between rainfall and crop production. At the same time, it is equally important to know the expected monthly rainfall to select the most suitable crops for cultivation.
The present study intends to analyse and forecast the monthly rainfall based on the basis of time series data on rainfall. Monthly rainfall forecasts will in advance inform the farmers of the volume of rainfall. This information will help them to decide what to cultivate and by how much during a particular season. The specific objective of the study is as follows.
i) To analyze the nature of monthly rainfall in India and forecast the future monthly rainfall. Gulati Ashok et.al. (2013), attempted to project the likely impact of robust monsoon rains of 2013 on the Agricultural Gross Domestic Product (GDP) growth in India. The model hypothesizes that the performance of agriculture in India depends upon (1) investments in agriculture (private and public); (2) agricultural price incentives; and (3) rainfall. A log-linear model fitted over 1996-97 to 2012-13 period can explain 95 percent of the variations in agri-GDP with all variables being statistically significant. The model also forecasts that the agri-GDP growth rate for the agricultural year (July-June) 2013-14 is likely to be between 5.2% and 5.7%. They also estimates that agri-GDP growth in 2013-14 is likely to be about three times higher than previous year. This growth in agri-GDP is likely to come mainly from oilseeds, pulses, cotton, and coarse cereals belt of central and western parts of the country, which is less irrigated and thereby more dependent on rain. It is very likely that any damage to kharif crops due to excess rainfall (with extended monsoons and cyclones) would be offset by a bumper rabi crop harvest, given that there is excellent soil moisture and ample surplus water in reservoirs. has been evaluated based on lowest value of mean squared forecast error (MAE), root mean square forecast error (RMSE) and Coefficient of determination. The study reveals that FTS model is an appropriate forecasting tool to predict rainfall, since it outperforms the ARIMA and Theil's Regression. The SARIMA model yields more accurate results for short period time series data. However, Holt Winter's Exponential Smoothing is more appropriate in forecasting seasonal time series data, whatever its pattern and trend be. In their study, they have compared the Mean Absolute Deviation (MAD) of the two models and have concluded that the model with smaller MAD, i.e., SARIMA (1, 1, 0) (0, 1, 0)12 is a better one. Kistner Erica et.al. (2018), have found that the temperature and precipitation fluctuations across the Midwest directly impact quantity and quality of specialty crops (generally more sensitive to climatic stressors and require more comprehensive management compared to traditional row crops) and indirectly influence the timing of crucial farm operations. They have observed that increasingly variable weather and climate condition pose a serious threat to specialty crop production in the Midwest. Their results indicate that weather-induced losses vary between Midwestern states with excessive moisture resulting in the highest total number of claims across all states followed by freeze and drought events. They observe that the specialty crop growers are aware of the increased production risk under changing climate condition and have identified the need for crop specific weather, production, and financial risk management tools and increased crop insurance coverage.

Karuiru Elias
Renato Rossetti (2019) has discussed time series approaches to forecast the sales of console games for the Italian market. More specifically he evaluates two univariate techniques namely, Exponential Smoothing and the SARIMA techniques. The aim is to exploit the capabilities of these statistical methods in order to have a comparison of the results and to ISSN 1948-5433 2020 choose the most accurate model through an ex-post evaluation. Using monthly time-series data from November 2005 to September 2017, the selection of the most suitable model has been indicated by the smallest value of the measures of accuracy (MAPE, MAPE, RMSE) for the out-of-sample observations during the period October 2017-September 2018. In terms of performance, the smallest values concerning these measures represent the best choice between the models studied, and the preferred option for this time series is SARIMA (2, 1, 0) (1, 1, 0)12. Therefore the SARIMA model is the most reliable and relevant model for this analysis.

Data and Methodology
The study covers the period between 1992-93 and 2016-17. All India monthly, seasonal and annual rainfall series were constructed based on the area weighted rainfall of all the 306 stations during 1989-2013 and 2014-2016 based on IMD Sub-divisional rainfall (Kothawale and Rajeevan, 2017). The different time series (linear and non-linear) models have been used to analyze the objective of our research.

Linear Time Series Models
In the analysis of time series, several methods of forecasting are used. Most common methods are the Moving Average method, Linear Regression with Time, Non-seasonal and Seasonal Exponential Smoothing, Autoregressive Integrated Moving average etc.
The monthly rainfall data of India exhibit seasonality (which is explained later). According to the nature of the data, this study concentrates on the Simple seasonal Exponential Smoothing and Seasonal Autoregressive Integrated Moving Average models among the linear models.

Simple Seasonal Exponential Smoothing
When the sequences of data show no trend but seasonality factors exist, the simple seasonal exponential smoothing method is the best option for forecasting. The simple seasonal exponential method comprises the forecast equation and two smoothing equations -one for the level Lt, and other for the seasonal component denoted by St, with smoothing parameters α and δ.
Where s (s = 12 for monthly data) is the length of the seasonal cycle, for 0 ≤ α ≤ 1 and 0 ≤ δ ≤ 1.

Seasonal Autoregressive Integrated Moving Average (SARIMA)
The ARMA is an extrapolation (techniques for forecasting using only the past data) method which requires historical time series data of the underlying variable. The model in specific and general forms may be expressed as follows.
Box and Jenkins argue that a non-stationary series can be transformed either into a stationary or an almost stationary series if it is differenced (d) an appropriate number of times. After transforming the series into a stationary or to an almost stationary series, the model transforms to ARIMA.
The mathematical equation, of the ARIMA (p, d, q) model is defined in Equation (4) as follows: If a time series is seasonal of period s, Box and Jenkins made a proposal that such a model may be defined as Equation (5) ɸp Where B is the backshift operator (i.e. BYt = Yt-1, B 2 Yt = Yt-2, B s Yt = Yt-s and so on),'s' is the seasonal lag and 'ε' is a sequence of independent normal error variables with mean 0 and variance σ 2 . ɸs and φs are the non-seasonal and seasonal autoregressive parameters respectively. θ s and δs are non-seasonal and seasonal moving average parameters respectively. P and q are orders of non-seasonal autoregressive and moving average parameters, whereas P and Q are that of the seasonal auto regressive and moving average parameters. Here, 'd' and 'D' denote non-seasonal and seasonal differences respectively. In its general form, the Seasonal ARIMA (SARIMA) model is represented as SARIMA (p, d, q) (P, D, Q) s. Box and Jenkins (1976)  Both models (SARIMA and Simple Seasonal Exponential Smoothing) are based on their past values and error terms. One is parametric model and the other one is non parametric. There is a controversy between the researchers over which model is more appropriate. Some literatures have described that simple seasonal exponential smoothing should be considered more useful than SARIMA due to its weight assigning capacity. Others have opined that SARIMA is better alternative than simple seasonal exponential smoothing. However, the best of the two is the one that describes time series the best without over-fitting. But both are suitable in describing linearity of the data.

Non-linear Time Series Model
Sometimes the time series often contains nonlinear components; under such situations the linear models are not adequate for modeling and forecasting. To overcome this difficulty, a non-linear model has been successfully used. When the linear restriction of the model form is relaxed, the possible number of nonlinear structures (ARCH, GARCH, EGARCH, TAR, NAR, NMA model, etc.) that can be used to describe and forecast a time series are enormous. A good nonlinear model should be "general enough to capture some of the nonlinear phenomena in the data" (De Gooijer and Kumar, 1992). Artificial neural networks (ANNs) ISSN 1948-5433 2020 model is one such that is able to approximate various nonlinearities in the data. To deal with uncertainty, linearity or nonlinearity of time series data, artificial neural networks model is the most effective method.

Artificial Neural Networks (ANNs)
Artificial neural networks are a class of flexible nonlinear models that can discover patterns adaptively from the data. Theoretically, it has been shown that given an appropriate number of nonlinear processing units, neural networks can learn from experience and estimate any complex functional relationship with high accuracy (Zhang and Qi, 2005). The most widely used ANNs in forecasting problems are multi-layer perceptron (MLPs), which use a single hidden layer feed forward network (Zhang et.al., 1998). The model is characterized by a network of three layers, viz. input, hidden and output layers, connected by acyclic links. There may be more than one hidden layers. The nodes in various layers are also known as processing elements. The three-layer feed forward architecture of ANN models can be diagrammatically depicted as follows: The relationship between the output (yt) and the inputs (yt-1, y t-3…….yt-p) have the following mathematical representation: Here y t-i (i=1, 2, ..., p) are the p inputs and y t is the output. α j (j = 1, 2, 3,……, q) and β ij ( i= 1 ,2, 3,……p, j = 1,2,3,………q) are the connection weights and εt is the random shock; α0 and β0j are the bias terms. The integers p, q are the number of input and hidden nodes respectively. Usually, the logistic sigmoid function is often used as the hidden layer transfer function (nonlinear activation function), that is, Hence, the ANN model of (6) in fact performs a nonlinear functional mapping from the past observations (yt-1, y t-2, y t-3,……,y t-p,) to the future value yt, i.e., ISSN 1948-5433 2020 y t = f ( yt-1, y t-2, y t-3,……,yt-p, w ) + ε t (8) Where, w is a vector of all parameters and f is a function determined by the network structure and connection weights. Thus, the neural network is equivalent to a non-linear autoregressive model. Note that expression (6) implies one output node in the output layer which is typically used for one-step-ahead forecasting.

Research in Applied Economics
The choice of q is data dependent and there is no systematic rule in deciding this parameter. In addition to choosing an appropriate number of hidden nodes, another important task of ANN modeling of a time series is the selection of the number of lagged observations, p (the dimension of the input vector). This is perhaps the most important parameter to be estimated in an ANN model. It plays a major role in determining the nonlinear autocorrelation structure of the time series. There is also no standard rule to guide the selection of p. Hence, experiments are often conducted to select an appropriate p as well as q. Once a network structure (p; q) is specified, the network is ready for training-a process of parameter estimation. The parameters are estimated so that an overall accuracy criterion like the mean squared error is minimized.
In other words, the two essential elements that determine the ANNs are architecture structure and learning algorithm. The architecture is determined by deciding the number of layers and number of neuron nodes in each layer (Aladag et al., 2009) and there is no general rule for determining the best architecture (Zurada, 1992). The links that connect the neurons of a layer to the neurons of another layer are called weights. These weights are determined by a learning algorithm that updates their values. Non-linear least square procedures are used to estimate the connection weights. Back Propagation algorithm is one of the most used learning algorithms which updates the weights based on the difference between the output value of the ANN and the desired real value.

Descriptive Statistics
The study considers monthly rainfall statistics of India during the period from January 1992 to December 2016 for time series analysis. The study has taken into account 300 observations. The descriptive statistics of monthly, seasonal and yearly rainfall in India are also presented in Source: Authors' calculation, mm: Millimeters

Time Series Models
While analyzing time series data for forecasting purposes, it is very important to know the pattern of the data. The series of data may have trends, seasonality, cyclical and random variability. The trend can be easily identified and confirmed with a computer analysis of the data using statistical techniques such as fitting a trend line or use of the autocorrelation function. The trend may be positive or negative. Seasonality, defined as a structured pattern of changes within a year, is the subject matter i.e., regular wavelike fluctuations of constant length, repeating themselves. Cyclicality is in the realm of economics and is not dealt with here (Davey et al., 1993). The graphical presentation of data in figure 2 indicates that data have no trend but the seasonality is identified. The values of the autocorrelation function (Annexure Table1) have been calculated from our observations and it is clearly indicated that the series has seasonality. So the study needs to consider those types of time series models which have the power to handle the seasonal factor. ISSN 1948-5433 2020  After analyzing the nature of data, the study tries to choose the appropriate linear model to explain the monthly rainfall and forecasting of rainfall in India. By using SPSS expert modeler, the study selects the simple seasonal model as the best model among the all. All parameters and model summary of the simple seasonal model are given in Table 2.

Source: Authors' Calculation
The results of the selected model indicate that the coefficient of level factor (α) is statistically significant but the coefficient of seasonal factor (δ) is insignificant. At the same time, the residual of ACF and PACF are not in the prescribed range. Hence, the study has not accepted the simple seasonal exponential smoothing model for the analysis of rainfall data and forecasting.
Therefore, the study has tried to find the appropriate alternative model which can explain the data showing exhibiting seasonality. Seasonal Auto Regressive Moving Average model might be an alternative. ISSN 1948-5433 2020

Seasonal Autoregressive Integrated Moving Average
In the SARIMA model, the series need to be stationary. In Figure 2 above, it can be clearly observed that the pattern of the graph indicates series is stationary. There is no upward and downward trend but clear seasonal fluctuation is present.
Augmented Dickey-Fuller (ADF) test has been used to test for stationary. The Dickey-Fuller test statistic is -7.60 and the p-value is around 0.000 of the data. Since, the test statistics is less than that of critical value, the study, therefore, fails to accept H0 (series is non-stationary) at 5 percent level and hence concludes that the alternative hypothesis H1 (series is stationary) is accepted. Hence, the series is stationary in its mean and variance. Consequently, there is no need for non-seasonal differences. The Table 1 in the Appendix indicates that the autocorrelation coefficients at lags between t and t-12 are significant. The series has seasonal fluctuation. To remove this seasonal fluctuation from the series, seasonal differences have to be taken into account. The sequence of de-seasonal data are presented in figure 3.

Research in Applied Economics
Whereas in the alternative model, all the coefficients are statistically significant. In addition to this, the ACF and PACF of residuals have fallen inside the 95 percent confidence levels in both models. The appropriate model is selected from two alternative SARIMA models [(0,0,0)(4,1,0) and (0,0,0)(5,1,0)] on the basis of diagnostic checking.

Diagnostic Checking
After estimating the parameters, the study tests the adequacy of the model using Ljung-Box (Q) statistics. In both these models, Q statistics are statistically significant. Comparing these two models, the SARIMA (0, 0, 0) (5, 1, 0)12 model is found to be best suited for forecasting Indian rainfall. The values of RMSE, MAE and BIC in SARIMA (0,0,0)(5,1,0)12 are lower than that in SARIMA (0,0,0)(4,1,0). The summary of the results are given in Table 4.

Source: Authors' Calculation
On the basis of above statistics SARIMA (0, 0, 0) (5, 1, 0)12 is the best model. The model is as follows: This model is a special case of the SARIMA model, which is called a Seasonal Integrated Auto-Regressive Model.

Forecasting
Using the identified model SARIMA (0, 0, 0) (5, 1, 0)12 the study has forecasted the monthly rainfall in India up to December 2021. The results of the forecasted monthly rainfall have been presented in Table 5. ISSN 1948-5433 2020  Source: Authors' Calculation

Research in Applied Economics
The forecasted results show that the rainfall of July-August is maximum and December-January is minimum which coincides with actual series. The annual rainfall will be 1034.318 mm. and 993.715 mm. and 1004.34 mm in 2019, 2020 and 2021 respectively.

Artificial Neural Network
To find the optimal number of input node (time lags) and hidden node, the study attempts to train the network with different values of learning rates and momentum rates until the mean square error and mean absolute error of the training set is very minimum and checked the testing set performance. For whatever values of the above parameters, the mean square errors of training set and testing set is minimum.
The different learning rates and momentum rates have been considered. However, it has been observed that, the different learning rates in the hidden layer and an output layer with a constant momentum rate leads to fast training. The study has been considered fixed momentum rate (value 0.05) with changing learning rates. Learning rates of 0.005 gives better results than learning rates of 0.009. The Zaitun software has been used to train and test the network.
The study developed and trained different time-lagged feed-forward neural networks (TLFN) models for forecasting monthly rainfall. It is observed that 36 -12-1 is the best model among the several TLFN models. The input layer has 36-time lags; and the output layer has one node whereas the hidden layer has 12 hidden nodes. In the model 36-12-1, 74.36 percent observations are used as training data and 25.64 percent are used as testing data. All the results are given in Table 6.  ISSN 1948-5433 2020 To compare the diagnostic checking parameter of the model 36-12-1 at different learning rates, 0.005 gives better results than 0.009.

Research in Applied Economics
Hence, 36-12-1 is the best ANN model for Indian monthly rainfall with 0.005 learning rate and 0.05 momentum. Table 7 has shown that forecasted monthly rainfall in India during 2019, 2020 and 2021 using the ANN model (36-12-1).

Source: Authors' Calculation
This rainfall is graphically represented in fig. 4. The Y-axis represents the volume of rainfall in millimeters and X-axis indicates time i.e, years. The black and white part of the figure is indicates actual values of rainfall and blue or highlighted portion of the figure is representing predicted values of rainfall.

Linear Model (3.1) Vs. Non Linear Model (3.2)
Now the question is which model performs better when it comes to analyzing time series data of rainfall and forecasting future rainfall. To select this, the study has considered the values of model selection parameters like MAE, MSE, and RMSE. The values of all these parameters are comparatively lower in the case of non-linear (ANN) model than the linear (SARIMA) model which are presented in Table 8. Hence, it can be conclude that Non-linear Artificial Neural Network model is the best one to explain monthly rainfall of India and to forecast rainfall accurately. Source: Authors' Calculation

Discussion
The study has estimated the year to year fluctuations in rainfall and agricultural production.
The results are presented in Table 1 in the Appendix. The table shows that the years 1995,2002,2004,2009,2012 and 2014 have received less (deviation is more than 10 percent) rainfall than their respective previous years. While the years 1994, 2003, 2010, 2013 and 2016 have received more (deviation is more than 10 percent) rainfall compared to the preceding years. The agricultural production has changed with change in rainfall. It is obvious that the fluctuation in rainfall has influenced the level of output in most of the cases and agricultural production has increased with an increase in rainfall and vice versa.
From table 1in the appendix, it can be observed that a good rainfall positively influences kharif production and at the same time provides water for irrigation during rabi season. Hence, good and timely rainfall indicates better production and stable prices. Keeping in mind the impact of rainfall on agricultural production, the study has attempted to forecast monthly rainfall for India with the help of time series analysis based on monthly rainfall data.
The diagnostic checking parameters of analyzed linear (SARIMA) and non-linear (ANN) time series models have been presented in Table 8. It is clear that the non-linear model performs better in explaining monthly rainfall data and accurate forecasting. The results match those obtained by Karuiru et al. (2016).

Conclusion
After 72 years of independence, Indian agriculture still depends on rainfall. The volume of rainfall and its distribution over time and region are uneven. Such vagaries of monsoon affects not only agricultural production but prices of agricultural commodities as well. The study has observed that the fluctuation in rainfall and that of output are highly correlated. The variation in rainfall leads to output fluctuations and thereby raises prices. The study has attempted to forecast monthly rainfall for the country as a whole with the help of time series analysis based on monthly rainfall data. Both linear and non-linear models have been used. The value of diagnostic checking parameters MAE, MSE, RMSE are lower in non-linear model compared to the linear ones. The Non-linear model of Artificial Neural Networks (ANNs) has given better results than linear models namely simple seasonal exponential smoothing and Seasonal Auto-Regressive Integrated Moving Average in explaining rainfall data. It can therefore be concluded that forecasting of rainfall with ANN is more efficient than that using SARIMA model. 0.00 a. The underlying process assumed is independence (white noise). b. Based on the asymptotic chi-square approximation. ISSN 1948-5433 2020