Robust Regression in the Presence of Leverage: An Application to the Baseball Data Author: Amit Saha, K.N. Singh, Bishal Gurung, Santosha Rathod and Md. Yeasin Pages: 181-186
It is well known that linear regression analysis is one of the commonly used statistical tools in various fields. The ordinary least squares (OLS) is generally adopted to estimate the parameters in the model provided all the necessary assumptions are satisfied. OLS is widely used because of its desirable properties like unbiasedness, minimum variance, consistency, asymptotic unbiasedness etc. However, outcomes of OLS may be affected if some of the assumptions do not hold properly. Presence of outliers is one of the main reason to deliver poor results in OLS. So, it is very much important to use a robust method for parameter estimation which is not much affected by outliers. Robust regression analysis is a statistical technique which is an improvement to least squares estimation to cope or to detect the outliers. In other words, the robust regression analysis performs well when the assumptions are not satisfied by the data. One can transform variables to deal with the data when some of the assumptions are substantially violated. But, the influence of outliers has often not been attenuated by the transformation. So it is better to use robust regression that is resistant to the influence of outliers. In this paper, OLS and two robust regression methods (M and S estimation) are discussed and applied to run the regression model on baseball data. It has been seen that S estimation method outperformed the OLS and M estimation. Keywords: M-estimation, OLS estimation, Robust regression and S-estimation.
Abstract
2
Prediction of Finite Population Total for Geo-referenced Data Author: Samir Barman, Pradip Basak and Hukum Chandra Pages: 195-200
In many surveys (for example, agriculture, forestry, environmental and ecological surveys), data are spatially correlated and independence assumption is questionable. As a result, the existing estimators for population total (or mean) based on standard survey estimation method can be biased and less efficient. Use of spatial information in sample surveys is expected to provide a better estimation of population parameters. This paper develops the estimators for finite population total incorporating spatial information. The proposed estimators are evaluated through simulation studies. The the estimators for finite population total incorporating spatial information. The proposed estimators are evaluated through simulation studies. The empirical results show that the developed estimators have smaller bias and better efficiency as compared to the existing estimators. Keywords: Geo-referenced data, Population total, Spatial information.
Abstract
3
Fixing the Sample-Size in Direct and Randomized Response Surveys Author: Arijit Chaudhuri and Aritra Sen Pages: 201-208
In order to estimate a population total or mean by an unbiased estimator from a Simple Random Sampling With Replacement or a Simple Random Sampling Without Replacement incurring an estimation error not exceeding a pre-assigned fraction of the estimand parameter with a high probability, we apply Chebyshev?s inequality to see a reasonable solution, to fix the sample size, available for a Direct Response (DR) survey but extending this to Randomized Response (RR) survey to cover a sensitive feature we find an absurd solution. The same exercise is extended by applying Chebyshev?s inequality to permit unequal probability sampling with suitably different estimators. Keywords: Chebyshev?s inequality; Finite population; Randomized response surveys; Sample-size; Sensitive issues.
Abstract
4
Application of Machine Learning Techniques with GARCH Model for Forecasting Volatility in Agricultural Commodity Prices Author: Tanima Das, Ranjit Kumar Paul, L.M. Bhar and A.K. Paul Pages: 187-194
Food price forecasting is very useful for farmers, consumers, policy makers and industrialists too. In the recent era, crucial management of food security in agriculture has a great value in India. Volatility forecasting is an integral part of commodity trading and price analysis. In many literatures, the inefficiency of single parametric model in capturing volatility in a series has been strongly proved. In this context non-parametric nonlinear models like Support Vector Regression (SVR) and Neural Network (NN) may be used to improve forecasting performance. Hence, in search of improved alternatives to the classical econometric methods, machine learning techniques viz. SVR, NN is applied along with its combination with GARCH model. The outperformance of this new approach has also been established by means of Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and R2. Keywords: GARCH, Hybrid model, Neural network, Onion price, SVR.
Abstract
5
Modelling Volatility Influenced by Exogenous Factors using an Improved GARCH-X Model Author: Md Yeasin, K.N. Singh, Achal Lama and Ranjit Kumar Paul Pages: 209-216
Generalized Autoregressive Conditional Heteroscedastic (GARCH) model has gained popularity since its inception due to its ability to forecast volatility. Usually, GARCH model captures the volatility based on its past volatility and past squared residuals, but does not consider the effect of exogenous variable(s) on the volatility process owing to its univariate nature. In the domain of econometric modelling, where exogenous variables play a crucial role, GARCH model with intervention of exogenous variable(s) is more feasible than the traditional GARCH model. Hence, this study aims to empirically introduce and implement an improved GARCH-X model which can account for the effects of influencing factors (X) both into the mean and variance equation simultaneously of the standard GARCH model. In this manuscript, we have briefly discussed GARCH and GARCH-X models along with their implementation procedure. The proposed model is compared with the traditional GARCH model using domestic price index of edible oils in India along with the influencing factors like foreign exchange rate and international price index of edible oils as exogenous variables (X). Supremacy of using exogenous factors in volatility modelling is concluded from this comparison. Keywords: Exogenous variables, GARCH model, GARCH-X model, Price index, Volatility.
Abstract
6
An Alternative Sampling Methodology for Estimation of Cotton Yield using Double Sampling Approach Author: Tauqueer Ahmad, U.C. Sud, Anil Rai and Prachi Misra Sahoo Pages: 217-226
Cotton, a multiple picking crop, is grown in around nine States in India. The existing procedure of estimation of average yield of cotton is based on crop cutting experiment(CCE) approach, which utilizes data on all pickings, is cumbersome and cost prohibitive. The double sampling approach can be gainfully employed in this case by collecting data on picking which has highest correlation with the total pickings yield on a larger sample and the total pickings yield data on a smaller sample. Accordingly, a stratified two-stage two-phase sampling design has been proposed for selection of representative sample and an appropriate estimation procedure, based on double sampling regression estimator, has been developed for estimation of average yield of cotton at district level. Utilizing the data of survey conducted in the Aurangabad and Amravati district of Maharashtra State and Adilabad and Guntur district of Andhra Pradesh wherein third picking data was collected on a larger sample and total pickings yield data on a smaller sample. An expression for optimum number of villages for larger and smaller samples have been obtained by minimizing cost subject to fixed percentage standard error of the estimates. These have been worked out empirically as well. Keywords: Yield, Cotton, Multiple picking, Sampling methodology, Double sampling, Percentage standard error, Crop Cutting Experiment (CCE).
Abstract
7
Spatial Bootstrap Variance Estimation Method for Missing Survey Data Author: Ankur Biswas, Anil Rai and Tauqueer Ahmad Pages: 227-236
In this study, an attempt was made to develop bootstrap variance estimation procedure for Spatial Estimator (SE) of finite population mean in presence of missing observations under Simple Random Sampling Without Replacement. The Proportional Spatial Bootstrap (PSB) method has been proposed considering spatial relationship between sampling units. Under this technique, different spatial imputation techniques based on the spatial dependency of data were used to impute missing observations in the observed sample. The statistical properties of the proposed PSB techniques were studied empirically through a simulation study. The simulation results reveal that using appropriate spatial data-dependent imputation techniques, the proposed PSB technique performed better than its existing techniques available in the literature. Keywords: Spatial estimator, Rescaled spatial bootstrap, Spatial imputation, Inverse distance weighting, Ordinary kriging, Spatial simulation.
Abstract
8
Integration of Survey Data and Satellite Data for Acreage Estimation of Mango (Mangifera indica) Author: Ashis Ranjan Udgata , Prachi Misra Sahoo , Tauqueer Ahmad , Anil Rai , Ankur Biswas and Gopal Krishna Pages: 237-242
Timely and accurate estimates of crop areas are critical for enhancing agriculture management and ensuring national food security. This study aims to combine remote-sensing data and survey data to improve the accuracy of crop area estimates and decrease the cost of crop surveys at district level. In this study an estimate of mango area was obtained using Sentinel 2 satellite images of 2017 for West Godavari district of Andhra Pradesh. Further the area estimate was improved by integrating satellite data and survey data using ratio and regression estimator. Regression estimator was found to be best with lowest standard error. Keywords: Food security, Remote sensing, Satellite data, Ratio and regression estimator.
Abstract
9
Tree Network-balanced Designs for Agroforestry Trials Author: Peter T Birteeb, Cini Varghese , Seema Jaggi , Mohd. Harun and Eldho Varghese Pages: 243-254
This study was carried out to develop designs for agroforestry experiments involving multiple trees and multiple crop species. A linear network effects model incorporating tree effects from adjacent plots has been considered, and a general method of constructing a class of designs balanced for tree network effects has been developed. The proposed designs are partially variance balanced for estimating direct effects of tree-crop combinations, with the tree-crop combinations following a two-associate class Group Divisible (GD) association scheme. It has been shown that these designs are highly efficient. Also, the layout in separate arrays permits each replication of these designs to be suitably adapted in a different location. Hence, the designs have promising application potential in agroforestry experimentations involving multiple trees and multiple crop species with resource limitations. Keywords: Adjacency matrix, Canonical efficiency factor, Group divisible association scheme, Partially variance balanced.
Abstract
10
New Quartile based Variant of the Ranked Set Sampling Scheme Author: Immad A. Shah, Shakeel Ahmad Mir and Imran Khan Pages: 255-264
A new variant of ranked set sampling namely Stratified Balanced Groups Quartile Ranked Set Sampling (SBGQRSS) is proposed for estimating the population mean with samples of size m=3k (k=1,2,3,...) drawn independently from each stratum. The SBGQRSS scheme yields an unbiased estimator for symmetric distributions and biased but consistent estimators for asymmetric distributions. The performance of the mean estimator based on SBGQRSS is compared with simple random sampling (SRS) theoretically and by simulation studies as well. The simulation study revealed an increased efficiency of SBGQRSS over ranked set sampling (RSS), and balanced groups ranked set sampling (BGRSS) to estimate the population mean. A real data set on the parameters of high-density plantation (HDP) apple is used to illustrate the methodology of the SBGQRSS scheme. Keywords: Ranked set sampling, Balanced groups ranked set sampling, Population mean, High-density plantation, Simulation.