The purpose of this study is to present a generalized class of estimators using the three-stage Optional Randomized Response Technique (ORRT) in the presence of non-response and measurement errors on a sensitive study variable. The proposed estimator makes use of dual auxiliary information. The expression for the bias and mean square error of the proposed estimator are derived using Taylor series expansion. The proposed estimator’s applicability is proven using real data sets. A numerical study is used to compare the efficiency of the proposed estimator with adapted estimators of the finite population mean. The suggested estimator performs better than adapted ordinary, ratio, and exponential ratio-type estimators in the presence of both non-response and measurement errors. The efficiency of the proposed estimator of population mean declines as the inverse sampling rate, non-response rate, and sensitivity level of the survey question increase.
In a survey, it’s challenging to collect accurate information on a sensitive study variable that has a socially stigmatizing characteristic such as “Have you ever had an abortion?”, “How much money do you make?” “Have you ever been infected with sexually transmitted diseases?” or “Are you a drug addict?” among others. Obtaining correct answers to such questions in an interview involving direct questioning is difficult since the respondent’s privacy is not protected. Most respondents will either purposefully give a false answer or refuse to respond to such questions due to fear of embarrassment or loss of social status.
Randomized Response Technique (RRT) was pioneered by Warner [1], and its main objective was to reduce response bias in surveys involving a sensitive question. In RRT, a scrambled variable is used to estimate the finite population mean of a sensitive variable. The scrambled variable is assumed to be independent of the study and the non-sensitive auxiliary variable. The respondent must adequately respond to the non-sensitive additional variable and a scrambled response for the study variable.
The Optional Randomized Response Technique (ORRT) was pioneered by Chaudhuri and Mukherjee [2]. The technique involves giving the respondent an option to provide a scrambled or direct response to a sensitive question. In one-stage ORRT [3], a respondent is expected to either give a scrambled response if they feel the question is sensitive and a direct response if the question is non-sensitive.
Two-stage ORRT aims at increasing respondent privacy and participation in a survey involving a sensitive study variable [4]. In two-stage ORRT, a known proportion of respondents, \(t_h\), is requested to respond directly to a sensitive question while maintaining anonymity. The remaining proportion of respondents provide a scrambled response using an additive model. The main drawback of two-stage ORRT is that it requires a significant value of \(\ t_h\), especially when the underlying question is susceptible.
The three-stage ORRT [5] aims to promote respondent privacy and cooperation in a survey involving a sensitive question. A known predetermined proportion, \(t_h\) of respondents, is requested to give a true response to a sensitive question. Another predetermined proportion, \(f_h\), is requested to provide a scramble response. The remaining proportion is given the option of either providing a scrambled or direct response to a sensitive question. In additive three-stage ORRT, the scrambled response provided is defined as \[\label{GrindEQ__1_} Z_{hi}=\left\{ \begin{array}{c} Y_{hi}\ \ with\ pobability\ t_h+\left(1-t_h-f_h\right)\left(1-{\psi }_h\right) \\ Y_{hi}+S_{hi},\ with\ probability\ f_h+\left(1-t_h-f_h\right)\ \ \end{array} \right. \tag{1}\] , where \(Y_{hi}\ \)is the study variable, \({\psi }_h\) is the sensitivity level, and \(S_{hi}\ \) is a scrambled variable that is normally distributed with mean 0 and variance\({\ S}^2_{Sh}\). The mean and variance of \(Z_{hi}\) are given as \(E\left(Z_{hi}\right)=E\left(Y_{hi}\right)\) and
\(S^2_{Zh}=S^2_{Yh}+{\varphi }_hS^2_{Sh}\ \), respectively, where\({\ \varphi }_h=f_h+\left(1-t_h-f_h\right)\).
In the literature, researchers who have studied the estimation of finite population mean of a sensitive study variable using non-optional RRT include Eichhron and Hayre [6], Gupta and Shabbir [7], Gupta et al. [8], Sousa et al. [9], Zatezalo [10], and Mushtaq et al [11], Mushtaq et al. [12], and Mushtaq and Noor-Ul-Amin [13].
The inability of specific units to provide information due to their unwillingness to participate, illness, or absence is referred to as non-response. Non-response in a survey reduces the sample size, raising the variance of an estimator of the finite population mean. Hansen and Hurwitz [14] proposed a strategy for getting data from non-responding units in a postal survey called subsampling. The approach included more effort to gather data directly from a subsample of non-responding units. The strategy is used in this study to handle the problem of non-response.
During the data collection and recording phases of a survey, measurement errors can occur. The difference between a variable’s valid values and those reported in a survey is known as measurement error. When measurement errors occur in a survey, the data becomes contaminated, resulting in under- or overestimated parameters during analysis.
Khalil [15] discusses the issue of estimation of the finite population mean in the presence of measurement errors in simple random sampling based on non-optional RRT. The problem of estimation of the finite population mean using a non-optional RRT model in the presence of measurement errors under stratified random sampling is addressed by Khalil et al. [16]. In the presence of measurement errors, Khalil et al. [17] extended the work of Khalil [16] to estimate the finite population mean using one-stage optional RRT. Recently, Onyango et al. [18] studied the problem of estimating the finite population mean and measurement errors using the non-optional RRT in stratified two-phase sampling.
Naeem and Shabbir [19] and Zahid and Shabbir [20] discuss the problem of estimation of the finite population mean of a sensitive variable using non-optional RRT in the presence of measurement errors and non-response simultaneously. Zhang et al. [21] studied the problem of estimating the finite population mean using one-stage RRT in the presence of measurement errors and non-response simultaneously in simple random sampling. Recently, Zhang et al. [22] studied the estimation of the finite population mean of a sensitive study variable using ORRT under measurement errors and non-response in stratified random sampling.
The present study fills the existing gap in the literature on estimating the finite population mean of a sensitive study variable using an additive three-stage ORRT model in the presence of measurement errors and non-response simultaneously.
In this paper, section two describes the population and notations used in this study. The existing estimators of population mean under three-stage ORRT models are described in section three. The proposed estimator and its properties of bias and mean square error are discussed in section four. Section five looks at the theoretical efficiency of the proposed estimator. Section six performs a numerical analysis of the proposed estimator’s performance. Finally, section seven contains the conclusions of the study.
Consider a population \(U=U_1,\ U_2,\ \dots ,\ U_N\) of size N. The population comprises a sensitive study variable, auxiliary variable, and scrambled response denoted as Y, X, and Z, respectively. Let \(\left(X_{hi},Y_{hi},\ Z_{hi}\right)\) and \(\left(x_{hi},y_{hi},\ z_{hi}\right)\) denote the \(i^{th}\) values of X, Y, and Z in the \(h^{th}\)population and sample stratum respectively. Furthermore, let \({\overline{X}}_h,{\overline{Y}}_h,\ and\ {\overline{Z}}_{h\ }\)denote the population mean of X, Y, and Z respectively in the \(h^{th}\) stratum. The variance of the scrambled response and auxiliary variables are obtained using \(S^2_{Zh}=\frac{1}{N_h-1}\sum^{N_h}_{i=1}{{\left(z_{hi}-{\overline{Z}}_h\right)}^2}\) and \(S^2_{Xh}=\frac{1}{N_h-1}\sum^{N_h}_{i=1}{{\left(x_{hi}-{\overline{X}}_h\right)}^2}\) respectively. Let \(S_{XZ}\) denote the covariance between the auxiliary variable and scrambled response. Also, let \({\rho }_{ZXh}\) denote the correlation coefficient between the scrambled response and the auxiliary variable.
Auxiliary information may be available in a survey as an attribute. Let\({\ \tau }_{hij}\) denote the value of \(j^{th}\) attribute for \(i^{th}\) unit (i=1, 2 … and j=1, 2 …) in the \(h^{th}\) stratum. The auxiliary attribute takes the values 1 and 0 if \(i^{th}\) population unit possesses and does not possess an attribute, respectively. Furthermore, let \(A_{hj}=\sum^{N_h}_h{{\tau }_{hij}}\) and\(\ \ P_h=\frac{A_{hj}}{N_h}\), be the number of units that have an attribute and proportion of units possessing an attribute in the population respectively. Additionally, let \(\ S^2_{Ph}\) denote the population variance of an auxiliary attribute. Also, let \(\left(S_{ZPh},\ and\ S_{XPh}\right)\) \(\ \) and \(\left({\ \rho }_{Zph},\ and\ {\rho }_{XPh}\right)\) \(\ \)denote the covariances and coefficient of correlations between their subscripts respectively.
In the presence of non-response, let \({\ N}_{1h}\) and \({\ N}_{2h}\)be the population sizes of the responding and non-responding units, respectively. Let \(S^2_{Zh\eqref{GrindEQ__2_}}\)denote the population variance of the sensitive variable for the non-responding units in the \(h^{th}\) stratum.
A relatively large sample of \(n'_h\) is drawn from the \(h^{th}\) stratum population using simple random sampling without replacement (SRSWOR). The sample mean of the auxiliary variable in the first phase sample is given as\({\ \ \overline{x}}'_h=\frac{1}{n'_h}\sum^{n'_h}_{i=1}{x_{hi}}\) and the proportion of units in the first phase sample possessing an auxiliary attribute as\(\ p'_h=\frac{a_{hj}}{n'_h}\). A second phase random sample of size \({\ n}_h\) is drawn from the first phase, \({\ n}_{1h}\) units are observed to respond, and non-response is observed on the remaining \({\ n}_{2h}\) units. Let \({\overline{x}}_h=\frac{1}{{\ n}_h}\sum^{{\ n}_h}_{i=1}{x_{hi}}\) be the sample mean and \({\ p}_h=\frac{a_{hj}}{{\ n}_h}\) be the proportion of units in the second phase sample that possess an auxiliary attribute. Also, let \({\overline{z}}_{1h}=\frac{1}{{\ n}_{1h}}\sum^{{\ n}_h}_{i=1}{z_{hi}}\ \ \) be the sample mean for the responding group in the second phase sample. A sub-sample of size\(\ r_{2h}=\frac{{\ n}_{2h}}{{\ k}_{2h}}\), where \({\ k}_{2h}\) is the inverse sampling rate is drawn from the non-responding sample. Let \({\overline{z}}_{2h}=\frac{1}{{\ r}_{2h}}\sum^{{\ r}_{2h}}_{i=1}{z_{hi}}\ \ \)be the sub-sample mean for the non-responding units. The estimate of the population mean for the scrambled response is given as
\(\ {\overline{z}}^*_h=w_{1h}{\overline{z}}_{1h}+\ w_{2h}{\overline{z}}_{2h}\), where\(\ w_{1h}=\frac{n_{1h}}{{\ n}_h}\), and\(\ \ w_{2h}=\frac{n_{2h}}{{\ n}_h}\).
In the presence of measurement errors, let \(\ Z^*_{hi}\) and \(z^*_{hi}\ \)be the true and observed values, respectively, for the scrambled response. Furthermore, let\(\ T^*_{hi}=z^*_{hi}-Z^*_{hi}\) denote the measurement errors associated with the scrambled response. These measurement errors are assumed to be normally distributed with mean 0 and variance\({\ S}^2_{Th}\). Furthermore, let \(S^2_{Th\eqref{GrindEQ__2_}}\) be the variance associated with non-responding units.
In this study, the following conventional and non-conventional measures of auxiliary variables are used in developing the special cases of the proposed generalized class of estimators;
\(C_{Xh}=\frac{S_{Xh}}{{\overline{X}}_h}\), coefficient of variation,
Coefficient of correlation defined as \({\rho }_{XYh}\),
Coefficient of skewness defined as \({\beta }_{1h}(x)=\frac{N_h\sum^{N_h}_{i=1}{{\left(X_{hi}-{\overline{X}}_h\right)}^3}}{\left(N_h-1\right)\left(N_h-2\right)S^3_{Xh}}\)
Coefficient of kurtosis defined as \({\beta }_{2h}(x)=\frac{N_h\left(N_h+1\right)\sum^N_{i=1}{{\left(X_{hi}-{\overline{X}}_h\right)}^4}}{\left(N_h-1\right)\left(N_h-2\right)\left(N\_h-3\right)S^4_{Xh}}-\frac{3{\left(N_h-1\right)}^2}{\left(N_h-2\right)\left(N_h-3\right)}\)
Mid-range is defined as\(,\ {MR}_h(x)=\frac{x_{h\left(1\right)}+x_{h\left(Nh\right)}}{2}\).
, where \(\ x_{h\left(1\right)}\) and \(x_{\left(Nh\right)}\) are the minimum and maximum values in a data set.
Quartile deviation is defined as\({\ QD}_h(x)=\frac{Q_{3h}(x)-Q_{1h}(x)}{2}\).
Tri-mean was proposed by Turkey [23] and is defined as
\({TM}_h(x)=\frac{Q_{1h}(x)+2Q_{2h}(x)+Q_{3h}(x)}{4}\), where \(Q_{1h}(x),\ Q_{2h}(x)\ \mathrm{and}{\ Q}_{3h}(x)\) are the first, second and third quartiles respectively,
Hodges-Lehmann [24] estimator is defined as \({HL}_h(x)=\mathrm{Median}\left(\frac{x_{jh}+x_{kh}}{2}\right),\) \[\ 1\le jh\le kh\le N\]
Some Existing Estimators
The adapted ordinary estimator of population mean is defined as \[\label{GrindEQ__2_} {\overline{Y}}_0\mathrm{=}\sum^L_{h=1}{w_h{\overline{z}}^*_h} \tag{2}\]
The variance of the estimator is given as \[\label{GrindEQ__3_} Var({\overline{Y}}_0)\cong \sum^L_{h=1}{W^2_h}B_h \tag{3}\]
The adapted Cochran [25] ratio estimators are defined as \[\label{GrindEQ__4_} {\overline{Y}}_R\mathrm{=}\sum^L_{h=1}{w_h{\overline{z}}^*_h\ \frac{{\overline{x}}'_h}{{\overline{x}}^*_h}} \tag{4}\]
The expression for the bias is given as \[\label{GrindEQ__5_} Bias({\overline{Y}}_R)\cong \sum^L_{h=1}{\frac{W_h}{{\overline{X}}_h}\left[\frac{9}{8}R_h\left(A_h-C_h\right)-\left(E_h-D_h\right)\right]} \tag{5}\]
The expression of the mean square error is given as
\[\label{GrindEQ__6_} MSE({\overline{Y}}_R)\cong \sum^L_{h=1}{W^2_h\left[B_h+R^2_h\left(A_h-C_h\right)-2R_h\left(E_h-D_h\right)\right]} \tag{6}\] , where \(A_h={\theta }_hS^2_{Xh},\ \) \(C_h={\theta }^{\ '}_hS^2_{Xh},\ \) \(D_h={\theta }^{\ '}_hS^2_{ZXh},\) \(E_h={\theta }_hS_{ZXh},\) \({\theta }'_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right)\), and \({\theta }^{\ '}_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right).\ \)
The adapted Bahl and Tuteja [26] exponential ratio-type estimator is defined as \[\label{GrindEQ__7_} t_{ER}=\sum^L_{h=1}{W_h{\overline{z}}_hexp\left(\frac{{\overline{x}}'_h-{\overline{x}}_h}{{\overline{x}}'_h+{\overline{x}}_h}\right)} \tag{7}\]
The expression for the bias is given as \[\label{GrindEQ__8_} Bias(t_{ER})\cong \sum^L_{h=1}{\frac{W_h}{2{\overline{X}}_h}\left[\frac{3}{4}R_h\left(A_h-C_h\right)-\left(E_h-D_h\right)\right]} \tag{8}\]
The expression for the mean square error is given as
\[\label{GrindEQ__9_} MSE(t_{ER})\cong \sum^L_{h=1}{W^2_h\left[B_h+\frac{1}{4}R^2_h\left(A_h-C_h\right)-R_h\left(E_h-D_h\right)\right]} \tag{9}\]
From equation (1) the expected value of the scrambled response under randomization mechanisms (Onyango et al. [27]) is defined as \[\label{GrindEQ__10_} E_R\left(Z_{hi}\right)=E_R\left[Y_{hi}\left(1-{\varphi }_h\right)+\left(Y_{hi}+S_{hi}\right){\varphi }_h\right] \tag{10}\] \[\label{GrindEQ__11_} E_R\left(Z_{hi}\right)=Y_{hi}+{\varphi }_h\ {\overline{S}}_h\ \tag{11}\] , where\(\ {\varphi }_h=f_h+{\psi }_h\left(1-t_h-f_h\right).\ \ \)
The variance of the response variable under randomization mechanisms is given as \[V_R\left(Z_{hi}\right)=V_R\left(Y_{hi}+{\varphi }_hS_{hi}\right)\] \[\label{GrindEQ__12_} V_R\left(Z_{hi}\right)=\varphi \left(S^2_{Sh}+{\overline{S}}^2_h\ \right)-{\varphi }^2_h{\overline{S}}^2_h \tag{12}\] \[\label{GrindEQ__13_} V_R\left(Z_{hi}\right)=\varphi S^2_{Sh} \tag{13}\] The transformed value of the randomized response is given as \[\label{GrindEQ__14_} {\hat{y}}_{hi}=z_{hi}-{\varphi }_h{\overline{S}}_h \tag{14}\] , with \(E_R\left({\hat{y}}_{hi}\right)=y_{hi}\) and,\(V_R\left({\hat{y}}_{hi}\right)=\varphi S^2_{Sh}\) where\(\ y_{hi}\) is the true response. Therefore, the modified Hansen and Hurwitz [15] technique with an additive three-stage ORRT added is defined as \[\label{GrindEQ__15_} {\widehat{\overline{y}}}_h=w_{1h}{\widehat{\overline{y}}}_{1h}+\ w_{2h}{\widehat{\overline{y}}}_{2h} \tag{15}\] \[\label{GrindEQ__16_} E\left({\hat{y}}_h\right)={\overline{Y}}_h \tag{16}\] \[\label{GrindEQ__17_} var\left({\widehat{\overline{y}}}_h\right)=E_1\left[V_2\left({\widehat{\overline{y}}}_h\right)\right]+V_1\left[E_2\left({\widehat{\overline{y}}}_h\right)\right] \tag{17}\] \[\label{GrindEQ__18_} var\left({\widehat{\overline{y}}}_h\right)=var\left({\overline{y}}_h\right)+E_1\left[\frac{n_{1h}}{n^2_h}\frac{;{o}_hS^2_{Sh}}{n_{1h}}\right]+E_1\left[\frac{n_{2h}}{n^2_h}k_{2h}{o}_h\ S^2_{Sh}\right] \tag{18}\] \[\label{GrindEQ__19_} var\left({\widehat{\overline{y}}}_h\right)=var\left({\overline{y}}_h\right)+{\mathrm{\Omega }}_h \tag{19}\] , where \({\mathrm{\Omega }}_h=\frac{\"{o}_h\ S^2_{Sh}}{n_h}\left(W_{1h}+k_{2h}W_{2h}\right)\) is the contribution of the three-stage ORRT to the variance of Hansen and Hurwitz [14] estimator.
The suggested randomized response estimator of the finite population mean for a sensitive variable in the presence of non-response and measurement errors simultaneously is defined as \[\label{GrindEQ__20_} {\overline{Y}}_g\mathrm{=}\sum^L_{h=1}{w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}'_h-{\overline{x}}_h\right)+{\beta }_h\mathrm{\ }\left(p'_h-p_h\right)\right]exp\left(\frac{~a_h({\overline{x}}'_h-{\overline{x}}_h)}{a_h({\overline{x}}'_h+{\overline{x}}_h)+2b_h}\right)}, \tag{20}\] , where\(\ {\alpha }_h\), and \({\beta }_h\) are appropriately chosen constants,\({~a}_h\ and\ b_h\), are either real numbers or some known conventional and non-conventional measures of auxiliary variable. Let \[\label{GrindEQ__21_} {\sigma }_{Zh}={\overline{z}}^*_h-{\overline{Z}}_h \tag{21}\] \[\label{GrindEQ__22_} {\sigma }_{X1h}={\overline{x}}'_h-{\overline{X}}_h\ \tag{22}\] \[\label{GrindEQ__23_} {\sigma }_{P1h}=p'_h-P_h\ \tag{23}\] \[\label{GrindEQ__24_} {\sigma }_{Xh}={\overline{x}}_h-{\overline{X}}_h \tag{24}\] \[\label{GrindEQ__25_} {\sigma }_{Ph}=p_h-P_h\ \tag{25}\]
\[\label{GrindEQ__26_} E\left({\sigma }_{Zh}\right)=E\left({\sigma }_{Xh}\right)=E\left({\sigma }_{X1h}\right)=E\left({\sigma }_{P1h}\right)=E\left({\sigma }_{Ph}\right)=0\ \tag{26}\] Furthermore, let \[\label{GrindEQ__27_} E({\sigma }^2_{Xh}\mathrm{)=}~{\theta }_hS^{\mathrm{2}}_{Xh}\mathrm{=}A_h~~ \tag{27}\] \[\label{GrindEQ__28_} E({\sigma }^2_{Zh}\mathrm{)=}~~{\theta }_h~\left(S^{\mathrm{2}}_{Yh}\mathrm{+}S^{\mathrm{2}}_{Th}\right)\mathrm{+}{\theta }^{\mathrm{*}}_h\left(S^{\mathrm{2}}_{Yh\left(\mathrm{2}\right)}\mathrm{+}S^{\mathrm{2}}_{Th\left(\mathrm{2}\right)}\right)+{\mathrm{\Omega }}_h=B_h \tag{28}\] \[\label{GrindEQ__29_} E\left({\sigma }^2_{X1h}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS^{\mathrm{2}}_{Xh}\mathrm{=}C_h~ \tag{29}\] \[\label{GrindEQ__30_} \mathrm{E}\left({\sigma }_{X1h}{\sigma }_{Zh}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{ZXh}=D_h \tag{30}\] \[\label{GrindEQ__31_} E\left({\sigma }_{Xh}{\sigma }_{Zh}\right)\mathrm{=}{\theta }_hS_{ZXh}=E_h \tag{31}\] \[\label{GrindEQ__32_} E({\sigma }^2_{Ph}\mathrm{)=}{\theta }_hS^{\mathrm{2}}_{ph}=F_h \tag{32}\] \[\label{GrindEQ__33_} E({\sigma }^2_{P1h})\mathrm{=}{\theta }^{\mathrm{'}}_hS^{\mathrm{2}}_{Ph}=G_h \tag{33}\] \[\label{GrindEQ__34_} E\left({\sigma }_{Ph}{\sigma }_{Zh}\right)\mathrm{=}{\theta }_hS_{Zph}=H_h \tag{34}\] \[\label{GrindEQ__35_} \mathrm{E}\left({\sigma }_{P1h}{\sigma }_{Zh}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{Zph}=I_h \tag{35}\] \[\label{GrindEQ__36_} \mathrm{E}\left({\sigma }_{Xh}{\sigma }_{Ph}\right)\mathrm{=}{\theta }_hS_{Xph}=J_h \tag{36}\] \[\label{GrindEQ__37_} E\left({\sigma }_{P1h}{\sigma }_{Xh}\right)\mathrm{=E}\left({\sigma }_{X1h}{\sigma }_{Ph}\right)\mathrm{=E}\left({\sigma }_{X1h}{\sigma }_{P1h}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{XPh}=L_h \tag{37}\] , where \({\theta }'_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right),\ {\theta }_h=\left(\frac{1}{n_h}-\frac{1}{N_h}\right)\mathrm{,\ }{\theta }^*_h=\frac{W_h\left(k_{2h}-1\right)}{n_h}\ \mathrm{and\ }W_h=\frac{N_h}{N}\ \ \)
Substituting equations (21)-(25) in (20) and simplifying while ignoring terms of order greater than two to obtain the approximation for the bias as \[\label{GrindEQ__38_} Bias({\overline{Y}}_g)\cong \sum^L_{h=1}{\frac{W_h{\lambda }_h}{2}}\left[{\frac{3}{4}\lambda }_h{\overline{Z}}_h~\left(A_h-C_h\right)+{\alpha }_h\left(A_h-C_h\right)-~\left(E_h-D_h\right)+{\beta }_h\left(J_h-L_h\right)\right] \tag{38}\] \[where\ {\lambda }_h=\frac{a_h}{a_h{\overline{X}}_h+b_h}\] The approximation for the MSE is given as \[\label{GrindEQ__39_} MSE({\overline{Y}}_g)\cong \sum^L_{h=1}{W^2_h}\left[B_h+{\vartheta }_{1h}+{\alpha }^2_h{\vartheta }_{2h}+{\beta }^2_h{\vartheta }_{3h}+{\beta }_h{\vartheta }_{4h}+{\alpha }_h{\vartheta }_{5h}+2{\alpha }_h{\beta }_h{\vartheta }_{6h}\right] \tag{39}\] , where \({\vartheta }_{1h}=\frac{1}{4}{\lambda }^2_h{\overline{Y}}^2_h\left(A_h-C_h\right)-{\lambda }_h{\overline{Y}}_h\left(E_h-D_h\right)\) \[{\vartheta }_{2h}=\left(A_h-C_h\right),\] \[{\vartheta }_{3h}=\left(F_h-G_h\right),\] \[{\vartheta }_{4h}={\overline{Y}}_h{\lambda }_h\left(J_h-L_h\right)-2\left(H_h-I_h\right)\] \[{\vartheta }_{5h}={\overline{Y}}_h{\lambda }_h\left(A_h-C_h\right)-2\left(E_h-D_h\right)\] \[{\vartheta }_{6h}=\left(J_h-L_h\right)\] The optimum values for \({\alpha }_h\) and\({\ \beta }_h\) are given as \[\label{GrindEQ__40_} {\alpha }^{(opt)}_h=\frac{{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{6h}-{\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{3h}}{2\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)} \tag{40}\] , and \[\label{GrindEQ__41_} {\beta }^{(opt)}_h=\frac{{\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{6h}-{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{2h}}{2\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)} \tag{41}\] Substitute equations (40) and (41) in (39) to obtain the minimum MSE as \[\label{GrindEQ__42_} {MSE({\overline{Y}}_g)}_{min}\cong \sum^L_{h=1}{W^2_h}\left[{B_h+\mathrm{\vartheta }}_{1h}-\frac{{\mathrm{\vartheta }}^2_{4h}}{4{\mathrm{\vartheta }}_{3h}}-\frac{{\left({\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{6h}\right)}^2}{4{\mathrm{\vartheta }}_{3h}\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)}\right] \tag{42}\] Table 1 shows some special cases of the proposed generalized class of estimators.
Proposed generalized class of estimators | \(a_h\) | \(b_h\) |
\(\overline{Y}_1\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{({\overline{x}}’_h-{\overline{x}}_h)}{({\overline{x}}’_h+{\overline{x}}_h)}\right)\)
|
1 | 0 |
\(\overline{Y}_2\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{({\overline{x}}’_h+{\overline{x}}_h)+2C_{Xh}}\right)\)
|
1 | \(C_Xh\) |
\(\overline{Y}_3\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~C_{Xh}({\overline{x}}’_h-{\overline{x}}_h)}{C_{Xh}({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh}}\right)\)
|
\(C_Xh\) | \(\rho _XYh\) |
\(\overline{Y}_4\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{\beta }_{1h}(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{\beta }_{1h}(x)({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh}}\right)\)
|
\(\beta _1h(x)\) | \(\rho _XYh\) |
\(\overline{Y}_5\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~{\beta }_{2h}(x)({\overline{x}}’_h-{\overline{x}}_h)}{{\beta }_{2h}(x)({\overline{x}}’_h+{\overline{x}}_h)+2{\beta }_{1h}(x)}\right)\)
|
\(\beta _2h(x)\) | \(\beta _1h(x)\) |
\(\overline{Y}_6\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{QD}_h(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{QD}_h(x)({\overline{x}}’_h+{\overline{x}}_h)+2{TM}_h(x)}\right)\)
|
\(QD_h(x)\) | \(TM_h(x)\) |
\(\overline{Y}_7\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{QD}_h(x)({\overline{x}}’_h-{\overline{x}}_h)}{{QD}_h(x)({\overline{x}}’_h+{\overline{x}}_h)+2{MR}_h(x)}\right)\)
|
\(QD_h(x)\) | \(MR_h(x)\) |
\(\overline{Y}_8\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{HL}_h(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{HL}_h(x)\ ({\overline{x}}’_h+{\overline{x}}_h)+2{TM}_h(x)}\right)\)
|
\(HL_h(x)\) | \(TM_h(x)\) |
\(\overline{Y}_9\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~{\rho }_{XYh}({\overline{x}}’_h-{\overline{x}}_h)}{\ {\rho }_{XYh}\ ({\overline{x}}’_h+{\overline{x}}_h)+2{\beta }_{2h}(x)}\right)\)
|
\(\rho _XYh\) | \(\beta _2h(x)\) |
\(\overline{Y}_10\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{\ ({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh})}\right)\)
|
1 | \(\rho _XYh\) |
\(\overline{Y}_11\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{\ ({\overline{x}}’_h+{\overline{x}}_h)+2{QD}_h(x))}\right)\)
|
1 | \(QD_h(x)\) |
The expression for the biases and mean square errors for the members of the proposed generalized class are obtained by substituting the values of \({\alpha }_h\ and\ {\beta }_h\) in equations (38) and (42) respectively.
The proposed estimators performs better than other estimators when the following conditions are satisfied
From equations (3) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}<Var\left({\overline{Y}}_0\right)\) if \[\label{GrindEQ__43_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h\right]<0 \tag{43}\]
From equations (6) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}<MSE\left({\overline{Y}}_R\right)\) if \[\label{GrindEQ__44_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h-R^2_h\left(A_h-C_h\right)+2R_h\left(E_h-D_h\right)\ \right]<0 \tag{44}\]
From equations (9) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}-MSE\left({\overline{Y}}_{ER}\right)<0\) if \[\label{GrindEQ__45_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h-{\frac{1}{4}R}^2_h\left(A_h-C_h\right)-R_h\left(E_h-D_h\right)\ \right]<0 \tag{45}\]
The efficiency of the proposed estimator is compared to adapted estimators in a numerical study. The real data for numerical analysis is COVID-19 obtained from www.worldometer.com and Rosner [28]. For data simulation and coding, the R programming language is used. Each population unit is subjected to measurement errors, which are normally distributed with mean 2 and variance 5. Using the least variance and percent relative efficiency (PRE) methods, the efficiency of the proposed estimator is compared to adapted estimators. The percent relative efficiency (PRE) of estimators of population mean is calculated using the formula; \[\label{GrindEQ__46_} PRE({\overline{Y}}_g)=\frac{var({\overline{Y}}_0)}{MSE({\overline{Y}}_g)}\times 100 \tag{46}\] , where\(\ g={\overline{Y}}_R,\ {\overline{Y}}_{ER},\ 1,\ 2,\ \dots ,\ 11\). An estimator with the highest value of PRE is considered the most efficient than others. The values of PREs are obtained at 20% and 80% sensitivity levels of the survey question. Also, the PREs are obtained at 20% and 30% non-response rates.
The data consist of six strata: the African Region (\(N_1\)=31200), the American region (\(N_2\)=34944), the Eastern Mediterranean Region (\(N_3\)=13728), the European Region (\(N_4\)=38688), the South-East Asia Region (\(N_5\)=6864), and the Western Pacific Region (\(N_6\)=21840). X is the number of new cases, Y is the number of deaths recorded in a given day, and P is the number of deaths less than one in a given day. Scrambled responses are generally distributed with mean 0 and variance 2 generated for each value of Y. Table 2 shows a summary of statistics for the responding units and Table 3 for the non-responding units.
Parameter | Stratum 1 | Stratum 2 | Stratum 3 | Stratum 4 | Stratum 5 | Stratum 6 |
\(\overline{X}_h\) | \(\mathrm{188.9035}\) | \(\mathrm{2502.012}\) | \(\mathrm{1120.151}\) | \(\mathrm{1757.061}\) | \(\mathrm{6175.008}\) | 356.2095 |
\(\overline{Y}_h\) | 4.543181 | \(\mathrm{61.90972}\) | \(\mathrm{20.51225}\) | \(\mathrm{33.79095}\) | \(\mathrm{97.12205}\) | \(\mathrm{4.833472\ }\) |
\(S^2_Xh\) | \(\mathrm{1094471}\) | \(\mathrm{187408859}\) | \(\mathrm{8526375}\) | \(\mathrm{24712119}\) | \(\mathrm{817189958}\) | \(\mathrm{318940}\) |
\(S^2_Zh\) | \(\mathrm{926.4621}\) | \(\mathrm{76639.99}\) | \(\mathrm{2937.237}\) | \(\mathrm{11588.58}\) | \(\mathrm{145353}\) | \(\mathrm{849.8079}\) |
\(S^2_Ph\) | \(\mathrm{0.2017896}\) | \(\mathrm{0.2328431}\) | \(\mathrm{0.2253055}\) | \(\mathrm{0.2467874}\) | \(\mathrm{0.247146}\) | \(\mathrm{0.1323922}\) |
\(\rho _XZh\) | \(\mathrm{0.8171398}\) | \(\mathrm{0.7944946}\) | \(\mathrm{0.834325}\) | \(\mathrm{0.6559524}\) | \(\mathrm{0.8679977}\) | \(\mathrm{0.7237861}\) |
\(\rho _XPh\) | \(\mathrm{-}\mathrm{0.2608673}\) | \(\mathrm{-}\mathrm{0.2379639}\) | \(\mathrm{-}\mathrm{0.265470}\) | \(\mathrm{-}\mathrm{0.2982271}\) | \(\mathrm{-}\mathrm{0.239344}\) | \(\mathrm{-}\mathrm{0.4403104}\) |
\(\rho _ZPh\) | -0.2386865 | -0.2924192 | -0.2729946 | -0.2802833 | -0.2839612 | -0.3832064 |
\(S^2_Th\) | 24.72743 | 24.91892 | 25.03186 | 25.29474 | 24.74669 | 18.97865 |
non-response rate | stratum | \(S^2_Xh\eqref{GrindEQ__2_}\) | \(S^2_Ph\eqref{GrindEQ__2_}\) |
20% | 1 | 989315.2 | 0.2050127 |
2 | 199477087 | 0.2343414 | |
3 | 8176575 | 0.2249875 | |
4 | 25298233 | 0.2462478 | |
5 | 708141536 | 0.2460244 | |
6 | 708141536 | 0.1284571 | |
30% | 1 | 1071816 | 0.2047748 |
2 | 206269098 | 0.2334903 | |
3 | 8525833 | 0.2250371 | |
4 | 24546992 | 0.2462213 | |
5 | 681811147 | 0.2462587 | |
6 | 681811147 | 0.1297743 |
Tables 4 and 5 represent the values of PREs of estimators of population mean in the cases without and with measurement errors, respectively. From the tables, the values of PREs decrease with an increase in inverse sampling rates and non-response rates. Additionally, the values of PREs decrease in the presence of non-response and measurement errors simultaneously. Also, the values of PREs decrease with an increase in the sensitivity levels of the survey question. The proposed estimator \({\overline{Y}}_6\) has the highest PRE compared to all other estimators in this study. Generally, the proposed estimators perform better than the adapted estimator.
Estimator | 20% non-response | 30% non-response | |||||
\(k_{2h}\) | \(k_{2h}\) | ||||||
2 | 4 | 8 | 2 | 4 | 8 | ||
\(\overline{Y}_0\) | \(\psi _h\) | 100 | 100 | 100 | 100 | 100 | 100 |
\(\overline{Y}_R\) |
0.2
0.8 |
121.8078
121.8064 |
111.8264
111.8257 |
106.1745
106.1743 |
117.4540
117.4531 |
108.4124
108.4121 |
104.1318
104.1317 |
\(\overline{Y}_ER\) |
0.2
0.8 |
126.6716
126.6685 |
113.9508
113.9495 |
107.1407
107.1397 |
121.7205
121.7184 |
110.2754
110.2746 |
105.0030
105.0028 |
\(\overline{Y}_1\) |
0.2
0.8 |
129.4792
129.4760 |
115.5386
115.5372 |
107.9859
107.9855 |
123.3008
123.2988 |
110.9481
110.9473 |
105.3139
105.3137 |
\(\overline{Y}_2\) |
0.2
0.8 |
129.4734
129.4702 |
115.5359
115.5345 |
107.9846
107.9842 |
123.2965
123.2945 |
110.9462
110.9455 |
105.3131
105.3129 |
\(\overline{Y}_3\) |
0.2
0.8 |
129.4790
129.4758 |
115.5385
115.5371 |
107.9859
107.9854 |
123.3007
123.2987 |
110.9480
110.9473 |
105.3139
105.3137 |
\(\overline{Y}_4\) |
0.2
0.8 |
125.7381
125.7354 |
113.7546
113.7534 |
107.1225
107.1221 |
120.4678
120.4660 |
109.7355
109.7349 |
104.7521
104.7519 |
\(\overline{Y}_5\) |
0.2
0.8 |
129.4792
129.4760 |
115.5386
115.5372 |
107.9859
107.9855 |
123.3008
123.2988 |
110.9481
110.9473 |
105.3139
105.3137 |
\(\overline{Y}_6\) |
0.2
0.8 |
129.5812
129.5780 |
115.5866
115.5852 |
108.009
108.0085 |
123.3776
123.3756 |
110.9805
110.9798 |
105.3289
105.3287 |
\(\overline{Y}_7\) |
0.2
0.8 |
128.8921
128.8889 |
115.2619
115.2605 |
107.8528
107.8524 |
122.8585
122.8565 |
110.7607
110.7600 |
105.2276
105.2273 |
\(\overline{Y}_8\) |
0.2
0.8 |
129.4789
129.4757 |
115.5385
115.5371 |
107.9859
107.9854 |
123.3007
123.2986 |
110.9480
110.9473 |
105.3139
105.3137 |
\(\overline{Y}_9\) |
0.2
0.8 |
129.1671
129.1640 |
115.3917
115.3903 |
123.0658
107.9148 |
123.0658
123.0638 |
110.8486
110.8479 |
105.2681
105.2679 |
\(\overline{Y}_10\) |
0.2
0.8 |
129.4783
129.4751 |
115.5382
115.5368 |
107.9857
107.9853 |
123.3001
123.2981 |
110.9478
110.9470 |
105.3138
105.3136 |
\(\overline{Y}_11\) |
0.2
0.8 |
129.2297
129.2265 |
115.4211
115.4197 |
107.9295
107.9290 |
123.1130
123.1110 |
110.8686
110.8678 |
105.2773
105.2771 |
Estimator | 20% non-response | 30% non-response | |||||
\(k_{2h}\) | \(k_{2h}\) | ||||||
2 | 4 | 8 | 2 | 4 | 8 | ||
\(\overline{Y}_0\) | \(\psi _h\) | 100 | 100 | 100 | 100 | 100 | 100 |
\(\overline{Y}_R\) |
0.2
0.8 |
121.7851
121.7837 |
111.8152
111.8149 |
106.1689
106.1689 |
117.4368
117.4360 |
108.4050
108.4048 |
104.1283
104.1283 |
\(\overline{Y}_ER\) |
0.2
0.8 |
126.6432
126.6402 |
113.9378
113.9369 |
107.1339
107.1336 |
119.4025
119.3995 |
109.2678
109.2669 |
104.5540
104.5535 |
\(\overline{Y}_1\) |
0.2
0.8 |
127.1178
127.1123 |
114.4185
114.4164 |
107.4453
107.4444 |
121.5170
121.5132 |
110.1884
110.1871 |
104.9627
104.9622 |
\(\overline{Y}_2\) |
0.2
0.8 |
127.1235
127.1181 |
114.4213
114.4191 |
107.4466
107.4457 |
121.5214
121.5176 |
110.1902
110.1889 |
104.9635
104.9630 |
\(\overline{Y}_3\) |
0.2
0.8 |
127.1179
127.1125 |
114.4186
114.4165 |
107.4453
107.4444 |
121.5171
121.5133 |
110.1884
110.1871 |
104.9627
104.9622 |
\(\overline{Y}_4\) |
0.2
0.8 |
123.8424
123.8376 |
112.8318
112.8299 |
106.6707
106.6699 |
119.0193
119.0159 |
109.1042
109.1030 |
104.4571
104.4566 |
\(\overline{Y}_5\) |
0.2
0.8 |
127.1178
127.1123 |
114.4185
114.4164 |
107.4453
107.4444 |
121.5170
121.5132 |
110.1884
110.1871 |
104.9627
104.9622 |
\(\overline{Y}_6\) |
0.2
0.8 |
127.2173
127.2118 |
114.4661
114.4640 |
107.4683
107.4675 |
121.5925
121.5887 |
110.2207
110.2194 |
104.9777
104.9772 |
\(\overline{Y}_7\) |
0.2
0.8 |
126.8219
126.8165 |
114.2767
114.2746 |
107.3765
107.3756 |
121.2925
121.2887 |
110.0918
110.0906 |
104.9179
104.9174 |
\(\overline{Y}_8\) |
0.2
0.8 |
127.1180
127.1125 |
114.4186
114.4165 |
107.4453
107.4444 |
121.5172
121.5134 |
110.1884
110.1871 |
104.9627
104.9622 |
\(\overline{Y}_9\) |
0.2
0.8 |
126.8822
126.8767 |
114.3056
114.3035 |
107.3905
107.3896 |
121.3382
121.3345 |
110.1115
110.1102 |
104.927
104.9265 |
\(\overline{Y}_10\) |
0.2
0.8 |
127.1186
127.1132 |
114.4189
114.4168 |
107.4455
107.4446 |
121.5177
121.5139 |
110.1886
110.1873 |
104.9628
104.9623 |
\(\overline{Y}_11\) |
0.2
0.8 |
126.9326
126.9272 |
114.3298
114.3277 |
107.4022
107.4014 |
121.3765
121.3728 |
110.1280
110.1267 |
104.9347
104.9341 |
The data consist of two strata of sizes\(\ N_1=480\ and\ N_2=174\), with Y as forced expiratory volume, X as age (in years), and gender as an auxiliary attribute. The scrambling variable is taken to be smoking (Yes=1, No=0) and is used in generation of the response variable. The study variable, auxiliary attribute, and variable all have a positive bi-serial correlation. Tables 6 and 7 represents the population statistics for different data sets used in this study
Parameter | \(\overline{X}_h\) | \(\overline{Y}_h\) | \(S^2_Xh\) | \(S^2_Yh\) | \(S^2_Ph\) | \(\rho _XZh\) | \(\rho _XPh\) | \(\rho _ZPh\) | \(S^2_Th\) |
Stratum 1 | 8.558333 | 2.363715 | 3.604106 | 0.5254207 | 0.2503653 | 0.7239923 | 0.2999931 | 0.8365375 | 26.04856 |
Stratum 2 | 13.71839 | 3.763615 | 3.301741 | 0.7556429 | 0.2511461 | 0.3619965 | 0.7201403 | 0.4809902 | 20.19661 |
non-response rate | stratum | \(S^2_Yh\eqref{GrindEQ__2_}\) | \(S^2_Th\eqref{GrindEQ__2_}\) |
20% | 1 | 0.5833481 | 25.62859 |
2 | 0.4723521 | 17.39198 | |
30% | 1 | 0.5804701 | 27.00072 |
2 | 0.6125016 | 20.19661 |
Tables 8 and 9 shows summary results for the PREs in the cases for without and with measurement errors respectively at different sensitivity levels. From the tables, the values of PREs decrease with an increase in inverse sampling rates and non-response rates. Additionally, the values of PREs decline in the presence of non-response and measurement errors simultaneously. For example, at 20% non-response,\(\ {\mathrm{k}}_{\mathrm{2h}}\mathrm{=2}\), and \({\mathrm{\psi up }}_{\mathrm{h}}\mathrm{=0.2\ }\) the value of PRE for \({\overline{\mathrm{Y}}}_{\mathrm{10}}\) is 114.2515 in the case for without measurement errors and decreases to 100.2539 in the presence of non-response and measurement errors simultaneously.
Furthermore, the values of PREs decreased with an increase in sensitivity level in the case for without measurement errors. The proposed estimators perform better than other adapted estimators in both cases for without and with measurement errors.
Estimator |
20% non-response | 30% non-response | |||||
\(k_{2h}\) | \(k_{2h}\) | ||||||
2 | 4 | 8 | 2 | 4 | 8 | ||
\(\overline{Y}_0\) | \(\psi _h\) | 100 | 100 | 100 | 100 | 100 | 100 |
\(\overline{Y}_R\) |
0.2
0.8 |
113.6230
112.4303 |
107.9472
107.3953 |
104.3350
104.0855 |
111.4729
110.5494 |
105.9844
105.6100 |
103.0583
102.8971 |
\(\overline{Y}_ER\) |
0.2
0.8 |
110.8210
109.8950 |
106.3781
105.9412 |
103.5022
103.3022 |
109.1488
108.4265 |
104.8202
104.5218 |
102.4767
102.3468 |
\(\overline{Y}_1\) |
0.2
0.8 |
114.2630
113.0079 |
108.3011
107.7228 |
104.5213
104.2606 |
112.0013
111.0310 |
106.2459
105.8542 |
103.1881
103.0198 |
\(\overline{Y}_2\) |
0.2
0.8 |
114.2589
113.0042 |
108.2988
107.7207 |
104.5201
104.2595 |
111.9979
111.0279 |
106.2442
105.8527 |
103.1873
103.0190 |
\(\overline{Y}_3\) |
0.2
0.8 |
114.1979
112.9492 |
108.2652
107.6896 |
104.5024
104.2429 |
111.9476
110.9821 |
106.2194
105.8295 |
103.1749
103.0074 |
\(\overline{Y}_4\) |
0.2
0.8 |
113.8930
112.6741 |
108.0967
107.5337 |
104.4138
104.1596 |
111.6959
110.7528 |
106.0949
105.7133 |
103.1132
102.9490 |
\(\overline{Y}_5\) |
0.2
0.8 |
114.2630
113.0079 |
108.3011
107.7228 |
104.5213
104.2606 |
112.0013
111.0310 |
106.2459
105.8542 |
103.1881
103.0198 |
\(\overline{Y}_6\) |
0.2
0.8 |
114.1794
112.9324 |
108.2549
107.6801 |
104.4970
104.2378 |
111.9323
110.9682 |
106.2118
105.8224 |
103.1712
103.0039 |
\(\overline{Y}_7\) |
0.2
0.8 |
114.1764
112.9298 |
108.2533
107.6786 |
104.4962
104.2370 |
111.9298
110.9660 |
106.2106
105.8213 |
103.1706
103.0033 |
\(\overline{Y}_8\) |
0.2
0.8 |
114.2354
112.9830 |
108.2858
107.7087 |
104.5133
104.2531 |
111.9785
111.0103 |
106.2346
105.8437 |
103.1825
103.0146 |
\(\overline{Y}_9\) |
0.2
0.8 |
114.1152
112.8745 |
108.2195
107.6473 |
104.4784
104.2203 |
111.8793
110.9199 |
106.1856
105.7980 |
103.1582
102.9916 |
\(\overline{Y}_10\) |
0.2
0.8 |
114.2515
112.9975 |
108.2947
107.7169 |
104.5179
104.2575 |
111.9917
111.0224 |
106.2412
105.8498 |
103.1858
103.0176 |
\(\overline{Y}_11\) |
0.2
0.8 |
114.1859
112.9383 |
108.2585
107.6834 |
104.4989
104.2396 |
111.9377
110.9731 |
106.2145
105.8249 |
103.1725
103.0051 |
Estimator | 20% non-response | 30% non-response | |||||
\(k_{2h}\) | \(k_{2h}\) | ||||||
2 | 4 | 8 | 2 | 4 | 8 | ||
\(\overline{Y}_0\) | \(\psi _h\) | 100 | 100 | 100 | 100 | 100 | 100 |
\(\overline{Y}_R\) |
0.2
0.8 |
100.2356
100.2351 |
100.1438
100.1436 |
100.0808
100.0807 |
100.1971
100.1968 |
100.1059
100.1058 |
100.0550
100.0550 |
\(\overline{Y}_ER\) |
0.2
0.8 |
100.2074
100.2070 |
100.1266
100.1264 |
100.0711
100.0710 |
100.1735
100.1732 |
100.0933
100.0931 |
100.0485
100.0484 |
\(\overline{Y}_1\) |
0.2
0.8 |
100.2542
100.2536 |
100.1551
100.1548 |
100.0872
100.0870 |
100.2127
100.2123 |
100.1143
100.1141 |
100.0594
100.0593 |
\(\overline{Y}_2\) |
0.2
0.8 |
100.2541
100.2536 |
100.1551
100.1548 |
100.0871
100.0870 |
100.2126
100.2122 |
100.1142
100.1141 |
100.0593
100.0593 |
\(\overline{Y}_3\) |
0.2
0.8 |
100.2530
100.2525 |
100.1544
100.1541 |
100.0868
100.0866 |
100.2117
100.2113 |
100.1137
100.1136 |
100.0591
100.0590 |
\(\overline{Y}_4\) |
0.2
0.8 |
100.2474
100.2469 |
100.1510
100.1507 |
100.0848
100.0847 |
100.2070
100.2066 |
100.1112
100.1111 |
100.0578
100.0577 |
\(\overline{Y}_5\) |
0.2
0.8 |
100.2542
100.2536 |
100.1551
100.1548 |
100.0872
100.0870 |
100.2127
100.2123 |
100.1143
100.1141 |
100.0594
100.0593 |
\(\overline{Y}_6\) |
0.2
0.8 |
100.2526
100.2521 |
100.1542
100.1539 |
100.0866
100.0865 |
100.2114
100.2110 |
100.1136
100.1134 |
100.0590
100.0589 |
\(\overline{Y}_7\) |
0.2
0.8 |
100.2526
100.2521 |
100.1541
100.1539 |
100.0866
100.0865 |
100.2113
100.2110 |
100.1136
100.1134 |
100.0590
100.0589 |
\(\overline{Y}_8\) |
0.2
0.8 |
100.2537
100.2531 |
100.1548
100.1545 |
100.0870
100.0869 |
100.2122
100.2119 |
100.1141
100.1139 |
100.0592
100.0592 |
\(\overline{Y}_9\) |
0.2
0.8 |
100.2515
100.2509 |
100.1535
100.1532 |
100.0862
100.0861 |
100.2104
100.2100 |
100.1131
100.1129 |
100.0587
100.0587 |
\(\overline{Y}_10\) |
0.2
0.8 |
100.2539
100.2534 |
100.1550
100.1547 |
100.0871
100.0870 |
100.2125
100.2121 |
100.1142
100.1140 |
100.0593
100.0592 |
\(\overline{Y}_11\) |
0.2
0.8 |
100.2527
100.2522 |
100.1542
100.1540 |
100.0867
100.0866 |
100.2115
100.2111 |
100.1137
100.1135 |
100.0590
100.0590 |
This study addresses the challenge of estimating the finite population mean in the presence of non-response and measurement errors simultaneously on a sensitive study variable. A general class of estimators is proposed using auxiliary attributes and variables. Up to the first degree of approximation, the bias and mean squared error (MSE) for the suggested estimator are appropriately computed. The proposed estimator outperforms the adapted ordinary estimator, ratio estimator, and exponential ratio-type estimator in numerical tests. Furthermore, when the non-response rate and inverse sampling rate grow, so do the mean squared errors of the proposed estimators. Finally, when non-response and measurement errors are present simultaneously, the efficiency of estimators of population mean decreases.
Warner, S. L. (1965). Randomized response: A survey for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63-69.
Chaudhuri, A., & Mukherjee, R. (1988). Randomized response: Theory and Techniques. Marcel Dekker, New York
Gupta, S. & Shabbir, J. (2004). Sensitivity estimation for personal interview survey questions. Statistica, 64(4), 643-653.
Gupta, S. Shabbir, J. & Sehra, S. (2010). Mean and sensitivity estimation in optional randomized response models. Journal of Statistical Planning and Inference, 140(10), 2870-2874.
Mehta. S., Dass, B. K., Shabbir, J., & Gupta, S. (2012). A three- stage optional randomized response model. Journal of Statistical Theory and Practice, 6(3), 412-427
Eichhorn, B. H., & Hayre, L. S. (1983). Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and Inference, 7(4), 307-316.
Gupta, S., & Shabbir, J. (2008). On improvement in estimating the population mean in simple random sampling. Journal of Applied Statistics, 35(5):559\(\mathrm{\{}\)566\(\mathrm{\}}\).
Gupta, S., Shabbir, J., & Sehra, S. (2012) Mean and sensitivity estimation in optional randomized response models. Journal of Statistical Planning and Inference, 140, 2870 – 2874.
Sousa, R., Shabbir, J., Rael, P. C., & Gupta, S. (2010). Ratio estimation of the mean of a sensitive variable in the presence of auxiliary information. Journal of Statistics Theory and Practice, 4(3), 495-507.
Zatezalo, T. (2017). Generalized mixture estimator of the mean of a sensitive variable in the presence of non-sensitive auxiliary information. Statistics and Applications, 15(1&2), 23-36.
Mushtaq, N., Noor-ul-Amin, M., & Hanif, M. (2016). Estimation of population mean of a sensitive variable in stratified two-phase sampling. Pakistan Journal of Statistics, 32(1), 393-404.
Mushtaq, N., Noor-ul-Amin, M., & Hanif, M., (2017). A family of estimators of a sensitive variable using auxiliary information in stratified random sampling. Pakistan Journal of Operation Research, 13(1), 141-155.
Mushtaq, N., Noor-ul-Amin, M. (2020). Joint influence of double sampling and randomized response technique on estimation method of mean. Applied Mathematics, 10(1), 12-19.
Hansen, M., & Hurwitz, W. (1946). The problem of non-response in sample surveys. Journal of American Statistical Association, 41, 517-529.
Khalil, S., Noor-Ul-Amin, M. & Hanif, M. (2018). Estimation of population mean for a sensitive variable in the presence of measurement error. Journal of Statistics and Management Systems, 21(1):81-91
Khalil, S., Gupta, S., & Hanif,. M. (2018). Estimation of finite population mean in stratified sampling using scrambled responses in the presence of measurement errors. Communications in Statistics – Theory and Methods, 48(6):1553-1561.
Khalil, S., Zhang, Q., & Gupta, S. (2019) Mean Estimation of Sensitive Variables under Measurement Errors using Optional RRT Models. Communications in Statistics – Simulation and Computation, DOI: 10.1080/03610918.2019.1584298
Onyango, R., Oduor, B., & Odundo, F. (2021). Joint influence of measurement errors and randomized response technique on mean estimation under stratified double sampling. Open Journal of Mathematical Science, 5(1), 192-199
Naeem, N., & Shabbir, J. (2018). Use of a scrambled response on two occasion’s successive sampling under nonresponse. Hacettepe Journal of Mathematics and Statistics, 47(3), 675-684.
Zahid, E., & Shabbir, J. (2019). Estimation of finite population mean for a sensitive variable using dual auxiliary information in the presence of measurement errors. PloS one, 14(2): e0212111.
Zhang, Q,, & Khalil, S., Gupta, S. (2020). Mean estimation of sensitive variables under non-response and measurement errors using optional RRT models. Journal of statistical theory and practice, 15.
Zhang, Q., & Khalil, S., & Gupta, S. (2021). Mean estimation in the simultaneous presence of measurement errors and non-response using optional RRT models under stratified sampling. Journal of Statistical Computation and Simulation, 91, 3492-3504
Tukey, J. W. (1970). Exploratory Data Analysis, Addison-Welsey Publishing Co., Reading, MA, USA.
Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. E., Salvemini, T., Eds.; Libreria Eredi Virgilio Veschi: Rome, Italy,
Cochran W.G. (1940). The estimation of the yields of the cereal experiments by sampling for the ratio of grain to total produce. Journal of Agricultural Science, 59, 1225-1226
Bahl, S., and Tuteja, R. (1991). Ratio and Product Type Exponential Estimators. Journal of Information and Optimization Sciences, 12(1), 159-164.
Onyango R., Mean estimation of a sensitive variable under nonresponse using three-stage RRT model in stratified two-phase sampling. Journal of Probability and Statistics, 2022. https://doi.org/10.1155/2022/4530120
Rosner, B. (2015). Fundamentals of biostatistics, Duxbury Press,