Contents

Estimation of finite population mean of a sensitive variable using three-stage orrt in the presence of non-response and measurement errors

Author(s): Ronald Onyango1, Samuel B. Apima2, Amos Wanjara2
1Department of Applied Statistics, Financial Mathematics and Actuarial Science, Jaramogi Oginga Odinga University of Science and Technology, Kenya
2Department of Mathematics and Statistics, Kaimosi Friends University, Kenya
Copyright © Ronald Onyango, Samuel B. Apima, Amos Wanjara. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The purpose of this study is to present a generalized class of estimators using the three-stage Optional Randomized Response Technique (ORRT) in the presence of non-response and measurement errors on a sensitive study variable. The proposed estimator makes use of dual auxiliary information. The expression for the bias and mean square error of the proposed estimator are derived using Taylor series expansion. The proposed estimator’s applicability is proven using real data sets. A numerical study is used to compare the efficiency of the proposed estimator with adapted estimators of the finite population mean. The suggested estimator performs better than adapted ordinary, ratio, and exponential ratio-type estimators in the presence of both non-response and measurement errors. The efficiency of the proposed estimator of population mean declines as the inverse sampling rate, non-response rate, and sensitivity level of the survey question increase.

Keywords: Sensitivity level; non-response; measurement errors; bias; efficiency

1. Introduction

In a survey, it’s challenging to collect accurate information on a sensitive study variable that has a socially stigmatizing characteristic such as “Have you ever had an abortion?”, “How much money do you make?” “Have you ever been infected with sexually transmitted diseases?” or “Are you a drug addict?” among others. Obtaining correct answers to such questions in an interview involving direct questioning is difficult since the respondent’s privacy is not protected. Most respondents will either purposefully give a false answer or refuse to respond to such questions due to fear of embarrassment or loss of social status.

Randomized Response Technique (RRT) was pioneered by Warner [1], and its main objective was to reduce response bias in surveys involving a sensitive question. In RRT, a scrambled variable is used to estimate the finite population mean of a sensitive variable. The scrambled variable is assumed to be independent of the study and the non-sensitive auxiliary variable. The respondent must adequately respond to the non-sensitive additional variable and a scrambled response for the study variable.

The Optional Randomized Response Technique (ORRT) was pioneered by Chaudhuri and Mukherjee [2]. The technique involves giving the respondent an option to provide a scrambled or direct response to a sensitive question. In one-stage ORRT [3], a respondent is expected to either give a scrambled response if they feel the question is sensitive and a direct response if the question is non-sensitive.

Two-stage ORRT aims at increasing respondent privacy and participation in a survey involving a sensitive study variable [4]. In two-stage ORRT, a known proportion of respondents, \(t_h\), is requested to respond directly to a sensitive question while maintaining anonymity. The remaining proportion of respondents provide a scrambled response using an additive model. The main drawback of two-stage ORRT is that it requires a significant value of \(\ t_h\), especially when the underlying question is susceptible.

The three-stage ORRT [5] aims to promote respondent privacy and cooperation in a survey involving a sensitive question. A known predetermined proportion, \(t_h\) of respondents, is requested to give a true response to a sensitive question. Another predetermined proportion, \(f_h\), is requested to provide a scramble response. The remaining proportion is given the option of either providing a scrambled or direct response to a sensitive question. In additive three-stage ORRT, the scrambled response provided is defined as \[\label{GrindEQ__1_} Z_{hi}=\left\{ \begin{array}{c} Y_{hi}\ \ with\ pobability\ t_h+\left(1-t_h-f_h\right)\left(1-{\psi }_h\right) \\ Y_{hi}+S_{hi},\ with\ probability\ f_h+\left(1-t_h-f_h\right)\ \ \end{array} \right. \tag{1}\] , where \(Y_{hi}\ \)is the study variable, \({\psi }_h\) is the sensitivity level, and \(S_{hi}\ \) is a scrambled variable that is normally distributed with mean 0 and variance\({\ S}^2_{Sh}\). The mean and variance of \(Z_{hi}\) are given as \(E\left(Z_{hi}\right)=E\left(Y_{hi}\right)\) and

\(S^2_{Zh}=S^2_{Yh}+{\varphi }_hS^2_{Sh}\ \), respectively, where\({\ \varphi }_h=f_h+\left(1-t_h-f_h\right)\).

In the literature, researchers who have studied the estimation of finite population mean of a sensitive study variable using non-optional RRT include Eichhron and Hayre [6], Gupta and Shabbir [7], Gupta et al. [8], Sousa et al. [9], Zatezalo [10], and Mushtaq et al [11], Mushtaq et al. [12], and Mushtaq and Noor-Ul-Amin [13].

The inability of specific units to provide information due to their unwillingness to participate, illness, or absence is referred to as non-response. Non-response in a survey reduces the sample size, raising the variance of an estimator of the finite population mean. Hansen and Hurwitz [14] proposed a strategy for getting data from non-responding units in a postal survey called subsampling. The approach included more effort to gather data directly from a subsample of non-responding units. The strategy is used in this study to handle the problem of non-response.

During the data collection and recording phases of a survey, measurement errors can occur. The difference between a variable’s valid values and those reported in a survey is known as measurement error. When measurement errors occur in a survey, the data becomes contaminated, resulting in under- or overestimated parameters during analysis.

Khalil [15] discusses the issue of estimation of the finite population mean in the presence of measurement errors in simple random sampling based on non-optional RRT. The problem of estimation of the finite population mean using a non-optional RRT model in the presence of measurement errors under stratified random sampling is addressed by Khalil et al. [16]. In the presence of measurement errors, Khalil et al. [17] extended the work of Khalil [16] to estimate the finite population mean using one-stage optional RRT. Recently, Onyango et al. [18] studied the problem of estimating the finite population mean and measurement errors using the non-optional RRT in stratified two-phase sampling.

Naeem and Shabbir [19] and Zahid and Shabbir [20] discuss the problem of estimation of the finite population mean of a sensitive variable using non-optional RRT in the presence of measurement errors and non-response simultaneously. Zhang et al. [21] studied the problem of estimating the finite population mean using one-stage RRT in the presence of measurement errors and non-response simultaneously in simple random sampling. Recently, Zhang et al. [22] studied the estimation of the finite population mean of a sensitive study variable using ORRT under measurement errors and non-response in stratified random sampling.

The present study fills the existing gap in the literature on estimating the finite population mean of a sensitive study variable using an additive three-stage ORRT model in the presence of measurement errors and non-response simultaneously.

In this paper, section two describes the population and notations used in this study. The existing estimators of population mean under three-stage ORRT models are described in section three. The proposed estimator and its properties of bias and mean square error are discussed in section four. Section five looks at the theoretical efficiency of the proposed estimator. Section six performs a numerical analysis of the proposed estimator’s performance. Finally, section seven contains the conclusions of the study.

Notations

Consider a population \(U=U_1,\ U_2,\ \dots ,\ U_N\) of size N. The population comprises a sensitive study variable, auxiliary variable, and scrambled response denoted as Y, X, and Z, respectively. Let \(\left(X_{hi},Y_{hi},\ Z_{hi}\right)\) and \(\left(x_{hi},y_{hi},\ z_{hi}\right)\) denote the \(i^{th}\) values of X, Y, and Z in the \(h^{th}\)population and sample stratum respectively. Furthermore, let \({\overline{X}}_h,{\overline{Y}}_h,\ and\ {\overline{Z}}_{h\ }\)denote the population mean of X, Y, and Z respectively in the \(h^{th}\) stratum. The variance of the scrambled response and auxiliary variables are obtained using \(S^2_{Zh}=\frac{1}{N_h-1}\sum^{N_h}_{i=1}{{\left(z_{hi}-{\overline{Z}}_h\right)}^2}\) and \(S^2_{Xh}=\frac{1}{N_h-1}\sum^{N_h}_{i=1}{{\left(x_{hi}-{\overline{X}}_h\right)}^2}\) respectively. Let \(S_{XZ}\) denote the covariance between the auxiliary variable and scrambled response. Also, let \({\rho }_{ZXh}\) denote the correlation coefficient between the scrambled response and the auxiliary variable.

Auxiliary information may be available in a survey as an attribute. Let\({\ \tau }_{hij}\) denote the value of \(j^{th}\) attribute for \(i^{th}\) unit (i=1, 2 and j=1, 2 ) in the \(h^{th}\) stratum. The auxiliary attribute takes the values 1 and 0 if \(i^{th}\) population unit possesses and does not possess an attribute, respectively. Furthermore, let \(A_{hj}=\sum^{N_h}_h{{\tau }_{hij}}\) and\(\ \ P_h=\frac{A_{hj}}{N_h}\), be the number of units that have an attribute and proportion of units possessing an attribute in the population respectively. Additionally, let \(\ S^2_{Ph}\) denote the population variance of an auxiliary attribute. Also, let \(\left(S_{ZPh},\ and\ S_{XPh}\right)\) \(\ \) and \(\left({\ \rho }_{Zph},\ and\ {\rho }_{XPh}\right)\) \(\ \)denote the covariances and coefficient of correlations between their subscripts respectively.

In the presence of non-response, let \({\ N}_{1h}\) and \({\ N}_{2h}\)be the population sizes of the responding and non-responding units, respectively. Let \(S^2_{Zh\eqref{GrindEQ__2_}}\)denote the population variance of the sensitive variable for the non-responding units in the \(h^{th}\) stratum.

A relatively large sample of \(n'_h\) is drawn from the \(h^{th}\) stratum population using simple random sampling without replacement (SRSWOR). The sample mean of the auxiliary variable in the first phase sample is given as\({\ \ \overline{x}}'_h=\frac{1}{n'_h}\sum^{n'_h}_{i=1}{x_{hi}}\) and the proportion of units in the first phase sample possessing an auxiliary attribute as\(\ p'_h=\frac{a_{hj}}{n'_h}\). A second phase random sample of size \({\ n}_h\) is drawn from the first phase, \({\ n}_{1h}\) units are observed to respond, and non-response is observed on the remaining \({\ n}_{2h}\) units. Let \({\overline{x}}_h=\frac{1}{{\ n}_h}\sum^{{\ n}_h}_{i=1}{x_{hi}}\) be the sample mean and \({\ p}_h=\frac{a_{hj}}{{\ n}_h}\) be the proportion of units in the second phase sample that possess an auxiliary attribute. Also, let \({\overline{z}}_{1h}=\frac{1}{{\ n}_{1h}}\sum^{{\ n}_h}_{i=1}{z_{hi}}\ \ \) be the sample mean for the responding group in the second phase sample. A sub-sample of size\(\ r_{2h}=\frac{{\ n}_{2h}}{{\ k}_{2h}}\), where \({\ k}_{2h}\) is the inverse sampling rate is drawn from the non-responding sample. Let \({\overline{z}}_{2h}=\frac{1}{{\ r}_{2h}}\sum^{{\ r}_{2h}}_{i=1}{z_{hi}}\ \ \)be the sub-sample mean for the non-responding units. The estimate of the population mean for the scrambled response is given as

\(\ {\overline{z}}^*_h=w_{1h}{\overline{z}}_{1h}+\ w_{2h}{\overline{z}}_{2h}\), where\(\ w_{1h}=\frac{n_{1h}}{{\ n}_h}\), and\(\ \ w_{2h}=\frac{n_{2h}}{{\ n}_h}\).

In the presence of measurement errors, let \(\ Z^*_{hi}\) and \(z^*_{hi}\ \)be the true and observed values, respectively, for the scrambled response. Furthermore, let\(\ T^*_{hi}=z^*_{hi}-Z^*_{hi}\) denote the measurement errors associated with the scrambled response. These measurement errors are assumed to be normally distributed with mean 0 and variance\({\ S}^2_{Th}\). Furthermore, let \(S^2_{Th\eqref{GrindEQ__2_}}\) be the variance associated with non-responding units.

In this study, the following conventional and non-conventional measures of auxiliary variables are used in developing the special cases of the proposed generalized class of estimators;

  1. \(C_{Xh}=\frac{S_{Xh}}{{\overline{X}}_h}\), coefficient of variation,

  2. Coefficient of correlation defined as \({\rho }_{XYh}\),

  3. Coefficient of skewness defined as \({\beta }_{1h}(x)=\frac{N_h\sum^{N_h}_{i=1}{{\left(X_{hi}-{\overline{X}}_h\right)}^3}}{\left(N_h-1\right)\left(N_h-2\right)S^3_{Xh}}\)

  4. Coefficient of kurtosis defined as \({\beta }_{2h}(x)=\frac{N_h\left(N_h+1\right)\sum^N_{i=1}{{\left(X_{hi}-{\overline{X}}_h\right)}^4}}{\left(N_h-1\right)\left(N_h-2\right)\left(N\_h-3\right)S^4_{Xh}}-\frac{3{\left(N_h-1\right)}^2}{\left(N_h-2\right)\left(N_h-3\right)}\)

  5. Mid-range is defined as\(,\ {MR}_h(x)=\frac{x_{h\left(1\right)}+x_{h\left(Nh\right)}}{2}\).

, where \(\ x_{h\left(1\right)}\) and \(x_{\left(Nh\right)}\) are the minimum and maximum values in a data set.

  1. Quartile deviation is defined as\({\ QD}_h(x)=\frac{Q_{3h}(x)-Q_{1h}(x)}{2}\).

  2. Tri-mean was proposed by Turkey [23] and is defined as

\({TM}_h(x)=\frac{Q_{1h}(x)+2Q_{2h}(x)+Q_{3h}(x)}{4}\), where \(Q_{1h}(x),\ Q_{2h}(x)\ \mathrm{and}{\ Q}_{3h}(x)\) are the first, second and third quartiles respectively,

  1. Hodges-Lehmann [24] estimator is defined as \({HL}_h(x)=\mathrm{Median}\left(\frac{x_{jh}+x_{kh}}{2}\right),\) \[\ 1\le jh\le kh\le N\]

  2. Some Existing Estimators

  3. The adapted ordinary estimator of population mean is defined as \[\label{GrindEQ__2_} {\overline{Y}}_0\mathrm{=}\sum^L_{h=1}{w_h{\overline{z}}^*_h} \tag{2}\]

The variance of the estimator is given as \[\label{GrindEQ__3_} Var({\overline{Y}}_0)\cong \sum^L_{h=1}{W^2_h}B_h \tag{3}\]

  1. The adapted Cochran [25] ratio estimators are defined as \[\label{GrindEQ__4_} {\overline{Y}}_R\mathrm{=}\sum^L_{h=1}{w_h{\overline{z}}^*_h\ \frac{{\overline{x}}'_h}{{\overline{x}}^*_h}} \tag{4}\]

The expression for the bias is given as \[\label{GrindEQ__5_} Bias({\overline{Y}}_R)\cong \sum^L_{h=1}{\frac{W_h}{{\overline{X}}_h}\left[\frac{9}{8}R_h\left(A_h-C_h\right)-\left(E_h-D_h\right)\right]} \tag{5}\]

The expression of the mean square error is given as

\[\label{GrindEQ__6_} MSE({\overline{Y}}_R)\cong \sum^L_{h=1}{W^2_h\left[B_h+R^2_h\left(A_h-C_h\right)-2R_h\left(E_h-D_h\right)\right]} \tag{6}\] , where \(A_h={\theta }_hS^2_{Xh},\ \) \(C_h={\theta }^{\ '}_hS^2_{Xh},\ \) \(D_h={\theta }^{\ '}_hS^2_{ZXh},\) \(E_h={\theta }_hS_{ZXh},\) \({\theta }'_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right)\), and \({\theta }^{\ '}_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right).\ \)

  1. The adapted Bahl and Tuteja [26] exponential ratio-type estimator is defined as \[\label{GrindEQ__7_} t_{ER}=\sum^L_{h=1}{W_h{\overline{z}}_hexp\left(\frac{{\overline{x}}'_h-{\overline{x}}_h}{{\overline{x}}'_h+{\overline{x}}_h}\right)} \tag{7}\]

The expression for the bias is given as \[\label{GrindEQ__8_} Bias(t_{ER})\cong \sum^L_{h=1}{\frac{W_h}{2{\overline{X}}_h}\left[\frac{3}{4}R_h\left(A_h-C_h\right)-\left(E_h-D_h\right)\right]} \tag{8}\]

The expression for the mean square error is given as

\[\label{GrindEQ__9_} MSE(t_{ER})\cong \sum^L_{h=1}{W^2_h\left[B_h+\frac{1}{4}R^2_h\left(A_h-C_h\right)-R_h\left(E_h-D_h\right)\right]} \tag{9}\]

2. The Proposed Strategy of Mean Estimation

2.1. Modified Hansen and Hurwitz [14] technique

From equation (1) the expected value of the scrambled response under randomization mechanisms (Onyango et al. [27]) is defined as \[\label{GrindEQ__10_} E_R\left(Z_{hi}\right)=E_R\left[Y_{hi}\left(1-{\varphi }_h\right)+\left(Y_{hi}+S_{hi}\right){\varphi }_h\right] \tag{10}\] \[\label{GrindEQ__11_} E_R\left(Z_{hi}\right)=Y_{hi}+{\varphi }_h\ {\overline{S}}_h\ \tag{11}\] , where\(\ {\varphi }_h=f_h+{\psi }_h\left(1-t_h-f_h\right).\ \ \)

The variance of the response variable under randomization mechanisms is given as \[V_R\left(Z_{hi}\right)=V_R\left(Y_{hi}+{\varphi }_hS_{hi}\right)\] \[\label{GrindEQ__12_} V_R\left(Z_{hi}\right)=\varphi \left(S^2_{Sh}+{\overline{S}}^2_h\ \right)-{\varphi }^2_h{\overline{S}}^2_h \tag{12}\] \[\label{GrindEQ__13_} V_R\left(Z_{hi}\right)=\varphi S^2_{Sh} \tag{13}\] The transformed value of the randomized response is given as \[\label{GrindEQ__14_} {\hat{y}}_{hi}=z_{hi}-{\varphi }_h{\overline{S}}_h \tag{14}\] , with \(E_R\left({\hat{y}}_{hi}\right)=y_{hi}\) and,\(V_R\left({\hat{y}}_{hi}\right)=\varphi S^2_{Sh}\) where\(\ y_{hi}\) is the true response. Therefore, the modified Hansen and Hurwitz [15] technique with an additive three-stage ORRT added is defined as \[\label{GrindEQ__15_} {\widehat{\overline{y}}}_h=w_{1h}{\widehat{\overline{y}}}_{1h}+\ w_{2h}{\widehat{\overline{y}}}_{2h} \tag{15}\] \[\label{GrindEQ__16_} E\left({\hat{y}}_h\right)={\overline{Y}}_h \tag{16}\] \[\label{GrindEQ__17_} var\left({\widehat{\overline{y}}}_h\right)=E_1\left[V_2\left({\widehat{\overline{y}}}_h\right)\right]+V_1\left[E_2\left({\widehat{\overline{y}}}_h\right)\right] \tag{17}\] \[\label{GrindEQ__18_} var\left({\widehat{\overline{y}}}_h\right)=var\left({\overline{y}}_h\right)+E_1\left[\frac{n_{1h}}{n^2_h}\frac{;{o}_hS^2_{Sh}}{n_{1h}}\right]+E_1\left[\frac{n_{2h}}{n^2_h}k_{2h}{o}_h\ S^2_{Sh}\right] \tag{18}\] \[\label{GrindEQ__19_} var\left({\widehat{\overline{y}}}_h\right)=var\left({\overline{y}}_h\right)+{\mathrm{\Omega }}_h \tag{19}\] , where \({\mathrm{\Omega }}_h=\frac{\"{o}_h\ S^2_{Sh}}{n_h}\left(W_{1h}+k_{2h}W_{2h}\right)\) is the contribution of the three-stage ORRT to the variance of Hansen and Hurwitz [14] estimator.

2.2. The proposed generalized class of estimators

The suggested randomized response estimator of the finite population mean for a sensitive variable in the presence of non-response and measurement errors simultaneously is defined as \[\label{GrindEQ__20_} {\overline{Y}}_g\mathrm{=}\sum^L_{h=1}{w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}'_h-{\overline{x}}_h\right)+{\beta }_h\mathrm{\ }\left(p'_h-p_h\right)\right]exp\left(\frac{~a_h({\overline{x}}'_h-{\overline{x}}_h)}{a_h({\overline{x}}'_h+{\overline{x}}_h)+2b_h}\right)}, \tag{20}\] , where\(\ {\alpha }_h\), and \({\beta }_h\) are appropriately chosen constants,\({~a}_h\ and\ b_h\), are either real numbers or some known conventional and non-conventional measures of auxiliary variable. Let \[\label{GrindEQ__21_} {\sigma }_{Zh}={\overline{z}}^*_h-{\overline{Z}}_h \tag{21}\] \[\label{GrindEQ__22_} {\sigma }_{X1h}={\overline{x}}'_h-{\overline{X}}_h\ \tag{22}\] \[\label{GrindEQ__23_} {\sigma }_{P1h}=p'_h-P_h\ \tag{23}\] \[\label{GrindEQ__24_} {\sigma }_{Xh}={\overline{x}}_h-{\overline{X}}_h \tag{24}\] \[\label{GrindEQ__25_} {\sigma }_{Ph}=p_h-P_h\ \tag{25}\]

\[\label{GrindEQ__26_} E\left({\sigma }_{Zh}\right)=E\left({\sigma }_{Xh}\right)=E\left({\sigma }_{X1h}\right)=E\left({\sigma }_{P1h}\right)=E\left({\sigma }_{Ph}\right)=0\ \tag{26}\] Furthermore, let \[\label{GrindEQ__27_} E({\sigma }^2_{Xh}\mathrm{)=}~{\theta }_hS^{\mathrm{2}}_{Xh}\mathrm{=}A_h~~ \tag{27}\] \[\label{GrindEQ__28_} E({\sigma }^2_{Zh}\mathrm{)=}~~{\theta }_h~\left(S^{\mathrm{2}}_{Yh}\mathrm{+}S^{\mathrm{2}}_{Th}\right)\mathrm{+}{\theta }^{\mathrm{*}}_h\left(S^{\mathrm{2}}_{Yh\left(\mathrm{2}\right)}\mathrm{+}S^{\mathrm{2}}_{Th\left(\mathrm{2}\right)}\right)+{\mathrm{\Omega }}_h=B_h \tag{28}\] \[\label{GrindEQ__29_} E\left({\sigma }^2_{X1h}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS^{\mathrm{2}}_{Xh}\mathrm{=}C_h~ \tag{29}\] \[\label{GrindEQ__30_} \mathrm{E}\left({\sigma }_{X1h}{\sigma }_{Zh}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{ZXh}=D_h \tag{30}\] \[\label{GrindEQ__31_} E\left({\sigma }_{Xh}{\sigma }_{Zh}\right)\mathrm{=}{\theta }_hS_{ZXh}=E_h \tag{31}\] \[\label{GrindEQ__32_} E({\sigma }^2_{Ph}\mathrm{)=}{\theta }_hS^{\mathrm{2}}_{ph}=F_h \tag{32}\] \[\label{GrindEQ__33_} E({\sigma }^2_{P1h})\mathrm{=}{\theta }^{\mathrm{'}}_hS^{\mathrm{2}}_{Ph}=G_h \tag{33}\] \[\label{GrindEQ__34_} E\left({\sigma }_{Ph}{\sigma }_{Zh}\right)\mathrm{=}{\theta }_hS_{Zph}=H_h \tag{34}\] \[\label{GrindEQ__35_} \mathrm{E}\left({\sigma }_{P1h}{\sigma }_{Zh}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{Zph}=I_h \tag{35}\] \[\label{GrindEQ__36_} \mathrm{E}\left({\sigma }_{Xh}{\sigma }_{Ph}\right)\mathrm{=}{\theta }_hS_{Xph}=J_h \tag{36}\] \[\label{GrindEQ__37_} E\left({\sigma }_{P1h}{\sigma }_{Xh}\right)\mathrm{=E}\left({\sigma }_{X1h}{\sigma }_{Ph}\right)\mathrm{=E}\left({\sigma }_{X1h}{\sigma }_{P1h}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{XPh}=L_h \tag{37}\] , where \({\theta }'_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right),\ {\theta }_h=\left(\frac{1}{n_h}-\frac{1}{N_h}\right)\mathrm{,\ }{\theta }^*_h=\frac{W_h\left(k_{2h}-1\right)}{n_h}\ \mathrm{and\ }W_h=\frac{N_h}{N}\ \ \)

Substituting equations (21)-(25) in (20) and simplifying while ignoring terms of order greater than two to obtain the approximation for the bias as \[\label{GrindEQ__38_} Bias({\overline{Y}}_g)\cong \sum^L_{h=1}{\frac{W_h{\lambda }_h}{2}}\left[{\frac{3}{4}\lambda }_h{\overline{Z}}_h~\left(A_h-C_h\right)+{\alpha }_h\left(A_h-C_h\right)-~\left(E_h-D_h\right)+{\beta }_h\left(J_h-L_h\right)\right] \tag{38}\] \[where\ {\lambda }_h=\frac{a_h}{a_h{\overline{X}}_h+b_h}\] The approximation for the MSE is given as \[\label{GrindEQ__39_} MSE({\overline{Y}}_g)\cong \sum^L_{h=1}{W^2_h}\left[B_h+{\vartheta }_{1h}+{\alpha }^2_h{\vartheta }_{2h}+{\beta }^2_h{\vartheta }_{3h}+{\beta }_h{\vartheta }_{4h}+{\alpha }_h{\vartheta }_{5h}+2{\alpha }_h{\beta }_h{\vartheta }_{6h}\right] \tag{39}\] , where \({\vartheta }_{1h}=\frac{1}{4}{\lambda }^2_h{\overline{Y}}^2_h\left(A_h-C_h\right)-{\lambda }_h{\overline{Y}}_h\left(E_h-D_h\right)\) \[{\vartheta }_{2h}=\left(A_h-C_h\right),\] \[{\vartheta }_{3h}=\left(F_h-G_h\right),\] \[{\vartheta }_{4h}={\overline{Y}}_h{\lambda }_h\left(J_h-L_h\right)-2\left(H_h-I_h\right)\] \[{\vartheta }_{5h}={\overline{Y}}_h{\lambda }_h\left(A_h-C_h\right)-2\left(E_h-D_h\right)\] \[{\vartheta }_{6h}=\left(J_h-L_h\right)\] The optimum values for \({\alpha }_h\) and\({\ \beta }_h\) are given as \[\label{GrindEQ__40_} {\alpha }^{(opt)}_h=\frac{{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{6h}-{\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{3h}}{2\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)} \tag{40}\] , and \[\label{GrindEQ__41_} {\beta }^{(opt)}_h=\frac{{\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{6h}-{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{2h}}{2\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)} \tag{41}\] Substitute equations (40) and (41) in (39) to obtain the minimum MSE as \[\label{GrindEQ__42_} {MSE({\overline{Y}}_g)}_{min}\cong \sum^L_{h=1}{W^2_h}\left[{B_h+\mathrm{\vartheta }}_{1h}-\frac{{\mathrm{\vartheta }}^2_{4h}}{4{\mathrm{\vartheta }}_{3h}}-\frac{{\left({\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{6h}\right)}^2}{4{\mathrm{\vartheta }}_{3h}\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)}\right] \tag{42}\] Table 1 shows some special cases of the proposed generalized class of estimators.

Table 1 Some members of the proposed generalized class of estimators
Proposed generalized class of estimators \(a_h\) \(b_h\)
\(\overline{Y}_1\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{({\overline{x}}’_h-{\overline{x}}_h)}{({\overline{x}}’_h+{\overline{x}}_h)}\right)\)
1 0
\(\overline{Y}_2\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{({\overline{x}}’_h+{\overline{x}}_h)+2C_{Xh}}\right)\)
1 \(C_Xh\)
\(\overline{Y}_3\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~C_{Xh}({\overline{x}}’_h-{\overline{x}}_h)}{C_{Xh}({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh}}\right)\)
\(C_Xh\) \(\rho _XYh\)
\(\overline{Y}_4\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{\beta }_{1h}(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{\beta }_{1h}(x)({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh}}\right)\)
\(\beta _1h(x)\) \(\rho _XYh\)
\(\overline{Y}_5\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~{\beta }_{2h}(x)({\overline{x}}’_h-{\overline{x}}_h)}{{\beta }_{2h}(x)({\overline{x}}’_h+{\overline{x}}_h)+2{\beta }_{1h}(x)}\right)\)
\(\beta _2h(x)\) \(\beta _1h(x)\)
\(\overline{Y}_6\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{QD}_h(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{QD}_h(x)({\overline{x}}’_h+{\overline{x}}_h)+2{TM}_h(x)}\right)\)
\(QD_h(x)\) \(TM_h(x)\)
\(\overline{Y}_7\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{QD}_h(x)({\overline{x}}’_h-{\overline{x}}_h)}{{QD}_h(x)({\overline{x}}’_h+{\overline{x}}_h)+2{MR}_h(x)}\right)\)
\(QD_h(x)\) \(MR_h(x)\)
\(\overline{Y}_8\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{HL}_h(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{HL}_h(x)\ ({\overline{x}}’_h+{\overline{x}}_h)+2{TM}_h(x)}\right)\)
\(HL_h(x)\) \(TM_h(x)\)
\(\overline{Y}_9\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~{\rho }_{XYh}({\overline{x}}’_h-{\overline{x}}_h)}{\ {\rho }_{XYh}\ ({\overline{x}}’_h+{\overline{x}}_h)+2{\beta }_{2h}(x)}\right)\)
\(\rho _XYh\) \(\beta _2h(x)\)
\(\overline{Y}_10\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{\ ({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh})}\right)\)
1 \(\rho _XYh\)
\(\overline{Y}_11\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{\ ({\overline{x}}’_h+{\overline{x}}_h)+2{QD}_h(x))}\right)\)
1 \(QD_h(x)\)

The expression for the biases and mean square errors for the members of the proposed generalized class are obtained by substituting the values of \({\alpha }_h\ and\ {\beta }_h\) in equations (38) and (42) respectively.

3. Efficiency Comparison

The proposed estimators performs better than other estimators when the following conditions are satisfied

  1. From equations (3) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}<Var\left({\overline{Y}}_0\right)\) if \[\label{GrindEQ__43_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h\right]<0 \tag{43}\]

  2. From equations (6) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}<MSE\left({\overline{Y}}_R\right)\) if \[\label{GrindEQ__44_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h-R^2_h\left(A_h-C_h\right)+2R_h\left(E_h-D_h\right)\ \right]<0 \tag{44}\]

  3. From equations (9) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}-MSE\left({\overline{Y}}_{ER}\right)<0\) if \[\label{GrindEQ__45_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h-{\frac{1}{4}R}^2_h\left(A_h-C_h\right)-R_h\left(E_h-D_h\right)\ \right]<0 \tag{45}\]

4. Empirical study

The efficiency of the proposed estimator is compared to adapted estimators in a numerical study. The real data for numerical analysis is COVID-19 obtained from www.worldometer.com and Rosner [28]. For data simulation and coding, the R programming language is used. Each population unit is subjected to measurement errors, which are normally distributed with mean 2 and variance 5. Using the least variance and percent relative efficiency (PRE) methods, the efficiency of the proposed estimator is compared to adapted estimators. The percent relative efficiency (PRE) of estimators of population mean is calculated using the formula; \[\label{GrindEQ__46_} PRE({\overline{Y}}_g)=\frac{var({\overline{Y}}_0)}{MSE({\overline{Y}}_g)}\times 100 \tag{46}\] , where\(\ g={\overline{Y}}_R,\ {\overline{Y}}_{ER},\ 1,\ 2,\ \dots ,\ 11\). An estimator with the highest value of PRE is considered the most efficient than others. The values of PREs are obtained at 20% and 80% sensitivity levels of the survey question. Also, the PREs are obtained at 20% and 30% non-response rates.

4.1. COVID-19 data (www.worldometer.com )

The data consist of six strata: the African Region (\(N_1\)=31200), the American region (\(N_2\)=34944), the Eastern Mediterranean Region (\(N_3\)=13728), the European Region (\(N_4\)=38688), the South-East Asia Region (\(N_5\)=6864), and the Western Pacific Region (\(N_6\)=21840). X is the number of new cases, Y is the number of deaths recorded in a given day, and P is the number of deaths less than one in a given day. Scrambled responses are generally distributed with mean 0 and variance 2 generated for each value of Y. Table 2 shows a summary of statistics for the responding units and Table 3 for the non-responding units.

Table 2 Parameters for COVID-19 data
Parameter Stratum 1 Stratum 2 Stratum 3 Stratum 4 Stratum 5 Stratum 6
\(\overline{X}_h\) \(\mathrm{188.9035}\) \(\mathrm{2502.012}\) \(\mathrm{1120.151}\) \(\mathrm{1757.061}\) \(\mathrm{6175.008}\) 356.2095
\(\overline{Y}_h\) 4.543181 \(\mathrm{61.90972}\) \(\mathrm{20.51225}\) \(\mathrm{33.79095}\) \(\mathrm{97.12205}\) \(\mathrm{4.833472\ }\)
\(S^2_Xh\) \(\mathrm{1094471}\) \(\mathrm{187408859}\) \(\mathrm{8526375}\) \(\mathrm{24712119}\) \(\mathrm{817189958}\) \(\mathrm{318940}\)
\(S^2_Zh\) \(\mathrm{926.4621}\) \(\mathrm{76639.99}\) \(\mathrm{2937.237}\) \(\mathrm{11588.58}\) \(\mathrm{145353}\) \(\mathrm{849.8079}\)
\(S^2_Ph\) \(\mathrm{0.2017896}\) \(\mathrm{0.2328431}\) \(\mathrm{0.2253055}\) \(\mathrm{0.2467874}\) \(\mathrm{0.247146}\) \(\mathrm{0.1323922}\)
\(\rho _XZh\) \(\mathrm{0.8171398}\) \(\mathrm{0.7944946}\) \(\mathrm{0.834325}\) \(\mathrm{0.6559524}\) \(\mathrm{0.8679977}\) \(\mathrm{0.7237861}\)
\(\rho _XPh\) \(\mathrm{-}\mathrm{0.2608673}\) \(\mathrm{-}\mathrm{0.2379639}\) \(\mathrm{-}\mathrm{0.265470}\) \(\mathrm{-}\mathrm{0.2982271}\) \(\mathrm{-}\mathrm{0.239344}\) \(\mathrm{-}\mathrm{0.4403104}\)
\(\rho _ZPh\) -0.2386865 -0.2924192 -0.2729946 -0.2802833 -0.2839612 -0.3832064
\(S^2_Th\) 24.72743 24.91892 25.03186 25.29474 24.74669 18.97865
Table 3 Parameters for non-responding units for COVID-19 data
non-response rate stratum \(S^2_Xh\eqref{GrindEQ__2_}\) \(S^2_Ph\eqref{GrindEQ__2_}\)
20% 1 989315.2 0.2050127
2 199477087 0.2343414
3 8176575 0.2249875
4 25298233 0.2462478
5 708141536 0.2460244
6 708141536 0.1284571
30% 1 1071816 0.2047748
2 206269098 0.2334903
3 8525833 0.2250371
4 24546992 0.2462213
5 681811147 0.2462587
6 681811147 0.1297743

Tables 4 and 5 represent the values of PREs of estimators of population mean in the cases without and with measurement errors, respectively. From the tables, the values of PREs decrease with an increase in inverse sampling rates and non-response rates. Additionally, the values of PREs decrease in the presence of non-response and measurement errors simultaneously. Also, the values of PREs decrease with an increase in the sensitivity levels of the survey question. The proposed estimator \({\overline{Y}}_6\) has the highest PRE compared to all other estimators in this study. Generally, the proposed estimators perform better than the adapted estimator.

Table 4 Percent Relative Efficiencies (PREs) of different estimators for COVID-19 data without measurement errors at \(t_h=0.3\ and f_h=0.2\)
Estimator 20% non-response 30% non-response
\(k_{2h}\) \(k_{2h}\)
2 4 8 2 4 8
\(\overline{Y}_0\) \(\psi _h\) 100 100 100 100 100 100
\(\overline{Y}_R\) 0.2
0.8
121.8078
121.8064
111.8264
111.8257
106.1745
106.1743
117.4540
117.4531
108.4124
108.4121
104.1318
104.1317
\(\overline{Y}_ER\) 0.2
0.8
126.6716
126.6685
113.9508
113.9495
107.1407
107.1397
121.7205
121.7184
110.2754
110.2746
105.0030
105.0028
\(\overline{Y}_1\) 0.2
0.8
129.4792
129.4760
115.5386
115.5372
107.9859
107.9855
123.3008
123.2988
110.9481
110.9473
105.3139
105.3137
\(\overline{Y}_2\) 0.2
0.8
129.4734
129.4702
115.5359
115.5345
107.9846
107.9842
123.2965
123.2945
110.9462
110.9455
105.3131
105.3129
\(\overline{Y}_3\) 0.2
0.8
129.4790
129.4758
115.5385
115.5371
107.9859
107.9854
123.3007
123.2987
110.9480
110.9473
105.3139
105.3137
\(\overline{Y}_4\) 0.2
0.8
125.7381
125.7354
113.7546
113.7534
107.1225
107.1221
120.4678
120.4660
109.7355
109.7349
104.7521
104.7519
\(\overline{Y}_5\) 0.2
0.8
129.4792
129.4760
115.5386
115.5372
107.9859
107.9855
123.3008
123.2988
110.9481
110.9473
105.3139
105.3137
\(\overline{Y}_6\) 0.2
0.8
129.5812
129.5780
115.5866
115.5852
108.009
108.0085
123.3776
123.3756
110.9805
110.9798
105.3289
105.3287
\(\overline{Y}_7\) 0.2
0.8
128.8921
128.8889
115.2619
115.2605
107.8528
107.8524
122.8585
122.8565
110.7607
110.7600
105.2276
105.2273
\(\overline{Y}_8\) 0.2
0.8
129.4789
129.4757
115.5385
115.5371
107.9859
107.9854
123.3007
123.2986
110.9480
110.9473
105.3139
105.3137
\(\overline{Y}_9\) 0.2
0.8
129.1671
129.1640
115.3917
115.3903
123.0658
107.9148
123.0658
123.0638
110.8486
110.8479
105.2681
105.2679
\(\overline{Y}_10\) 0.2
0.8
129.4783
129.4751
115.5382
115.5368
107.9857
107.9853
123.3001
123.2981
110.9478
110.9470
105.3138
105.3136
\(\overline{Y}_11\) 0.2
0.8
129.2297
129.2265
115.4211
115.4197
107.9295
107.9290
123.1130
123.1110
110.8686
110.8678
105.2773
105.2771
Table 5 Percent Relative Efficiencies (PREs) of different estimators for COVID-19 data with measurement errors at \(t_h=0.3\ and\ f_h=0.2\)
Estimator 20% non-response 30% non-response
\(k_{2h}\) \(k_{2h}\)
2 4 8 2 4 8
\(\overline{Y}_0\) \(\psi _h\) 100 100 100 100 100 100
\(\overline{Y}_R\) 0.2
0.8
121.7851
121.7837
111.8152
111.8149
106.1689
106.1689
117.4368
117.4360
108.4050
108.4048
104.1283
104.1283
\(\overline{Y}_ER\) 0.2
0.8
126.6432
126.6402
113.9378
113.9369
107.1339
107.1336
119.4025
119.3995
109.2678
109.2669
104.5540
104.5535
\(\overline{Y}_1\) 0.2
0.8
127.1178
127.1123
114.4185
114.4164
107.4453
107.4444
121.5170
121.5132
110.1884
110.1871
104.9627
104.9622
\(\overline{Y}_2\) 0.2
0.8
127.1235
127.1181
114.4213
114.4191
107.4466
107.4457
121.5214
121.5176
110.1902
110.1889
104.9635
104.9630
\(\overline{Y}_3\) 0.2
0.8
127.1179
127.1125
114.4186
114.4165
107.4453
107.4444
121.5171
121.5133
110.1884
110.1871
104.9627
104.9622
\(\overline{Y}_4\) 0.2
0.8
123.8424
123.8376
112.8318
112.8299
106.6707
106.6699
119.0193
119.0159
109.1042
109.1030
104.4571
104.4566
\(\overline{Y}_5\) 0.2
0.8
127.1178
127.1123
114.4185
114.4164
107.4453
107.4444
121.5170
121.5132
110.1884
110.1871
104.9627
104.9622
\(\overline{Y}_6\) 0.2
0.8
127.2173
127.2118
114.4661
114.4640
107.4683
107.4675
121.5925
121.5887
110.2207
110.2194
104.9777
104.9772
\(\overline{Y}_7\) 0.2
0.8
126.8219
126.8165
114.2767
114.2746
107.3765
107.3756
121.2925
121.2887
110.0918
110.0906
104.9179
104.9174
\(\overline{Y}_8\) 0.2
0.8
127.1180
127.1125
114.4186
114.4165
107.4453
107.4444
121.5172
121.5134
110.1884
110.1871
104.9627
104.9622
\(\overline{Y}_9\) 0.2
0.8
126.8822
126.8767
114.3056
114.3035
107.3905
107.3896
121.3382
121.3345
110.1115
110.1102
104.927
104.9265
\(\overline{Y}_10\) 0.2
0.8
127.1186
127.1132
114.4189
114.4168
107.4455
107.4446
121.5177
121.5139
110.1886
110.1873
104.9628
104.9623
\(\overline{Y}_11\) 0.2
0.8
126.9326
126.9272
114.3298
114.3277
107.4022
107.4014
121.3765
121.3728
110.1280
110.1267
104.9347
104.9341

4.2. Rosner [28] data

The data consist of two strata of sizes\(\ N_1=480\ and\ N_2=174\), with Y as forced expiratory volume, X as age (in years), and gender as an auxiliary attribute. The scrambling variable is taken to be smoking (Yes=1, No=0) and is used in generation of the response variable. The study variable, auxiliary attribute, and variable all have a positive bi-serial correlation. Tables 6 and 7 represents the population statistics for different data sets used in this study

Table 6 Parameters for Rosner [28] data
Parameter \(\overline{X}_h\) \(\overline{Y}_h\) \(S^2_Xh\) \(S^2_Yh\) \(S^2_Ph\) \(\rho _XZh\) \(\rho _XPh\) \(\rho _ZPh\) \(S^2_Th\)
Stratum 1 8.558333 2.363715 3.604106 0.5254207 0.2503653 0.7239923 0.2999931 0.8365375 26.04856
Stratum 2 13.71839 3.763615 3.301741 0.7556429 0.2511461 0.3619965 0.7201403 0.4809902 20.19661
Table 7 Parameters for non-responding units for Rosner [28] data
non-response rate stratum \(S^2_Yh\eqref{GrindEQ__2_}\) \(S^2_Th\eqref{GrindEQ__2_}\)
20% 1 0.5833481 25.62859
2 0.4723521 17.39198
30% 1 0.5804701 27.00072
2 0.6125016 20.19661

Tables 8 and 9 shows summary results for the PREs in the cases for without and with measurement errors respectively at different sensitivity levels. From the tables, the values of PREs decrease with an increase in inverse sampling rates and non-response rates. Additionally, the values of PREs decline in the presence of non-response and measurement errors simultaneously. For example, at 20% non-response,\(\ {\mathrm{k}}_{\mathrm{2h}}\mathrm{=2}\), and \({\mathrm{\psi up }}_{\mathrm{h}}\mathrm{=0.2\ }\) the value of PRE for \({\overline{\mathrm{Y}}}_{\mathrm{10}}\) is 114.2515 in the case for without measurement errors and decreases to 100.2539 in the presence of non-response and measurement errors simultaneously.

Furthermore, the values of PREs decreased with an increase in sensitivity level in the case for without measurement errors. The proposed estimators perform better than other adapted estimators in both cases for without and with measurement errors.

Table 8 Percent Relative Efficiencies (PREs) of different estimators for Rosner [28] data without measurement errors at \(t_h=0.3\ and\ f_h=0.2\)


Estimator
20% non-response 30% non-response
\(k_{2h}\) \(k_{2h}\)
2 4 8 2 4 8
\(\overline{Y}_0\) \(\psi _h\) 100 100 100 100 100 100
\(\overline{Y}_R\) 0.2
0.8
113.6230
112.4303
107.9472
107.3953
104.3350
104.0855
111.4729
110.5494
105.9844
105.6100
103.0583
102.8971
\(\overline{Y}_ER\) 0.2
0.8
110.8210
109.8950
106.3781
105.9412
103.5022
103.3022
109.1488
108.4265
104.8202
104.5218
102.4767
102.3468
\(\overline{Y}_1\) 0.2
0.8
114.2630
113.0079
108.3011
107.7228
104.5213
104.2606
112.0013
111.0310
106.2459
105.8542
103.1881
103.0198
\(\overline{Y}_2\) 0.2
0.8
114.2589
113.0042
108.2988
107.7207
104.5201
104.2595
111.9979
111.0279
106.2442
105.8527
103.1873
103.0190
\(\overline{Y}_3\) 0.2
0.8
114.1979
112.9492
108.2652
107.6896
104.5024
104.2429
111.9476
110.9821
106.2194
105.8295
103.1749
103.0074
\(\overline{Y}_4\) 0.2
0.8
113.8930
112.6741
108.0967
107.5337
104.4138
104.1596
111.6959
110.7528
106.0949
105.7133
103.1132
102.9490
\(\overline{Y}_5\) 0.2
0.8
114.2630
113.0079
108.3011
107.7228
104.5213
104.2606
112.0013
111.0310
106.2459
105.8542
103.1881
103.0198
\(\overline{Y}_6\) 0.2
0.8
114.1794
112.9324
108.2549
107.6801
104.4970
104.2378
111.9323
110.9682
106.2118
105.8224
103.1712
103.0039
\(\overline{Y}_7\) 0.2
0.8
114.1764
112.9298
108.2533
107.6786
104.4962
104.2370
111.9298
110.9660
106.2106
105.8213
103.1706
103.0033
\(\overline{Y}_8\) 0.2
0.8
114.2354
112.9830
108.2858
107.7087
104.5133
104.2531
111.9785
111.0103
106.2346
105.8437
103.1825
103.0146
\(\overline{Y}_9\) 0.2
0.8
114.1152
112.8745
108.2195
107.6473
104.4784
104.2203
111.8793
110.9199
106.1856
105.7980
103.1582
102.9916
\(\overline{Y}_10\) 0.2
0.8
114.2515
112.9975
108.2947
107.7169
104.5179
104.2575
111.9917
111.0224
106.2412
105.8498
103.1858
103.0176
\(\overline{Y}_11\) 0.2
0.8
114.1859
112.9383
108.2585
107.6834
104.4989
104.2396
111.9377
110.9731
106.2145
105.8249
103.1725
103.0051
Table 9 Percent Relative Efficiencies (PREs) of different estimators for Rosner [28] data with measurement errors at \(t_h=0.3\ and\ f_h=0.2\)
Estimator 20% non-response 30% non-response
\(k_{2h}\) \(k_{2h}\)
2 4 8 2 4 8
\(\overline{Y}_0\) \(\psi _h\) 100 100 100 100 100 100
\(\overline{Y}_R\) 0.2
0.8
100.2356
100.2351
100.1438
100.1436
100.0808
100.0807
100.1971
100.1968
100.1059
100.1058
100.0550
100.0550
\(\overline{Y}_ER\) 0.2
0.8
100.2074
100.2070
100.1266
100.1264
100.0711
100.0710
100.1735
100.1732
100.0933
100.0931
100.0485
100.0484
\(\overline{Y}_1\) 0.2
0.8
100.2542
100.2536
100.1551
100.1548
100.0872
100.0870
100.2127
100.2123
100.1143
100.1141
100.0594
100.0593
\(\overline{Y}_2\) 0.2
0.8
100.2541
100.2536
100.1551
100.1548
100.0871
100.0870
100.2126
100.2122
100.1142
100.1141
100.0593
100.0593
\(\overline{Y}_3\) 0.2
0.8
100.2530
100.2525
100.1544
100.1541
100.0868
100.0866
100.2117
100.2113
100.1137
100.1136
100.0591
100.0590
\(\overline{Y}_4\) 0.2
0.8
100.2474
100.2469
100.1510
100.1507
100.0848
100.0847
100.2070
100.2066
100.1112
100.1111
100.0578
100.0577
\(\overline{Y}_5\) 0.2
0.8
100.2542
100.2536
100.1551
100.1548
100.0872
100.0870
100.2127
100.2123
100.1143
100.1141
100.0594
100.0593
\(\overline{Y}_6\) 0.2
0.8
100.2526
100.2521
100.1542
100.1539
100.0866
100.0865
100.2114
100.2110
100.1136
100.1134
100.0590
100.0589
\(\overline{Y}_7\) 0.2
0.8
100.2526
100.2521
100.1541
100.1539
100.0866
100.0865
100.2113
100.2110
100.1136
100.1134
100.0590
100.0589
\(\overline{Y}_8\) 0.2
0.8
100.2537
100.2531
100.1548
100.1545
100.0870
100.0869
100.2122
100.2119
100.1141
100.1139
100.0592
100.0592
\(\overline{Y}_9\) 0.2
0.8
100.2515
100.2509
100.1535
100.1532
100.0862
100.0861
100.2104
100.2100
100.1131
100.1129
100.0587
100.0587
\(\overline{Y}_10\) 0.2
0.8
100.2539
100.2534
100.1550
100.1547
100.0871
100.0870
100.2125
100.2121
100.1142
100.1140
100.0593
100.0592
\(\overline{Y}_11\) 0.2
0.8
100.2527
100.2522
100.1542
100.1540
100.0867
100.0866
100.2115
100.2111
100.1137
100.1135
100.0590
100.0590

5. Conclusion

This study addresses the challenge of estimating the finite population mean in the presence of non-response and measurement errors simultaneously on a sensitive study variable. A general class of estimators is proposed using auxiliary attributes and variables. Up to the first degree of approximation, the bias and mean squared error (MSE) for the suggested estimator are appropriately computed. The proposed estimator outperforms the adapted ordinary estimator, ratio estimator, and exponential ratio-type estimator in numerical tests. Furthermore, when the non-response rate and inverse sampling rate grow, so do the mean squared errors of the proposed estimators. Finally, when non-response and measurement errors are present simultaneously, the efficiency of estimators of population mean decreases.

References

  1. Warner, S. L. (1965). Randomized response: A survey for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63-69.

  2. Chaudhuri, A., & Mukherjee, R. (1988). Randomized response: Theory and Techniques. Marcel Dekker, New York

  3. Gupta, S. & Shabbir, J. (2004). Sensitivity estimation for personal interview survey questions. Statistica, 64(4), 643-653.

  4. Gupta, S. Shabbir, J. & Sehra, S. (2010). Mean and sensitivity estimation in optional randomized response models. Journal of Statistical Planning and Inference, 140(10), 2870-2874.

  5. Mehta. S., Dass, B. K., Shabbir, J., & Gupta, S. (2012). A three- stage optional randomized response model. Journal of Statistical Theory and Practice, 6(3), 412-427

  6. Eichhorn, B. H., & Hayre, L. S. (1983). Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and Inference, 7(4), 307-316.

  7. Gupta, S., & Shabbir, J. (2008). On improvement in estimating the population mean in simple random sampling. Journal of Applied Statistics, 35(5):559\(\mathrm{\{}\)566\(\mathrm{\}}\).

  8. Gupta, S., Shabbir, J., & Sehra, S. (2012) Mean and sensitivity estimation in optional randomized response models. Journal of Statistical Planning and Inference, 140, 2870 – 2874.

  9. Sousa, R., Shabbir, J., Rael, P. C., & Gupta, S. (2010). Ratio estimation of the mean of a sensitive variable in the presence of auxiliary information. Journal of Statistics Theory and Practice, 4(3), 495-507.

  10. Zatezalo, T. (2017). Generalized mixture estimator of the mean of a sensitive variable in the presence of non-sensitive auxiliary information. Statistics and Applications, 15(1&2), 23-36.

  11. Mushtaq, N., Noor-ul-Amin, M., & Hanif, M. (2016). Estimation of population mean of a sensitive variable in stratified two-phase sampling. Pakistan Journal of Statistics, 32(1), 393-404.

  12. Mushtaq, N., Noor-ul-Amin, M., & Hanif, M., (2017). A family of estimators of a sensitive variable using auxiliary information in stratified random sampling. Pakistan Journal of Operation Research, 13(1), 141-155.

  13. Mushtaq, N., Noor-ul-Amin, M. (2020). Joint influence of double sampling and randomized response technique on estimation method of mean. Applied Mathematics, 10(1), 12-19.

  14. Hansen, M., & Hurwitz, W. (1946). The problem of non-response in sample surveys. Journal of American Statistical Association, 41, 517-529.

  15. Khalil, S., Noor-Ul-Amin, M. & Hanif, M. (2018). Estimation of population mean for a sensitive variable in the presence of measurement error. Journal of Statistics and Management Systems, 21(1):81-91

  16. Khalil, S., Gupta, S., & Hanif,. M. (2018). Estimation of finite population mean in stratified sampling using scrambled responses in the presence of measurement errors. Communications in Statistics – Theory and Methods, 48(6):1553-1561.

  17. Khalil, S., Zhang, Q., & Gupta, S. (2019) Mean Estimation of Sensitive Variables under Measurement Errors using Optional RRT Models. Communications in Statistics – Simulation and Computation, DOI: 10.1080/03610918.2019.1584298

  18. Onyango, R., Oduor, B., & Odundo, F. (2021). Joint influence of measurement errors and randomized response technique on mean estimation under stratified double sampling. Open Journal of Mathematical Science, 5(1), 192-199

  19. Naeem, N., & Shabbir, J. (2018). Use of a scrambled response on two occasion’s successive sampling under nonresponse. Hacettepe Journal of Mathematics and Statistics, 47(3), 675-684.

  20. Zahid, E., & Shabbir, J. (2019). Estimation of finite population mean for a sensitive variable using dual auxiliary information in the presence of measurement errors. PloS one, 14(2): e0212111.

  21. Zhang, Q,, & Khalil, S., Gupta, S. (2020). Mean estimation of sensitive variables under non-response and measurement errors using optional RRT models. Journal of statistical theory and practice, 15.

  22. Zhang, Q., & Khalil, S., & Gupta, S. (2021). Mean estimation in the simultaneous presence of measurement errors and non-response using optional RRT models under stratified sampling. Journal of Statistical Computation and Simulation, 91, 3492-3504

  23. Tukey, J. W. (1970). Exploratory Data Analysis, Addison-Welsey Publishing Co., Reading, MA, USA.

  24. Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. E., Salvemini, T., Eds.; Libreria Eredi Virgilio Veschi: Rome, Italy,

  25. Cochran W.G. (1940). The estimation of the yields of the cereal experiments by sampling for the ratio of grain to total produce. Journal of Agricultural Science, 59, 1225-1226

  26. Bahl, S., and Tuteja, R. (1991). Ratio and Product Type Exponential Estimators. Journal of Information and Optimization Sciences, 12(1), 159-164.

  27. Onyango R., Mean estimation of a sensitive variable under nonresponse using three-stage RRT model in stratified two-phase sampling. Journal of Probability and Statistics, 2022. https://doi.org/10.1155/2022/4530120

  28. Rosner, B. (2015). Fundamentals of biostatistics, Duxbury Press,