Engineering and Applied Science Letters (EASL)

Volume 6 (2023) Issue 1
Pages: 37
- 48

ISSN: 2617-9709 (Online) 2617-9695 (Print)

DOI: https://www.doi.org/10.30538/psrp-easl2023.0094

Research article

Estimation of finite population mean of a sensitive variable using three-stage orrt in the presence of non-response and measurement errors

Author(s): ^¹, ^², ^²

¹Department of Applied Statistics, Financial Mathematics and Actuarial Science, Jaramogi Oginga Odinga University of Science and Technology, Kenya

²Department of Mathematics and Statistics, Kaimosi Friends University, Kenya

Copyright © Ronald Onyango, Samuel B. Apima, Amos Wanjara. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: 06/05/2023
Accepted: 20/06/2023
Published: 30/06/2023

Abstract

The purpose of this study is to present a generalized class of estimators using the three-stage Optional Randomized Response Technique (ORRT) in the presence of non-response and measurement errors on a sensitive study variable. The proposed estimator makes use of dual auxiliary information. The expression for the bias and mean square error of the proposed estimator are derived using Taylor series expansion. The proposed estimator’s applicability is proven using real data sets. A numerical study is used to compare the efficiency of the proposed estimator with adapted estimators of the finite population mean. The suggested estimator performs better than adapted ordinary, ratio, and exponential ratio-type estimators in the presence of both non-response and measurement errors. The efficiency of the proposed estimator of population mean declines as the inverse sampling rate, non-response rate, and sensitivity level of the survey question increase.

Keywords: Sensitivity level; non-response; measurement errors; bias; efficiency

1. Introduction

In a survey, it’s challenging to collect accurate information on a sensitive study variable that has a socially stigmatizing characteristic such as “Have you ever had an abortion?”, “How much money do you make?” “Have you ever been infected with sexually transmitted diseases?” or “Are you a drug addict?” among others. Obtaining correct answers to such questions in an interview involving direct questioning is difficult since the respondent’s privacy is not protected. Most respondents will either purposefully give a false answer or refuse to respond to such questions due to fear of embarrassment or loss of social status.

Randomized Response Technique (RRT) was pioneered by Warner [1], and its main objective was to reduce response bias in surveys involving a sensitive question. In RRT, a scrambled variable is used to estimate the finite population mean of a sensitive variable. The scrambled variable is assumed to be independent of the study and the non-sensitive auxiliary variable. The respondent must adequately respond to the non-sensitive additional variable and a scrambled response for the study variable.

The Optional Randomized Response Technique (ORRT) was pioneered by Chaudhuri and Mukherjee [2]. The technique involves giving the respondent an option to provide a scrambled or direct response to a sensitive question. In one-stage ORRT [3], a respondent is expected to either give a scrambled response if they feel the question is sensitive and a direct response if the question is non-sensitive.

Two-stage ORRT aims at increasing respondent privacy and participation in a survey involving a sensitive study variable [4]. In two-stage ORRT, a known proportion of respondents, \(t_h\), is requested to respond directly to a sensitive question while maintaining anonymity. The remaining proportion of respondents provide a scrambled response using an additive model. The main drawback of two-stage ORRT is that it requires a significant value of \(\ t_h\), especially when the underlying question is susceptible.

The three-stage ORRT [5] aims to promote respondent privacy and cooperation in a survey involving a sensitive question. A known predetermined proportion, \(t_h\) of respondents, is requested to give a true response to a sensitive question. Another predetermined proportion, \(f_h\), is requested to provide a scramble response. The remaining proportion is given the option of either providing a scrambled or direct response to a sensitive question. In additive three-stage ORRT, the scrambled response provided is defined as \[\label{GrindEQ__1_} Z_{hi}=\left\{ \begin{array}{c} Y_{hi}\ \ with\ pobability\ t_h+\left(1-t_h-f_h\right)\left(1-{\psi }_h\right) \\ Y_{hi}+S_{hi},\ with\ probability\ f_h+\left(1-t_h-f_h\right)\ \ \end{array} \right. \tag{1}\] , where \(Y_{hi}\ \)is the study variable, \({\psi }_h\) is the sensitivity level, and \(S_{hi}\ \) is a scrambled variable that is normally distributed with mean 0 and variance\({\ S}^2_{Sh}\). The mean and variance of \(Z_{hi}\) are given as \(E\left(Z_{hi}\right)=E\left(Y_{hi}\right)\) and

\(S^2_{Zh}=S^2_{Yh}+{\varphi }_hS^2_{Sh}\ \), respectively, where\({\ \varphi }_h=f_h+\left(1-t_h-f_h\right)\).

In the literature, researchers who have studied the estimation of finite population mean of a sensitive study variable using non-optional RRT include Eichhron and Hayre [6], Gupta and Shabbir [7], Gupta et al. [8], Sousa et al. [9], Zatezalo [10], and Mushtaq et al [11], Mushtaq et al. [12], and Mushtaq and Noor-Ul-Amin [13].

The inability of specific units to provide information due to their unwillingness to participate, illness, or absence is referred to as non-response. Non-response in a survey reduces the sample size, raising the variance of an estimator of the finite population mean. Hansen and Hurwitz [14] proposed a strategy for getting data from non-responding units in a postal survey called subsampling. The approach included more effort to gather data directly from a subsample of non-responding units. The strategy is used in this study to handle the problem of non-response.

During the data collection and recording phases of a survey, measurement errors can occur. The difference between a variable’s valid values and those reported in a survey is known as measurement error. When measurement errors occur in a survey, the data becomes contaminated, resulting in under- or overestimated parameters during analysis.

Khalil [15] discusses the issue of estimation of the finite population mean in the presence of measurement errors in simple random sampling based on non-optional RRT. The problem of estimation of the finite population mean using a non-optional RRT model in the presence of measurement errors under stratified random sampling is addressed by Khalil et al. [16]. In the presence of measurement errors, Khalil et al. [17] extended the work of Khalil [16] to estimate the finite population mean using one-stage optional RRT. Recently, Onyango et al. [18] studied the problem of estimating the finite population mean and measurement errors using the non-optional RRT in stratified two-phase sampling.

Naeem and Shabbir [19] and Zahid and Shabbir [20] discuss the problem of estimation of the finite population mean of a sensitive variable using non-optional RRT in the presence of measurement errors and non-response simultaneously. Zhang et al. [21] studied the problem of estimating the finite population mean using one-stage RRT in the presence of measurement errors and non-response simultaneously in simple random sampling. Recently, Zhang et al. [22] studied the estimation of the finite population mean of a sensitive study variable using ORRT under measurement errors and non-response in stratified random sampling.

The present study fills the existing gap in the literature on estimating the finite population mean of a sensitive study variable using an additive three-stage ORRT model in the presence of measurement errors and non-response simultaneously.

In this paper, section two describes the population and notations used in this study. The existing estimators of population mean under three-stage ORRT models are described in section three. The proposed estimator and its properties of bias and mean square error are discussed in section four. Section five looks at the theoretical efficiency of the proposed estimator. Section six performs a numerical analysis of the proposed estimator’s performance. Finally, section seven contains the conclusions of the study.

Notations

Consider a population \(U=U_1,\ U_2,\ \dots ,\ U_N\) of size N. The population comprises a sensitive study variable, auxiliary variable, and scrambled response denoted as Y, X, and Z, respectively. Let \(\left(X_{hi},Y_{hi},\ Z_{hi}\right)\) and \(\left(x_{hi},y_{hi},\ z_{hi}\right)\) denote the \(i^{th}\) values of X, Y, and Z in the \(h^{th}\)population and sample stratum respectively. Furthermore, let \({\overline{X}}_h,{\overline{Y}}_h,\ and\ {\overline{Z}}_{h\ }\)denote the population mean of X, Y, and Z respectively in the \(h^{th}\) stratum. The variance of the scrambled response and auxiliary variables are obtained using \(S^2_{Zh}=\frac{1}{N_h-1}\sum^{N_h}_{i=1}{{\left(z_{hi}-{\overline{Z}}_h\right)}^2}\) and \(S^2_{Xh}=\frac{1}{N_h-1}\sum^{N_h}_{i=1}{{\left(x_{hi}-{\overline{X}}_h\right)}^2}\) respectively. Let \(S_{XZ}\) denote the covariance between the auxiliary variable and scrambled response. Also, let \({\rho }_{ZXh}\) denote the correlation coefficient between the scrambled response and the auxiliary variable.

Auxiliary information may be available in a survey as an attribute. Let\({\ \tau }_{hij}\) denote the value of \(j^{th}\) attribute for \(i^{th}\) unit (i=1, 2 … and j=1, 2 …) in the \(h^{th}\) stratum. The auxiliary attribute takes the values 1 and 0 if \(i^{th}\) population unit possesses and does not possess an attribute, respectively. Furthermore, let \(A_{hj}=\sum^{N_h}_h{{\tau }_{hij}}\) and\(\ \ P_h=\frac{A_{hj}}{N_h}\), be the number of units that have an attribute and proportion of units possessing an attribute in the population respectively. Additionally, let \(\ S^2_{Ph}\) denote the population variance of an auxiliary attribute. Also, let \(\left(S_{ZPh},\ and\ S_{XPh}\right)\) \(\ \) and \(\left({\ \rho }_{Zph},\ and\ {\rho }_{XPh}\right)\) \(\ \)denote the covariances and coefficient of correlations between their subscripts respectively.

In the presence of non-response, let \({\ N}_{1h}\) and \({\ N}_{2h}\)be the population sizes of the responding and non-responding units, respectively. Let \(S^2_{Zh\eqref{GrindEQ__2_}}\)denote the population variance of the sensitive variable for the non-responding units in the \(h^{th}\) stratum.

A relatively large sample of \(n'_h\) is drawn from the \(h^{th}\) stratum population using simple random sampling without replacement (SRSWOR). The sample mean of the auxiliary variable in the first phase sample is given as\({\ \ \overline{x}}'_h=\frac{1}{n'_h}\sum^{n'_h}_{i=1}{x_{hi}}\) and the proportion of units in the first phase sample possessing an auxiliary attribute as\(\ p'_h=\frac{a_{hj}}{n'_h}\). A second phase random sample of size \({\ n}_h\) is drawn from the first phase, \({\ n}_{1h}\) units are observed to respond, and non-response is observed on the remaining \({\ n}_{2h}\) units. Let \({\overline{x}}_h=\frac{1}{{\ n}_h}\sum^{{\ n}_h}_{i=1}{x_{hi}}\) be the sample mean and \({\ p}_h=\frac{a_{hj}}{{\ n}_h}\) be the proportion of units in the second phase sample that possess an auxiliary attribute. Also, let \({\overline{z}}_{1h}=\frac{1}{{\ n}_{1h}}\sum^{{\ n}_h}_{i=1}{z_{hi}}\ \ \) be the sample mean for the responding group in the second phase sample. A sub-sample of size\(\ r_{2h}=\frac{{\ n}_{2h}}{{\ k}_{2h}}\), where \({\ k}_{2h}\) is the inverse sampling rate is drawn from the non-responding sample. Let \({\overline{z}}_{2h}=\frac{1}{{\ r}_{2h}}\sum^{{\ r}_{2h}}_{i=1}{z_{hi}}\ \ \)be the sub-sample mean for the non-responding units. The estimate of the population mean for the scrambled response is given as

\(\ {\overline{z}}^*_h=w_{1h}{\overline{z}}_{1h}+\ w_{2h}{\overline{z}}_{2h}\), where\(\ w_{1h}=\frac{n_{1h}}{{\ n}_h}\), and\(\ \ w_{2h}=\frac{n_{2h}}{{\ n}_h}\).

In the presence of measurement errors, let \(\ Z^*_{hi}\) and \(z^*_{hi}\ \)be the true and observed values, respectively, for the scrambled response. Furthermore, let\(\ T^*_{hi}=z^*_{hi}-Z^*_{hi}\) denote the measurement errors associated with the scrambled response. These measurement errors are assumed to be normally distributed with mean 0 and variance\({\ S}^2_{Th}\). Furthermore, let \(S^2_{Th\eqref{GrindEQ__2_}}\) be the variance associated with non-responding units.

In this study, the following conventional and non-conventional measures of auxiliary variables are used in developing the special cases of the proposed generalized class of estimators;

\(C_{Xh}=\frac{S_{Xh}}{{\overline{X}}_h}\), coefficient of variation,
Coefficient of correlation defined as \({\rho }_{XYh}\),
Coefficient of skewness defined as \({\beta }_{1h}(x)=\frac{N_h\sum^{N_h}_{i=1}{{\left(X_{hi}-{\overline{X}}_h\right)}^3}}{\left(N_h-1\right)\left(N_h-2\right)S^3_{Xh}}\)
Coefficient of kurtosis defined as \({\beta }_{2h}(x)=\frac{N_h\left(N_h+1\right)\sum^N_{i=1}{{\left(X_{hi}-{\overline{X}}_h\right)}^4}}{\left(N_h-1\right)\left(N_h-2\right)\left(N\_h-3\right)S^4_{Xh}}-\frac{3{\left(N_h-1\right)}^2}{\left(N_h-2\right)\left(N_h-3\right)}\)
Mid-range is defined as\(,\ {MR}_h(x)=\frac{x_{h\left(1\right)}+x_{h\left(Nh\right)}}{2}\).

, where \(\ x_{h\left(1\right)}\) and \(x_{\left(Nh\right)}\) are the minimum and maximum values in a data set.

Quartile deviation is defined as\({\ QD}_h(x)=\frac{Q_{3h}(x)-Q_{1h}(x)}{2}\).
Tri-mean was proposed by Turkey [23] and is defined as

\({TM}_h(x)=\frac{Q_{1h}(x)+2Q_{2h}(x)+Q_{3h}(x)}{4}\), where \(Q_{1h}(x),\ Q_{2h}(x)\ \mathrm{and}{\ Q}_{3h}(x)\) are the first, second and third quartiles respectively,

Hodges-Lehmann [24] estimator is defined as \({HL}_h(x)=\mathrm{Median}\left(\frac{x_{jh}+x_{kh}}{2}\right),\) \[\ 1\le jh\le kh\le N\]
Some Existing Estimators
The adapted ordinary estimator of population mean is defined as \[\label{GrindEQ__2_} {\overline{Y}}_0\mathrm{=}\sum^L_{h=1}{w_h{\overline{z}}^*_h} \tag{2}\]

The variance of the estimator is given as \[\label{GrindEQ__3_} Var({\overline{Y}}_0)\cong \sum^L_{h=1}{W^2_h}B_h \tag{3}\]

The adapted Cochran [25] ratio estimators are defined as \[\label{GrindEQ__4_} {\overline{Y}}_R\mathrm{=}\sum^L_{h=1}{w_h{\overline{z}}^*_h\ \frac{{\overline{x}}'_h}{{\overline{x}}^*_h}} \tag{4}\]

The expression for the bias is given as \[\label{GrindEQ__5_} Bias({\overline{Y}}_R)\cong \sum^L_{h=1}{\frac{W_h}{{\overline{X}}_h}\left[\frac{9}{8}R_h\left(A_h-C_h\right)-\left(E_h-D_h\right)\right]} \tag{5}\]

The expression of the mean square error is given as

\[\label{GrindEQ__6_} MSE({\overline{Y}}_R)\cong \sum^L_{h=1}{W^2_h\left[B_h+R^2_h\left(A_h-C_h\right)-2R_h\left(E_h-D_h\right)\right]} \tag{6}\] , where \(A_h={\theta }_hS^2_{Xh},\ \) \(C_h={\theta }^{\ '}_hS^2_{Xh},\ \) \(D_h={\theta }^{\ '}_hS^2_{ZXh},\) \(E_h={\theta }_hS_{ZXh},\) \({\theta }'_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right)\), and \({\theta }^{\ '}_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right).\ \)

The adapted Bahl and Tuteja [26] exponential ratio-type estimator is defined as \[\label{GrindEQ__7_} t_{ER}=\sum^L_{h=1}{W_h{\overline{z}}_hexp\left(\frac{{\overline{x}}'_h-{\overline{x}}_h}{{\overline{x}}'_h+{\overline{x}}_h}\right)} \tag{7}\]

The expression for the bias is given as \[\label{GrindEQ__8_} Bias(t_{ER})\cong \sum^L_{h=1}{\frac{W_h}{2{\overline{X}}_h}\left[\frac{3}{4}R_h\left(A_h-C_h\right)-\left(E_h-D_h\right)\right]} \tag{8}\]

The expression for the mean square error is given as

\[\label{GrindEQ__9_} MSE(t_{ER})\cong \sum^L_{h=1}{W^2_h\left[B_h+\frac{1}{4}R^2_h\left(A_h-C_h\right)-R_h\left(E_h-D_h\right)\right]} \tag{9}\]

2. The Proposed Strategy of Mean Estimation

2.1. Modified Hansen and Hurwitz [14] technique

From equation (1) the expected value of the scrambled response under randomization mechanisms (Onyango et al. [27]) is defined as \[\label{GrindEQ__10_} E_R\left(Z_{hi}\right)=E_R\left[Y_{hi}\left(1-{\varphi }_h\right)+\left(Y_{hi}+S_{hi}\right){\varphi }_h\right] \tag{10}\] \[\label{GrindEQ__11_} E_R\left(Z_{hi}\right)=Y_{hi}+{\varphi }_h\ {\overline{S}}_h\ \tag{11}\] , where\(\ {\varphi }_h=f_h+{\psi }_h\left(1-t_h-f_h\right).\ \ \)

The variance of the response variable under randomization mechanisms is given as \[V_R\left(Z_{hi}\right)=V_R\left(Y_{hi}+{\varphi }_hS_{hi}\right)\] \[\label{GrindEQ__12_} V_R\left(Z_{hi}\right)=\varphi \left(S^2_{Sh}+{\overline{S}}^2_h\ \right)-{\varphi }^2_h{\overline{S}}^2_h \tag{12}\] \[\label{GrindEQ__13_} V_R\left(Z_{hi}\right)=\varphi S^2_{Sh} \tag{13}\] The transformed value of the randomized response is given as \[\label{GrindEQ__14_} {\hat{y}}_{hi}=z_{hi}-{\varphi }_h{\overline{S}}_h \tag{14}\] , with \(E_R\left({\hat{y}}_{hi}\right)=y_{hi}\) and,\(V_R\left({\hat{y}}_{hi}\right)=\varphi S^2_{Sh}\) where\(\ y_{hi}\) is the true response. Therefore, the modified Hansen and Hurwitz [15] technique with an additive three-stage ORRT added is defined as \[\label{GrindEQ__15_} {\widehat{\overline{y}}}_h=w_{1h}{\widehat{\overline{y}}}_{1h}+\ w_{2h}{\widehat{\overline{y}}}_{2h} \tag{15}\] \[\label{GrindEQ__16_} E\left({\hat{y}}_h\right)={\overline{Y}}_h \tag{16}\] \[\label{GrindEQ__17_} var\left({\widehat{\overline{y}}}_h\right)=E_1\left[V_2\left({\widehat{\overline{y}}}_h\right)\right]+V_1\left[E_2\left({\widehat{\overline{y}}}_h\right)\right] \tag{17}\] \[\label{GrindEQ__18_} var\left({\widehat{\overline{y}}}_h\right)=var\left({\overline{y}}_h\right)+E_1\left[\frac{n_{1h}}{n^2_h}\frac{;{o}_hS^2_{Sh}}{n_{1h}}\right]+E_1\left[\frac{n_{2h}}{n^2_h}k_{2h}{o}_h\ S^2_{Sh}\right] \tag{18}\] \[\label{GrindEQ__19_} var\left({\widehat{\overline{y}}}_h\right)=var\left({\overline{y}}_h\right)+{\mathrm{\Omega }}_h \tag{19}\] , where \({\mathrm{\Omega }}_h=\frac{\"{o}_h\ S^2_{Sh}}{n_h}\left(W_{1h}+k_{2h}W_{2h}\right)\) is the contribution of the three-stage ORRT to the variance of Hansen and Hurwitz [14] estimator.

2.2. The proposed generalized class of estimators

The suggested randomized response estimator of the finite population mean for a sensitive variable in the presence of non-response and measurement errors simultaneously is defined as \[\label{GrindEQ__20_} {\overline{Y}}_g\mathrm{=}\sum^L_{h=1}{w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}'_h-{\overline{x}}_h\right)+{\beta }_h\mathrm{\ }\left(p'_h-p_h\right)\right]exp\left(\frac{~a_h({\overline{x}}'_h-{\overline{x}}_h)}{a_h({\overline{x}}'_h+{\overline{x}}_h)+2b_h}\right)}, \tag{20}\] , where\(\ {\alpha }_h\), and \({\beta }_h\) are appropriately chosen constants,\({~a}_h\ and\ b_h\), are either real numbers or some known conventional and non-conventional measures of auxiliary variable. Let \[\label{GrindEQ__21_} {\sigma }_{Zh}={\overline{z}}^*_h-{\overline{Z}}_h \tag{21}\] \[\label{GrindEQ__22_} {\sigma }_{X1h}={\overline{x}}'_h-{\overline{X}}_h\ \tag{22}\] \[\label{GrindEQ__23_} {\sigma }_{P1h}=p'_h-P_h\ \tag{23}\] \[\label{GrindEQ__24_} {\sigma }_{Xh}={\overline{x}}_h-{\overline{X}}_h \tag{24}\] \[\label{GrindEQ__25_} {\sigma }_{Ph}=p_h-P_h\ \tag{25}\]

\[\label{GrindEQ__26_} E\left({\sigma }_{Zh}\right)=E\left({\sigma }_{Xh}\right)=E\left({\sigma }_{X1h}\right)=E\left({\sigma }_{P1h}\right)=E\left({\sigma }_{Ph}\right)=0\ \tag{26}\] Furthermore, let \[\label{GrindEQ__27_} E({\sigma }^2_{Xh}\mathrm{)=}~{\theta }_hS^{\mathrm{2}}_{Xh}\mathrm{=}A_h~~ \tag{27}\] \[\label{GrindEQ__28_} E({\sigma }^2_{Zh}\mathrm{)=}~~{\theta }_h~\left(S^{\mathrm{2}}_{Yh}\mathrm{+}S^{\mathrm{2}}_{Th}\right)\mathrm{+}{\theta }^{\mathrm{*}}_h\left(S^{\mathrm{2}}_{Yh\left(\mathrm{2}\right)}\mathrm{+}S^{\mathrm{2}}_{Th\left(\mathrm{2}\right)}\right)+{\mathrm{\Omega }}_h=B_h \tag{28}\] \[\label{GrindEQ__29_} E\left({\sigma }^2_{X1h}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS^{\mathrm{2}}_{Xh}\mathrm{=}C_h~ \tag{29}\] \[\label{GrindEQ__30_} \mathrm{E}\left({\sigma }_{X1h}{\sigma }_{Zh}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{ZXh}=D_h \tag{30}\] \[\label{GrindEQ__31_} E\left({\sigma }_{Xh}{\sigma }_{Zh}\right)\mathrm{=}{\theta }_hS_{ZXh}=E_h \tag{31}\] \[\label{GrindEQ__32_} E({\sigma }^2_{Ph}\mathrm{)=}{\theta }_hS^{\mathrm{2}}_{ph}=F_h \tag{32}\] \[\label{GrindEQ__33_} E({\sigma }^2_{P1h})\mathrm{=}{\theta }^{\mathrm{'}}_hS^{\mathrm{2}}_{Ph}=G_h \tag{33}\] \[\label{GrindEQ__34_} E\left({\sigma }_{Ph}{\sigma }_{Zh}\right)\mathrm{=}{\theta }_hS_{Zph}=H_h \tag{34}\] \[\label{GrindEQ__35_} \mathrm{E}\left({\sigma }_{P1h}{\sigma }_{Zh}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{Zph}=I_h \tag{35}\] \[\label{GrindEQ__36_} \mathrm{E}\left({\sigma }_{Xh}{\sigma }_{Ph}\right)\mathrm{=}{\theta }_hS_{Xph}=J_h \tag{36}\] \[\label{GrindEQ__37_} E\left({\sigma }_{P1h}{\sigma }_{Xh}\right)\mathrm{=E}\left({\sigma }_{X1h}{\sigma }_{Ph}\right)\mathrm{=E}\left({\sigma }_{X1h}{\sigma }_{P1h}\right)\mathrm{=}{\theta }^{\mathrm{'}}_hS_{XPh}=L_h \tag{37}\] , where \({\theta }'_h=\left(\frac{1}{n'_h}-\frac{1}{N_h}\right),\ {\theta }_h=\left(\frac{1}{n_h}-\frac{1}{N_h}\right)\mathrm{,\ }{\theta }^*_h=\frac{W_h\left(k_{2h}-1\right)}{n_h}\ \mathrm{and\ }W_h=\frac{N_h}{N}\ \ \)

Substituting equations (21)-(25) in (20) and simplifying while ignoring terms of order greater than two to obtain the approximation for the bias as \[\label{GrindEQ__38_} Bias({\overline{Y}}_g)\cong \sum^L_{h=1}{\frac{W_h{\lambda }_h}{2}}\left[{\frac{3}{4}\lambda }_h{\overline{Z}}_h~\left(A_h-C_h\right)+{\alpha }_h\left(A_h-C_h\right)-~\left(E_h-D_h\right)+{\beta }_h\left(J_h-L_h\right)\right] \tag{38}\] \[where\ {\lambda }_h=\frac{a_h}{a_h{\overline{X}}_h+b_h}\] The approximation for the MSE is given as \[\label{GrindEQ__39_} MSE({\overline{Y}}_g)\cong \sum^L_{h=1}{W^2_h}\left[B_h+{\vartheta }_{1h}+{\alpha }^2_h{\vartheta }_{2h}+{\beta }^2_h{\vartheta }_{3h}+{\beta }_h{\vartheta }_{4h}+{\alpha }_h{\vartheta }_{5h}+2{\alpha }_h{\beta }_h{\vartheta }_{6h}\right] \tag{39}\] , where \({\vartheta }_{1h}=\frac{1}{4}{\lambda }^2_h{\overline{Y}}^2_h\left(A_h-C_h\right)-{\lambda }_h{\overline{Y}}_h\left(E_h-D_h\right)\) \[{\vartheta }_{2h}=\left(A_h-C_h\right),\] \[{\vartheta }_{3h}=\left(F_h-G_h\right),\] \[{\vartheta }_{4h}={\overline{Y}}_h{\lambda }_h\left(J_h-L_h\right)-2\left(H_h-I_h\right)\] \[{\vartheta }_{5h}={\overline{Y}}_h{\lambda }_h\left(A_h-C_h\right)-2\left(E_h-D_h\right)\] \[{\vartheta }_{6h}=\left(J_h-L_h\right)\] The optimum values for \({\alpha }_h\) and\({\ \beta }_h\) are given as \[\label{GrindEQ__40_} {\alpha }^{(opt)}_h=\frac{{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{6h}-{\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{3h}}{2\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)} \tag{40}\] , and \[\label{GrindEQ__41_} {\beta }^{(opt)}_h=\frac{{\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{6h}-{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{2h}}{2\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)} \tag{41}\] Substitute equations (40) and (41) in (39) to obtain the minimum MSE as \[\label{GrindEQ__42_} {MSE({\overline{Y}}_g)}_{min}\cong \sum^L_{h=1}{W^2_h}\left[{B_h+\mathrm{\vartheta }}_{1h}-\frac{{\mathrm{\vartheta }}^2_{4h}}{4{\mathrm{\vartheta }}_{3h}}-\frac{{\left({\mathrm{\vartheta }}_{5h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}_{4h}{\mathrm{\vartheta }}_{6h}\right)}^2}{4{\mathrm{\vartheta }}_{3h}\left({\mathrm{\vartheta }}_{2h}{\mathrm{\vartheta }}_{3h}-{\mathrm{\vartheta }}^2_{6h}\right)}\right] \tag{42}\] Table 1 shows some special cases of the proposed generalized class of estimators.

Table 1 Some members of the proposed generalized class of estimators
Proposed generalized class of estimators	\(a_h\)	\(b_h\)
\(\overline{Y}_1\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{({\overline{x}}’_h-{\overline{x}}_h)}{({\overline{x}}’_h+{\overline{x}}_h)}\right)\)	1	0
\(\overline{Y}_2\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{({\overline{x}}’_h+{\overline{x}}_h)+2C_{Xh}}\right)\)	1	\(C_Xh\)
\(\overline{Y}_3\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~C_{Xh}({\overline{x}}’_h-{\overline{x}}_h)}{C_{Xh}({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh}}\right)\)	\(C_Xh\)	\(\rho _XYh\)
\(\overline{Y}_4\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{\beta }_{1h}(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{\beta }_{1h}(x)({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh}}\right)\)	\(\beta _1h(x)\)	\(\rho _XYh\)
\(\overline{Y}_5\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~{\beta }_{2h}(x)({\overline{x}}’_h-{\overline{x}}_h)}{{\beta }_{2h}(x)({\overline{x}}’_h+{\overline{x}}_h)+2{\beta }_{1h}(x)}\right)\)	\(\beta _2h(x)\)	\(\beta _1h(x)\)
\(\overline{Y}_6\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{QD}_h(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{QD}_h(x)({\overline{x}}’_h+{\overline{x}}_h)+2{TM}_h(x)}\right)\)	\(QD_h(x)\)	\(TM_h(x)\)
\(\overline{Y}_7\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{QD}_h(x)({\overline{x}}’_h-{\overline{x}}_h)}{{QD}_h(x)({\overline{x}}’_h+{\overline{x}}_h)+2{MR}_h(x)}\right)\)	\(QD_h(x)\)	\(MR_h(x)\)
\(\overline{Y}_8\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{{HL}_h(x)~({\overline{x}}’_h-{\overline{x}}_h)}{{HL}_h(x)\ ({\overline{x}}’_h+{\overline{x}}_h)+2{TM}_h(x)}\right)\)	\(HL_h(x)\)	\(TM_h(x)\)
\(\overline{Y}_9\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~{\rho }_{XYh}({\overline{x}}’_h-{\overline{x}}_h)}{\ {\rho }_{XYh}\ ({\overline{x}}’_h+{\overline{x}}_h)+2{\beta }_{2h}(x)}\right)\)	\(\rho _XYh\)	\(\beta _2h(x)\)
\(\overline{Y}_10\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{\ ({\overline{x}}’_h+{\overline{x}}_h)+2{\rho }_{XYh})}\right)\)	1	\(\rho _XYh\)
\(\overline{Y}_11\mathrm{=}\sum^L_h=1w_h\left[{\overline{z}}^*_h+{\alpha }_h\left({\overline{x}}’_h-{\overline{x}}_h\right)+{\beta }_h\left(p’_h-p_h\right)\right]exp\left(\frac{~({\overline{x}}’_h-{\overline{x}}_h)}{\ ({\overline{x}}’_h+{\overline{x}}_h)+2{QD}_h(x))}\right)\)	1	\(QD_h(x)\)

The expression for the biases and mean square errors for the members of the proposed generalized class are obtained by substituting the values of \({\alpha }_h\ and\ {\beta }_h\) in equations (38) and (42) respectively.

3. Efficiency Comparison

The proposed estimators performs better than other estimators when the following conditions are satisfied

From equations (3) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}<Var\left({\overline{Y}}_0\right)\) if \[\label{GrindEQ__43_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h\right]<0 \tag{43}\]
From equations (6) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}<MSE\left({\overline{Y}}_R\right)\) if \[\label{GrindEQ__44_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h-R^2_h\left(A_h-C_h\right)+2R_h\left(E_h-D_h\right)\ \right]<0 \tag{44}\]
From equations (9) and (42), \(MSE{\left({\overline{Y}}_g\right)}_{min}-MSE\left({\overline{Y}}_{ER}\right)<0\) if \[\label{GrindEQ__45_} \left[{\vartheta }_{1h}-\frac{{\vartheta }^2_{4h}}{4{\vartheta }_{3h}}-\frac{{\left({\vartheta }_{5h}{\vartheta }_{3h}-{\vartheta }_{4h}{\vartheta }_{6h}\right)}^2}{4{\vartheta }_{3h}\left({\vartheta }_{2h}{\vartheta }_{3h}-{\vartheta }^2_{6h}\right)}-B_h-{\frac{1}{4}R}^2_h\left(A_h-C_h\right)-R_h\left(E_h-D_h\right)\ \right]<0 \tag{45}\]

4. Empirical study

The efficiency of the proposed estimator is compared to adapted estimators in a numerical study. The real data for numerical analysis is COVID-19 obtained from www.worldometer.com and Rosner [28]. For data simulation and coding, the R programming language is used. Each population unit is subjected to measurement errors, which are normally distributed with mean 2 and variance 5. Using the least variance and percent relative efficiency (PRE) methods, the efficiency of the proposed estimator is compared to adapted estimators. The percent relative efficiency (PRE) of estimators of population mean is calculated using the formula; \[\label{GrindEQ__46_} PRE({\overline{Y}}_g)=\frac{var({\overline{Y}}_0)}{MSE({\overline{Y}}_g)}\times 100 \tag{46}\] , where\(\ g={\overline{Y}}_R,\ {\overline{Y}}_{ER},\ 1,\ 2,\ \dots ,\ 11\). An estimator with the highest value of PRE is considered the most efficient than others. The values of PREs are obtained at 20% and 80% sensitivity levels of the survey question. Also, the PREs are obtained at 20% and 30% non-response rates.

4.1. COVID-19 data (www.worldometer.com )

The data consist of six strata: the African Region (\(N_1\)=31200), the American region (\(N_2\)=34944), the Eastern Mediterranean Region (\(N_3\)=13728), the European Region (\(N_4\)=38688), the South-East Asia Region (\(N_5\)=6864), and the Western Pacific Region (\(N_6\)=21840). X is the number of new cases, Y is the number of deaths recorded in a given day, and P is the number of deaths less than one in a given day. Scrambled responses are generally distributed with mean 0 and variance 2 generated for each value of Y. Table 2 shows a summary of statistics for the responding units and Table 3 for the non-responding units.

Table 2 Parameters for COVID-19 data
Parameter	Stratum 1	Stratum 2	Stratum 3	Stratum 4	Stratum 5	Stratum 6
\(\overline{X}_h\)	\(\mathrm{188.9035}\)	\(\mathrm{2502.012}\)	\(\mathrm{1120.151}\)	\(\mathrm{1757.061}\)	\(\mathrm{6175.008}\)	356.2095
\(\overline{Y}_h\)	4.543181	\(\mathrm{61.90972}\)	\(\mathrm{20.51225}\)	\(\mathrm{33.79095}\)	\(\mathrm{97.12205}\)	\(\mathrm{4.833472\ }\)
\(S^2_Xh\)	\(\mathrm{1094471}\)	\(\mathrm{187408859}\)	\(\mathrm{8526375}\)	\(\mathrm{24712119}\)	\(\mathrm{817189958}\)	\(\mathrm{318940}\)
\(S^2_Zh\)	\(\mathrm{926.4621}\)	\(\mathrm{76639.99}\)	\(\mathrm{2937.237}\)	\(\mathrm{11588.58}\)	\(\mathrm{145353}\)	\(\mathrm{849.8079}\)
\(S^2_Ph\)	\(\mathrm{0.2017896}\)	\(\mathrm{0.2328431}\)	\(\mathrm{0.2253055}\)	\(\mathrm{0.2467874}\)	\(\mathrm{0.247146}\)	\(\mathrm{0.1323922}\)
\(\rho _XZh\)	\(\mathrm{0.8171398}\)	\(\mathrm{0.7944946}\)	\(\mathrm{0.834325}\)	\(\mathrm{0.6559524}\)	\(\mathrm{0.8679977}\)	\(\mathrm{0.7237861}\)
\(\rho _XPh\)	\(\mathrm{-}\mathrm{0.2608673}\)	\(\mathrm{-}\mathrm{0.2379639}\)	\(\mathrm{-}\mathrm{0.265470}\)	\(\mathrm{-}\mathrm{0.2982271}\)	\(\mathrm{-}\mathrm{0.239344}\)	\(\mathrm{-}\mathrm{0.4403104}\)
\(\rho _ZPh\)	-0.2386865	-0.2924192	-0.2729946	-0.2802833	-0.2839612	-0.3832064
\(S^2_Th\)	24.72743	24.91892	25.03186	25.29474	24.74669	18.97865

Table 3 Parameters for non-responding units for COVID-19 data
non-response rate	stratum	\(S^2_Xh\eqref{GrindEQ__2_}\)	\(S^2_Ph\eqref{GrindEQ__2_}\)
20%	1	989315.2	0.2050127
	2	199477087	0.2343414
	3	8176575	0.2249875
	4	25298233	0.2462478
	5	708141536	0.2460244
	6	708141536	0.1284571
30%	1	1071816	0.2047748
	2	206269098	0.2334903
	3	8525833	0.2250371
	4	24546992	0.2462213
	5	681811147	0.2462587
	6	681811147	0.1297743

Tables 4 and 5 represent the values of PREs of estimators of population mean in the cases without and with measurement errors, respectively. From the tables, the values of PREs decrease with an increase in inverse sampling rates and non-response rates. Additionally, the values of PREs decrease in the presence of non-response and measurement errors simultaneously. Also, the values of PREs decrease with an increase in the sensitivity levels of the survey question. The proposed estimator \({\overline{Y}}_6\) has the highest PRE compared to all other estimators in this study. Generally, the proposed estimators perform better than the adapted estimator.

Table 4 Percent Relative Efficiencies (PREs) of different estimators for COVID-19 data without measurement errors at \(t_h=0.3\ and f_h=0.2\)
Estimator		20% non-response			30% non-response
		\(k_{2h}\)			\(k_{2h}\)
		2	4	8	2	4	8
\(\overline{Y}_0\)	\(\psi _h\)	100	100	100	100	100	100
\(\overline{Y}_R\)	0.2 0.8	121.8078 121.8064	111.8264 111.8257	106.1745 106.1743	117.4540 117.4531	108.4124 108.4121	104.1318 104.1317
\(\overline{Y}_ER\)	0.2 0.8	126.6716 126.6685	113.9508 113.9495	107.1407 107.1397	121.7205 121.7184	110.2754 110.2746	105.0030 105.0028
\(\overline{Y}_1\)	0.2 0.8	129.4792 129.4760	115.5386 115.5372	107.9859 107.9855	123.3008 123.2988	110.9481 110.9473	105.3139 105.3137
\(\overline{Y}_2\)	0.2 0.8	129.4734 129.4702	115.5359 115.5345	107.9846 107.9842	123.2965 123.2945	110.9462 110.9455	105.3131 105.3129
\(\overline{Y}_3\)	0.2 0.8	129.4790 129.4758	115.5385 115.5371	107.9859 107.9854	123.3007 123.2987	110.9480 110.9473	105.3139 105.3137
\(\overline{Y}_4\)	0.2 0.8	125.7381 125.7354	113.7546 113.7534	107.1225 107.1221	120.4678 120.4660	109.7355 109.7349	104.7521 104.7519
\(\overline{Y}_5\)	0.2 0.8	129.4792 129.4760	115.5386 115.5372	107.9859 107.9855	123.3008 123.2988	110.9481 110.9473	105.3139 105.3137
\(\overline{Y}_6\)	0.2 0.8	129.5812 129.5780	115.5866 115.5852	108.009 108.0085	123.3776 123.3756	110.9805 110.9798	105.3289 105.3287
\(\overline{Y}_7\)	0.2 0.8	128.8921 128.8889	115.2619 115.2605	107.8528 107.8524	122.8585 122.8565	110.7607 110.7600	105.2276 105.2273
\(\overline{Y}_8\)	0.2 0.8	129.4789 129.4757	115.5385 115.5371	107.9859 107.9854	123.3007 123.2986	110.9480 110.9473	105.3139 105.3137
\(\overline{Y}_9\)	0.2 0.8	129.1671 129.1640	115.3917 115.3903	123.0658 107.9148	123.0658 123.0638	110.8486 110.8479	105.2681 105.2679
\(\overline{Y}_10\)	0.2 0.8	129.4783 129.4751	115.5382 115.5368	107.9857 107.9853	123.3001 123.2981	110.9478 110.9470	105.3138 105.3136
\(\overline{Y}_11\)	0.2 0.8	129.2297 129.2265	115.4211 115.4197	107.9295 107.9290	123.1130 123.1110	110.8686 110.8678	105.2773 105.2771

Table 5 Percent Relative Efficiencies (PREs) of different estimators for COVID-19 data with measurement errors at \(t_h=0.3\ and\ f_h=0.2\)
Estimator		20% non-response			30% non-response
		\(k_{2h}\)			\(k_{2h}\)
		2	4	8	2	4	8
\(\overline{Y}_0\)	\(\psi _h\)	100	100	100	100	100	100
\(\overline{Y}_R\)	0.2 0.8	121.7851 121.7837	111.8152 111.8149	106.1689 106.1689	117.4368 117.4360	108.4050 108.4048	104.1283 104.1283
\(\overline{Y}_ER\)	0.2 0.8	126.6432 126.6402	113.9378 113.9369	107.1339 107.1336	119.4025 119.3995	109.2678 109.2669	104.5540 104.5535
\(\overline{Y}_1\)	0.2 0.8	127.1178 127.1123	114.4185 114.4164	107.4453 107.4444	121.5170 121.5132	110.1884 110.1871	104.9627 104.9622
\(\overline{Y}_2\)	0.2 0.8	127.1235 127.1181	114.4213 114.4191	107.4466 107.4457	121.5214 121.5176	110.1902 110.1889	104.9635 104.9630
\(\overline{Y}_3\)	0.2 0.8	127.1179 127.1125	114.4186 114.4165	107.4453 107.4444	121.5171 121.5133	110.1884 110.1871	104.9627 104.9622
\(\overline{Y}_4\)	0.2 0.8	123.8424 123.8376	112.8318 112.8299	106.6707 106.6699	119.0193 119.0159	109.1042 109.1030	104.4571 104.4566
\(\overline{Y}_5\)	0.2 0.8	127.1178 127.1123	114.4185 114.4164	107.4453 107.4444	121.5170 121.5132	110.1884 110.1871	104.9627 104.9622
\(\overline{Y}_6\)	0.2 0.8	127.2173 127.2118	114.4661 114.4640	107.4683 107.4675	121.5925 121.5887	110.2207 110.2194	104.9777 104.9772
\(\overline{Y}_7\)	0.2 0.8	126.8219 126.8165	114.2767 114.2746	107.3765 107.3756	121.2925 121.2887	110.0918 110.0906	104.9179 104.9174
\(\overline{Y}_8\)	0.2 0.8	127.1180 127.1125	114.4186 114.4165	107.4453 107.4444	121.5172 121.5134	110.1884 110.1871	104.9627 104.9622
\(\overline{Y}_9\)	0.2 0.8	126.8822 126.8767	114.3056 114.3035	107.3905 107.3896	121.3382 121.3345	110.1115 110.1102	104.927 104.9265
\(\overline{Y}_10\)	0.2 0.8	127.1186 127.1132	114.4189 114.4168	107.4455 107.4446	121.5177 121.5139	110.1886 110.1873	104.9628 104.9623
\(\overline{Y}_11\)	0.2 0.8	126.9326 126.9272	114.3298 114.3277	107.4022 107.4014	121.3765 121.3728	110.1280 110.1267	104.9347 104.9341

4.2. Rosner [28] data

The data consist of two strata of sizes\(\ N_1=480\ and\ N_2=174\), with Y as forced expiratory volume, X as age (in years), and gender as an auxiliary attribute. The scrambling variable is taken to be smoking (Yes=1, No=0) and is used in generation of the response variable. The study variable, auxiliary attribute, and variable all have a positive bi-serial correlation. Tables 6 and 7 represents the population statistics for different data sets used in this study

Table 6 Parameters for Rosner [28] data
Parameter	\(\overline{X}_h\)	\(\overline{Y}_h\)	\(S^2_Xh\)	\(S^2_Yh\)	\(S^2_Ph\)	\(\rho _XZh\)	\(\rho _XPh\)	\(\rho _ZPh\)	\(S^2_Th\)
Stratum 1	8.558333	2.363715	3.604106	0.5254207	0.2503653	0.7239923	0.2999931	0.8365375	26.04856
Stratum 2	13.71839	3.763615	3.301741	0.7556429	0.2511461	0.3619965	0.7201403	0.4809902	20.19661

Table 7 Parameters for non-responding units for Rosner [28] data
non-response rate	stratum	\(S^2_Yh\eqref{GrindEQ__2_}\)	\(S^2_Th\eqref{GrindEQ__2_}\)
20%	1	0.5833481	25.62859
	2	0.4723521	17.39198
30%	1	0.5804701	27.00072
	2	0.6125016	20.19661

Tables 8 and 9 shows summary results for the PREs in the cases for without and with measurement errors respectively at different sensitivity levels. From the tables, the values of PREs decrease with an increase in inverse sampling rates and non-response rates. Additionally, the values of PREs decline in the presence of non-response and measurement errors simultaneously. For example, at 20% non-response,\(\ {\mathrm{k}}_{\mathrm{2h}}\mathrm{=2}\), and \({\mathrm{\psi up }}_{\mathrm{h}}\mathrm{=0.2\ }\) the value of PRE for \({\overline{\mathrm{Y}}}_{\mathrm{10}}\) is 114.2515 in the case for without measurement errors and decreases to 100.2539 in the presence of non-response and measurement errors simultaneously.

Furthermore, the values of PREs decreased with an increase in sensitivity level in the case for without measurement errors. The proposed estimators perform better than other adapted estimators in both cases for without and with measurement errors.

Table 8 Percent Relative Efficiencies (PREs) of different estimators for Rosner [28] data without measurement errors at \(t_h=0.3\ and\ f_h=0.2\)
Estimator		20% non-response			30% non-response
		\(k_{2h}\)			\(k_{2h}\)
		2	4	8	2	4	8
\(\overline{Y}_0\)	\(\psi _h\)	100	100	100	100	100	100
\(\overline{Y}_R\)	0.2 0.8	113.6230 112.4303	107.9472 107.3953	104.3350 104.0855	111.4729 110.5494	105.9844 105.6100	103.0583 102.8971
\(\overline{Y}_ER\)	0.2 0.8	110.8210 109.8950	106.3781 105.9412	103.5022 103.3022	109.1488 108.4265	104.8202 104.5218	102.4767 102.3468
\(\overline{Y}_1\)	0.2 0.8	114.2630 113.0079	108.3011 107.7228	104.5213 104.2606	112.0013 111.0310	106.2459 105.8542	103.1881 103.0198
\(\overline{Y}_2\)	0.2 0.8	114.2589 113.0042	108.2988 107.7207	104.5201 104.2595	111.9979 111.0279	106.2442 105.8527	103.1873 103.0190
\(\overline{Y}_3\)	0.2 0.8	114.1979 112.9492	108.2652 107.6896	104.5024 104.2429	111.9476 110.9821	106.2194 105.8295	103.1749 103.0074
\(\overline{Y}_4\)	0.2 0.8	113.8930 112.6741	108.0967 107.5337	104.4138 104.1596	111.6959 110.7528	106.0949 105.7133	103.1132 102.9490
\(\overline{Y}_5\)	0.2 0.8	114.2630 113.0079	108.3011 107.7228	104.5213 104.2606	112.0013 111.0310	106.2459 105.8542	103.1881 103.0198
\(\overline{Y}_6\)	0.2 0.8	114.1794 112.9324	108.2549 107.6801	104.4970 104.2378	111.9323 110.9682	106.2118 105.8224	103.1712 103.0039
\(\overline{Y}_7\)	0.2 0.8	114.1764 112.9298	108.2533 107.6786	104.4962 104.2370	111.9298 110.9660	106.2106 105.8213	103.1706 103.0033
\(\overline{Y}_8\)	0.2 0.8	114.2354 112.9830	108.2858 107.7087	104.5133 104.2531	111.9785 111.0103	106.2346 105.8437	103.1825 103.0146
\(\overline{Y}_9\)	0.2 0.8	114.1152 112.8745	108.2195 107.6473	104.4784 104.2203	111.8793 110.9199	106.1856 105.7980	103.1582 102.9916
\(\overline{Y}_10\)	0.2 0.8	114.2515 112.9975	108.2947 107.7169	104.5179 104.2575	111.9917 111.0224	106.2412 105.8498	103.1858 103.0176
\(\overline{Y}_11\)	0.2 0.8	114.1859 112.9383	108.2585 107.6834	104.4989 104.2396	111.9377 110.9731	106.2145 105.8249	103.1725 103.0051

Table 9 Percent Relative Efficiencies (PREs) of different estimators for Rosner [28] data with measurement errors at \(t_h=0.3\ and\ f_h=0.2\)
Estimator		20% non-response			30% non-response
		\(k_{2h}\)			\(k_{2h}\)
		2	4	8	2	4	8
\(\overline{Y}_0\)	\(\psi _h\)	100	100	100	100	100	100
\(\overline{Y}_R\)	0.2 0.8	100.2356 100.2351	100.1438 100.1436	100.0808 100.0807	100.1971 100.1968	100.1059 100.1058	100.0550 100.0550
\(\overline{Y}_ER\)	0.2 0.8	100.2074 100.2070	100.1266 100.1264	100.0711 100.0710	100.1735 100.1732	100.0933 100.0931	100.0485 100.0484
\(\overline{Y}_1\)	0.2 0.8	100.2542 100.2536	100.1551 100.1548	100.0872 100.0870	100.2127 100.2123	100.1143 100.1141	100.0594 100.0593
\(\overline{Y}_2\)	0.2 0.8	100.2541 100.2536	100.1551 100.1548	100.0871 100.0870	100.2126 100.2122	100.1142 100.1141	100.0593 100.0593
\(\overline{Y}_3\)	0.2 0.8	100.2530 100.2525	100.1544 100.1541	100.0868 100.0866	100.2117 100.2113	100.1137 100.1136	100.0591 100.0590
\(\overline{Y}_4\)	0.2 0.8	100.2474 100.2469	100.1510 100.1507	100.0848 100.0847	100.2070 100.2066	100.1112 100.1111	100.0578 100.0577
\(\overline{Y}_5\)	0.2 0.8	100.2542 100.2536	100.1551 100.1548	100.0872 100.0870	100.2127 100.2123	100.1143 100.1141	100.0594 100.0593
\(\overline{Y}_6\)	0.2 0.8	100.2526 100.2521	100.1542 100.1539	100.0866 100.0865	100.2114 100.2110	100.1136 100.1134	100.0590 100.0589
\(\overline{Y}_7\)	0.2 0.8	100.2526 100.2521	100.1541 100.1539	100.0866 100.0865	100.2113 100.2110	100.1136 100.1134	100.0590 100.0589
\(\overline{Y}_8\)	0.2 0.8	100.2537 100.2531	100.1548 100.1545	100.0870 100.0869	100.2122 100.2119	100.1141 100.1139	100.0592 100.0592
\(\overline{Y}_9\)	0.2 0.8	100.2515 100.2509	100.1535 100.1532	100.0862 100.0861	100.2104 100.2100	100.1131 100.1129	100.0587 100.0587
\(\overline{Y}_10\)	0.2 0.8	100.2539 100.2534	100.1550 100.1547	100.0871 100.0870	100.2125 100.2121	100.1142 100.1140	100.0593 100.0592
\(\overline{Y}_11\)	0.2 0.8	100.2527 100.2522	100.1542 100.1540	100.0867 100.0866	100.2115 100.2111	100.1137 100.1135	100.0590 100.0590

5. Conclusion

This study addresses the challenge of estimating the finite population mean in the presence of non-response and measurement errors simultaneously on a sensitive study variable. A general class of estimators is proposed using auxiliary attributes and variables. Up to the first degree of approximation, the bias and mean squared error (MSE) for the suggested estimator are appropriately computed. The proposed estimator outperforms the adapted ordinary estimator, ratio estimator, and exponential ratio-type estimator in numerical tests. Furthermore, when the non-response rate and inverse sampling rate grow, so do the mean squared errors of the proposed estimators. Finally, when non-response and measurement errors are present simultaneously, the efficiency of estimators of population mean decreases.

References

Warner, S. L. (1965). Randomized response: A survey for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63-69.
Chaudhuri, A., & Mukherjee, R. (1988). Randomized response: Theory and Techniques. Marcel Dekker, New York
Gupta, S. & Shabbir, J. (2004). Sensitivity estimation for personal interview survey questions. Statistica, 64(4), 643-653.
Gupta, S. Shabbir, J. & Sehra, S. (2010). Mean and sensitivity estimation in optional randomized response models. Journal of Statistical Planning and Inference, 140(10), 2870-2874.
Mehta. S., Dass, B. K., Shabbir, J., & Gupta, S. (2012). A three- stage optional randomized response model. Journal of Statistical Theory and Practice, 6(3), 412-427
Eichhorn, B. H., & Hayre, L. S. (1983). Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and Inference, 7(4), 307-316.
Gupta, S., & Shabbir, J. (2008). On improvement in estimating the population mean in simple random sampling. Journal of Applied Statistics, 35(5):559\(\mathrm{\{}\)566\(\mathrm{\}}\).
Gupta, S., Shabbir, J., & Sehra, S. (2012) Mean and sensitivity estimation in optional randomized response models. Journal of Statistical Planning and Inference, 140, 2870 – 2874.
Sousa, R., Shabbir, J., Rael, P. C., & Gupta, S. (2010). Ratio estimation of the mean of a sensitive variable in the presence of auxiliary information. Journal of Statistics Theory and Practice, 4(3), 495-507.
Zatezalo, T. (2017). Generalized mixture estimator of the mean of a sensitive variable in the presence of non-sensitive auxiliary information. Statistics and Applications, 15(1&2), 23-36.
Mushtaq, N., Noor-ul-Amin, M., & Hanif, M. (2016). Estimation of population mean of a sensitive variable in stratified two-phase sampling. Pakistan Journal of Statistics, 32(1), 393-404.
Mushtaq, N., Noor-ul-Amin, M., & Hanif, M., (2017). A family of estimators of a sensitive variable using auxiliary information in stratified random sampling. Pakistan Journal of Operation Research, 13(1), 141-155.
Mushtaq, N., Noor-ul-Amin, M. (2020). Joint influence of double sampling and randomized response technique on estimation method of mean. Applied Mathematics, 10(1), 12-19.
Hansen, M., & Hurwitz, W. (1946). The problem of non-response in sample surveys. Journal of American Statistical Association, 41, 517-529.
Khalil, S., Noor-Ul-Amin, M. & Hanif, M. (2018). Estimation of population mean for a sensitive variable in the presence of measurement error. Journal of Statistics and Management Systems, 21(1):81-91
Khalil, S., Gupta, S., & Hanif,. M. (2018). Estimation of finite population mean in stratified sampling using scrambled responses in the presence of measurement errors. Communications in Statistics – Theory and Methods, 48(6):1553-1561.
Khalil, S., Zhang, Q., & Gupta, S. (2019) Mean Estimation of Sensitive Variables under Measurement Errors using Optional RRT Models. Communications in Statistics – Simulation and Computation, DOI: 10.1080/03610918.2019.1584298
Onyango, R., Oduor, B., & Odundo, F. (2021). Joint influence of measurement errors and randomized response technique on mean estimation under stratified double sampling. Open Journal of Mathematical Science, 5(1), 192-199
Naeem, N., & Shabbir, J. (2018). Use of a scrambled response on two occasion’s successive sampling under nonresponse. Hacettepe Journal of Mathematics and Statistics, 47(3), 675-684.
Zahid, E., & Shabbir, J. (2019). Estimation of finite population mean for a sensitive variable using dual auxiliary information in the presence of measurement errors. PloS one, 14(2): e0212111.
Zhang, Q,, & Khalil, S., Gupta, S. (2020). Mean estimation of sensitive variables under non-response and measurement errors using optional RRT models. Journal of statistical theory and practice, 15.
Zhang, Q., & Khalil, S., & Gupta, S. (2021). Mean estimation in the simultaneous presence of measurement errors and non-response using optional RRT models under stratified sampling. Journal of Statistical Computation and Simulation, 91, 3492-3504
Tukey, J. W. (1970). Exploratory Data Analysis, Addison-Welsey Publishing Co., Reading, MA, USA.
Hodges, J. L., & Lehmann, E. L. (1963). Estimates of location based on rank tests. E., Salvemini, T., Eds.; Libreria Eredi Virgilio Veschi: Rome, Italy,
Cochran W.G. (1940). The estimation of the yields of the cereal experiments by sampling for the ratio of grain to total produce. Journal of Agricultural Science, 59, 1225-1226
Bahl, S., and Tuteja, R. (1991). Ratio and Product Type Exponential Estimators. Journal of Information and Optimization Sciences, 12(1), 159-164.
Onyango R., Mean estimation of a sensitive variable under nonresponse using three-stage RRT model in stratified two-phase sampling. Journal of Probability and Statistics, 2022. https://doi.org/10.1155/2022/4530120
Rosner, B. (2015). Fundamentals of biostatistics, Duxbury Press,

Contents