Joint influence of measurement errors and randomized response technique on mean estimation under stratified double sampling

Author(s): Ronald Onyango1, Brian Oduor1, Francis Odundo1
1Department of Applied Statistics, Financial Mathematics and Actuarial Science Jaramogi Oginga Odinga University of Science and Technology P.o Box 210, Bondo-Kenya.
Copyright © Ronald Onyango, Brian Oduor, Francis Odundo. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The present study proposes a generalized mean estimator for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement errors based on the Randomized Response Technique (RRT). Expressions for the bias and mean squared error for the proposed estimator are correctly derived up to the first order of approximation. Furthermore, the optimum conditions and minimum mean squared error for the proposed estimator are determined. The efficiency of the proposed estimator is studied both theoretically and numerically using simulated and real data sets. The numerical study reveals that the use of the Randomized Response Technique (RRT) in a survey contaminated with measurement errors increases the variances and mean squared errors of estimators of the finite population mean.

Keywords: Population mean; Sensitive variable; Measurement errors; Randomized Response Technique (RRT).

1. Introduction

Auxiliary variables are closely related to the survey variable and are used in a survey at the design and estimation stage to improve the efficiency of estimators of the finite population mean. The difference between the true value of a variable and the value recorded in a survey is referred to as measurement errors. Measurement errors are caused by memory loss, prestige bias, over-reporting, under-reporting, processing errors, and incorrect values from the respondent. In literature, most researchers assume that the data collected in a survey are error-free. However this is not the case, the problem of measurement errors is inherent in survey sampling.

In a survey, the researcher faces the problem of estimation of the finite population mean for a sensitive survey question with a social stigmatizing characteristic such as ”Have you ever had an abortion?”, ”Are you a drug addict?” and ”Have you ever been infected with sexually transmitted diseases?”. Moreover, it is challenging to obtain the correct responses on such questions in personal interviews which involve direct questioning of the subjects because the respondent’s privacy is unprotected. Consequently, this may result in measurement errors. Warner [1] proposed the Randomized Response Technique (RRT) which aims at reducing answer bias in a survey involving a sensitive variable through the protection of the privacy of the respondents. In the Randomized Response Technique (RRT), a scrambled variable that is independent of the survey and auxiliary variables are used in the estimation of the finite population means of a sensitive variable. The respondent is expected to provide a true response for the non-sensitive auxiliary variable and a scrambled response for the survey variable. The scrambled response is obtained by adding a random number to the true response of a sensitive question. The value added is unknown to the survey practitioners but the probability distribution of the scrambled response is assumed to be known.

The problem of estimation of the finite population mean for a non-sensitive variable using auxiliary variable under simple random sampling is addressed by Shalabh [2], Diwakar et al., [3] and, Yadav et al., [4]. Additionally, Gajendra et al., [5] used calibrated weights to propose ratio and regression type mean estimators for a non-sensitive variable under stratified random sampling.

The problem of estimation of the finite population mean for a sensitive variable based on Randomized Response Technique (RRT) under different sampling schemes is addressed by Eichhorn and Hayre [6], Gupta and Shabbir [7], Gupta et al., [8], Sousa et al., [9] and Tanveer and Housila [10].

Mushtaq et al., [11] and Mushtaq et al., [12] have proposed different estimators of the finite population mean for a sensitive variable using a non-sensitive auxiliary variable under stratified random sampling. The problem of estimation of the finite population mean under stratified two-phase sampling is discussed by Mushtaq et al., [12]. The joint influence of double sampling and the Randomized Response Technique (RRT) on the estimation of the finite population mean under simple random sampling is addressed by Mushtaq and Noor-Ul-Amin [13]. Additionally, the problem of estimation of the finite population mean for a sensitive variable in the presence of non-response based on the Randomized Response Technique (RRT) is discussed by Naeem and Shabbir [14]. Zahid and Shabbir [15] proposed a generalized class of estimators of the finite population mean using a non-sensitive auxiliary variable in the presence of non-response and measurement errors under simple random sampling and stratified random sampling.

Sadia [16] proposed generalized estimators of the finite population mean in the presence of measurement errors under simple random sampling and stratified random sampling. The performances of the proposed estimators were studied in the presence and absence of the measurement errors. Recently, Zhang [17] addressed the problem of mean estimation for a sensitive variable based on optional Randomized Response Technique (RRT) in the presence of non-response and measurement errors under simple random sampling and stratified random sampling.

Handling sensitive survey questions and measurement errors is a major challenge for survey practitioners especially when both occur simultaneously in a survey. The present study fills the existing gap in the literature on mean estimation for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement errors under stratified double sampling. Also, the combined effect of measurement errors and Randomized Response Technique (RRT) on estimators of the finite population mean is investigated.

The study considers an additive Randomized Response Technique (RRT) model in which the respondent adds a random number to the true answer of a sensitive question to give a scrambled response. Further, the probability distribution of the scrambling variable is assumed to be known by the survey practitioner. The proposed strategy assumes that measurement errors are present in both first and second-phase samples of stratified double sampling.

In the present paper, Section 2 gives a detailed description of the population under study. The ordinary mean estimator of the finite population mean for a sensitive variable is discussed in Section 3. Section 4 describes the properties of the proposed estimator of the finite population mean for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement error. In Section 5, members of the family of the proposed generalized estimator are discussed. The efficiency of the proposed estimator is studied theoretically in Section 6. Finally, a numerical analysis of the performance of the proposed estimator is done in Section 7.

2. Population description and notations

Consider a heterogeneous population \(U =1, 2\dots N\) of size \(N\) consisting of a survey variable \(Y,\) and auxiliary variable, \(X.\) The population is categorized into \(L\) homogeneous groups of sizes \(N_h\) each known as strata. In a survey, direct observations cannot be made on a sensitive variable with social stigmatizing characteristics hence the Randomized Response Technique (RRT) is used for obtaining unbiased estimates of the finite population parameters. Let \(S,\) be a scrambling variable that is normally distributed with mean 0 and variance \(S^2_{Sh}\). The respondent is expected to provide a true response for the auxiliary variable and a scrambled response for the sensitive variable. Let \(Z_{hi}=Y_{hi}+S_{hi}\), denote the \(i^{th}\) value of a scrambled response in \(h^{th}\) stratum. Further, let \(Z_{hi}\) and \(X_{hi}\) denote \(i^{th}\) value of \(Z\) and \(X\) respectively in \(h^{th}\) stratum. Additionally, let \({\overline{Z}}_h\) and \({\overline{X}}_h\) be the population means for \(Z\) and \(X\) respectively in \(h^{th}\) stratum. Further, let \(S^2_{Zh}\) and \(S^2_{Xh}\) be the population variances of \(Z\) and \(X\) respectively in \(h^{th}\) stratum. Let \(S_{ZXh}\) and \({\rho }_{ZXh}\) denote the covariance and coefficient of correlation between their subscripts in \(h^{th}\) stratum.

In the presence of measurement errors, let \((x^*_{hi},{\ z}^*_{hi})\) and \((X^*_{hi},{\ Z}^*_{hi})\) be the observed and true values of \(X\) and \(Z\) respectively in \(h^{th}\) stratum. Let \(T^*_{hi}=z^*_{hi}-Z^*_{hi}\) and \(V^*_{hi}=x^*_{hi}-X^*_{hi}\) denote the measurement errors associated with \(Z\) and \(X\) respectively in \(h^{th}\) stratum. The measurement errors are assumed to be normally distributed with mean zero and variances \(S^2_{Th}\) and \(S^2_{Vh}\), for \(Z\) and \(X\) respectively in \(h^{th}\) stratum.

A relatively large sample of size \(n\) is drawn from the population using a simple random sampling without replacement (SRSWOR) and the units are classified into \(L\) homogeneous strata of size \(n’_h\) each. A second phase random sample of size \(n_h\) is drawn from the first phase sample using a simple random sampling without replacement (SRSWOR) and both the survey and auxiliary variables are studied. Let \({\overline{x}}’_h\) denote the first phase \(h^{th}\) stratum sample mean for \(X.\) Further, let \({\overline{x}}_h\) and \({\overline{z}}_h\) denote the second phase \(h^{th}\) stratum sample means for \(X\) and \(Z\) respectively. Let

\begin{equation} \label{GrindEQ__1_} {\sigma }_{X1h}={\overline{x}}’_h-{\overline{X}}_h, \end{equation}
(1)
\begin{equation} \label{GrindEQ__2_} {\sigma }_{Xh}={\overline{x}}_h-{\overline{X}}_h, \end{equation}
(2)
and
\begin{equation} \label{GrindEQ__3_} {\sigma }_{Zh}={\overline{z}}_h-{\overline{Z}}_h. \end{equation}
(3)
Take expectation on both sides of Equations (1)-(3) to obtain
\begin{equation} \label{GrindEQ__4_} E\left({\sigma }_{X1h}\right)=E\left({\sigma }_{Xh}\right)=E\left({\sigma }_{Zh}\right)=0. \end{equation}
(4)
Square both sides of Equations (1)-(3) and then take expectations to obtain
\begin{align} \label{GrindEQ__5_} A_h&=E\left({\sigma }^2_{Xh}\right)=\ \ {\theta }_h\ (S^2_{Xh}+S^2_{Vh}),\\ \end{align}
(5)
\begin{align} \label{GrindEQ__6_} B_h&=E\left({\sigma }^2_{Zh}\right)=\ \ {\theta }_h\ (S^2_{Zh}+S^2_{Th}),\\ \end{align}
(6)
\begin{align} \label{GrindEQ__7_} C_h&=E\left({\sigma }^2_{X1h}\right)=\ E\left({\sigma }_{X1h}{\sigma }_{Xh}\right)={\theta }’_h(S^2_{Xh}+S^2_{Vh}),\\ \end{align}
(7)
\begin{align} \label{GrindEQ__8_} D_h&=\left({\sigma }_{X1h}{\sigma }_{Zh}\right)={\theta }’_hS_{ZXh},\\ \end{align}
(8)
\begin{align} \label{GrindEQ__9_} E_h&=\left({\sigma }_{Xh}{\sigma }_{Zh}\right)={\theta }_hS_{ZXh}, \end{align}
(9)
where \({\theta }’_h=\left(\frac{1}{n’_h}-\frac{1}{N_h}\right)\) and \({\theta }_h=\left(\frac{1}{n_h}-\frac{1}{N_h}\right)\).

3. Existing estimators in the literature

The ordinary mean estimator in the presence of measurement errors in stratified double sampling is defined as
\begin{equation} \label{GrindEQ__10_} t_0=\sum^L_{h=1}{w_h{\overline{z}}_h}\,. \end{equation}
(10)
The variance is given as
\begin{equation} \label{GrindEQ__11_} {Var(t}_0)\cong \sum^L_{h=1}{W^2_hB_h}\,. \end{equation}
(11)

4. Proposed estimator

Let \({\overline{x}}’_h=\frac{1}{n’_h}\sum^L_{h=1}{x_{h\ }}\) and \({\overline{x}}_h=\frac{1}{n_h}\sum^L_{h=1}{x_h}\) denote the first and second-phase stratum sample means for the auxiliary variable respectively. Further, let \({\overline{z}}_h=\frac{1}{n_h}\sum^L_{h=1}{z_h}\) denote the mean for a scrambled response in the second phase stratum sample and \(w_h\) denote the \(h^{th}\) stratum weight. The proposed estimator of the finite population mean in the presence of measurement errors is given as
\begin{equation} \label{GrindEQ__12_} t_g=\sum^L_{h=1}{w_h{\overline{z}}_h}\left(\frac{{\overline{x}}’_h}{{\overline{x}}_h}\right)\left[{\alpha }_h\ exp\left(\frac{{\overline{x}}’_h-{\overline{x}}_h}{{\overline{x}}’_h+{\overline{x}}_h}\right)+\left(1-{\alpha }_h\right)exp\left(\frac{{\overline{x}}_h-{\overline{x}}’_h}{{\overline{x}}_h{\overline{+x}}’_h}\right)\ \ \right], \end{equation}
(12)
where \({\alpha }_h\), is a suitably chosen constant whose value is to be determined.

Substitute Equations (1)-(3) in (12) and solve using Taylor’s approximation while ignoring terms of order greater than two, and then subtract the population mean to obtain

\begin{align*} \left(t_g-\overline{Z}\right)=&\sum^l_{h=1}w_h\left[ {\sigma }_{Zh}-\frac{1}{2}\ R_h{\sigma }_{Xh}+\frac{1}{2}\ R_h{\sigma }_{X1h}+\frac{3}{8}\ \frac{R_h}{{\overline{X}}_h}\ {\sigma }^2_{Xh}-\ \frac{1}{8}\ \frac{R_h}{{\overline{X}}_h}\ {\sigma }^2_{X1h}-\ \frac{{\sigma }_{Xh}{\sigma }_{Zh}}{2{\overline{X}}_h}\right.\notag\end{align*} \begin{align} \label{GrindEQ__13_} & +\frac{{\sigma }_{X1h}{\sigma }_{Zh}}{2{\overline{X}}_h}-\frac{R_h}{4{\overline{X}}_h}{\sigma }_{Xh}{\sigma }_{X1h}+{\alpha }_hR_h{\sigma }_{X1h}-{\alpha }_hR_h{\sigma }_{Xh}+\frac{{\alpha }_h{\sigma }_{X1h}{\sigma }_{Zh}}{{\overline{X}}_h}- \frac{{\alpha }_h{\sigma }_{Xh}{\sigma }_{Zh}}{{\overline{X}}_h}\notag\\ &\left.+\frac{{\alpha }_hR_h{\sigma }^2_{X1h}}{2{\overline{X}}_h}+\frac{{3\alpha }_hR_h{\sigma }^2_{Xh}}{2{\overline{X}}_h}-\ \frac{2{\alpha }_hR_h}{{\overline{X}}_h}{\sigma }_{Xh}{\sigma }_{X1h} \right]. \end{align}
(13)
Take expectations on both sides of Equation (13) and substitute Equations (4)-(9) to obtain the approximation for the bias as
\begin{equation} \label{GrindEQ__14_} Bias(t_g)\cong \sum^L_{h=1}{\frac{W_h}{{\overline{X}}_h}}\left[R_h\left(\frac{3}{8}+\frac{3}{2}{\alpha }_h\right)\ \left(A_h-C_h\right)-\left(\frac{1}{2}+{\alpha }_h\right)\ \left(E_h-D_h\right)\right]. \end{equation}
(14)
Square both sides of Equation (13) and simplify while ignoring terms of order greater than two, and then take expectations to obtain the approximation for the mean squared error as
\begin{equation} \label{GrindEQ__15_} MSE(t_g)\cong \sum^L_{h=1}{W^2_h\left[B_h+R^2_h\left(\frac{1}{4}+{\alpha }_h+{\alpha }^2_h\right)\ \left(A_h-C_h\right)-R_h\left(1+2{\alpha }_h\right)\left(E_h-D_h\right)\right]}. \end{equation}
(15)
Differentiate Equation (15) partially with respect to \({\alpha }_h\) and then equate to zero to obtain
\begin{equation} \label{GrindEQ__16_} {\alpha }^*_h=\frac{E_h-D_h}{R_h(A_h-C_h)}-\frac{1}{2}. \end{equation}
(16)
Substitute Equation (16) in (15) to obtain the minimum mean squared error as
\begin{equation} \label{GrindEQ__17_} MSE(t_g)\cong \sum^L_{h=1}{W^2_h}\left[B_h-\frac{{(E_h-D_h)}^2}{(A_h-C_h)}\right]. \end{equation}
(17)

5. Members of family of Proposed generalized estimator

Members of the family of the proposed estimator are obtained as follows;
  • (i) For \({\alpha }_h=\frac{1}{2}\), the proposed estimator reduces to ratio estimator given as
    \begin{equation} \label{GrindEQ__18_} t_r=\sum^L_{h=1}{w_h{\overline{z}}_h}\left(\frac{{\overline{x}}’_h}{{\overline{x}}_h}\right) \end{equation}
    (18)
    The bias and mean squared error are given as
    \begin{equation} \label{GrindEQ__19_} Bias\left(t_r\right)\cong \sum^L_{h=1}{\frac{W_h}{{\overline{X}}_h}}\left[\frac{9}{8}\ R_h\left(A_h-C_h\right)-\left(E_h-D_h\right)\right], \end{equation}
    (19)
    and
    \begin{equation} \label{GrindEQ__20_} MSE(t_r)\cong \sum^L_{h=1}{W^2_h\left[B_h+R^2_h\left(A_h-C_h\right)-2R_h(E_h-D_h)\right]} \end{equation}
    (20)
  • (ii) For \({\alpha }_h=1\) , the proposed estimator reduces to exponential ratio-type estimator given as
    \begin{equation} \label{GrindEQ__21_} t_{err}=\sum^L_{h=1}{w_h{\overline{z}}_h}\left(\frac{{\overline{x}}’_h}{{\overline{x}}_h}\right)exp\left(\frac{{\overline{x}}’_h-{\overline{x}}_h}{{\overline{x}}’_h +{\overline{x}}_h}\right) \end{equation}
    (21)
    The bias and mean squared error are given as
    \begin{equation} \label{GrindEQ__22_} Bias\left(t_{err}\right)\cong \sum^L_{h=1}{\frac{W_h}{{\overline{X}}_h}}\left[\frac{15}{8}\ R_h\left(A_h-C_h\right)-\frac{3}{2}\left(E_h-D_h\right)\right],\ \ \ \ \ \ \ \ \ \ \end{equation}
    (22)
    and
    \begin{equation} \label{GrindEQ__23_} MSE(t_{err})\cong \sum^L_{h=1}{W^2_h\left[B_h+\frac{9}{4}R^2_h\left(A_h-C_h\right)-3R_h(E_h-D_h)\right]} \end{equation}
    (23)
  • (iii) For \({\alpha }_h=0\) , the proposed estimator reduces to exponential ratio- product- type estimator given as
    \begin{equation} \label{GrindEQ__24_} t_{erp}=\sum^L_{h=1}{w_h{\overline{z}}_h}\left(\frac{{\overline{x}}’_h}{{\overline{x}}_h}\right)exp\left(\frac{{\overline{x}}_h-{\overline{x}}’_h}{{\overline{x}}_h{\overline{+x}}’_h}\right) \end{equation}
    (24)
    The bias and mean squared error are given as
    \begin{equation} \label{GrindEQ__25_} Bias\left(t_{erp}\right)\cong \sum^L_{h=1}{\frac{W_h}{{\overline{X}}_h}}\left[\frac{3}{8}\ R_h\left(A_h-C_h\right)-\frac{1}{2}\left(E_h-D_h\right)\right], \end{equation}
    (25)
    and
    \begin{equation} \label{GrindEQ__26_} MSE(t_{erp})\cong \sum^L_{h=1}{W^2_h\left[B_h+\frac{1}{4}R^2_h\left(A_h-C_h\right)-R_h(E_h-D_h)\right]} \end{equation}
    (26)

6. Efficiency comparison

In this section, the performances of the proposed estimators are studied theoretically.
  • i. From Equations (11) and (17), \({MSE(t_g)}_{min}-Var\left(t_0\right)< 0\) if \[{\mathrm{(}E_h\ -D_h)}^2 >0.\]
  • ii. From Equations (17) and (20), \({MSE(t_g)}_{min}-MSE\ \left(t_r\right)< 0\) if \[{(D_h\ -E_h)}^2-R^2_h\ {\left(A_h\ -C_h\right)}^2-2R_{h\ }\left(E_h\ -D_h\right)\left(A_h\ -C_h\right)>0.\]
  • iii. From Equations (17) and (23), \({MSE(t_g)}_{min}-MSE\ \left(t_{err}\right)< 0\) if \[{(D_h\ -E_h)}^2-\frac{9}{4}R^2_h{\left(A_h\ -C_h\right)}^2-{3R}_{h}\left(E_h-D_h\right)\left(A_h-C_h\right)>0.\]
  • iv. From Equations (17) and (26), \({MSE(t_g)}_{min}-MSE\left(t_{erp}\right)< 0\) if \[{(D_h\ -E_h)}^2+\frac{1}{4}R^2_h{\left(A_h\ -C_h\right)}^2-R_{h}\left(E_h-D_h\right)\left(A_h-C_h\right)>0.\]
The stated inequalities provide the necessary conditions under which the proposed optimum estimator is more efficient than existing estimators of the finite population mean. The numerical study reveals that these conditions are true hence the proposed optimum estimator is recommended for use by survey practitioners when the conditions hold. Furthermore, the proposed strategy is useful for the construction of accurate confidence intervals for unknown population parameters in a survey based on the Randomized Response Technique (RRT) and contaminated with measurement errors.

7. Numerical study

7.1. Introduction

A numerical study is conducted using both simulated and real data sets to compare the performance of the proposed estimator with some existing estimators in the literature. The real data set is obtained from Sarndal et al., [18]. The simulated data is generated using \(R-\)programming Language. The data sets consist of the survey variable, \(Y\) and auxiliary variable, \(X.\) Scrambling responses that are normally distributed, \(S_{hi}\ \sim \ N\left(0,\ 2\right)\) is generated for each unit in the data set. Thereafter, the response variable is obtained as \(Z_{hi}=Y_{hi}+S_{hi}\). Finally, normally distributed measurement errors with mean 2 and variance 5 are introduced to each unit of the response and auxiliary variables. The efficiency of the proposed estimator is compared with other estimators using the minimum variance and the Percent Relative Efficiency (PRE) approaches. The Percent Relative Efficiency (PRE) of the estimators are obtained using the expression;
\begin{equation} \label{GrindEQ__27_} PRE=\frac{Var(t_0)}{MSE(t_j)}\times 100, \end{equation}
(27)
where \(t_j=t_g,\ t_r,\ t_{err}\) and \(t_{erp}\) denotes estimators of the finite population mean. The estimator with the highest PRE is considered to be more efficient than the corresponding estimators. The performances of the estimators are compared in cases for measurement errors and without measurement errors. The description of the populations are as follows;

Population I: Simulated data

Stratum 1

\begin{align*}X_1&=rnorm(100,\ 450,\ 15),\\ x_1&=X_1+rnorm(100,\ 2,\ 5),\\ Y_1&=0.8+0.5X_1+rnorm(100,\ 0,\ 1),\\ Z_1&=Y_1+rnorm(100,\ 0,\ \ 0.2), \ \ \ \text{ and}\\ z_1&=Z_1+rnorm(100,\ 2,\ 5).\end{align*}

Stratum 2

\begin{align*} X_2&=rnorm(250,\ 50,\ 15),\\ x_2&=X_2+rnorm(250,\ 2,\ 5),\\ Y_2&=0.8+0.5X_2+rnorm(250,\ 0,\ 1),\\ Z_2&=Y_2+rnorm(250,\ 0,\ \ 0.2),\ \ \ \text{and}\\ z_2&=Z_2+rnorm(250,\ 2,\ 5).\end{align*}

Stratum 3

\begin{align*}X_3&=rnorm(300,\ 920,\ 25),\\ x_3&=X_3+rnorm(300,\ 2,\ 5),\\ Y_3&=0.8+0.5X_3+rnorm(300,\ 0,\ 1),\\ Z_3&=Y_3+rnorm(300,\ 0,\ \ 0.2),\ \ \ \text{and}\\ z_3&=Z_3+rnorm(300,\ 2,\ 5).\end{align*}

Stratum 4

\begin{align*} X_4&=rnorm(350,\ 500,\ 8),\\ x_4&=X_4+rnorm(350,\ 2,\ 5),\\ Y_4&=0.8+0.5X_4+rnorm(350,\ 0,\ 1),\\ Z_4&=Y_4+rnorm(350,\ 0,\ \ 0.2),\ \ \ \text{and}\\ z_4&=Z_4+rnorm(350,\ 2,\ 5).\end{align*}

Population II: Sarndal et al., [18]

The population consist of five strata of sizes; \(\mathrm{N1\ =\ 38,\ N2\ =\ 14,\ N3\ =\ 11,\ N4\ =\ 33,\ and\ N5\ =\ 24}\). Table 1 represents summary statistics for populations I and II.

Table 1. Parameters for populations I and II.
Population stratum \({\overline{X}}_h\) \({\overline{Z}}_h\) \(S^2_{Xh}\) \(S^2_{Zh}\) \({\rho }_{XZh}\) \(S^2_{Th}\) \(S^2_{Vh}\)
I 1 450.2457 227.7285 227.9771 81.01574 0.8406767 22.31754 20.40296
2 577.5290 291.1661 3583.724 929.5202 0.9824869 30.78505 27.55788
3 921.7221 463.7038 643.6014 212.0282 0.9236006 30.68011 25.67958
4 499.8988 252.5883 61.48334 46.27591 0.613076 28.21903 22.29580
II 1 1029.158 16.09219 3667896 327.0976 0.7177369 30.54519 22.30025
2 25671.57 29.88566 6568461403 3617.208 0.9645813 26.56678 25.47327
3 5028.818 28.29478 63348743 1493.623 0.979968 22.46011 19.04889
4 7533.939 82.67373 440717912 45688.17 0.3021371 29.60195 17.35155
5 16315.25 22.62072 408441212 405.7601 0.8939683 16.27989 21.05136

7.2. Discussion

Tables 2 and 3 show the contribution of measurement errors and the Randomized Response Technique (RRT) to the bias, mean squared error (MSE), and Percent Relative Efficiency (PRE) of the mean estimators. Through numerical study, it is observed that the Mean Squared Error (MSE) for the estimators are lower in cases without measurement errors but increases when measurement errors are introduced into the survey. Moreover, the Percent Relative Efficiency (PRE) for the mean estimators decreases when measurement errors are present in the survey. Additionally, the proposed generalized estimator has the minimum bias compared to other estimators of the finite population mean. A very significant finding of the study is that the proposed estimator performs better than other estimators under both cases for with and without measurement errors for both real and simulated data.

Table 2. Bias (in brackets), MSE and PRE of estimators for population I.
MSE PRE MSE PRE
\(t_0\) 0.1389270 100 0.1964786 100
\(t_g\) 0.0620388 223.8805 0.1075719 182.6487
(0.0001302) (0.0001629)
\(t_r\) 0.0007016 198.0145 0.1118722 175.6281
(0.0001325) (0.0001789)
\(t_{err}\) 0.0817005 170.0022 0.1322906 148.5205
(0.0003902) (0.0004957)
\(t_{erp}\) 0.0811046 171.2513 0.1291244 152.1623
(-0.0001238) (-0.0001379)
Table 3. Bias (in brackets), MSE and PRE of estimators for population II.
MSE PRE MSE PRE
\(t_0\) 62.54516 100 63.04892 100
\(t_g\) 58.18433 107.4949 58.95167 106.9502
(0.124300) (0.132101)
\(t_r\) 76.12109 82.16552 78.65928 80.15446
(1.721466) (1.929885)
\(t_{err}\) 110.9480 56.36996 116.9480 53.91193
(3.065028) (3.414083)
\(t_{erp}\) 59.98440 104.2690 60.83360 103.6416
(0.377905) (0.4456877)

7.3. Conclusion

The study proposes a generalized estimator of the finite population mean for a sensitive variable using a non-sensitive auxiliary variable in the presence of measurement errors based on the Randomized Response Technique (RRT). Expressions for the bias and Mean Squared Error (MSE) for the proposed estimator have been derived up to the first order of approximation. The performance of the proposed estimator has been studied both theoretically and numerically. The numerical study reveals that the presence of measurement errors in a survey based on the Randomized Response Technique (RRT) increases the variance and Mean Squared Error (MSE) resulting in biased estimates of the finite population mean. Finally, the proposed strategy is applicable in surveys involving sensitive variables such as bribery, cheating in examination, drug abuse, homosexuality, habitual tax evasion, reckless driving, abortion, indiscriminate gambling among others.

Acknowledgments

Authors are thankful to the anonymous referee for his constructive comments and feedback.

Conflicts of Interest

The authors declare no conflict of interest.

References:

  1. Warner, S. L. (1965). Randomized response: A survey for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63-69. [Google Scholor]
  2. Shalabh. (1997). Ratio method of estimation in the presence of measurement errors. Journal of Indian Society of Agricultural Statistics, 50(2), 150-155. [Google Scholor]
  3. Diwakar, S., Sharad, P., & Narendra, S. T. (2012). An Estimator for Mean Estimation in Presence of measurement error. Research and Reviews: A Journal of Statistics, 1(1), 1-8. [Google Scholor]
  4. Yadav, D., Sheela, M., & Dipika. (2017). Estimation of population mean using auxiliary information in presence of measurement errors. International Journal of Engineering Sciences and Research Technology, 6(6), DOI: 10.5281/zenodo.817860. [Google Scholor]
  5. Gajendra, K. V., Abhishek, S., & Neha S., (2020). Calibration under measurement errors. Journal of King Saud University Science, 32(7), 29502961. [Google Scholor]
  6. Eichhorn, B. H., & Hayre, L. S. (1983). Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and Inference, 7, 307-316. [Google Scholor]
  7. Gupta, S., & Shabbir, J. (2004). Sensitivity estimation for personal interview survey questions. Statistica, 64(4), 643-653. [Google Scholor]
  8. Gupta, S., Shabbir, J., & Sehra, S. (2010). Mean and sensitivity estimation in optional randomized response models. Journal of Statistical Planning and Inference, 140(10), 2870-2874. [Google Scholor]
  9. Sousa, R., Shabbir, J., Rael. & Gupta, S. (2010). Ratio estimation of the mean of a sensitive variable in the presence of auxiliary information. Journal of Statistics Theory and Practice, 36(3), 495-507. [Google Scholor]
  10. Tanveer, A. T., & Housila, P. S. (2015). A general procedure for estimating the mean of a sensitive variable using auxiliary information. Revista Investigacion Operacional, 36(3), 268-279. [Google Scholor]
  11. Mushtaq, N., Noor-Ul-Amin, M., & Hanif, M. (2017). A family of estimators of a sensitive variable using auxiliary information in stratified random sampling. Pakistan Journal of Operation Research, 13(1), 141-155. [Google Scholor]
  12. Mushtaq, N., Noor-Ul-Amin, M., & Hanif, M. (2016). Estimation of population mean of a sensitive variable in stratified two-phase sampling. Pakistan Journal of Statistics, 32, 393-404. <a href="https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Estimation+of+population+mean+of+a+sensitive+variable+in+stratified+two-phase+sampling.+[Google Scholor]
  13. Mushtaq, N., & Noor-Ul-Amin, M. (2020). Joint influence of double sampling and randomized response technique on estimation method of mean. Applied Mathematics, 10(1), 12-19. [Google Scholor]
  14. Naeem, N., & Shabbir, J. (2018). Use of a scrambled response on two occasion’s successive sampling under nonresponse. Hacettepe Journal of Mathematics and Statistics, 47(3), 675-684. [Google Scholor]
  15. Zahid, E., & Shabbir, J. (2019). Estimation of finite population mean for a sensitive variable using dual auxiliary information in the presence of measurement errors. PLoS ONE, 14(2), e0212111. [Google Scholor]
  16. Sadia K. (2017). Generalized mean estimators for sensitive and non-sensitive variables in the presence of measurement errors. PhD Thesis, National College of Business Administration and Economics, Lahore. [Google Scholor]
  17. Zhang Q. (2020). Mean estimation of sensitive variables under measurement errors and non-response. PhD Thesis. The University of North Carolina at Greensboro. [Google Scholor]
  18. Sarndal, C., Swensson, B., & Wretman, J. (1992). Model assisted survey sampling. New York: Springer. [Google Scholor]