1. Introduction
Auxiliary information is used either in the estimation stage or in the formation of an estimator to get improved designs and increase the efficiency of estimators in sampling technique. In [1], Laplace started the use of the auxiliary information in formulating ratio type estimation. The statisticians paid a lot of care towards the formation of new and efficient estimators for the population parameters estimation [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. Khan and Al-Hossain [14] suggested a generalized chain ratio in regression estimator for mean of the population using two auxiliary variables.
In this research work, a modified form of difference-type estimator for mean of the population using two-phase sampling is suggested [15].
Firstly, we give some definitions and notions.
Consider a finite population of size \(N\) of different units \(U =\{U_1,U_2,\ U_3, \dots ,U_N\}\). Let \(x\) and \(y\) be the auxiliary and the study variables with corresponding values \(x_i\) and \(y_i\) respectively for the \(i^{th}\) unit \(i =\{1, 2, 3,\dots, N\}\) defined in a finite population \(U\) with means \[\overline{Y}= (1/N) \sum^N_i{y_i}\] and \[\overline{X}= (1/N) \sum^N_i{x_i}\] of the study as well as auxiliary variable respectively.
Also let \[S^2_x= \frac{1}{N-1}\sum^N_i{{(x_i-\overline{X})}^2}\] and \[S^2_y = \frac{1}{N-1}\sum^N_i{{(y_i-\overline{Y})}^2}\] be the population variances of the auxiliary and the study variables respectively and let \(C_x\) and \(C_y\) be the coefficient of variation of the auxiliary as well as study variable respectively, while \({\rho }_{yx}\) is the correlation coefficient between \(x\) and \(y\).
Let the sample mean of \(x\) and \(y\) be as
\[\overline{X}=\frac{1}{n-1}\sum^{n}_{i}{x}_{i}\] and \[\overline{y}=\frac{1}{n-1}\sum^{n}_{i}{y}_{i}\] respectively. Also let \[{\widehat{S}}^{2}_{y}= \frac{1}{n-1}\sum^{n}_{i}({y}_{i}- \overline{y})^{2}\] and \[{\widehat{S}}^{2}_{x} =\frac{1}{n-1}\sum^n_i (x_i- \overline{x})^2\] be the corresponding sample variances of the study as well as auxiliary variable respectively. Let \[ S_{yx} = \frac{\sum^N_i{\left(y_i-\overline{Y}\right)(x_i-\overline{X})}}{N-1},\] \[ \ S_{yz} = \frac{\sum^N_i \left(y_i-\overline{Y}\right)(z_i-\overline{Z})}{N-1}\ \] and \[S_{xz} =\frac{\sum^N_i \left(z_i-\overline{Z}\right)(x_i-\overline{X})}{N-1}\] be the co-variances between their respective subscripts. Similarly \[b_{yx}=\frac{{\hat{S}}_{xy}}{{\hat{S}}^2_x}\] is the corresponding sample regression coefficient of \(y\) on \(x\) based on a sample of size \(n\). Also,
\[C_y=\frac{S_y}{\overline{Y}}, C_x=\frac{S_x}{\overline{X}}\,\,\text{
and}\,\,C_z=\frac{S_z}{\overline{Z}}\]
are the coefficient of variations of the study and the auxiliary variables respectively. Also, \(\theta=\frac{1}{n}-\frac{1}{N},\ \theta_1=\frac{1}{n’}-\frac{1}{N}\) and \(\theta_2=\frac{1}{n}-\frac{1}{n’}\).
2. Some existing estimators
Consider a finite population of size N units. To estimate the mean of the population \(\overline{Y}\), it is assumed that the correlation between y and x is greater than the correlation between y and z, (i.e\({\rho }_{yx}\) \(\mathrm{>}\)~\({\rho }_{yz}\)~). When the mean of the population \(\overline{X\ }\)of the auxiliary variable x is unknown, but information on the other cheaply auxiliary variable say z closely related to x but compared to x remotely to y, is available for all the units in a population. The usage of two phase sampling is imperative in such a situation. In double sampling scheme, a large initial sample of size n\(\mathrm{\prime}\) (n\(\mathrm{\prime}\)~\(\mathrm{< }\)N) is drawn from the population U using simple random sample without replacement sampling (SRSWOR) scheme and measure x and z to estimate \(\overline{X\ }\) and \(\ \overline{Z}\) . In the second phase, a sample (subsample) of size n from first phase sample of size n\(\mathrm{\prime}\), i.e. (n\(\mathrm{< }\)~n\(\mathrm{\prime}\)) is drawn using (SRSWOR) or directly from the population U and observed the study variable \(y.\)
The usual variance of simple estimator \(t_o = {\overline{y}}=\frac{1}{n}\sum^n_i{y_i}\) up to first order of approximation is given by
\begin{equation}\label{eqn1}
V(t_o) = \theta S^2_y.
\end{equation}
(1)
The ratio and regression estimators in two-phase sampling and their mean square errors up to first order of approximation are given by
\begin{equation}\label{eqn2}
t_1 = \frac{{\overline{y}}{\overline{x’}}}{\overline{x}},
\end{equation}
(2)
\begin{equation}\label{eqn3}
\text{MSE}(t_1) = {\overline{Y}}^2\left[\theta C^{2}_y+\theta_2(C^2_x – 2C_{yx})\right],
\end{equation}
(3)
\begin{equation}\label{eqn4}
t_2 = {\overline{y}} + b_{yx\left(n\right)}({\overline{x’}}-{\overline{x}}),
\end{equation}
(4)
\begin{equation}\label{eqn5}
\text{MSE}(t_2) = S^2_y\left[\theta(1-\rho^2_{yx})+\theta_1(\rho^2_{yx})\right].
\end{equation}
(5)
Chand in [
5] proposed the following chain ratio-type estimator which is given by:
\begin{equation}\label{eqn6}
t_3 = \frac{{\overline{y}}{\overline{x’}}}{{\overline{x}}{\overline{z’}}}{\overline{Z}},
\end{equation}
(6)
\begin{equation}\label{eqn7}
\text{MSE}(t_3) = {\overline{Y}}^2\left[\theta C^{2}_y+\theta_2\left(C^2_x-2C_{yx}\right)+\theta_1\left(C^2_z-2C_{yz}\right)\right].
\end{equation}
(7)
Singh and Majhi in [
15] formulated a chain-type exponential estimators for \(\overline{Y}\) given by
\begin{equation}\label{eqn8}
t_5 = \frac{{\overline{y}}{\overline{x’}}}{{\overline{x}}} \exp \left(\frac{{\overline{Z}}-{{\overline{z’}}}}{{\overline{Z}}+{\overline{z’}}}\right),
\end{equation}
(8)
\begin{equation}\label{eqn9}
\text{MSE}(t_5) = {\overline{Y}}^2\left[\theta C^{2}_y+\theta_2\left(C^2_x-2C_{yx}\right)+\theta_1/4\left(C^2_z-2C_{yz}\right)\right].
\end{equation}
(9)
Khan and Al-Hossain in [
14] gave a difference-type estimator for the mean of the population in two-phase sampling scheme using two auxiliary variables as
\begin{equation}\label{eqn10}
t_m = {\overline{y}} + k_1\left({\overline{x’}}\frac{{\overline{Z}}}{{\overline{z’}}}-{\overline{x}}\right)+ k_2\left({\overline{Z}}\frac{{\overline{x’}}}{\overline{x}}-{\overline{z}}\right),
\end{equation}
(10)
\begin{align}
\text{MSE}(t_m) &={\overline{Y}}^2\theta C^{2}_y+ k^2_1{\overline{X}}^2(\theta_1C^2_z+\theta_2C^2_x) +k^2_2{\overline{Z}}^2(\theta C^{2}_z+\theta_2C^2_x+2 \theta_2{C}_{xz})+2k_1k_2{\overline{X}}{\overline{Z}}({\theta}_2C^{2}_x+{\theta}_1C^2_z+{\theta}_2{C}_{xz})\nonumber \\
&\;\;\;-2k_1{\overline{X}}{\overline{Y}}(\theta_2{C}_{yx}+\theta_1{C}_{yz}) – 2k_2{\overline{Z}}{\overline{Y}}(\theta_2{C}_{yx}+\theta {C}_{yz}).\label{eqn11}
\end{align}
(11)
3. The proposed estimator
On the basis of Khan and Al-hossain [
14], a modified difference-type estimator for the mean of the population in two-phase sampling scheme using two auxiliary variables is proposed as
\begin{equation}\label{eqn12}
t_{ae} = {\overline{y}} + k_1\left({\overline{x’}}-\frac{\overline{Z}}{{\overline{z’}}}{\overline{x}}\right)+ k_2\left({\overline{z}}-{\overline{Z}}\frac{{\overline{x’}}}{\overline{x}}\right)\,,
\end{equation}
(12)
where \(k_1\) and \(k_2\) are unknown constants.
Let
\[\begin{cases} e_o = \frac{{\overline{y}}-{\overline{Y}}}{\overline{Y}},\\ e_1 = \frac{{\overline{x}}-{\overline{X}}}{\overline{X}},\\ {e’}_1= \frac{{\overline{x’}}-{\overline{X}}}{\overline{X}},\\ e_2 = \frac{{\overline{z}}-{\overline{Z}}}{\overline{Z}},\\ {e’}_2 = \frac{{\overline{z’}}-{\overline{Z}}}{\overline{Z}},\end{cases}\]
hence
\[\begin{cases}E (e_o)= E (e_1)= E ({e’}_1) = E (e_2)= E ({e’}_2) = 0\\E ( e^2_0) = \theta C^2_y E(e^2_1) = \theta C^2_x,\\
E (e^2_2)=\theta C^2_z, E({e’}^2_1)=\theta_1 C^2_x,\\
E(e_1{e’}_1)=\theta_1 C^2_x, E(e_o{e’}_2)=\theta_1C_{yz},\\
E(e_oe_1) = \theta C_{yx},\\
E (e_o{e’}_1) = {\theta}_1C_{yx},\\
E(e_oe_2) = \theta C_{yz},\\
E (e_1{e’}_2)= E({e’}_1{e’}_2) = E({e’}_1e_2)={\theta}_1C_{xz},\\
E(e_1e_2) = \theta C_{xz},\\
E({e’}^2_2) = E (e_2{e’}_2) = {\theta}_1 C^2_z.\end{cases}\]
Now, the MSE(\(t_{ae}\)) is given as
\begin{align}
\text{MSE}(t_{ae})&= {\overline{Y}}^2{\theta} C^{2}_y+ k^2_1{\overline{X}}^2({\theta}_1 C^2_z+{\theta}_2C^2_x)+k^2_2{\overline{Z}}^2({\theta} C^{2}_z+{\theta}_2C^2_x+2{\theta}_2{C}_{xz})- 2 k_1k_2{\overline{X}}{\overline{Z}}({\theta}_2C^{2}_x-{\theta}_1 C^2_z+{\theta}_2 {C}_{xz})\nonumber \\
&-2 k_1{\overline{X}}{\overline{Y}}({\theta}_2{C}_{yx}-{\theta}_1 {C}_{yz}) + 2k_2{\overline{Z}}{\overline{Y}}({\theta}_2{C}_{yx}+{\theta} {C}_{yz}).\label{eqn13}
\end{align}
(13)
To find the minimum mean squared error of the estimator \(t_{ae}\), we differentiate (13) with respect to \(k_1\) and \(k_2\) respectively and putting it equal to zero, that is
\[\frac{\partial (\text{MSE}\left(t_{ae}\right))}{\partial k_1}= 0\ \ \ \ \text{and}\ \ \ \ \frac{\partial (\text{MSE}\left(t_{ae}\right))}{\partial k_2}= 0,\]
\[k_{1(opt)}=\frac{{\overline{Y}}({\overline{X}}^2 CB-{\overline{Z}}^2 DE)}{\overline{X}({\overline{X}}^2 AB-{\overline{Z}}^2 E^2)}\ \ \text{and}\ \ k_{2(opt)}= \frac{{\overline{Y}}{\overline{Z}}(EC-AD)}{({\overline{X}}^2 AB-{\overline{Z}}^2 E^2)},\]
where \[\begin{cases}A ={\theta}_1C^2_z+{\theta}_2C^2_x,\\ B =\ \theta C^{2}_z+{\theta}_2C^2_x+2 {\theta}_2{C}_{xz},\\ C ={\theta}_2 {C}_{yx}-{\theta}_1{C}_{yz},\\ D = {\theta}_2{C}_{yx}+{\theta} {C}_{yz},\\ E = {\theta}_2C^{2}_x-{\theta}_1C^2_z+{\theta }_2{C}_{xz}.\end{cases}\]
When substituting the optimum values of \(k_1\) and \(k_2\) in Equation (13), the minimum MSE\((t_{ae})\) is derived as:
\[ \text{MSE}\ {(t_{ae})}_{min} = {\overline{Y}}^2\left[\theta C^{2}_y-\left(\frac{{\overline{Z}}^2 AD+{\overline{X}}^2 C^2B-2{\overline{Z}}^2 CED}{{\overline{X}}^2 AB-{\overline{Z}}^2 E^2}\right)\right]\,.\]
4. Comparison of efficiency
In this section, the proposed estimator is compared with other existing estimators.
- By (1) and (13)
\[\text{MSE}\ {(t_{ae})}_{min} 0\,.\]
- By (11) and (13)
\[\text{MSE}\ {(t_{ae})}_{min} 0\,.\]
- By (3) and (13)
\[\text{MSE}\ {(t_{ae})}_{min} 0 \,.\]
- By (7) and (13)
\[\text{MSE}\ {(t_{ae})}_{min} 0\,.\]
5. Numerical comparison
Utilizing the Data set given in [
14], the mean square errors (MSE’s) together with the percent relative efficiencies (PRE’s) of the proposed estimator with respect to \(t_0\) is given in Table 1.
Table 1
Estimators |
MSE’s |
PRE’s |
\(\ \ \ \ \ \ \ \ t_0\) |
1.7525 |
100 |
\(\ \ \ \ \ \ \ \ t_1\) |
1.5032 |
116.59 |
\(\ \ \ \ \ \ \ \ t_3\) |
1.2793 |
137.00 |
\(\ \ \ \ \ \ \ \ t_5\) |
1.1312 |
154.92 |
\(\ \ \ \ \ \ \ \ t_m\) |
0.8206 |
213.56 |
\(\ \ \ \ \ \ \ \ t_{ae}\) |
0.6693 |
261.84 |
6. Conclusion
Inferring from Table 1, it shows that the proposed estimator has smaller mean squared error and higher percent relative efficiency than the other existing estimators. Hence, the proposed estimator is efficient and highly recommended for use in practice with respect to difference type estimation.
Author Contributions
All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript.
Conflict of Interests
The authors declare no conflict of interest.