การประมาณค่ารวมประชากรจากตัวอย่างสุ่มอย่างง่าย ที่มีบางหน่วยมีค่าสูงมากโดยใช้ตัวประมาณความถดถอย / อโนทัย ตรีวานิช = Estimation of population total from simple random samples containing some very large units using regression estimator / Anotai Trevanich
The problem considered in this thesis is the estimation of the population total from a simple random samples containing some very large units which are actually present in the population. The objective of this study is to investigate the property of suggested estimators for population total (Ŷ[subscript k] ; k = 1, 2, 3) which are designed t increase efficiency by changing weight for very large observations and use of an auxiliary variate X that is linearly related with Y. Then compare them with the simple mean estimator Ŷ[subscript o] = Nȳ and the suggested estimators by Michael & Kadaba (Ŷ[subscript mkt] ; t = 1, 2, 3, 4) using the relative efficiency. The data were obtained through simulation using Monte Carlo technique. A computer program was designed to calculate the relative efficiency of Ŷ[subscript k] or Ŷ[subscript mkt] to Ŷ[subscript o] for the case of Lognormal and Gamma distributions, with correlation coefficient between Y and X being 0.1, 0.3, 0.5, 0.7 and -0.1, -0.3, -0.5, -0.7. Population of sizes 500 and 1000 were used, having percents of very large observations in population of size 500 as 1.8% 2.8% and 3.2% and in population of size 1000 as 1.8%, 2.8% and 3.3%. The sample sizes considered were 50, 100 and 200 and the number of very large observations in the sample started from 2 to the number of very large observations in population. For each predicament of the experiment was repeated 100 times in population size 500 and 50 times in population size 1000. The results of the study as classified by inference forms can be summarized as follows. For conditional inference, when the number of very large observations in the population in known, the relative efficiency of Ŷ[subscript l] to Ŷ[subscript o] is highest and decreases when the percent of very large observations in the population increases. If variable Y is linearly related with variable X or the percent of very large units in the sample increases, then Ŷ[subscript l] is obviously more relative efficiency than Ŷ[subscript mk4].
Particularly when the correlation coefficient between Y and X is 0.7 or -0.7, the relative efficiency of Ŷ[subscript l] to Ŷ[subscript o] is about 2 times of the relative efficiency of Ŷ[subscript mk4] to Ŷ[subscript o]. For the case when the number of very large observations in the population is unknown, the relative efficiency of Ŷ[subscript 3], Ŷ[subscript mkl] or Ŷ[subscript mk3] to Ŷ[subscript o] decrease as the percent of very large observations in the population increases. But the relative efficiency of Ŷ[subscript 2] to Ŷ[subscript o] increases and the relative efficiency of Ŷ[subscript mk2] to Ŷ[subscript o] decreases when the percent of very large samples is small. However, when the linear relationship between Y and X is high or the percent of very large units in sample is large, the relative efficiency of Ŷ[subscript 3] to Ŷ[subscript o] is the highest. For unconditional inference, when the number of very large observations in the population is known, the relative efficiencies of the estimabors to Ŷ[subscript o] are the same as in the case or conditional inference. When the number of very large observations in population increases, the relative efficiency of Ŷ[subscript l] to Ŷ[subscript o] is highest. However, the relative efficiency decreases for Lognormal distribution but increases for Gamma distribution as the percent of very large observations in the population increases. Particularly if the linear relationship between Y and X becomes stronger, the relative efficiency of Ŷ[subscript l] to Ŷ[subscript o] is obviously higher the Ŷ[subscript mk4]. For the case when the number of very large observation in the population is unknown, as in the conditional inference, the relative efficiency of Ŷ[subscript i] ; I = 2, 3, or Ŷ[subscript mkj] ; j = j = 1, 2, 3 yp Ŷ[subscript o] decrease as the number of very large observations in the population increases. When the linear relationship between Y and X is how, the relative efficiency of Ŷ[subscript mk2] to Ŷ[subscript o] is the highest. However, when the linear relationship between Y and X is high, the relative efficiency of Ŷ[subscript 2] to Ŷ[subscript o] is the highest.