การประมาณค่ารวมประชากรจากตัวอย่างสุ่มอย่างง่าย ที่มีบางหน่วยมีค่าสูงมากโดยใช้ตัวประมาณความถดถอย / อโนทัย ตรีวานิช = Estimation of population total from simple random samples containing some very large units using regression estimator / Anotai Trevanich

Author	อโนทัย ตรีวานิช
Title	การประมาณค่ารวมประชากรจากตัวอย่างสุ่มอย่างง่าย ที่มีบางหน่วยมีค่าสูงมากโดยใช้ตัวประมาณความถดถอย / อโนทัย ตรีวานิช = Estimation of population total from simple random samples containing some very large units using regression estimator / Anotai Trevanich
Imprint	2529
Connect to	http://cuir.car.chula.ac.th/handle/123456789/34766
Descript	[ก-ล], 251 แผ่น : ตาราง

SUMMARY

ปัญหาที่พิจารณาในวิทยานิพนธ์นี้คือ การประมาณค่ารวมประชากรจากตัวอย่างแบบสุ่มอย่างง่าย ชนิดไม่ใส่คืน (simple random samples without replacement) ที่มีหน่วยตัวอย่างบางหน่วยมีค่าสูงมากและเป็นค่าที่มีอยู่จริงในประชากร การวิจัยครั้งนี้มีวัตถุประสงค์ที่จะศึกษาคุณสมบัติของตัวประมาณค่ารวมประชากรที่ผู้วิจัยได้เสนอแนะ (Ŷ[subscript k] ; k = 1, 2, 3) ซึ่งใช้การปรับหรือเปลี่ยนแปลงน้ำหนัก สำหรับกลุ่มค่าสังเกตของตัวแปร Y ที่เป็นค่าสูงมาก และนำตัวแปรช่วย X ที่มีความสัมพันธ์เชิงเส้นต่อกันกันตัวแปร Y มาช่วยเพิ่มประสิทธิภาพของการประมาณค่า พร้อมทั้งศึกษาเปรียบเทียบตัวประมาณดังกล่าวกับตัวประมาณค่ารวมประชากรที่เสนอโดยไมเคิลและคาดาบา (Ŷ[subscript mkt] ; t = 1, 2, 3, 4) และตัวประมาณค่ารวมประชากรที่คำนวณจากค่าเฉลี่ย (Ŷ[subscript o] = Nȳ) โดยศึกษาจากค่าประสิทธิภาพสัมพัทธ์ (relative efficiency) ที่คำนวณได้จากการจำลองข้อมูลขึ้นในเครื่องคอมพิวเตอร์ด้วยเทคนิคมอนติคาร์โล เมื่อสมมติให้ตัวแปร Y มีการแจกแจงแบบล็อกนอร์มอล (Lognormal) และแบบแกมม่า (Gamma) ค่าสัมประสิทธิ์สหสัมพันธ์ (correlation coefficient) ระหว่างตัวแปร Y และตัวแปร X เท่ากับ 0.1 0.3 0.5 0.7 และ -0.1 -0.3 -0.5 -0.7 ขนาดประชากรที่ใช้ศึกษามี 2 ขนาดคือ 500 และ 1000 โดยให้มีร้อยละของจำนวนค่าสังเกตที่เป็นค่าสูงมากในประชากรขนาด 500 คิดเป็น 1.8% 2.8% และ 3.2% และในประชากรขนาด 1000 คิดเป็น 1.8% 2.8% และ 3.3% ส่วนขนาดตัวอย่างจะกำหนดให้มีขนาดเท่ากับ 50 100 และ 200 สำหรับจำนวนหน่วยตัวอย่างที่มีค่าสูงมากที่พบในตัวอย่างจะมีค่าเริ่มตั้งแต่ 2 จนถึงจำนวนค่าสังเกตที่เป็นค่าสูงมากที่พบในประชากร โดยที่ในแต่ละสถานการณ์ที่ศึกษาจะกระทำซ้ำๆ กัน 100 ครั้ง ในขนาดประชากรเท่ากับ 500 และกระทำซ้ำๆ กัน 50 ครั้ง ในขนาดประชากรเท่ากับ 1000 ซึ่งจากผลการวิเคราะห์ข้อมูลสามารถสรุปได้ว่า ในการอนุมานอย่ามีเงื่อนไข (conditional inference) พบว่าในกรณีที่ทราบจำนวนค่าสังเกตที่เป็นค่าสูงมากในประชากร ถึงแม้ Ŷ[subscript l] จะมีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] ลดลง ถ้าร้อยละของจำนวนค่าสังเกตที่เป็นค่าสูงมากที่พบในประชากรเพิ่มมากขึ้น ประสิทธิภาพสัมพัทธ์ของ Ŷ[subscript l] เทียบกับ Ŷ[subscript o] จะยังคงสูงกว่าประสิทธิภาพสัมพัทธ์ของ Ŷ[subscript mk4] เทียบกับ Ŷ[subscript o] เสมอ และถ้าตัวแปร Y มีความสัมพันธ์เชิงเส้นต่อกันกับตัวแปร X หรือในตัวอย่างมีร้อยละของจำนวนหน่วยตัวอย่างที่มีค่าสูงมาก เพิ่มมากขึ้นแล้ว Ŷ[subscript l] จะมีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] สูงกว่า Ŷ[subscript mk4] อย่างเห็นได้ชัด โดยจะมีค่าประมาณ 2 เท่าของ Ŷ[subscript mk4]
ถ้าค่าสัมประสิทธิ์สหสัมพันธ์ระหว่างตัวแปร Y และ X มีค่าเท่ากับ 0.7 หรือ -0.7 ส่วนในกรณีที่ไม่ทราบจำนวนค่าสังเกตที่เป็นค่าสูงมากในประชากร ถ้าร้อยละของจำนวนค่าสังเกตที่เป็นค่าสูงมากที่พบในประชากรเพิ่มขึ้น ประสิทธิภาพสัมพัทธ์ของตัวประมาณ Ŷ[subscript 3], Ŷ[subscript mkt], หรือ Ŷ[subscript mk3] เมื่อเทียบกับ Ŷ[subscript o] จะลดลงแต่ประสิทธิภาพสัมพัทธ์ของตัวประมาณ Ŷ[subscript 2] เทียบกับ Ŷ[subscript o] จะเพิ่มขึ้นส่วนประสิทธิภาพสัมพัทธ์ของตัวประมาณ Ŷ[subscript mk2] เทียบกับ Ŷ[subscript o] จะลดลงเมื่อพบจำนวนหน่วยตัวอย่างที่เป็นค่าสูงมาก มีจำนวนน้อยและถ้าตัวแปร Y และตัวแปร X มีความสัมพันธ์เชิงเส้นต่อกันหรือในตัวอย่างมีจำนวนหน่วยตัวอย่างที่มีค่าสูงมาก เพิ่มมากขึ้นแล้ว Ŷ[subscript 3] จะเป็นตัวประมาณที่มีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] สูงสุด สำหรับในการอนุมานอย่างไม่มีเงื่อนไข (unconditional inference) พบว่า ในกรณีที่ทราบจำนวนค่าสังเกตที่เป็นค่าสูงมากในประชากร จะยังคงได้ Ŷ[subscript l] เป็นตัวประมาณที่มีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] สูงกว่า Ŷ[subscript mk4] ถึงแม้ว่าในการแจกแจงแบบล็อกนอร์มอล Ŷ[subscript l] จะมีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] ลดลงเมื่อประชากรมีจำนวนค่าสังเกตที่เป็นค่าสูงมากเพิ่มมากขึ้น แต่มีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] สูงขึ้นในการแจกแจงแบบแกมม่า และถ้าตัวแปร Y และตัวแปร X มีความสัมพันธ์เชิงเส้นต่อกันเพิ่มมากขึ้น Ŷ[subscript l] จะเป็นตัวประมาณที่มีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] สูงกว่า Ŷ[subscript mk4] อย่างเห็นได้ชัด ส่วนในกรณีที่ไม่ทราบจำนวนค่าสังเกตที่เป็นค่าสูงมากที่พบในประชากร เช่นเดียวกันกับในการอนุมานอย่างมีเงื่อนไข กล่าวคือประสิทธิภาพสัมพัทธ์ของตัวประมาณ Ŷ[subscript k] ; k = 2, 3 หรือ Ŷ[subscript mkt] ; t = 1, 2, 3 เทียบกับ Ŷ[subscript o] จะลดลง ถ้าในประชากรมีร้อยละของจำนวนค่าสังเกตที่เป็นค่าสูงมากเพิ่มขึ้น และเมื่อตัวแปร Y มีความสัมพันธ์เชิงเส้นต่อกันกับตัวแปร X น้อยแล้ว จะได้ Ŷ[subscript mk2] เป็นตัวประมาณที่มีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] สูงสุด แต่ถ้าตัวแปร Y มีความสัมพันธ์เชิงเส้นต่อกันกับตัวแปร X มากขึ้น แล้ว Ŷ[subscript 2] จะเป็นตัวประมาณที่มีประสิทธิภาพสัมพัทธ์เมื่อเทียบกับ Ŷ[subscript o] สูงสุด
The problem considered in this thesis is the estimation of the population total from a simple random samples containing some very large units which are actually present in the population. The objective of this study is to investigate the property of suggested estimators for population total (Ŷ[subscript k] ; k = 1, 2, 3) which are designed t increase efficiency by changing weight for very large observations and use of an auxiliary variate X that is linearly related with Y. Then compare them with the simple mean estimator Ŷ[subscript o] = Nȳ and the suggested estimators by Michael & Kadaba (Ŷ[subscript mkt] ; t = 1, 2, 3, 4) using the relative efficiency. The data were obtained through simulation using Monte Carlo technique. A computer program was designed to calculate the relative efficiency of Ŷ[subscript k] or Ŷ[subscript mkt] to Ŷ[subscript o] for the case of Lognormal and Gamma distributions, with correlation coefficient between Y and X being 0.1, 0.3, 0.5, 0.7 and -0.1, -0.3, -0.5, -0.7. Population of sizes 500 and 1000 were used, having percents of very large observations in population of size 500 as 1.8% 2.8% and 3.2% and in population of size 1000 as 1.8%, 2.8% and 3.3%. The sample sizes considered were 50, 100 and 200 and the number of very large observations in the sample started from 2 to the number of very large observations in population. For each predicament of the experiment was repeated 100 times in population size 500 and 50 times in population size 1000. The results of the study as classified by inference forms can be summarized as follows. For conditional inference, when the number of very large observations in the population in known, the relative efficiency of Ŷ[subscript l] to Ŷ[subscript o] is highest and decreases when the percent of very large observations in the population increases. If variable Y is linearly related with variable X or the percent of very large units in the sample increases, then Ŷ[subscript l] is obviously more relative efficiency than Ŷ[subscript mk4].
Particularly when the correlation coefficient between Y and X is 0.7 or -0.7, the relative efficiency of Ŷ[subscript l] to Ŷ[subscript o] is about 2 times of the relative efficiency of Ŷ[subscript mk4] to Ŷ[subscript o]. For the case when the number of very large observations in the population is unknown, the relative efficiency of Ŷ[subscript 3], Ŷ[subscript mkl] or Ŷ[subscript mk3] to Ŷ[subscript o] decrease as the percent of very large observations in the population increases. But the relative efficiency of Ŷ[subscript 2] to Ŷ[subscript o] increases and the relative efficiency of Ŷ[subscript mk2] to Ŷ[subscript o] decreases when the percent of very large samples is small. However, when the linear relationship between Y and X is high or the percent of very large units in sample is large, the relative efficiency of Ŷ[subscript 3] to Ŷ[subscript o] is the highest. For unconditional inference, when the number of very large observations in the population is known, the relative efficiencies of the estimabors to Ŷ[subscript o] are the same as in the case or conditional inference. When the number of very large observations in population increases, the relative efficiency of Ŷ[subscript l] to Ŷ[subscript o] is highest. However, the relative efficiency decreases for Lognormal distribution but increases for Gamma distribution as the percent of very large observations in the population increases. Particularly if the linear relationship between Y and X becomes stronger, the relative efficiency of Ŷ[subscript l] to Ŷ[subscript o] is obviously higher the Ŷ[subscript mk4]. For the case when the number of very large observation in the population is unknown, as in the conditional inference, the relative efficiency of Ŷ[subscript i] ; I = 2, 3, or Ŷ[subscript mkj] ; j = j = 1, 2, 3 yp Ŷ[subscript o] decrease as the number of very large observations in the population increases. When the linear relationship between Y and X is how, the relative efficiency of Ŷ[subscript mk2] to Ŷ[subscript o] is the highest. However, when the linear relationship between Y and X is high, the relative efficiency of Ŷ[subscript 2] to Ŷ[subscript o] is the highest.

SUBJECT

ทฤษฎีการประมาณค่า (สถิติ)
วิธีมอนติคาร์โล
สถิติชีพ
ประชากร -- การประมาณ
ประชากรศาสตร์