การเปรียบเทียบวิธีการตรวจสอบค่าผิดปกติในการวิเคราะห์การถดถอยเชิงเส้น / วศิรินทร์ วารีเศวตสุวรรณ = A comparison on detecting outlier methods in linear regression analysis / Wasirin Wareesawedsuwan
To compare the capacity of detecting outlier methods in linear regression analysis when outliers are occur in independent variable. The detecting outlier methods are Kianifard and Swallow Method (Sequential Recursive Method : SRM and Modified Recursive Method : MRM), S.R.Paul & Karen Y.Fung Method (PK) and Daniel Pena & Victor Yohai Method (PY). The comparison was done under the following conditions. The distributions of random error are normal distribution (In case of none outlier) and contaminated normal distribution (In case of outlier is present). The sizes of the outliers of dependent variable are small, medium and large level according to the proportion of the contamination of 0.05, 0.10 and 0.15. The independent variables are 1 and 3. The sample sizes are 20, 30, 40, 50, 60, 80 and 100. The levels of significant level are 0.01, 0.05 and 0.10. The data of this experiment were generated through the Monte Carlo Simulation Technique. The experiment was repeated 500 times under each condition to compare the probability of correct detecting that is measurement such as the probability of correct detecting when data without outlier (P1), the probability of incorrect detecting when data without outlier (P2), the probability of correct detecting when data with outlier (P3) the probability of incorrect detecting when data with outlier (P4) and percent of total correct detecting (TP%). Result of this research can be summarized as follows:Percent of total correct detecting (TP%), which is calculate from P1, P2, P3, and P4. Result of this research has 2 cases as follows 1) The random errors are location-contaminate normal distribution. The proportion of the contamination is a small level. The TP% of MRM method is the highest, as the sample size is 20 at all the independent variable and all levels of significant level. The TP% of SRM, PK and PY method is lower, respectively. The TP% of SRM method is the highest when the larger sample size The TP% of PK, PY and MRM method is lower, respectively. The proportion of the contamination is a medium and a large level. In all levels of sample size, those of independent variable and those of significant level, the TP% of PY method is the highest. The TP% of SRM, PK and MRM method is lower, respectively. 2) The random errors are location-contaminate normal distribution. The proportion of the contamination is a small level. This result is the same as that in location-contaminate normal distribution. The proportion of the contamination is a medium and a large level. All levels of the sample size, all the independent variable and all levels of significant level, the TP% of SRM method is the highest PK, PY and MRM method is lower, respectively