Download Applied linear regression, chap.8 Anova: one-way and tests | 22S 152 and more Study notes Statistics in PDF only on Docsity! 22s:152 Applied Linear Regression Chapter 8: ANOVA ———————————————————— NOTE: We will meet in the lab on Monday October 13. • One-way ANOVA – Focuses on testing for di!erences among group means. – Take random samples from each of m pop- ulations. – ni is the sample size in the ith population for i = 1, . . . ,m. – yij is the jth observation in the ith pop- ulation. 1 – There are a couple commonly used models for a one-way ANOVA with m groups. ——————————————————– – The cell means model: Yij = µi + !ij with !ij iid! N(0, "2) i = 1, 2, . . . ,m j = 1, 2, . . . , ni So, E[Y1j] = µ1, and all observations from group 1 have the same mean, µ1. The mean of group i is µi. The mean parameters to be estimated are: µ1, µ2, . . . , µm There is 1 noise parameter to estimate "2 2 Estimators: µ̂i = Ȳi· = !ni j=1 Yij ni The estimated µ̂i for a group is just the sample group mean. "2 is estimated using a pooled estimate because constant variance is assumed. "̂2 = s2P = (n1"1)s21 + (n2"1)s 2 2 + ··· (nm"1)s 2 m N"m where s2i is the sample variance in the ith group Pooled estimate of ": sP = " s2P 3 – The e!ects model: Yij = µ + #i + !ij with !ij iid! N(0, "2) i = 1, 2, . . . ,m j = 1, 2, . . . , ni So, E[Y1j] = µ + #1, and all observations from group 1 have the same mean, µ+#1. In this model, there are m groups (esti- mated means), and we’re using m + 1 pa- rameters to define the mean structure. This is an ‘over-parameterization’. Di!erent sets of parameter values (µ, #1, . . . #m) can give the same fitted values (i.e. can give the same estimated group means). 4 For example, suppose m = 3, and Ȳ1· = 10, Ȳ2· = 20, and Ȳ3· = 30. In the over-parameterized e!ects model, Ŷij = µ̂ + #̂i for i = 1, 2, 3 many di!erent combinations of (µ, #1, #2, #3) estimates will give me these same estimated group means of (10, 20, 30), for example... µ̂ #̂1 #̂2 #̂3 0 10 20 30 -10 20 30 40 20 -10 0 10 This means we have to use a constraint or restriction to make the parameters in the model ‘identifiable’ (uniquely deter- mined). 5 – The e!ects model: Yij = µ + #i + !ij The !m = " constraint: Set the last group parameter to zero. (Essentially, delete the parameter for the last category). Under this constraint, the last group, group m, is seen as the baseline group... #m = 0, so E[Ymj] = µ + #m = µ µ represents the mean of the mth group under this constraint. #i is the distance of group i from group m. (The #i’s give distance from baseline group.) This may or may not be a useful interpre- tation for your situation. 6 Dummy Regressor Coding for the #m = 0 constraint with m = 3: Category D1 D2 group 1 1 0 group 2 0 1 group 3 0 0 Regression Model: Yi = µ + #1D1i + #2D2i + !i Model by group... Group 1: Yi = µ + #1 + !i Group 2: Yi = µ + #2 + !i Group 3: Yi = µ + !i This is what we’ve been using so far with our dummy regressor coding (we’ve had a baseline group). 7 – The e!ects model: Yij = µ + #i + !ij There is another often used constraint that produces easily interpretable parameters... The sum-to-zero constraint: !m i=1 #i = 0 # #m = " (#1 + #2 + · · · + #m"1)# $% & m" 1 dummy variables needed µ is seen as the grand mean, or the average of the pop’n means (nice interpretation). If you have balanced data: µ̂ = Ȳ , the overall mean of the sample If you have unbalanced data: µ̂ = !m i=1 Ȳi· m , the mean of the sample means 8