Preface
Linear models is a broad and diversified subject area. Because the subject area is so vast, no attempt was mad...
97 downloads
1043 Views
7MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Preface
Linear models is a broad and diversified subject area. Because the subject area is so vast, no attempt was made in this text to cover all possible linear models topics. Rather, the objective of this book is to cover a series of introductory topics that give a student a solid foundation in the study of linear models. The text is intended for graduate students who are interested in linear statistical modeling. It has been my experience that students in this group enter a linear models course with some exposure to mathematical statistics, linear algebra, normal distribution theory, linear regression, and design of experiments. The attempt here is to build on that experience and to develop these subject areas within the linear models framework. The early chapters of the text concentrate on the linear algebra and normal distribution theory needed for a linear models study. Examples of experiments with complete, balanced designs are introduced early in the text to give the student a familiar foundation on which to build. Chapter 4 of the text concentrates entirely on complete, balanced models. This early dedication to complete, balanced models is intentional. It has been my experience that students are generally more comfortable learning structured material. Therefore, the structured rules that apply to complete, balanced designs give the student a set of learnable tools on which to
xi
xii
Preface
build confidence. Later chapters of the text then expand the discussion to more complicated incomplete, unbalanced and mixed models. The same tools learned for the balanced, complete models are simply expanded to apply to these more complicated cases. The hope is that the text progresses in an orderly manner with one topic building on the next. I thank all the people who contributed to this text. First, I thank Virgil Anderson for introducing me to the wonders of statistics. Special thanks go to Julie Sawyer, Laura Coombs, and David Weeks. Julie and Laura helped edit the text. Julie also contributed heavily to the development of Chapters 4 and 8. David has generally served as a sounding board during the writing process. He has listened to my ideas and contributed many of his own. Finally, and most importantly, I thank my wife, Diane, for her generosity and support.
1
Linear Algebra and Related Introductory Topics
A summary of relevant linear algebra concepts is presented in this chapter. Throughout the text boldfaced letters such as A, U, T, X, Y, t, g, u are used to represent matrices and vectors, italicized capital letters such as Y, U, T, E, F are used to represent random variables, and lowercase italicized letters such as r, s, t, n, c are used as constants.
1.1
ELEMENTARY
MATRIX CONCEPTS
The following list of definitions provides a brief summary of some useful matrix operations.
Matrix: An r • s matrix A is a rectangular array of elements with r rows and s columns. An r x 1 vector Y is a matrix with r rows and 1 column. Matrix elements are restricted to real numbers throughout the text.
Definition 1.1.1
Transpose: If A is an n x s matrix, then the transpose of A, denoted by A', is an s x n matrix formed by interchanging the rows and columns of A. Definition 1.1.2
2
Linear Models
Definition
1.1.3 Identity Matrix, Matrix of Ones and Zeros: In represents an n x n identity matrix, Jn is an n x n matrix of ones, In is an n x 1 vector of ones, and 0m • is an m x n matrix of zeros. Definition 1.1.4 Multiplication of Matrices: Let aij represent the ij th e l e m e n t of an r x s matrix A with i = 1 . . . . . r rows and j = 1 . . . . . s columns. Likewise, let bjk represent the j k th element of an s x t matrix B with j = 1 . . . . . s rows and k - 1 . . . . . t columns. The matrix multiplication of A and B is represented by A B = C where C is an r x t matrix whose ik th element C i k - aijbjk. If the r x s matrix A is multiplied by a scalar d, then the resulting r x s matrix d A has ij th element daij.
~--~j=l
Example 1.1.1
The following matrix multiplications c o m m o n l y occur. l'nln = n lnl'n
= Jn
JnJn = nJn
lln (In-ljn):Olxn
J: (In- 1J,,)=On• (in-1Jn) (I,,-1Jn) = (in-1jn). Definition 1.1.5 Addition of Matrices" The sum of two r x s matrices A and B is represented by A + B = C where C is the r x s matrix whose ij th element
Cij : aij 9 bij. Definition 1.1.6 Inverse of a Matrix: An n x n matrix A has an inverse if A A -1 -- A - 1 A = In where the n x n inverse matrix is denoted by A -1 . Definition 1.1.7 Singularity: If an n • n matrix A has an inverse then A is a nonsingular matrix. If A does not have an inverse then A is a singular matrix.
i th
Definition 1.1.8 Diagonal Matrix: Let aii be the diagonal element of an n • n matrix A. Let aij be the ij th off-diagonal element of A for i :/: j . Then A is a diagonal matrix if all the off-diagonal elements aij equal zero.
Definition1.1.9
Trace of a Square Matrix: The trace of an n x n matrix A, denoted by tr(A), is the sum of the diagonal elements of A. That is, tr(A) = a//
1
Linear Algebra
3
It is assumed that the reader is familiar with the definition of the determinant of a square matrix. Therefore, a rigorous definition is omitted. The next definition actually provides the notation used for a determinant.
Definition 1.1.10
Determinant of a Square Matrix: Let det(A) = IAI denote the determinant of an n x n matrix A. Note det(A) = 0 if A is singular.
Definition 1.1.11
Symmetric Matrix: An n x n matrix A is symmetric if A = A'.
Definition 1.1.12
Linear Dependence and the Rank of a Matrix: Let A be an n x s matrix (s _< n) where al . . . . . as represent the s n x 1 column vectors of A. The s vectors al . . . . . as are linearly dependent provided there exists s elements kl . . . . . ks, not all zero, such that klal + . - . + ksas - 0. Otherwise, the s vectors are linearly independent. Furthermore, if there are exactly r _< s vectors of the set al . . . . . as which are linearly independent, while the remaining s - r can be expressed as a linear combination of these r vectors, then the rank of A, denoted by rank (A), is r. The following list shows the results of the preceding definitions and are stated without proof:
Result 1.1:
Let A and B each be n x n nonsingular matrices. Then (AB)
Result 1.2:
-1
-- B-1A-1.
Let A and B be any two matrices such that AB is defined. Then (AB)' = B'A'.
Result 1.3:
Let A be any matrix. The A'A and AA' are symmetric.
Result 1.4:
Let A and B each be n x n matrices. Then det(AB) = [det(A)] [det(B)].
Result 1.5:
Let A and B be m x n and n x m matrices, respectively. Then tr(AB) = tr(BA).
Quadratic forms play a key role in linear model theory. The following definitions introduce quadratic forms.
Definition 1.1.13 Quadratic Forms: A function f(Xl . . . . . Xn) is a quadratic form if f (xl ' .. " , xn) = ~~i~=l ~ ~j = l aijxixj - - X ' A X where X = ( x 1 . . . . . Xn)' is an n x 1 vector and A is an n x n symmetric matrix whose ij th element is aij.
4
Linear Models
Example 1.1.2
Let f (xl, x2, X3) : X 2 -Jr-3X~ + 4X~ + XlX2 + 2X2X3. Then f ( x l , x2, x3) = X'AX is a quadratic form with X = (Xl, x2, x3)' and
A
1 0.5 0
_._
0.5 3 1
0] 1 . 4
The symmetric matrix A is constructed by setting coefficient on the xixj term for i # j.
Example 1.1.3
aij
and a j i equal to one-half the
Quadratic forms are very useful for defining sums of squares.
For example, let
f ( x l . . . . . Xn) -- ~
x 2 -- X'X : X'InX
i=1
where the n • 1 vector X = (xl . . . . . X n ) mean is another common example. Let
t.
The sum of squares around the sample
n
f (xl . . . . .
Xn)
:
Z(xi
-
~-)2
i=1
= s
x 2 _ n2 -2
i=1
- - __ n
Xi i=l
Xi i=l
= X ' X _ _1 (X, ln)(1,nX) n
= X' [In -- -nlIn l'n] x
Definition 1.1.14 Orthogonal Matrix: An n x n matrix P is orthogonal if and only if p-1 _ p,. Therefore, PP' = P'P = In. If P is written as (Pl, P2 . . . . . Pn) where Pi is an n x 1 column vector of P for i - 1 . . . . . n, then necessary and sufficient conditions for P to be orthogonal are (i)
P'iPi = 1 for each i = 1. . . . . n and
(ii)
p'~pj = 0 for any i # j.
1
Linear Algebra
E x a m p l e 1.1.4
V
5
Let the n x n matrix - l l v/-d
11ff2
11,vi-6
...
l l,ff n (n - 1)
l l,viJ
-11ff2
11,ff-6
...
l l ~/n (n - 1)
llU'J
0
-21J-6
...
lldn(n-
1)
l l,vi-d
0
0
...
l l d n (n -
l)
. l l ff-ff
0
0
....
'
~/ (n - 1 ) I n
where PP' = P'P = In. The columns of P are created as follows: Pl = ( 1 / r P2 -- ( 1 / r
1 2 + " ' - q - 12) (1, 1, 1,1 . . . . . 1 > ' = (1/V/'n)In -q-(-1) 2) ( 1 , - 1 , 0, 0 . . . . . 0)'
P3 = ( l / v / 1 2 -4- l a + (-2> a) (1, 1 , - 2 , 0 . . . . . 0>'
Pn = ( l / v / 1 2 -t- 12 + " "
+ 12 + ( - ( n - 1)) 2) (1, 1 . . . . .
1,-(n - 1))'.
The matrix P' in Example 1.1.4 is generally referred to as an n-dimensional Helmert matrix. The Helmert matrix has some interesting properties. Write P as P -- (pl[Pn) where the n x 1 vector Pl = (1/~/~)ln and the n x (n - 1) matrix Pn -" (P2, P3 . . . . . Pn) then PIP; = 1 j n n
P'lPi = 1 P'IPn = O l x ( n - 1 )
n
P'n Pn = In- 1. The (n - 1) x n matrix P'n will be referred to as the lower portion of an n-dimensional Helmert matrix. If X is an n x 1 vector and A is an n x n matrix, then AX defines n linear combinations of the elements of X. Such transformations from X to AX are very useful in linear models. Of particular interest are transformations of the vector X that produce multiples of X. That is, we are interested in transformations that
6
Linear
Models
satisfy the relationship AX = ZX where ~. is a scalar multiple. The above relationship holds if and only if IzI~ - AI - 0. But the determinant of ZIn - A is an n th degree polynomial in ~.. Thus, there are exactly n values of ~. that satisfy I~.In - AI -- 0. These n values of ~. are called the n eigenvalues of the matrix A. They are denoted by Zl, ~.2. . . . . ~.~. Corresponding to each eigenvalue ~.i there is an n x 1 vector X i that satisfies AXi
--
)~iXi
where Xi is called the i th eigenvector of the matrix A corresponding to the eigenvalue Zi.
Example 1.1.5 Find the eigenvalues and vectors of the 3 • 3 matrix A = 0.613 + 0.4J3. First, set IZI3 - AI - 0. This relationship produces the cubic equation ,k3 - 3~ 2 + 2 . 5 2 ~ . - 0.648 = 0 or
(~. -
1.8)()~ -
0.6)(,k -
0 . 6 ) = 0.
Therefore, the eigenvalues of A are ~.1 = 1.8, Z2 = )~3 = 0.6. Next, find vectors Xi that satisfy (A - ~ . i I 3 ) X i - - 0 3 x l for each i = 1, 2, 3. For Zl = 1.8, (A - 1.813)X1 = 03xl or (-1.213 + 0.4J3)X1 = 03xl. The vector X1 = (1/~/3)13 satisfies this relationship. For ~.2 = ~.3 = 0.6, (A - 0 . 6 1 3 ) X i = 03,,1 or l~Xi - 03xl f o r / = 2,3. The vectors X2 = ( 1 / ~ / 2 , - 1 / ~ / 2 , 0)' and X3 = (1/~/-6, 1/~/6, -2/~/-6)' satisfy this condition. Note that vectors X1, X2, X3 are normalized and orthogonal since X~X1 = X~X2 = X~X3 = 1 and X~X2 = x
x3 = x
x3 = 0.
The following theorems address the uniqueness or nonuniqueness of the eigenvector associated with each eigenvalue. 1.1.1 eigenvalue.
Theorem
There exists at least one eigenvector corresponding to each
1.1.2 If an n • n matrix A has n distinct eigenvalues, then there exist exactly n linearly independent eigenvectors, one associated with each eigenvalue. Theorem
In the next theorem and corollary a symmetric matrix is defined in terms of its eigenvalues and eigenvectors.
1
Linear Algebra
7
Theorem 1.1.3
Let A be an n x n symmetric matrix. There exists an n x n orthogonal matrix P such that P'AP = D where D is a diagonal matrix whose diagonal elements are the eigenvalues o f A and where the columns o f P are the orthogonal, normalized eigenvectors o f A. The i th column o f P (i.e., the i th eigenvectors o f A) corresponds to the i th diagonal element o f D f o r i = 1 . . . . . n. E x a m p l e 1.1.6
Let A be the 3 x 3 matrix from Example 1.1.5. Then P'AP = D or
1/~/~
-1/~/~
l/v/6
1/~/~
1,0111 04 04111, -2/vr6J
[108o o0] 0
0.4
1
0.4
l/~d/3
-lIVe2
0.4
0.4
1
1/~/r3
0
1, l 1/~/6 / -2/~/~J
0
Theorem 1.1.3 can be used to relate the trace and determinant of a symmetric matrix to its eigenvalues. Theorem 1.1.4 If A is an n x n symmetric matrix then tr(A) = E i L 1 )~i and det(A) = 1-Iinl ~.i. Proof:
Let P'AP - D then tr(A) = tr(APP') - tr(P'AP) = tr(D) = ~
)~i.
i=1
det(A) = det(APP') = det(P'AP) = det(D)
f i ~,i. i=l
The number of times an eigenvalue occurs is the multiplicity of the value. This idea is formalized in the next definition.
Definition 1.1.15
Multiplicity: The n x n matrix A has eigenvalue ~.* with multiplicity m _< n if m of the eigenvalues of A equal )c*.
E x a m p l e 1.1.7 All the n eigenvalues of the identity matrix In equal 1. Therefore, In has eigenvalue 1 with multiplicity n. E x a m p l e 1.1.8 Find the eigenvalues and eigenvectors of the n x n matrix G = (a - b)In + bJn. First, note that
G[(1/C~)I,,] = [a + (n - 1)b](1/C"n)ln.
8
Linear Models
Therefore, a + (n - 1)b is an eigenvalue of matrix G with corresponding n o r m a l i z e d e i g e n v e c t o r (l~/-ff)ln. Next, take any n x 1 vector X such that l'nX = 0. (One set of n - 1 vectors that satisfies l'nX = 0 are the c o l u m n vectors P2, P3 . . . . . Pn f r o m E x a m p l e 1.1.4.) Rewrite G = (a - b ) l n + b l n l t n . Therefore, G X = (a - b ) X + b l n l ' n X -
(a - b ) X
and matrix G has eigenvalue a - b. Furthermore, I~.In - GI -
I(Z - a + b)In - bJnl
= [a + (n - 1)b](a
- b) n-1.
Therefore, eigenvalue a + (n - 1)b has multiplicity 1 and eigenvalue a - b has multiplicity n - 1. Note that the 3 x 3 matrix A in E x a m p l e 1.1.5 is a special case of matrix G with a = 1, b = 0.4, and n = 3. It will be convenient at times to separate a matrix into its submatrix c o m p o n e n t s . Such a separation is called partitioning.
Definition 1.1.16
P a r t i t i o n i n g a M a t r i x : If A is an m x n matrix then A can be
separated or partitioned as
A=
JAil A12 1 A21 A22
w h e r e Aij is an m i x n j matrix for i, j = 1, 2, m = rnl +
m2 and
n
=nm + n2.
M o s t of the square matrices used in this text are either positive definite or positive semidefinite. These two general matrix types are described in the following definitions.
Definition 1.1.17
P o s i t i v e S e m i d e f i n i t e M a t r i x : An n • n matrix A is positive
semidefinite if
(i) (ii) (iii)
A = A', Y ' A Y >_ 0 for all n • 1 real vectors Y, and Y ' A Y = 0 for at least one n • 1 n o n z e r o real vector Y. The matrix J,, is positive semidefinite because Jn = J',,, Y ' J n Y =
Example 1.1.9
(ltnY)t(ltn Y) = (y~in=l yi) 2 >__0 for Y = (Yl . . . . . Yn)t and Y'JnY -: 0 for Y = (1,-1,o
.....
o)'.
Definition 1.1.18 nite if
P o s i t i v e D e f i n i t e M a t r i x : An n x n matrix A is positive deft-
1
Linear Algebra
(i) (ii)
9
A = A' and Y'AY > 0 for all nonzero n x 1 real vectors Y.
E x a m p l e 1.1.10 The n x n identity matrix In is positive definite because In is symmetric and Y'InY > 0 for all nonzero n x 1 real vectors Y. Theorem
(i) (ii)
1.1.5
Let A be an n x n positive definite matrix. Then
there exists an n x n matrix B o f rank n such that A = BB' and the eigenvalues o f A are all positive.
The following example demonstrates how the matrix B in Theorem 1.1.5 can be constructed. E x a m p l e 1.1.11 Let A be an n x n positive definite matrix. Thus, A = A' and by Theorem 1.1.3 there exists n x n matrices P and D such that f l A P = D where P is the orthogonal matrix whose columns are the eigenvectors of A, and D is the corresponding diagonal matrix of eigenvalues. Therefore, A = PDP' = pD1/ED1/Ep ' = BB' where D 1/2 is an n x n diagonal matrix whose i th diagonal element is ~,]/2 and B = PD 1/2.
Certain square matrices have the characteristic that A 2 = A. For example, let 1 A = In - g Jn. Then
A2= (In-lJn) 2-"
(In-lJn)n (In-lJn)n
= I n -- 1 j n -- _Jn 1 + 1jn n n n
"-'A. Matrices of this type are introduced in the next definition. Definition 1.1.19 (i) (ii)
Idempotent Matrices: Let A be an n x n matrix. Then
A is idempotent if A 2 = A and A is symmetric, idempotent if A = A 2 and A = A'.
Note that if A is idempotent of rank n then A - In. In linear model applications, idempotent matrices generally occur in the context of quadratic forms. Since the matrix in a quadratic form is symmetric, we generally restrict our attention to symmetric, idempotent matrices.
10
Linear Models
Theorem 1.1.6
Let B be an n x n symmetric, idempotent matrix o f rank r < n. Then B is positive semidefinite.
The next t h e o r e m will prove useful w h e n e x a m i n i n g sums of squares in A N O V A problems.
Theorem 1.1.7 ns f o r s (i)
1,..
~-
ArAs
=
"'
Let A1. . . . . A m be n x n symmetric matrices where rank (As) = m and ~ s =m l A s • In. I7 ~ s = m l n s __ n, then f o r r 5~ s, r, s = 1 . . . . . m a n d
On x n
As = A2 f o r s = 1 . . . . .
(ii)
m.
The eigenvalues of the matrix In - 1j~ are derived in the next example. n E x a m p l e 1.1.12 (a-
The symmetric, i d e m p o t e n t matrix In -
b)In + bJn with a = 1 - n1 and b -
n1 J~ takes the form
- f i1" Therefore, by E x a m p l e 1 1.8, the
1
1
1
eigenvalues of In - n Jn are a + (n - 1)b = (1 - fi) + (n - 1 ) ( - ~ )
= 0 with
multiplicity 1 and a - b = (1 - 1) _ ( _ 1) = 1 with multiplicity n - 1. The result that the eigenvalues of an i d e m p o t e n t matrix are all zeros and ones is g e n e r a l i z e d in the next theorem.
Theorem 1.1.8 The eigenvalues o f an n x n symmetric matrix A o f rank r < n are 1, multiplicity r and O, multiplicity n - r if a n d only if A is idempotent. Proof:
The p r o o f is given by Graybill (1976, p. 39).
II
The following t h e o r e m relates the trace and the rank of a symmetric, i d e m p o t e n t matrix.
Theorem 1.1.9
I f A is an n x n symmetric, idempotent matrix then tr(A) =
rank(A). Proof:
The p r o o f is left to the reader.
II 1
In the next e x a m p l e the matrix I,, - n J,, is written as a function of n - 1 of its eigenvalues. E x a m p l e 1.1.13
1
The n • n matrix In - nJn takes the form (a - b)I,, + bJn with
a - 1 - 1n and b = - !n . By E x a m p l e s 1.1.8 and 1.1.12, the n - 1 eigenvalues equal to 1 have c o r r e s p o n d i n g eigenvectors equal to the n - 1 c o l u m n s of Pn w h e r e P'n is the (n - 1) x n lower portion of an n - d i m e n s i o n a l H e l m e r t matrix. Further, 1
In -- nJn = PnPtn 9
1
Linear Algebra
11
This representation of a symmetric idempotent matrix is generalized in the next theorem.
Theorem 1.1.10
/ f A is an n x n symmetric, idempotent matrix o f rank r then A = PP' where P is an n x r matrix whose columns are the eigenvectors o f A associated with the r eigenvalues equal to 1. Proof:
Let A be an n x n symmetric, idempotent matrix of rank r. By Theorem
1.1.3, R'AR=
[ Ir0 00]
where R = [PIQ] is the n x n matrix of eigenvectors of A, P is the n x r matrix whose r columns are the eigenvectors associated with the r eigenvalues 1, and Q is the n x (n - r) matrix of eigenvectors associated with the (n - r) eigenvalues 0. Therefore, A=[PQ]
[ lr
0
0O] [ PQ'' 1 = P P "
Furthermore, P'P = lr because P is an n x r matrix oforthogonal eigenvectors.
I
If X'AX is a quadratic form with n x 1 vector X and n x n symmetric matrix A, then X'AX is a quadratic form constructed from an n-dimensional vector. The following example uses Theorem 1.1. l0 to show that if A is an n x n symmetric, idempotent matrix of rank r < n then the quadratic form X'AX can be rewritten as a quadratic form constructed from an r-dimensional vector. Let X'AX be a quadratic form with n x 1 vector X and n x n symmetric, idempotent matrix A of rank r < n. By Theorem 1.1.10, X'AX = X'PP'X = Z ' Z where P is an n x r matrix of eigenvectors of A associated with the eigenvalues 1 and Z = P'X is an r x 1 vector. For a more specific example, note that X ' ( I n - n1 J n ) X = X ' P n P ' n X = Z'Z where P',, is the (n - 1) x n lower portion of an n-dimensional Helmert matrix and Z = P'n X is an (n - 1) • 1 vector.
E x a m p l e 1.1.14
Later sections of the text cover covariance matrices and quadratic forms within the context of complete, balanced data structures. A data set is complete if all combinations of the levels of the factors contain data. A data set is balanced if the number of observations in each level of any factor is constant. Kronecker product notation will prove very useful when discussing covariance matrices and quadratic forms for balanced data structures. The next section of the text therefore provides some useful Kronecker product results.
12 1.2
Linear M o d e l s KRONECKER
PRODUCTS
Kronecker products will be used extensively in this text. In this section the Kronecker product operation is defined and a number of related theorems are listed without proof 9
Definition 1.2.1 Kronecker Product: If A is an r x s matrix with i j th element aij for i = 1 . . . . . r and j = 1 . . . . . s, and B is any t x v matrix, then the Kronecker product of A and B, denoted by A | B, is the rt x s v matrix formed by multiplying each aij element by the entire matrix B. That is, allB
al2B
...
alsB
a21B
a22B
...
a2sB
9
.
9149
9
arzB
""
araB
A| arlB
T h e o r e m 1.2.1
Let A and B be any matrices. Then (A | B)' = A' | B'.
Example 1.2.1
[1 a | (2, 1, 4)1' = 1'a |
(2, 1, 4)'.
T h e o r e m 1.2.2 Let A, B, and C be any matrices and let a be a scalar. Then aA | B | C = a(A | | C = A | (aB | C).
Example 1.2.2 T h e o r e m 1.2.3
1
1
aJa @ Jb @ Jc -- a[(Ja @ Jb) ~) Jc] = Ja | Let A and B be any square matrices 9
( abJl | Jc). Then t r ( A |
B) --
[tr(A)][tr(B)].
Example 1.2.3 (a-
1)(n-
t r [ ( I a - aJa) 1 @
(I n --
1Jn)] -- tr[(Ia -- 1Ja)]tr[(In - 1Jn)] --
1).
T h e o r e m 1.2.4
Let A be an r x s matrix, B be a t x u matrix, C be an s x v matrix, and D be a u • w matrix 9 Then (A | B)(C | D) = AC | BD.
Example 1.2.4
[la @ Jn ] [ (la -- aJa) 1 1 1 1 @ ~Jn] = Ia(Ia - aJa) @ nJnJn -- (Ia --
1Ja)|176 T h e o r e m 1.2.5
Let A and B be m x m and n x n nonsingular matrices, respectively. Then the inverse o f A | B is (A | B) -1 = A -1 | B -1.
Example 1.2.5
[(Ia + ~Ja) ~ (In + flJn)] -1 = [Ia -- (~/(1 + a u ) ) J a ] | (fl/(1 4- nfl))Jn] for 0 < a, ft.
[In --
1
Linear Algebra
T h e o r e m 1.2.6
13
Let A and B be m x n matrices and let C be a p x q matrix.
Then (A + B) |
C = (A | C) + (B | C). (Ia - - a l Ja ) @ (In --~1Jn ) = [la ~) (In _ n[ J n1) ] _ aajl | (In - -Jn)]nl __
E x a m p l e 1.2.6 1
1
la @ In -- la @ nJn - a Ja @ In
+
1 1 a Ja n Jn 9
T h e o r e m 1.2.7 Let A be an m x m matrix with eigenvalues O~1 . . . . . Olm and let B be an n x n matrix with eigenvalues ~1 . . . . . fin. Then the eigenvalues o f A | B (or B | A ) are the m n values Oti ~ j f o r i = 1 . . . . . m and j = 1 . . . . . n.
E x a m p l e 1.2.7
From Example 1.1.8, the eigenvalues of In
-
1
nJn are 1 with
1
multiplicity n - 1 and 0 with multiplicity 1. Likewise, Ia - aJa has eigenvalues 1 with multiplicity a - 1 and 0 with multiplicity 1. Therefore, the eigenvalues 1 1 of (la - aJa) @ (In -- nJn) are 1 with multiplicity (a - 1)(n - 1) and 0 with multiplicity a + n - 1. T h e o r e m 1.2.8 Let A be an m • n matrix o f rank r and let B be a p • q matrix o f rank s. Then A | B has rank rs.
E x a m p l e 1.2.8
rank[(Ia - a1Ja) |
I. ] =[rank (Ia -- a1Ja) ] [rank (In) ] : (a - 1)n.
T h e o r e m 1.2.9 Let A be an m x m symmetric, idempotent matrix o f rank r and let B be an n x n symmetric, idempotent matrix o f rank s. Then A | B is an m n x m n symmetric, idempotent matrix where t r ( A | B) = rank (A | B) = rs.
E x a m p l e 1.2.9
tr[ (I a - aJa) 1 | In] = rank[(Ia - a1J a ) |
In] = (a -- 1)n.
The following example demonstrates that Kronecker products are useful for describing sums of squares in complete, balanced ANOVA problems. Consider a one-way classification where there are r replicate observations nested in each of the t levels of a fixed factor. Let Yij represent the jth replicate observation in the i th level of the fixed factor for i = 1 . . . . . t and j -- 1 . . . . . r. Define the tr x 1 vector of observations Y (yll . . . . . Ylr . . . . . Ytl . . . . . Ytr)'. The layout for this experiment is given in Figure 1.2.1. The ANOVA table is presented in Table 1.2.1. Note that the sums of squares are written in summation notation and as quadratic forms, Y'AmY, for m = 1 . . . . . 4. The objective is to demonstrate that the tr x tr matrices Am can be expressed as Kronecker products. Each matrix Am is derived later. Note E x a m p l e 1.2.10
y.. = [1/(rt)llttr Y = [ 1 / ( r t ) ] ( l t | l r ) t y
14
Linear Models Fixed Factor (i) 2
1 Replicates (j)
Figure 1.2.1
t
Yll Y12
Y21 Y22
Ytl
Ylr
Y2r
Ytr
Yt2
One-Way Layout.
Table 1.2.1 One-Way ANOVA Table Source
df
Mean
1
r
Fixed factor
t - 1
r
~ i - i 2j=_l (yi.- y..)2 = Y'A2Y
Nested replicates
t (r - l)
~-~i=l ~-'~j=l (Yij - Yi.) 2 = Y'A3Y
Total
tr
~ i = 1 E j=1 22
SS 92
-- Y'A1 Y
r
r
-- Y'A4Y
and
(Yl . . . . . . Yr.)' =
( 1,) It |
-1 r
r
Y.
Therefore, the sum of squares due to the mean is given by
~-~~'~
y2.. = rty2..
i=1 j = l
= rt{[1/(rt)](lt
| lr)'Y}'{[1/(rt)](lt
| lr)'Y}
= Y'A1Y where the tr x tr matrix A1 -- 1Jt @ 1Jr. The sum of squares due to the fixed factor is
1
Linear Algebra
15 t
( y i . _ y..)2 = r Z
i=1 j=l
try2..
y/2. _
i=1
=Y'[It|
Jt|
=Yt[(It-~Jt)| = Y'AEY where the tr x tr matrix A2 = (it - 7Jt) 1 @ r1Jr" The sum of squares due to the nested replicates is (Yiji=l
yi.)2 :
j=l
y/~ _ r ~--~ y2. i=l
j=l
i=1
= Y'[It | I r ] Y - Y' [It | !Jrl Y
='[It (Ir l r)l = Y'A3Y where the
t r x t r matrix
A3 = It ~ (Ir -- r1Jr). Finally, the sum of squares total is
~--~ ~
y2 .__ Y'[It ~ Ir]Y : Y'A4Y
i--1 j=l
where the t r x t r matrix A4 -- It t~ Ir. The derivations of the sums of squares matrices Am can be tedious. In Chapter 4 an algorithm is provided for determining the sums of squares matrices for complete, balanced designs with any number of main effects, interactions, or nested factors. This algorithm makes the calculation of sums of squares matrices Am very simple. This section concludes with a matrix operator that will prove useful in Chapter 8. Definition 1.2.2 BIB Product: If B is a c x d matrix and A is an a x b matrix where each column of A has c _< a nonzero elements and a - c zero elements, then the BIB product of matrices A and B, denoted by A n B, is the a x bd matrix formed by multiplying each zero element in the i th column of A by a 1 x d row
16
Linear Models
vector of zeros and multiplying the jth nonzero element in the ith column of A by the jth row of B for i = 1. . . . . b and j = 1. . . . . c. Let the 3 x 3 matrix A = J3 - I3 and the 2 x 2 matrix B = 1 I2 - 2J2" Then the 3 x 6 BIB product matrix
E x a m p l e 1.2.11
ADB=
I =
0 1 1 0 1 1
1] 1 [] 0
0 1/2 -1/2
0 -1/2 1/2
1/2 -1/2 1/2 0 -1/2
-1/2] 1/2 -1/2 0 1/2
1/2 -1/2 0
-1/21 1/2 . 0
Theorem 1.2.10
I f A is an a x b matrix with c < a nonzero elements p e r c o l u m n a n d a - c zeros p e r column; B1, B2, a n d B are each c • d matrices; D is a b x b diagonal matrix o f rank b; Z is a d x 1 vector; a n d Y is a c x 1 vector, then (i) (ii)
[A D B][D | Z] = AD D BZ [A [] Y][D | Z'] = AD [] YZ'
(iii)
D9
= D|
Y'
(iv)
[A I-I(B1 + B2)] - [A El B1] + [A D B2].
Let A be any 3 • 3 matrix with two nonzero elements per 1 column and let B = I2 - 2J2. Then
E x a m p l e 1.2.12
E (')
[At2B][13|
A{2
I2-~J2
[I3 | 12][I3 | 1~]
= IA rq ( ( 1 2 - ~ J 2 ) 1 2 ) ] = [A 7] 02• :
1.3
[I3 | 1~]
@ 1~]
03x6.
RANDOM VECTORS
Let the n x 1 random vector Y = (Y1, Y2. . . . . Yn)' where Yi is a random variable for i = 1. . . . . n. The vector Y is a random entity. Therefore, Y has an expectation; each element of Y has a variance; and any two elements of Y have a covariance
1
Linear Algebra
17
(assuming the expectations, variances, and covariances exist). The following definitions and theorems describe the structure of random vectors.
Definition 1.3.1
Joint Probability Distribution: The probability distribution of the n x 1 random vector Y = ( Y 1 . . . . . Yn)' equals the joint probability distribution of Y1. . . . . Yn. Denote the distribution of Y by f v ( y ) - fv(Yl . . . . . Yn).
Definition 1.3.2
Expectation o f a R a n d o m Vector: The expected value of the n x 1 random vector Y = (Y1 . . . . . Yn)' is given by E(Y) = [E(Y1) . . . . . E(Yn)]'.
Definition 1.3.3
Covariance Matrix o f a R a n d o m Vector Y" The n x 1 random vector Y - (Y1 . . . . . Yn)' has n x n covariance matrix given by
cov(Y) - E { [ Y - E ( Y ) ] [ Y - E(Y)]'}. The i j th element of cov(Y) equals E{[Yi - E(Yi)][Yj - E(Yj)]} for i, j - 1 . . . . . n.
Definition 1.3.4
Linear Transformations o f a R a n d o m Vector Y: I f B is an m • n matrix of constants and Y is an n x 1 random vector, then the m x 1 random vector BY represents m linear transformations of Y.
The following theorem provides the covariance matrix of linear transformations of a random vector.
Theorem 1.3.1
I f B is an m x n matrix o f constants, Y is an n x 1 random vector, and cov(Y) is the n x n covariance matrix o f Y, then the m • 1 random vector BY has an m x m covariance matrix given by B[cov(Y)]B'. Proof."
cov(BY) = E { [ B Y - E(BY)][BY - E(BY)]'} = E{B[Y - E(Y)][Y - E(Y)]'B'} = B{E{[Y - E(Y)][Y - E(Y)]'}}B' = B[cov(Y)]B'.
II
The next theorem provides the expected value of a quadratic form.
Theorem 1.3.2 Let Y be an n x 1 random vector with mean v e c t o r / z = E(Y) and n x n covariance matrix ~ - cov(Y) then E(Y'AY) = t r ( A ~ ) + / z ' A # where A is any n x n symmetric matrix o f constants.
18
Linear Models
Proof:
Since (Y - / z ) ' A ( Y - / z ) is a scalar and using Result 1.5,
(Y- #)'A(Y-
tz) = t r [ ( Y - / z ) ' A ( Y - / z ) ]
= tr[A(Y-/z)(Y-/z)'].
Therefore, E[Y'AY] = E [ ( Y - / x ) ' A ( Y - / z ) = E{tr[A(Y-/z)(Y-
+ 2Y'A/z - / z ' A # ] #)']} + 2E(Y'A/x) - # ' A #
= tr[AE{ (Y - /z) (Y - /x)'}] + / x ' A # = tr[A~] + / z ' A # .
l
The moment generating function of a random vector is used extensively in the next chapter. The following definitions and theorems provide some general moment generating function results.
Definition 1.3.5 M o m e n t Generating Function ( M G F ) o f a R a n d o m Vector Y: The MGF of an n x 1 random vector Y is given by my(t) = E(e t'v) where the n • 1 vector of constants t = (tl . . . . . tn)' if the expectation exists for - h < ti < h where h > 0 and i -- 1. . . . . n. There is a one-to-one correspondence between the probability distribution of Y and the MGF of Y, if the MGF exists. Therefore, the probability distribution of Y can be identified if the MGF of Y can be found. The following two theorems and corollary are used to derive the MGF of a random vector Y.
Theorem 1.3.3
Let the n x 1 random vector Y = (Y'I, Y2 . . . . . Y~)' where m Yi is an ni • 1 random vector f o r i = 1 . . . . . m and n = ~-~i=l ni. Let m e ( . ) , mv~ (.) . . . . . m y m(.) represent the M G F s o f Y, Y1 . . . . . Ym respectively. The vectors Y1 . . . . . Ym are mutually independent if and only if
my(t) = my~ (tl)mv2 ( t 2 ) . . .
m y m (tm)
f o r all t = (t~ . . . . . t'm)' on the open rectangle around O.
Theorem 1.3.4 I f Y is an n x 1 random vector, g is an n • 1 vector o f constants, and c is a scalar constant, then mg, y(c) = my(cg). Proof:
mg,u
= E[e cg'u = E[e ~cg)'u = mu
II
1
Linear Algebra
19
Let mr, (.) . . . . . myra (.) represent the MGFs of the independent random variables Y1 . . . . . Ym, respectively. If Z = ~-~i%1 gi then the M G F of Z is given by C o r o l l a r y 1.3.4
m mz(s) = I-[ mr,, (s). i=1
Proof." mz(s)
= ml~,y(s)
--
my(slm)
by Theorem 1.3.4
m
= I I mri (s)
by Theorem 1.3.3.
II
i=1
Moment generating functions are used in the next example to derive the distribution of the sum of independent chi-square random variables.
Example 1.3.1 Let Y1 . . . . . Ym be m independent central chi-square random variables where Yi and n i degrees of freedom for i = 1. . . . . m. For any i mri (t) = E(e tri)
etyi [I"(ni/2)2ni/2] -1 Yini/2-1 e-Yi/2dyi : fO ~176 ---- [I'(ni/2)2ni/2] -1 fO0~176 yini/2- le-y i/(l/2-t) -I dyi = [I'(ni/Z)Zn~/2] -1 [ F ( n i / 2 ) ( 1 / 2 - t) -n'/2] = (1 - 2t)-n'/2. Let Z = ~-~im__lYi. By Corollary 1.3.4, m
mz(t) = 1-I mr~ (t) i=1
= ( 1 - 2t) - Y']~2, n~/2.
Therefore, y]im=l Yi is distributed as a central chi-square random variable with m )-~i=1 ni degrees of freedom. The next theorem is useful when dealing with functions of independent random vectors.
20
Linear Models
Theorem 1.3.5 Let gl(Y1) . . . . . gm(Ym) be m functions o f the random vectors Y1 . . . . . Y m , respectively. I f Y1 . . . . . Y m are m u t u a l l y i n d e p e n d e n t , t h e n g l . . . . . gm are mutually independent. The next example demonstrates that the sum of squares of n independent N1 (0, 1) random variables has a central chi-square distribution with n degrees of freedom. E x a m p l e 1.3.2 Let Z 1 . . . . . Z n be a random sample of normally distributed random variables with mean 0 and variance 1. Let u = Z 2 for i = 1 . . . . . n. The m o m e n t generating function of Yi is mri (t) = mz2i (t) = E(e tz2i ) =
F f
(2yr)-l/2etZ2-Z2/2dz i
oo
__
z~ (27r)-l/2e-(1-Et)z2 /2dzi oo
-- (1 - 2t) -1/2. That is, each Yi has a central chi-square distribution with one degree of freedom. Furthermore, by Theorem 1.3.5, the Yi's are independent random variables. n n Therefore, by Example 1.3.1, ~~i=1 Yi = ~--~i=1 Z/2 is a central chi-square random variable with n degrees of freedom.
EXERCISES 1
1
1. Find an mn x (n - 1) matrix P such that PP' = m Jm @ (In -- nJn)" n
~
/'/
n
2. Let S1 = ~~/=1 Ui (Vi - V), $2 --" ~~i=1 (Ui - ~ ) 2 , and $3 = E i = I (Vi - V ) 2. If A = (I,, - 1 J n ) ( I n - V V ' ) ( I n -- ~Jn), U = ( g l . . . . . gn) t, and V = (V1,. , Vn)', is the statement $2 - ($21/$3) = U'AU true or false? If the statement is true, verify that A is correct. If the statement A is false, find the correct form of A. 1
1
3. Let V = lm @ [a121, + a~Jn], A1 : (Im -- mJm) @ nJn, and A2 : lm N (In -1jn). n
(a) Show A1A2 = 0mn • (b) Define an mn x mn matrix C such that Ira,, = A1 + A2 + 12.
1
Linear Algebra
21
(c) Let the m n x 1 vector Y = (Yll . . . . . Ymn)' and let C be defined as in part b. Is the following statement true or false?: Y ' C Y = [~--~im1 ~'~=1 Yij]2/ (ran). If the statement is true, verify it. If the statement is false, redefine Y ' C Y in terms of the Yij's. (d) Define constants k l, k2, and m n x m n idempotent matrices C1 and C2 such that A1V = klC1 and A2V -- k2C2. Verify that C1 and C2 are idempotent. 4. Find the inverse of the matrix 0.414 + 0.6J4.
(I.-W)
5. Show that the inverse of the matrix (In + VV') is \ l+V'V
where V is an n x 1
vector. 6. Use the result in Exercise 5 to find the inverse of matrix (a - b)In + bJ,~ where a and b are positive constants. 7. Let V = In | [(1 - p)12 + pJ2] where - 1 < p < 1. (a) Find the 2n eigenvalues of V. (b) Find a nonsingular 2n x 2n matrix Q such that V = QQ'. 1
1
1
1
1
8. Let A1 -- mJm | nJn , A2 - (Im - mJm ) | nJn, and A3 = In | (In _ nJn). Find ~-]~=1Ai, A i A j for all i, j = 1, 2, 3 and ~~=1 ciAi where Cl - c2 = l+(m-1)bandca=l-bfor1 < b < 1. (m-l) 9. Define n x n matrices P and D such that (a - b)In -+- bJn = PDP' where D is a diagonal matrix and a and b are constants. 10. Let B
~.
In, 0
--Y] 12Y]21 In2
[
Y]ll
and
~-
~"]12 ]
~-']21 ~-']22
where E is a symmetric matrix and the 'Y']ij are ni • nj matrices. Find B E B ' . 1
1
-
1
11. Let A1 -- mJm | nJn, A2 = X(X'X) 1X/, A3 - Im | (In -- nJn) and X X + @ In where X + is an m • p matrix such that l'm X+ = 01xp. Find the m n x m n matrix A4 such that I m | form.
In -- ~-~4=1 Ai. Express A4 in its simplest
12. Is the matrix A4 in Exercise 11 idempotent? 13. Let Y be an n • 1 random vector with n • n covariance matrix cov(Y) = al2In -4- a2Jn. Define Z = P'Y where the n x n matrix P = (lnlP,,) and the n x (n - 1) matrix Pn are defined in Example 1.1.4. Find cov(P'Y).
22
Linear Model:,,;
14. L e t Y be a bt x 1 r a n d o m v e c t o r w i t h E ( Y ) = lb @ (/z 1. . . . . /s and c o v ( Y ) cr2[Ib @ Jt] + a2T[Ib ~ (It -- 7Jt)]. 1 1 @ 7Jt, 1 D e f i n e A1 = ~Jb A2 - ( I b 1Jb) | 1Jt , A3 = (lb -- gJb) 1 l | (It -- 7 J t ) . F i n d E ( Y ' A i Y ) for i = 1, 2, 3. 15. L e t the btr x 1 v e c t o r Y -- (Ylll . . . . . Ybtr)' and let b Sl -- E ~ ~ ' Y
Yllr, Y121 . . . . .
Y12r . . . . .
Ybtl . . . .
"2 . . . .
i=1 j = l k=l
b~r i=1 j = l k=l
S3 -- E E (y'j" -- ~ ' " i=1 j=l k=l
54 -- E
.)2,
(YiJ" -- Fi'" -- Y.j. + Y . . .)2
i=1 j = l k=l
s5 = ~
(Yij, - Yij.) 2.
i=1 j = l k=l
Derive
btr x btr m a t r i c e s
Am for m = 1 . . . . .
5 where
Sm =
Y'AmY.
2
Multivariate Normal Distribution
In later chapters we will investigate linear models with normally distributed error structures. Therefore, this chapter concentrates on some important concepts related to the multivariate normal distribution.
2.1
MULTIVARIATE NORMAL DISTRIBUTION FUNCTION
Let Z 1 . . . . . Z n be independent, identically distributed normal random variables with mean 0 and variance 1. The marginal distribution of Zi is
fzi (zi) = (2zr)-l/2e-Z{/2
-~
< zi < oo
for i = 1 . . . . . n. Since the Zi's are independent random variables, the joint probability distribution of the n x 1 random vector Z = (Z1 . . . . . Zn)' is fz(z) = (27r)-n/2e-Ei"--lZ2/2 = (2~)-n/2e-Z'Z/2
23
--O0
< Zi < O0
24
L i n e a r Model~
for i = 1 . . . . . n. Let the n x 1 vector Y = G Z +/~ where G is an n x n nonsingula~t matrix a n d / z is an n x 1 vector. The joint distribution of the n x 1 random vector Y is f y ( y ) = I~1-1/2 (2n')-n/2e-{(Y-U)'Y:'-I (y-/.~)}/2. where ~ = G G ' is an n x n positive definite matrix and the Jacobian for the transformation Z = G -1 ( Y - / ~ ) is IGG'1-1/2 - I~1-1/2. The function f y (y) is the multivariate normal distribution of an n x 1 random vector Y with n x 1 mean vector/_t and n • n positive definite covariance matrix ~ . The following notation will be used to represent this distribution: the n x 1 random vector Y ~ Nn (/~, ~ ) . The m o m e n t generating function of an n-dimensional multivariate normal random vector is provided in the next theorem.
T h e o r e m 2.1.1
Let the n x 1 random vector Y ~ Nn (lz, ~,). The M G F o f Y i,', m y ( t ) = e t't~+t'~t/2
where the n x 1 v e c t o r t = (tl . . . . . tn)' f o r - h
< ti < h, h > O, and i -
1. . . . . n
Proof: Let the n x 1 random vector Z = ( Z 1 . . . . . Z n)' where Z i are independent identically distributed N1 (0, 1) random variables. The n x 1 random vector Y G Z + / ~ --~ Nn (/~, N) where N = G G ' . Therefore, m y ( t ) = Ev[e t'v]
= Ez
=
[e t'(GZ+#)]
=/.../(2rr)-n/2et'(Gz+U)e-Z'Z/2dz
=et'U+t'y,t/2f...f(2rr)-n/2et-(z-C,'t)'(z-G't)}/2dz
: e t'#+t'~2t/2.
II
We will now consider distributions of linear transformations of the random vector Y when Y ~ Nn (/z, ~ ) . The following theorem provides the joint probabilit~ distribution of the m • 1 random vector BY + b where B is an m x n matrix of constants and b is an m x 1 vector of constants.
T h e o r e m 2.1.2 / f Y is an n x 1 random vector distributed Nn (/~, ~ ) , B is at~ m x n matrix o f constants with m <_ n, a n d h is an m x 1 vector o f constants, ther, the m x 1 random vector BY + b "~ Nm (B/~ + b, B ~ B ' ) .
2
Multivariate Normal Distribution
25
Proof: mBY+b(t)
:
E y [ e t'(BY+b)] -- e t ' b E y [ e (B't)'Y] __ et'be(B't)'tt+(B't)'~(B't)/2 _ e t'(Btt+b)+t'(B~B')t/2.
The M G F of BY + b takes the form of a multivariate normal random vector with dimension m, mean vector B/~ + b and covariance matrix B E B ' and the proof is complete. II
E x a m p l e 2.1.1 Find the distribution of ~" = ( 1 / n ) l'nY when the n x 1 random vector Y = (I11. . . . . Y~)' -'~ Nn(ctln, or21,,). By Theorem 2.1.2 with 1 x n matrix B = (1/n)l'~ and scalar b = 0, ~" ~ Nl(Ot, cr2/n) since B(c~ln) + b = (1/n)l'n(Otln) + 0 = C~
and
B(cr21~)B ' = (1/n)Yn(cr21~)(l' / n ) ' = cr2/n
E x a m p l e 2.1.2 Let the n x 1 random vector Y be defined as in Example 2.1.1. Find the distribution of the ( n - 1) x 1 random vector U = ( U 1 , . . 9 ~ Un-1 ) I _ _ (1/cr)P'~Y where P'~ is the (n - 1) x n lower portion of an n-dimensional Helmert matrix. By Theorem 2.1.2 with (n - 1) x n matrix B = (1/cr)P'~ and (n - 1) x 1 vector b = 0~n-1)• U ~ Nn-1 (0, I n - l ) since B(ctln) -k b = (1/cr)P'n(ctl~) = 0 n - l x l B(cr21n)B ' =
and
(1/cr)P'n(Cr2In)[(1/cr)P'n] ' = In-1.
The n x 1 random vector Y can be partitioned as Y = (Y'I, Y'2)' where Yi is an n i x 1 vector for i = 1, 2 and n = n l + n2. Theorem 2.1.2 is used to derive the marginal distributions of the n i x 1 random vectors Yi.
Theorem 2.1.3 Let the n x 1 random vector Y = (Y~ ' Y'2 )' "~ N,, (~, ~ ) where lz -- (lz], /x'2)' is the n x 1 mean vector,
N =
[J
~"]~11 ~']~12 N21 N22
is the n x n covariance matrix, Yi and ~i are ni x 1 vectors, IP,ij is an ni x nj matrix f o r i, j = 1, 2 and n = n l --I-n2. The marginal distribution o f the ni x 1 random vector Yi is Nni (~i, ~"~ii). Proof: By Theorem 2.1.2 with n 1 x n matrix B = [In~ 10n~• ] and n 1 x 1 vector b = 0n~• = BY + b ~ N,,~ (/z 1, E l l ) . The marginal distribution of Y2 is
26
Linear Models,
derived in the same way with n2 • 1 matrix B b - O , , 2 x 1. II
=
[ 0 n 2 x n I IIn2]
and n2 • 1 vector
The results of Theorem 2.1.3 are generalized as follows. If the n x 1 random vector Y ~ Nn (/~, ~ ) , then any subset of elements of Y has a multivariate normal distribution where the mean vector of the subset is obtained by choosing the corresponding elements of/z, and the covariance matrix of the subset is obtained by choosing the corresponding rows and columns of ~ . Two normally distributed random vectors have the unique characteristic that the two vectors are independent if and only if they are uncorrelated. This result is given in the next theorem9 Theorem 2.1.4 Let the n • 1 r a n d o m vector Y = (Y] . . . . . Y'm)' ~ Nn (~, ~ ) where I~ = (l~] . . . . . ffm)' is the n • 1 m e a n vector, ~']11
~"]12
"'"
~-']lm
Y]21
~-']22
"'"
~-']2m
~']m l
]~am2
"""
~amm
is the n x n covariance matrix, Yi a n d ~-~i are ni • 1 vectors, ~aij is an ni • nj m matrix f o r i, j = 1 . . . . . m a n d n = ~--~i=1hi. The r a n d o m vectors Y1 . . . . . Y,, are i n d e p e n d e n t if a n d only if Eij -- Oni • f o r all i :~ j.
Proof:
First a s s u m e
~ij
:
01~ i x n j
for all i ~ j. The moment generating functior~
of Y is my(t) = E[e t'u
=
e t't~+t'~t/2
= e t't~+Em=l E~=l t; Yl,ijt j/2 = et't~+~m=l tti~2iiti/2 m - - I X et'il~i+t'i~iiti/2 i=l
m
-- 1-I m Y i ( t i ) i=l
where t = (t~ . . . . . t'm)' with ni x 1 vector ti f o r / = 1 . . . . . m. Therefore, by Theorem 1.3.3, the vectors Yi are mutually independent. Now assume the vector~ Yi are mutually independent. For any i 5~ j,
~]ij = E{[Yi
--/.ti][Yj
-
I~j] !}
=
E[Yi - / ~ i ] E [ Y j - / ~ j
! = Orlixtl j
1
2
Multivariate Normal Distribution
27
In the following examples mean vectors and covariance matrices are derived for a few common problems.
Example 2.1.3
Let I"1 . . . . . Y,, be independent, identically distributed N1 (ct, 0.2) random variables. By Theorem 2.1.4, cov(Yi, Yj) = 0 for i :~ j. Furthermore, E(Yi) = c~ and the var(Yi) = 0.2 for all i = 1 . . . . . n. Therefore, the n x 1 random vector Y = (Y1 . . . . . Yn)' "~ N n ( c t l n , 0 . 2 1 n ) .
Example 2.1.4
Consider the one-way classification described in Example 1.2.10. Let Yij be a random variable representing the jth replicate observation in the i th level of the fixed factor f o r i = 1. . . . . t and j = 1. . . . . r. Let the tr x 1 random vector Y = Yll . . . . . Ylr . . . . . Ytl . . . . . Ytr)' where the Yij's are assumed to be independent, normally distributed random variables with E(Yij) = ~i and var(Yij) = 0"2. This experiment can be characterized with the model Yij "-" #i .qt_ R ( T ) ( i ) j
where the R ( T ) ( i ) j are independent, identically distributed normal random variables with mean 0 and variance 0.2. The letter R signifies replicates and the letter T signifies the fixed factor or fixed treatments. Therefore, R(T) represents the effect of the random replicates nested in the fixed treatment levels. The parentheses around T identify the nesting. By Theorems 2.1.2 and 2.1.4, Y ~ Ntr (11,, Y]) where the tr • 1 mean vector/~ is given by -- [E(Y11) ..... =
[# 1.....
= tU~l'r . . . . .
= (]Z 1 . . . . .
E(Ylr) .....
# 1. . . . .
/Zt . . . . .
E(Ytr)]'
E ( Y t 1) . . . . . # t ]t
U,l'rJ'
# t ) ' ~ lr
and using Definition 1.3.3 the elements of the tr x tr covariance matrix ~ are c o v ( Y i j , Yi'j') = E[(Yij - E ( Y i j ) ) ( Y i , j , = E [ ( / z i q-
R(T)(i)j
- E(Yi,j,))]
- #i)(#i'-~-
R(T)(i,)j, -/zi,)]
= E[R(T)(i)j R(T)(i,)j,] f0.2 0
if/=i'andj=j' otherwise.
That is, ~ = o-21t | Ir. The preceding model is composed of two parts: a fixed portion represented by the t fixed constants #i and a random portion represented by the tr random variables R ( T ) ( i ) j . In models of this type, each constant in the fixed portion equals
28
Linear
Models
the expected value of observations in particular combinations of the levels of the fixed factor(s). Models whose fixed portions are represented in this way are called mean models. Numerous examples of mean models are presented in this text and a specific discussion on mean models is provided in Chapters 9 and 10.
Example 2.1.5
Consider a two-way cross classification where both factors are random. Let Yij be a random variable representing the observation in the ith level of the first random factor S and the jth level of the second random factorTfori = 1. . . . . s a n d j - 1 . . . . . t. Let the st x 1 random v e c t o r Y = Yll ..... Ylt . . . . . Ysl . . . . . Yst)'. This experiment can be characterized with the model
gij : Ol --~ Si + Tj -~- S Tij where u is a constant representing the overall mean; the random variable Si represents the effect of the i th level of the first random factor; the random variable Tj represents the effect of t h e jth level of the second random factor; and the random variable STij represents the interaction effect of the i th level of factor S and the jth level of factor T. Furthermore, S1 . . . . . Ss, T1 . . . . . Tt and STll . . . . . STst are assumed to be independent normal random variables with zero expectations and variances given by var(Si) = a 2, var(Tj) = cr2, and v a r [ S T / j ] - " O'2T for/ = 1 . . . . ,.: and j = 1 . . . . . t. Therefore, by Theorems 2.1.2 and 2.1.4, Y ~ Nst (l~, Z,) where the st x 1 mean vector # is given by p,
--
[E(Yll)
---
[0/,
.....
. . . , Ct,
= otls |
E(Ylt)
.....
. . . , 0~, . . . ,
E(Ysl)
.....
E(Yst)]'
]' O~
It
and the elements of the st x st covariance matrix ~ are
cov(Yij, Yi,j,) = E[(Yij - E(Yij))(Yi,j, - E(Yi,j,))]
= E[(c~ + S~ + Tj + ST~j - ~ ) ( ~ + Si, + Tj, + ST~,j,- ~)] = E[SiSi,] + E[Tj Tj,] -1- E[ST~jSTvj,]
_
a2+a 2+azr
if/=i'andj=j'
a2
if/=
a2
if j = j ' b u t / ~ - i'
0
otherwise.
i' but j -7/=j '
That is, ~ -- a g (Is | Jr) 4- a 2 (Js | It) + a~r (Is | It). Direct derivation of the covariance matrix E can be difficult even for balanced design structures. In Chapter 4 a simple algorithm is provided for determining
2
Multivariate Normal Distribution
29
the covariance matrix for complete, balanced designs with any number of fixed or random main effects, interactions, and nested factors.
2.2
CONDITIONAL DISTRIBUTIONS OF MULTIVARIATE NORMAL RANDOM VECTORS
In this section conditional multivariate normal distributions are discussed. T h e o r e m 2.2.1 Let the n x 1 random vector Y = (Y], Y'2)' "~ N. (/~, ~ ) where /~ = ( ~ , 1~'2)' is the n x 1 mean vector,
[~"]11 ~-]12] ~]21 ~]22
~]--
is the n x n positive definite covariance matrix, Y i and I~i are ni • 1 vectors, ~aij is an ni x nj matrix f o r i, j = 1, 2 and n = nl + n2. The conditional distribution o f the n l x 1 random vector Y1 given the n2 x 1 vector o f constants Y2 = e2 is
Nnl []/'1 + )"]12)-]21(C2 -- ]-~2), ~"]11-- ~]12)-]21)-]21]. Proof:
Define the n l x 1 vector V1 and the n2 x 1 vector V2 as
[ Vl ._.V2 ]
[ Y 1 - ~_12~21 _ y Y22 ]
-- ~]~-]211n2 12 Y"
[I~,
By Theorem 2.1.2 with the n x 1 vector b = 0. • 1 and the n x n matrix
. [Inl 0
l
In2
"
the n x 1 random vector (V], V~)' ~ Nn(#*, ~*) where the n x 1 mean vector /z* is
~Lg*= B 1"~I = /3'2
]J'l -- ~
-I 1/'2
and the n x n covariance matrix ~* is ~* = B ~ B ' 0
~"]21 ~"]22 = [~"]ll--Y]12Y]21Y]21 0
0 ] ~]22 "
~"]21 ~"]21
1.2 ]
30
L i n e a r Models,
Thus, V l and V2 are independent multivariate normal random vectors. That is, the joint distribution of V1 and V2 can be written as the product of the two marginal distributions fv,, v2 (v 1, V2) ~-- fvl (v l) fv2 (v2) where fvt(vl) is a Nn,(/.tl - ~]12~]2211-t2, ~]11 - Y]12Y]~Z21Y]21) distribution and fv2 (v2) is a Nn2 (/-t2, Y]22) distribution. The conditional distribution of Y11Y2 = c2 is derived by utilizing the transformation from Y1, Y2 to V1, V2 and noting that the Jacobian of this transformation is 1.
fYllY2=c2 (Yl ]Y2 = C2) : fYI,Y2 (Yl, C2)/fYz (C2) "- fvl,vz (Vl -+- ~-']12~22;c2, c2)/fv2 (C2)
-- fv, (Vl -at- Y']12Y]21cz)Jv2 (c2)/fv2 (c2) -- fvl (Vl-4-~12~221c2). But the distribution of fv, (Vl + ~12~21c2) is the distribution of VI plus the constant vector ~12~2-21c2). By Theorem 2.1.2 with B - In, and b = ~12~221c2, the proof is complete. II Theorem 2.2.1 is applied in the next few examples.
Example 2.2.1 Let Y - (Y1, Y2, Y3)' ~ N3(/.t, ~ ) where/.t = (1, 4, - 2 ) ' and N -
1
0.5
0.4
0.5 0.4
1 0.2
0.2 1
I
-1 .
Then by Theorem 2.2.1 the conditional distribution of Y1, Y21Y3 - 1 is N2 (/*c, ~c) where the 2 x 1 conditional mean vector/~c is given by Nc--
11 [04]
4
+
0.2
(1)-111--(--2)]--
4.6
and the 2 x 2 conditional covariance matrix Nc is given by ~c--
E1 o.51 Eo.41 0.5
1
-
0.2
(1)-110"4'0"2]=
[o.84o.42] 0.42
0.96
Example 2.2.2 Use the distribution o f Y = (Y1, Y2, Y3)' from Example 2.2.1 to find the conditional distribution of 2Y1 + Y21Y1 d- 2Y2 + 3Y3 = 2. First, let the 2 x 1 vector b - 02• 1 and let the 2 x 3 matrix B=
2
0] 3
"
2
Multivariate Normal Distribution
31
By Theorem 2.1.2, the joint distribution of 2Y1 + Y2 ] Y1 + 2 Y z + 3 Y 3
J
BY=
is Nz(/.t*, E*) where the 2 • 1 mean vector/.t* is /.t,=B/.t=
2(1)+1(4)+0(-2)] 1(1) + 2(4) + 3 ( - 2 )
= [6 3
and the 2 • 2 covariance matrix E* is E* = B E B '
=
[ ,oji 1 0 041i ,1 1
2
3
0.5 0.4
1 0.2
0.2 1
1 0
2 3
=
20.8
where /~ and E are given in Example 2.2.1. Applying Theorem 2.2.1 to the distribution of BY, the conditional distribution of 2Y1 + Y21Y1 + 2Y2 + 3Y3 = 2 is N1 (#c, crf) where the conditional mean/Xc is /~c = 6 + 9.5 (20.8) -1 (2 - 3) = 5.54 and the conditional variance cr2 is given by cr2 = 7 - 9.5 (20.8) -1 9.5 = 2.66. E x a m p l e 2.2.3 Let the n x 1 random vector Y = (Y1 . . . . . Yn)' '~ Nn(otln, or21,,). Find the conditional distribution of Y1 . . . . . Yn-ll~" = Y. By Theorem 2.1.2 with n x 1 vector b = 0n• and n • n matrix B=
[ ( In-I ,
1/n)ln_ 1
0 ] 1/n
the n • 1 random vector BY = (Y1 . . . . . Yn-1, ~')' ~ N~ (tt*, E*) where the n • 1 mean vector/~* is ~* -- B(otln) --
In-1
(1/n)l'n_~
o]
1/n
(otln) = c~l,
and the n x n covariance matrix E* is E* = B(cr21n)B '
__ 0.2
In-1
(1/n)l'._~
0.2 [ In-1
L (1/n)l'n_~
0][Inl
1/n
0
(1/n)l,,_l ] 1/n "
(1/n)ln_l
32
Linear Models
Applying Theorem 2.2.1 to the distribution of BY, the conditional distribution ot Y1 . . . . . Y,,-II~' = Y is N,,-l(/~c, ~ c ) where the (n - 1) x 1 conditional mean vector/~c is /~c = Ctln-1 +
(1/n)ln-l(1/n)-l(y
-- ct) = y l n - 1
and the (n - 1) x (n - 1) conditional covariance matrix ~ c is ~ c : erE[In-1 --
(1/n)ln-l(1/n)-l(1/n)l'n_l]
=or2 [In_l-ljn_ll.
Applying Theorem 2.1.2 [with scalar b = 0 and 1 x (n - 1) matrix B = (1, 0 . . . . . 0)] to the conditional distribution of Y1 . . . . . Yn-ll~" = ~, the conditional distribution of Y11~" = ~ is N1 (C~c, a 2) where the conditional mean C~c is ac = B/~c = (1, 0 . . . . .
0 ) ( y l n - 1 ) -- y
and the conditional variance ac2 is Crc2 -- B ~ c B ' = (1, 0 . . . . .
0)or 2 In-1 - 1 j n - 1 ] (1, 0 . . . . . / n
0)'
= a2(n - 1)/n. The distribution theory of quadratic forms is presented in Chapter 3. However, a number of interesting quadratic form problems are solved in the next section before the general distribution theory is developed.
2.3
DISTRIBUTIONS
OF CERTAIN
QUADRATIC FORMS The distributions of several quadratic forms are derived in the following examples.
Example2.3.1
L e t t h e n x l r a n d o m v e c t o r Y = (Y1 . . . . . Yn) ' ~ , N,,(c~l,,, cr21n). Define U = ~-'~in__l(Yi - Y')2/tr2 and V = n(~" - t~)2/cr 2 where ~' = ( 1 / n ) l n Y . Find the distributions of U and V and show these two random variables are independent. First, note Vcff(~' - a ) / a = [1/(a~/'ff)]l'~Y - (aVcff)/tr. By Theorem 2.1.2 with 1 x n matrix B = [l/(cr~/'ff)]l~, and scalar b = - ( c t ~ ) / t r , ~/'ff(~'- c t ) / a --~ N1 (0, 1) since B(otln) + b = [Ot/ ( tr ~/-ff ) ] l 'nl n --Ot~/-ff/tr = 0 and B(crEIn)B ' = cr2 {[1/(cr ~/r~)]l',,}{[1/(tr ~/'n)]l'n} ' = 1.
2
Multivariate Normal Distribution
33
Therefore, by Example 1.3.2, n(~" - Ot)2/O"2 is distributed as a central chi-square random variable with 1 degree of freedom. Next, rewrite U as
(
n U -- E ( Y i
- ~')2/cr2 = (1/cr2)y I In - 1 j n
)
Y
n
i=1
= W1(1/tr)Pn[(1/o')pIn]W = X'X n-1
= Zx? i=1
where the ( n - 1) • 1 vectorX = [(1/a)P1n]Y = (X1 . . . . . X n _ 1 I and the ( n - 1) • matrix P'n is the lower portion of an n-dimensional Helmert matrix with PnP'n "1 1 (In -- nJn), PInPn "-- In-1 and lnP n -- [}Ix(n-l). By Theorem 2.1.2 with (n - 1) • n matrix B = (1/a)P' n and (n - 1) • 1 vector b = 0(n-I)• X "" N n - l ( 0 , In-l) since B(c~l,) = (1/tr)P1n(~ln)
=
0(n-1)•
and B(cr2In)B ' =
(1/o')P1n(O'2In)[(1/o')P'n] ' = In-1.
Therefore, U equals the sum of squares of n - 1 independent N1 (0, 1) random variables. By Example 1.3.2, U has a central chi-square distribution with n - 1 degrees of freedom. Finally, by Theorem 2.1.2 with n x n matrix B and n x 1 vector b given by
B =
(1/~r)p,n
and
b
the n x 1 vector (v/-n(~" - ct)/er, X')' = BY + b
B(ctl~) + b =
O(n_l)•
"~ N n ( 0 ,
1
]
,
In) since
[ [~ / ( cr ~/'n ) ] l lnl n ] [ -- V/-ffot/ cr (ot/cr)P,nln + 0(n-1)xl
--" 0 n x l
and B(tr2In) B ' =
1/[1/(~rv/-ff)]l,/ I(cr2In){[1/(~ L (1/cr)P'n _l
(1/o')Pn} =In.
By Theorem 2.1.4, v/-n(~' - ct)/cr and X are independent. Therefore, by Theorem 1.3.5, U and V are independent.
34
Linear Models Factor S (i) 1
Factor T (j)
2
1
2
S
Yll Y12
Y21 Y22
L1
Y2t
i.
Ys2
9
Ylt F i g u r e 2.3.1
Two-Way Layout.
Example 2.3.2
Consider the two-way cross classification described in Example 2.1.5 where the st • 1 random vector Y
=
(YII .....
Ylt.....
Ysl ..... Yst) t
Nst(Otls ~ 1,, affIs | Jt + a2Js | It + a~TIs | It). The layout for this experiment is given in Figure 2.3.1 and the ANOVA table is provided in Table 2.3.1, where the sums of squares are written in summatiorpt notation and as quadratic forms, Y'AmY, for m = 1 . . . . . 5. The matrices A1, A 2 and A5 for the sums of squares due to the mean, random factor S, and the total, respectively, were already derived in Example 1.2.10. Note that (I7.1. . . . . ITt)' =: [(1/s)l' s | I/]Y. Therefore, the sum of squares due to the random factor T is Z(~.
j
I~..)2 -
i=1
-2 Y.j -stf"
=
j=l
i=1
2
..
j=l
= s{[(1/s)l' s | It]Y}'{[(1/s)l' s | It]Y} - Y' sJs|
1 J]Y
T a b l e 2.3.1
Two-Way ANOVA Table Source
df
SS
Mean
1
E~=,Etj=l]T2.. t
~'~=, ~-~j=, (]7i. - ]7..)2 t ~-~=, ~-~j=, (]7.j _ ]7..)2
Factor S
s- 1
Factor T
t- 1
Interaction ST
(s- 1)(t- 1)
Ei=l
Total
st
~-~i_.1 ~-~j=l g/2j
s
s
= Y'A,Y : Y'AzY : Y'A3Y
t ~ - ~ j = l (Yij - ]7i. - ]7.j -if- ]7..)2 __ Y ' A 4 Y t
= Y'AsY
2
Multivariate Normal Distribution _- y ,
35
1 (It - ~ J / ) ] Y sJ~|
= Y'A3Y 1 where the st x st matrix A3 = [~Js | (It - ~Jt)]. The matrix A4 for the sum of squares due to the random interaction can be solved by subtraction, A4 -As - A1 - A2 - A3, that is,
A4 -- [Is | It] -
1
1][(
sJs | t J t
1 )
-
Is - -Jss
|
~
Jt
- [1js| =[Is|
Is|
-- [ I1Jst |s (
[!Js|
= [Is@ ( I t - ~ J / ) ] :
[ ( I s - - 1 j| s )
(I/-~Jt)
_ ttjl )]
(It--~Jt)]
1.
To find the distribution of Y'A2Y, note that A2 is an idempotent matrix of rank s - 1. Thus, Y'A2Y = Y'PP'Y = X'X where the ( s - 1) x 1 vectorX = (X1 , . .9, Xs-1 ) / = P'Y, the st x (s - 1) matrix P = Ps | (1/q/t)l/, and the (s - 1) x s matrix P~ is the lower portion of an s-dimensional Helmert matrix with PsP's = (Is - ~Js), 1 l'sPs = 01• and P'sPs = Is-1. By Theorem 2.1.2 with (s - 1) x st matrix B = P' and (s - 1) x 1 vector b = 0(s-1)• X ~Ns-1 (0, (crffT + taft)Is_l) since B(c~ls | It) + b = [Ps | (1/~/7)lt]'(otls | 1/) = 0(s-1)xl and B[cr~I~ | Jt + crreJs | It + o's2TIs | I/]B' 2 t = Crs[PsPs | (1/~/-i)l~Jt((1/~/-i)lt)]
+cy~2[P'~J~Ps | (1/~/-i)l~((1/VCi)lt)] +os2r [P'sP~ | (1/~/7)l't((1/~/f)lt)] = (Cr2T + tcr2)Is_a. Therefore, [1/ V/(CrSr s + tcr2) ]X ~ Ns _ 1(0, Is-l). From Example 1 3.2, Y'A2Y/ (cr2r + tcr2) = X'X/(crffr + tcr2) has a central chi-square distribution with s - 1
36
Linear Models
degrees of freedom. The distributions of Y'A3Y and Y'A4Y and the independence of Y'A2Y, Y'A3Y, and Y'AaY are left to the reader9 The preceding examples suggest the following general technique for finding the distribution of the quadratic form Y'AY when Y ~ N~ (/~, N) and A is an n x n idempotent matrix of rank r. 1.
Set A = PP' where P is an n x r matrix of eigenvectors corresponding to the r eigenvalues of A equal to 1.
2.
Let Y'AY = Y'PP'Y = X'X = ~-~=1 X2 where the r x 1 vector X = (X1 . . . . . Xr)' = P'Y. Then find the distribution of X.
3.
Find the distribution of Y'AY using the distribution of X.
This technique is used in the next chapter to prove some general theorems about the distributions of quadratic forms.
EXERCISES 1. Let the 2 x 1 random vector Y ~ N2(/~, N) where/~ = (0, 0)' and ~=
I1 ~ 0.3
1
"
Find a 2 x 2 triangular matrix T such that TY "~ N2 (0, I2). 2. Let the 6 x 1 random vector Y = (Y1, Y2, Y3, Y4, I15, Y6)' ~ N6(~, N) where = (/Zll~,/Z21~,/Z3)' and
=
0.513 + 0.5J3
0
0
0
0.312 + 0.7J2
0
0
0
1
I
Find the distribution of the 3 x 1 random vector ~r = (f'l, f'2, f'3)' where 1 ~'1 "-- ~(Y1 '[" Y2 -I- Y3), fz2 -- ~1 (Y4 -I- Y5) and f'3 -- Y6. 3. Let the n x 1 random vector Y = (Y1 . . . . . Yn)' ~ N. (0, N) where the n x ~ covariance matrix -1 1131 1112 1 0 0
".
wn-1 and let
Y*= ~-~inl wiYi/[~inl wi] 1/2
2
Multivariate N o r m a l Distribution
37
(a) Find the distribution of Y*. Rewrite Y* in terms of the n x 1 vector W = (Wl . . . . . w~)' and the n x 1 vector Y. (b) Find the distribution o f Ein__l w i Y?. 4. Let the 9 x 1 random vector Y ~ N9(/~, ~ ) where the mean v e c t o r / ~ = (/Zll~,/z21~,/z31~)' and covariance matrix E1 Y]~2 0
with ~']~i Also let
--
~'~'3
(a - b)Ini -b b J . i for i = 1, 2, 3 and nl -- 2, n2 -- 4, and n3 -" 3.
A1 A
A2
0 where A i = Ini
A3
1
-
~iJrti for i = 1, 2, 3.
(a) Define a 9 x 6 matrix P such that A = PP'. (b) Find the distribution of P'Y. (c) Find the distribution of Y'AY. 5. L e t Y i j k = l z i + S i j + T i j k f o r i = 1. . . . . a ; j = 1. . . . . s ; a n d k = 1. . . . . t where Sij "~ i i d N1 (0, cr2) and independent of Tijk "~ i i d N1 (0, a2). (a) Find the distribution of the a x 1 random vector (?'1. . . . . . . ~'a..)' where
?i = Ej_S Etk_ rijk/ st). (b) Find the distribution of the as x 1 random v e c t o r ~r11. -- ~rl . . . . . . . ?'1. . . . . . . ~ ' a l . - ?'a . . . . . . . ? ' a s . - ~'a..)' where f'ij. = ~-~=1 Yijk/t.
~rls. --
6. Let the n x 1 random vector Y = (Y1 . . . . . Yn)' ~ Nn(#, ~ ) where # = otl~ n n . and E = (a - b)In + bJn. Let ?'. = ~-~'~i=1Yi/n and U = ~ i = l ( Y i - ~,.)2 (a) Find the distribution of ~'. (b) Find the distribution of U. (c) Are ~'. and U independent? Prove your answer.
38
Linear Models
7. Let the 3 x 1 random vector Y = (Y1, Y2, Y3)' ~ N3(~, ~ ) w h e r e / z (0, 1, 2)' and
N=
Let Z =
(2,-1,-1)Y.
=
[11 1 1
4
3
2
3
9
.
Find a random variable U = b'Y where b =
(bl, b2, b3)' such that E (U) = 0 and U is independent of Z. 8. L e t t h e 9 x 1 random vectorY = (Yll, Y12, Y21, I"22, I"23, }"31, Y32, I"33, I"34)' ~ N9(~, 0"219) where /z = (#11~, #21;, #31~)'. Let Yi = ~"~i= 1 Yij/ri for r l =2, r2-3, andr3=4. (a) Find a linear combination of ~'1., ~'2., and ~'3. that is independent of ~'l.- ~'2.. Prove your result. (b) What is the distribution of the linear combination you found in part a? 9. Letthe (nl +n2) x 1 random vectorY = (Yll . . . . . YI,,, . . . . . Y21 . . . . . Nn,+n2(O~lnl+n2,0"21n,+ n2).
Y2n2)' "~
(a) Find the distribution of ~'~. - ~'2. where Y'i. - ~_~in~=lYij/ni for i = 1, 2. (b) Find the distribution of (~'1. - ~'2.)2. 10. Let Y = X/3 + E where Y is an n x 1 random vector, X is an n x p matrix of known constants,/3 is a p x 1 vector of unknown constants, and E is an n x 1 random vector. Assume E ~ Nn(0, aZIn). (a) Find the distribution of/~1 = ( X ' X ) - I X ' Y . (b) Find the rank [cov(Y)] where Y = X/3. 11. Let the 3 x 1 random vector Y = (Y1, Y2, Y3)' "~ N3(0, 0"213). (a) Find the distribution of Y1 + 2Y2 - Y3. (b) Show that Y1 + 2 Y2 - Y3 is independent of 2 Y~ + y2 + 2 y2 _ 2 Y~Y2 + 2 Y2Y3. (c) Find a constant c such that c[5Y 2 + 2Y 2 + 5Y~ - 4 Y ~ I12 + 2Y~ Y3 + 4YzY3] has a central chi-square distribution. 12. Let the 4 x 1 random vector Y = (Y1, Y2, Y3, Y4)' "~ N4(/.t, ~ ) where tt = (2, 3, 0, - 1 ) ' and =
11 5 1 -1
5 11 -1 1
1 -1 11 5
-1 1 5 11
"
2
Multivariate Normal Distribution
39
(a) Find the conditional distribution of Y1 d- Y2IY3 d- Y4 = 1. (b) Show that the conditional mean of Y3[Y1 = Yl, Y2 -- Y2 has the form flo + ~1Yl + f12Y2 and find the values of the fl's. 13. Let the 3 x 1 random vector Y = (Y1, Y2, Y3)' "~ N3(~, ~ ) w h e r e / z = (1, 2, - 2 ) ' and N=
E
0
-1
3
1
1
4
.
Find the conditional distribution of Y1 -I- I"2 + Y31Y2 + Y3 = 1. 14. Let the 4 x 1 random vector Y = (Y1, Y2, Y3, Y4)' ~ N4(~, ~ ) where ~ = (1, 0, 2, - 1 ) ' and 4
1
i
2
0
1
1
3
1
0
0
1
4
1
"
(a) Find the marginal distribution of (Y1, Y3)'. (b) Find the partial (i.e., conditional) correlation of Y1 and Y2 given Y3 - 2. (c) Find the distribution of 4Y1 - Y2 - 2. (d) Find a normally distributed random variable Z which is a (nontrivial) function of Y1 and Ye and is independent of 4Y1 - Y2 - 2. 15. Let the n x 1 random vector Y = (Y1 . . . . .
Yn)'~
Nn(otln, treIn) for n >_ 3.
(a) Find the conditional expectation of ~l(Ye + Y3)[ ~" where ~" = ~-~i=1 n Yi/n; that is, find E[ ~l (Ye + I13)1~']. (b) Show that the variance of E[ 1 (Ye + Y3)[~'] is smaller than the variance of 1 ~(Y2 -1- Y3)16. Let F and G be independent normal random variables with 0 means and variances cr 2 and 1 - a 2, respectively, 0 < tr 2 < 1. Find the conditional distribution of F given F + G = c. 17. Let the 3 • 1 random vector Y - (Y1, Y2, Y3)' ~ N3(/z, ~ ) . Show that the variance of Y1 in the conditional distribution of Y1 given Y2 is greater than or equal to the variance of Y1 in the conditional distribution of Y1 given Y2 and Y3.
3
Distributions of Quadratic Forms
The distribution of the quadratic form Y'AY is now derived when Y ~. Nn (0, In). Later, the distribution of Y'AY is developed when Y -~ Nn (0, ~ ) for any positive definite n x n matrix ~ .
3.1
QUADRATIC FORMS RANDOM VECTORS
OF NORMAL
A chi-square random variable with n degrees of freedom and the noncentrality parameter ~. will be designated by Xff (Z). Therefore, a central chi-square random variable with n degrees of freedom is denoted by Xff (Z = 0) or Xff (0).
Theorem 3.1.1 Let the n x 1 random vector Y ~ Nn (0, In) then Y'AY X2 ()~ = O) if and only if A is an n x n idempotent matrix o f rank p. Proof: First assume A is an n x n idempotent matrix of rank p. By Theorem 1.1.10, A = PP' where P is an n x p matrix of eigenvectors with P ' P = Ip. Let the p x 1 random vector X = P'Y. By Theorem 2.1.2 with p • n matrix B = P'
41
42
Linear Models
and p x 1 vector b = 0 p • X -- (X1 ..... X p ) ' ~ Np(0, Ip). Therefore, by 2 ~- 0 ) . Next assume Example 1.3.2, Y'AY = Y ' P P ' Y - X'X = E P _ . I X 2 ~ Xp(~, that Y'AY --~ Xp2(~ _ 0) Therefore, the moment generating function of Y'AY is
(1 - 2t) -p/2. But the moment generating function of Y'AY is also defined as my, A y ( t ) - E[e tY'AY]
=f.../(2jr)-n/2e[ty'Ay-Y'Y/21dyl...dyn =/...f(2rr)-n/2ety'(I,-2tA)y/21dyl...dyn = Iln -- 2tA1-1/2. The final equality holds since the last integral equation is the integral of a multivariate normal distribution (without the Jacobian Iln - 2tAI 1/2) with mean vector 0n• 1 and covariance matrix (In - 2 t A ) -1. The two forms of the m o m e n t generating function must be equal for all t in some neighborhood of zero. Therefore, (1 -
2t) -p/2 -- II~ - 2tA1-1/2
or (1 -
2t) p - IIn - 2tAI.
Let Q be the n x n matrix of eigenvectors and D be the n x n diagonal matrix of eigenvalues of A where the eigenvalues are given by ~,1 . . . . . ~n- By Theorem 1.1.3, Q ' A Q = D, Q ' Q = In, and Iln - 2tAI = (IQ'QI)(lln - 2tA[) - IQ'(In - 2tA)QI = IIn - 2tQ'AQI =
IIn - - 2 t D ] n
= 1-~(1 -
2t)~i).
i=1 Therefore, ( 1 - 2 t ) p = IIi~ 1( 1 - 2 t ~ i ) . The left side ofthe equation is a polynomial in 2t with highest power p. The right side of the equation therefore must have highest power p in 2t also, implying that (n - p) of the ~i's are zero. Thus, the equation becomes (1 - 2 t ) p = 1-IP=l(1 --2t)~i). Taking logarithms of each side and equating coefficients produces 1 - 2t = 1 - 2t)~i for i - 1 . . . . . p. The solution to these equations is ~,1 . . . . ~ ~,p = 1. Therefore, by Theorem 1.1.8, A is an idempotent matrix. I
3
Distributions of Quadratic Forms
43
Thus far we have concentrated on central chi-square random variables (i.e.,)~ = 0). However, in general, if the n • 1 random vector Y ~ Nn (/z, In) then Y'AY ~ Xp2 (~.) where # is any n x 1 mean vector and the noncentrality parameter is given by ~. = # ' A # / 2 . The next theorem considers the distribution of Y'AY when Y -~ N,, (#, ~ ) and S is a positive definite matrix of rank n.
Theorem 3.1.2 Let the n x 1 random vector Y "-~ Nn (#, ~ ) where S is an n x n positive definite matrix o f rank n. Then Y'AY ~. Xp2(~. = # ' A # / 2 ) i f a n d only if any of the following conditions are satisfied: ( 1 ) A S (or ~ A ) is an idempotent matrix o f rank p or ( 2 ) A S A = A and A has rank p. Proof: Let Z = T -1 (Y - # ) where ~ - TT'. By Theorem 2.1.2 with n x n matrix B = T-1 and n x 1 vector b = " T - 1 # , Z ~ Nn (0, In). Furthermore, Y'AY = (TZ + #)'A(TZ + #) = (Z + T - I # ) ' T ' A T ( Z + T - l # ) = V ' R V whereV = Z+T-l#andR - T'AT. By Theorem 2.1.2 w i t h n x n matrix B = In and n x 1 vector b = T - l # , V "~ N n ( T - I # , In). By Theorem 2 3.1.1, V ' R V ~ Xp(~.) if and only if R is idempotent of rank p. But R is idempotent if and only if (T'AT)(T'AT) - T'AT or equivalently A S = A S A ~ . Therefore, Y'AY "~ ~p2(L) if and only if A S is idempotent. Finally, p = rank(R) = rank(T'AT) - rank(ATT') = r a n k ( A S ) since T is nonsingular. Also, )~ = ( T - I # ) ' R ( T - I # ) / 2 ffT-I'T'ATT-I# = # ' A # / 2 . The proofs for E A idempotent of rank p or A E A = A of rank p are left to the reader. II It is convenient to make an observation at this point. In most applications it is more natural to show that A ~ is a multiple of an idempotent matrix. We therefore state the following two corollaries which are direct consequences of Theorem 3.1.2.
Corollary 3.1.2(a) Let the n • 1 random vector Y "-~ N,, ( # , ~ ) where ~, is an n x n positive definite matrix o f rank n. Then Y'AY "~ cx2(~. = # ' A # / ( 2 c ) ) if and only i f ( l ) A S ( o r S A ) is a multiple o f an idempotent matrix o f rank p where the multiple is c or (2) A S A = cA and A has rank p. Corollary 3.1.2(b) I f the n x 1 random vector Y -~ Nn (0, t72V) where V is an n x n positive definite matrix ofknown constants then Y ' V - 1 y / c r 2 "~ X2(~. = 0) n There are cases where Y ~. Nn (#, ~ ) but A ~ is not a multiple of an idempotent matrix. Such situations are covered by the next theorem.
Theorem 3.1.3 I f the n x 1 random vector Y ~. N,, ( # , S ) where S is a positive definite matrix o f rank n, then Y'AY "~ Y]~P=I~.i W 2 where p = rank ( A S ) ; W 2
44
Linear
are independent X?(t~i) random variables for i = 1. . . . . p; and the nonzero eigenvalues of A~.
~'1 . . . . .
Models
~.p are
Proof:
Let Z = T - I ( Y # ) where ~ = TT'. By Theorem 2.1.2 with n x n matrix B = T -1 and n x 1 vector b - - T - I # , z ~ N , ( 0 , 1 n ) . Furthermore, Y'AY = (TZ + # ) ' A ( T Z + # ) = (Z + T - I # ) ' T ' A T ( Z + T - l # ) (Z + T -1 # ) ' F D I " ' ( Z + T -1 # ) where T ' A T is an n x n symmetric matrix, F is the n x n orthogonal matrix of eigenvectors of T'AT, and D is the n • n diagonal matrix of eigenvalues of T ' A T such that T ' A T = F D F ' . The eigenvalues of T ' A T are Zl . . . . . ~.p, 0 . . . . . 0 and rank (T'AT) = p. Let W = (W1 . . . . . W~)' = F ' ( Z + T - l # ) . Therefore, Y'AY = W ' D W = E P = I ~,iW?. By Theorem 2.1.2 with n x n matrix B = F ' and n x 1 vector b = F ' T -1 # , W ~ N,, ( F ' T -1 # , I,). Therefore, W E are independent X2(Si >__ 0) random variables for i = 1 . . . . . p. Furthermore, p = rank(T'AT) = rank(ATT') = rank(AN) because T is nonsingular. Finally, the eigenvalues of T ' A T are found by solving the polynomial equation II,, - ZT'ATI = 0. Premultiplying the above expression by IT'-ll and postmultiplying by IT'I we obtain IT'-II(II,, - ~.T'ATI)IT'I = 0 IT'- 1 T' - ~.T'- 1 T'ATT'I = 0 IIn - ~.A?EI : 0 Thus, the eigenvalues T ' A T are the eigenvalues of A E . We now reexamine the distributions of a number of quadratic forms previously derived in Section 2.3.
Example 3.1.1
From Example 2.3.1 let the n x 1 random vector Y "~ iNn n 1 (uln, cr21n). By Corollary 3.1.2(a), y]i=l(Yi - ~,)2 = Y ' ( l , - ~ J , ) Y ~ cr2x2_~ (~. = 0) since ~, = (al,,)' (I,, - 1 J , , ) (al,,)/(2cr2) = 0
and
(In - l jn) (Cr2In) = Cr2(In - l jn) which is a multiple of an idempotent matrix of rank n - 1. Furthermore, let n (~' ct) 2 = n[(1/n)l~n(Y-otln)]'[(1/n)Vn(Y-otln)] = (Y-otln)'(1Jn)(Y-otln). By
3
Distributions of Quadratic Forms
45
Theorem 2.1.2 with n x n matrix B = In and n x 1 vector (b = -or ln, Y - ct 1,, Nn(0, trZIn). Therefore, by Corollary 3.1.2(a), n(~" - or) 2 -~ trzx2(,k = 0) since Z -" (0nx 1
Jn
)
( 0 n x l ) / ( 2 G 2 ) --" 0
and (1jn)
(O'2In)= or2 ( 1 j n )
which is a multiple of an idempotent matrix of rank 1. Example 3.1.2 Consider the two-way cross classification from Example 2.3.2 where the st x 1 random vector Y = (Yll . . . . . Y l t . . . . . Y s l . . . . . Y s t ) ' Nst(#, ~]),/.t -- ctl~ | It, and ~ = cr21s | Jt + cr2Js | It + ty2Tls | It. The s u m of squares due to the random factor S is Y'A2Y where A2 = (L - sJs) 1 | ~Jt. By 2 2 Corollary 3.1.2(a), Y'AzY "~ (cr2r + tas)Xs_l(~.2 = 0) since AzE = tcrZ[(ls -
1 + cr2r[(is - sl Js) | 7Jt] 1 1j~) | 7Jt] = (O2T + tcr2)A2, [( ~.2 --- (als | lt)'
1 ) 1 ] Is - - J s | t J t (als | It) = 0 S
and A2 is an idempotent matrix of rank s - 1. The sum of squares due to the mean is Y'A1Y where A1 = s1Js | ~Jt. By Corollary 3.1.2(a), Y'A1Y "~ (tcr 2 + s~r2 + O'2T)X?(A1) since a l ~ = (tcr 2 + scr2 + ty2T)A1, A1 is an idempotent matrix of rank 1, and k~ = {1/[2(try 2 + sty2 + crST)] } (als | It)'
Js | ~Jt (Otis | It)
= stot2/[2(tcr 2 + scr2 + ~r~r)]
The distributions of the other quadratic forms in Example 2.3.2 can be derived in a similar manner and are left to the reader.
3.2
INDEPENDENCE
The independence of two quadratic forms is examined in the next theorem. Theorem 3.2.1 Let A and B be n x n constant matrices. Let the n x 1 random vector Y ~ Nn(//,, ]~). The quadratic forms Y'AY and Y'BY are independent if and only i f A E B = 0 (or B E A = 0). The matrices A, E, and B are symmetric. Therefore, A E B = 0 is equivalent to B E A = 0. Assume A E B = 0. Since ~ is positive definite, by
Proof:
46
Linear Models
Theorem 1.1.5, there exists an n • n nonsingular matrix S such that SES' = I,. Then Z = SY ~, Nn(Stt, In). Let G = (S-1)'AS -1 and H = (S-1)'BS -l. Therefore, Y'AY = Z ' G Z , Y'BY = Z ' H Z , and AXlB = S ' G S E S ' H S = S'GHS. Thus, the statement A E B = 0 implies y r A y and Y'BY are independent and the statement G H = 0, implies Z ' G Z and Z ' H Z are independent, are equivalent. Since G is symmetric, there exists an orthogonal matrix P such that G = P'DP, where a = rank(A) = rank(G) and D is a diagonal matrix with a nonzero diagonal elements. Without loss of generality, assume that
o=[Do0 00] where D a is the a • a diagonal matrix containing the nonzero elements of D. Let X = PZ ~ Nn(PStt, In) and partition X as [X~, X~]', where X1 is a • 1. Note that X1 and X2 are independent. Then Z ' G Z = Z'P'DPZ = X'DX = X]DaXl and Z ' H Z = X'PHP'X = X'CX where the symmetric matrix C = PHP'. If G H = 0 then P ' D P P ' C P = 0, which implies DC = 0. Partitioning C to conform with D, 0 = DC =
Da :1 E C1] EDoC,1 l 0
C12
C22
0
0
'
which implies C ll = O and C12 = 0. Therefore,
C
0 0] 0
C22
'
which implies X'CX = X2C22X2, which is independent of X~DaX1. Therefore, ZrGZ and Z'I-IZ are independent. The proof of the converse statement is supplied by Searle (1971). II The following theorem considers the independence of a quadratic form and linear combinations of a normally distributed random vector. T h e o r e m 3.2.2 Let A and B be n • n and m • n constant matrices, respectivel3: Let the n • 1 random vector Y ~ Nn(~, E). The quadratic f o r m Y'AY and the set o f linear combinations BY are independent if and only if B ~ A = 0 (or A ~ B ' = 0). Proof: The "if" portion can be proven by the same method used in the proof of Theorem 3.2.1. The proof of the converse statement is supplied by Searle (1971). I
In the following examples the independence of certain quadratic forms and linear combinations is examined.
3
Distributions of Quadratic Forms
47
E x a m p l e 3.2.1 Consider the one-way classification described in Examples 1.2.10 and 2.1.4. The sum of squares due to the fixed factor is Y'A2Y where A2 = (I/ -- 7Jr) 1 | r1Jr is an idempotent matrix of rank t - 1. Furthermore, A2E -- [(It -- 7Jt) 1 | r1Jr] [0-2It | Ir] = 0-2A2. The sum of squares due to the
nested replicates is Y'AaY where A3 = It | (Ir -- r1Jr) is an idempotent matrix of rank t (r - 1). Likewise, A3 E = [It | (Ir -- 7Jr)] 1 [0-2It | Ir] = 0-2A3. Therefore, by Corollary 3.1.2(a), Y'AzY "~ ~ (~.2) and Y'AaY ~ 0"2Xt(r_l)2 (k3) where )~2 = [ ( / Z 1 , . . . , [Zt) t | l r ] t [(It -- 7Jr) 1 1 | rJr] [(/Z1 . . . . . U t ) t | lr]/(20" 2) : r y~'~l=l(/zi -- /2.)2/(20" 2) with /2. -- ~'~I=1 lzi/t and Z3 - [(/Zl . . . . . /zt)' | lr]' [It @ (Ir -- 1Jr)] [(/Zl . . . . . /zt)' ~) l r ] / ( 2 0 " 2) "-- 0. Finally, by Theorem 3.2.1, Y'A2Y and Y'AaY are independent since AE~A3 - 0"2A2A3 -0"2[(it - 7Jt) 1 1 | r1 Jr] [It | (Ir -- rJ r)] : Otrxtr. E x a m p l e 3.2.2 Reconsider Example 2.3.1 where Y = (Y1 . . . . . Yn)' "~ Nn(ctln, 0-21n), U = Ein_,_l(Yi - ~,)2/0-2 _. y,[(1/0-2)(In _ 1 j , ) ] y and ~" = ( 1 / n ) l ' n Y . By Theorem 3.2.2, ~" and U are independent since (1/n)l'n[0-EIn] [(1/0-2)(I~ -
88
3.3
01•
THE t AND F DISTRIBUTIONS
The normal and chi-square distributions were discussed at length in the previous sections. We now examine the distributions of certain functions of chi-square and normal random variables. Definition 3.3.1 Noncentral t R a n d o m Variable: Let the random variable Y .~ N1 (or, 0-2) and the random variable U -'~ X2(0) If Y and U are independent, then n the random variable T = ( Y / 0 - ) / ~ / - O / n is distributed as a noncentral t random variable with n degrees of freedom and noncentrality parameter ~. = ct 2 / 2 . Denote this noncentral t random variable as tn (~). Definition 3.3.2 Noncentral F R a n d o m Variable: Let the random variable U1 "~ X nl2 (~,) and the random variable U2 "~ Xn2 2 (0) " If U1 and U2 are independent, then the random variable F = ( U1 / n 1) / ( U2 / n 2) is distributed as a noncentral F random variable with n 1 and n2 degrees of freedom and noncentrality parameter ~.. Denote this noncentral F random variable as Fn,,, 2(k).
A t random variable with n degrees of freedom and a noncentrality parameter equal to zero [i.e., tn (~. = 0)] has a central t distribution. Likewise, an F random variable with n l and n2 degrees of freedom and a noncentrality parameter equal to zero [i.e., Fn,,n2 (~- = 0)] has a central F distribution.
48
Linear Models
In recent years Smith and Lewis (1980, 1982), Pavur and Lewis (1983), Scariano, Neill, and Davenport (1984) and Scariano and Davenport (1984) have developed the theory of the corrected F random variable. The definition of the corrected F random variable is given next. Definition 3.3.3 Noncentral Corrected F Random Variable: Let the random variable U1 ~ clx 2nl( ~ ) and the random variable U2 "~ c2xn22(0). If U1 and U2 are independent, then the random variable Fc = (c2/cl)[(U1/nl)/(U2/n2)] Fn,,n2(Z) is called a corrected F random variable where the ratio c2/cl is the correction factor. In practice, we often encounter independent random variables U1 and U2, which are distributed as multiples of chi-square random variables (/-/2 being a multiple of a central chi square). The random variable F (U1/nl)/(U2/n2) in this case will be distributed as a noncentral F random variable if and only if Cl - c2 (i.e., r 1 6 2 = 1). Generally, Cl and c2 will be linear combinations of unknown variance parameters. In the following examples a number of central and noncentral t and F random variables are derived. =
Example3.3.1
Letthenx 1 random vector Y = ( E l . . . . . Yn)' ~ Nn(ctln, 0.2In). By Example 2.1.1, ~' ~ Nl(Ct, 0"2/n). By Example 3.1.1, ~-~in=l(Yi - ~,)2 = Y'[(In 1 n (Yi nJn)]Y "-~ 0"9 2 (0). By Example 3.2.2. Y" and ~-~/=1 ~,)2 are independent. Therefore,
T -- , ~ Y/S
-
-
[ ~ ' / / ( 0 " / % / - / ' 7 ) ] /V/S 2/0" 2 ~ t n - l ( Z )
where 8 2 = ~'~i%1 (Yi - ~ ' ) 2 / ( n - 1) and k = ~ 2 / 2 .
Example 3.3.2
Consider the one-way classification described in Example 3.2.1. It was shown that the sum of squares due to the fixed factor Y'AEY = Y'[(It ~Jt) | ~Jr]Y ~" 0.2Xt-1 2 (k2) and the sum of squares due to the nested replicates 1 Y'A3Y = Y'[It | (Ir -- rJr)]Y -~ 0. 2X2t ( r - 1 ) (0). Furthermore, Y'A2Y and Y'A3Y are independent. Therefore, the statistic
F*=
Y ' A 2 Y / ( t - 1)
Y ' A a Y / [ t ( r - 1)]
"~ F t _ l , t ( r _ l ) ( ~ . 2 )
where ~.2 --" r E I = I ( ~ i -- /2")2/(20"2) 9 The hypothesis H0 " ~.2 "- 0 versus H1 "~. > 0 is equivalent to the hypothesis H0" /s = ~ 2 . . . . . /Zt versus H1 9 the #i's are not all equal. Thus, under H0, the statistic F* has a central F distribution with t - 1 and t (r - l) degrees of freedom. A V level rejection region for the hypothesis H0 versus H1 is as follows" Reject H0 if F* > F t•- l , t ( r - 1 ) where
3
Distributions of Quadratic Forms
49
F;_l,t(r_l) is the 100 (1 - y)th percentile point of a central F distribution with t - 1 and t (r - 1) degrees of freedom.
Consider the two-way cross classification described in Example 3.1.2. The sums of squares due to the random factor S and due to the random interaction S T are given by Y'A2Y and Y'A4Y, respectively. It was shown that Y'A2Y = Y'[(I~ - 7J~) 1 1 | 7Jt]Y ~' (tr2T + tcrZ)x 2-1 (0). Furthermore, A 4 ~ = 1 [(I~ sJ~) | (it - 1j~)] [o'~L | Jt + o2js | It + o'2rL | It] = o'2rA4 where
E x a m p l e 3.3.3
A4 is an idempotent matrix of rank (s - 1)(t - 1). Therefore, by Corollary 1 3.1.2(a), Y'A4Y = V'[(I~ - 7J~) | (It - 1jt)]V '~ O'2TX(s-1)(t-1)2 ('~'4) where ~.4 = (ctl~ | lt)'[(Is -- ;J~) 1 | (It - 1jt)](otl s | lt)/(2tr2T) = 0. Finally, A2~A4 = (0"2T + ttr2)A2A4 = Ostxst. Therefore, by Theorem 3.2.1, Y'A2Y and Y'A4Y are independent. By Definition 3.3.3, the statistic F* =
Y'AzY/(sY'A4Y/[(s-
1)
1 ) ( t - 1)]
(CrS2T+tcrs2) ~
O's2r
Fs-l,(s-1)(t-1)(O).
Under the hypothesis Ho 9 tr 2 = 0, the statistic F* has a central F distribution with s - 1 and ( s - 1 ) ( t - 1) degrees of freedom. A y level rejection region for the hypothesis Ho 9 tr 2 = 0 versus H1 " tr 2 > 0 follows; Reject Ho if F* > F;_l,(s_l)(t_l).
3.4
BHAT'S LEMMA
The following lemma by Bhat (1962) is applicable in many ANOVA and regression problems. The lemma provides necessary and sufficient conditions for sums of squares to be distributed as multiples of independent chi-square random variables. L e m m a 3.4.1 Let k and n denote fixed positive integers such that 1 < k < n. Suppose In = ~-~'~/k_1Ai, where each Ai is an n x n symmetric matrix o f rank ni k with ~-~i=1 ni = n. I f the n x 1 random vector Y ~ Nn(/A ~ ) and the sum o f squares S 2 = Y ' A i Y f o r i = 1 . . . . . k, then (a)
S 2 '~ Ci)(.n2i (~'i = I I / A i l . I , / ( 2 c i ) ) a n d
(b)
{S 2, i = 1 . . . . . k} are mutually independent
if and only if E = Y~'~=I ciAi where ci > O.
This proof is due to Scariano et al. (1984). Assume that the quadratic forms S 2 satisfy (a) and (b) given in Lemma 3.4.1. By Theorems 3.1.2 and 3.2.1, (i) the matrices (1/ci) Ai E are idempotent for i = 1 . . . . . k and (ii) Ai XAj = 0, • f o r / # j , i , j = 1 . . . . . k. Furthermore, by Theorem 1.1.7, Ai = A 2 and
Proof:
50
Linear Models
A i A j = 0nxn fori 7/= j, i, j = 1 . . . . . k. B u t ( i ) a n d ( i i ) i m p l y t h a t ~ - ~ = l ( 1 / c i ) A i ~ is idempotent of rank n and thus equal to In. Hence, E = [~-~=l(1/ci)Ai] -1 :=
~-'~=1 ciAi" Conversely, assume E - E~=I ciAi. But Ai -- A 2 and AiAj -- 0nx,,z, so (i) and (ii) hold. Therefore, by Theorems 3.1.2 and 3.2. l, (a) and (b) hold.
II
In the next example, Bhat's lemma is applied to the two-way cross classification described in Example 2.3.2
Example 3.4.1 From Example 2.3.2, E -- [cr21s | Jt + tr2Js @ It + cr2rL | 1 1 1 1 1 1 I t ] , A 1 - - sJs @ 7Jt, A 2 (Is - -Js)s @ 7 J / , A 3 - - sJs @ ( I t - - 7 J / ) , A n ( I s - s J1 s ) Q ( I t - T J t 1) 4 , s t = ~--~m=l rank(Am) -- l + ( s - 1 ) + ( t - 1 ) + ( s - 1 ) ( t - 1 ) , and Is @ It - ~-~4_ 1 Am. Furthermore, Am ~ = cmAm for m = 1. . . . . 4 where cl -- cr2r + to"2 + so"2, c2 -- cr2r + to 2, ca - - O'2T-~-SO"2 , and C4 - - O'2T9 Therefore, (
4 ~--~m=l A m ) Y ]
4 - - ~--~m--1
4
(Am~) - ~-~m=l c m A m . Thus, by Bhat's lemma, the 2 {)~m - (c~l~ | quadratic forms Y'AmY are distributed as independent CmXrank(Am) lt)'Am(otls @ lt)/(2Cm)} for m -- 1 . . . . . 4. Y] - -
EXERCISES 1. Use Corollary 3.1.2(a) to find the distribution of ~-~in__l //)i Yf from Exercise 3b in Chapter 2. 2. Use Corollary 3.1.2(a) to find the distribution of Y'AY from Exercise 4c in Chapter 2. 3. Consider the model presented in Exercise 5 of Chapter 2. s
(a) Find the distribution of V1 - ~"]~ia___l~-'~j=l ~-'~k=l (}"ij.- ~ri..)2. S
(b) Find the distribution of V2 -- ~--~ia__l ~--~j=l ~--~k=l (Yijk -- Yij.) 2 (c) Find the distribution of {V1/[a(s - 1)]}/{V2/[as(t - 1)]}. 4. Consider Exercise 6 of Chapter 2. (a) Use Corollary 3.1.2(a) to find the distribution of U. (b) Use Theorem 3.2.2 to show that U and Y. are independent. 5. Prove Theorem 3.1.2, part (2). 6. Use Corollary 3.1.2(a) to find the distribution of (t'1. - I'2.) 2 from Exercise 9b in Chapter 2. 7. Consider Exercise 11 of Chapter 2.
3
Distributions of Quadratic Forms
51
(a) Use Theorem 3.2.2 to show that Y1 + 2Y2 - Y3 is independent of 2Y 2 + y2 + 2y2 _ 2Y1Y2 + 2Y2 I13. (b) Use Corollary 3.1.2(a) to nnO a constant c such that c [ 5 Y 2 + 2 Y 2 + 5 Y 2 4Y1Y2 + 2Y1 Y3 + 4Y2 Y3] has a central chi-square distribution. 8. Derive the distributions of Y'A3Y and Y'A4Y from Example 2.3.2. 9. Calculate the noncentrality parameters ~.~ . . . . .
~.4
in Example 3.4.1.
10. Let the 3 • 1 random vector Y = (Y1, Y2, Y3)' ~ N3(ot13, crzI3) and define Z1 = ( r 2 + r~ - 2 r l r3 and Z2 = r 2 + r 2 + r 2 - rl r2 - rl r3 - r2 r3. (a) Find the distributions of Z1 and Z2. (b) Find the E(Z~) for i - 1, 2 and any positive integer k. 1 1. Let the n x 1 random vector Y = (Y1 . . . . . Yn)' ~ Nn (or ln, ~2) where E -(a - b)In + bJn. (a) Find the distribution of V ( n - 1).
Y']~ins__~(Yi - ~-,)2 where ~'* -
~-~in5__~Yi/
(b) Find the distribution of (~'* - Y n ) / V 1/2. 12. Let the n • 1 random vector Y ~ Nn (/~, In). Let X -- AY where A is an n • n orthogonal matrix whose first row is # , / r#-7-/~. Let V = x Z ( x 1 is the first element of vector X) and U = (X'X - V). (a) Find the distributions of U and V. (b) Are U and V independent? Prove your answer. 13. Let Ui ~" X20~i) for i -- 1, 2 where U1 and U2 are independent. Let a and b be two positive constants. Under what conditions is aU1 + bU2 "~ cx2(~)? Provide the values of c and ~.. 14. Consider the model Yij -- ~i + R ( T ) ( i ) j where R ( T ) ( i ) j "~ i i d NI(0, cr20. )) for i -- 1 . . . . . 3, j - 1 . . . . . ni with nl - 3, n2 - 4, and n3 - 2. (a) Find the distribution of U -
~-~=1 ~ i l ( Y i J
-
~.i.)2 where ~'i. =
~-~ji= 1 Yij/ni. (b) Find the expected value of V = ~-~=l (~'i.- ~,,)2 where ~'* = ~-~=1 ~'i./3. [Hint: Write V - Y'(13 -731J ) ~ where Y = (~'1., ~'2., ~'3.)'.] 15. (Paired t-Test Problem) Consider an experiment with n experimental units. Suppose two observations are made on each unit. The first observation corresponds to the first level of a fixed factor, the second observation to the second level of the fixed factor. Let Yij be a random variable representing the
52
Linear Models
jth observation on the i th experimental unit for i = 1 . . . . . n and j = 1, 2. Let the 2n x 1 random vector Y = (Yll, Y12, Y21, Y22 . . . . . Ynl, Yn2)'. Let E(Yij) = /zj and var(Yq) = tr 2 for i -- 1 . . . . . n and j = 1, 2; and let cov(Yil, Yi2) = crZP for all i = 1 . . . . . n. Assume Y ~ Nzn (/~, ~ ) . (a) Define/~ and ~ in terms of/~j, cr 2, and p. (Hint: Use Kronecker products.) (b) Let T = b / ( S o / ~ ' f f ) where D i = Yi 1 - - Yi2 for i = 1 . . . . . n; b = n n _ ~-~i=1Di/n; and S 2 = Ei=l(Di D)2/(n - 1). Find the distribution of T. [Hint: Start by finding the distribution of D = (D1 . . . . . Dn)'.] 16. Let the 6n x 1 random vector Y = (Ylll . . . . . Ylln, Y121. . . . . Y12n. . . . . Y131, ....
Y13n, Y211 . . . . .
( ~ 1 , /-s
/A,3)' I~)
Y21n . . . . .
Y22n, Y231 . . . . .
Y221 . . . . .
Y23n) t "~ N 6 n ( 1 2 @
In, ~ ) w h e r e
J3 | Jn]
-- tr2[I2 |
+cr2[I2@ (I3-~J3)
@Jn]
+ cr32[I2 @ 13 @ In]. (a) Show t ' . l . - t'.2. = [(1/2)1~ |
(1,--1, 0) |
(1/n)l'n]Y where t'.j. =
E~=l E~,=l r,j~/(2n). (b) Find the distribution of ~'.l. - Y.2.. (c) Find the distribution of Y'[(12 -- ~J2)l | (I3 -- ~J3)l (~) nlj ~ ] y . 17. Let the (nl + hE) x 1 random vector Y = ( Y l l , . . . , N.t-k-n2 (~.L, ~'~) where/~ = (/z 1l'n~, ~2 1'n2 /)' and
~2 =
Yln~, Y21 . . . . .
[.In0] 0
cr221n2 "
If tr 2 ~ tr2, this problem is called the Behrens-Fisher problem. (a) Find the distribution of V=
(f'l.-
Y 2 . ) 2 / ( l / n l - J r - l/n2)
~-~.~=1 }-~-~~, (Yij - Yi.)Z/(nl + n2 - 2)
when tr 2 = o"2. (b) Describe the distribution of V when ~2 # e2.
Y2n2)
t
"~
4
Complete, Balanced Factorial Experiments
The main objective of this chapter is to provide sum of squares and covariance matrix algorithms for complete, balanced factorial experiments. The algoorithm rules are dependent on the model used in the analysis and on the model assumptions. Therefore, before the algorithms are presented we will discuss two different model formulations, models that admit restrictions on the random variables and models that do not admit restrictions.
4.1
MODELS THAT ADMIT RESTRICTIONS (FINITE MODELS)
We begin our model discussion with an example. Consider a group of b t r experimental units. Separate the units into b homogeneous groups with tr units per group. In each group (or random block) randomly assign r replicate units to each of the t fixed treatment levels. The observed data for this two-way mixed experiment with replication are given in Figure 4.1.1.
53
54
Linear Models Random blocks Bi b Reps
1
Yl 11
Y211
~ 1 4 9
Ybl 1
9
R(BT)(ij)k
Fixed treatments
. r
Yllr
Y21r
Yblr
Y2tl
Ybt 1
Y2tr
Ybtr
9
rj Reps
1
Yltl 9
R(BT)(ij)k r
Figure 4.1.1
Yltr
Two-Way Mixed Experimental Layout with Replication.
A model for this experiment is
Yijk = #j d- Bi d- BTij + R(BT)(ij)k for i = 1 . . . . . b, j = 1. . . . . t, and k = 1. . . . . r where Yijk is a random variable representing the k th replicate value in the ij ~ block treatment combination; # j is a constant representing the mean effect of the jth fixed treatment; Bi is a random variable representing the effect of the i th random block; B T/j is a random variable representing the interaction of the i th random block and the jth fixed treatment; and R(BT)~ij)k is a random variable representing the effect of the k th replicate unit nested in the ij ~ block treatment combination. We now attempt to develop a reasonable set of distributional assumptions for the random variables Bi, BTij, and R(BT)(ij)k. Start by considering the btr observed data points in the experiment as a collection of values sampled from an entire population of possible values. The population for this experiment can be viewed as a rectangular grid with an infinite number of columns, exactly t rows, and an infinite number of possible observed values in each row-column combination (see Figure 4.1.2). The infinite number of columns represents the infinite number of blocks in the population. Each block (or column) contains exactly t rows, one for each level of the fixed treatments. Then the population contains an infinite number of replicate observed values nested in each block treatment combination. The btr observed data points for the experiment are then sampled from this infinite population of values in the following way. Exactly b blocks are selected at random from the infinite number of blocks in the population. For each block selected, all t of the treatment rows are then included in the sample. Finally, within the selected block treatment combinations, r replicate observations are randomly sampled from the infinite number of nested population replicates. Since the r blocks are selected at random from an infinite population of possible blocks, assume that the block variables Bi for i = 1. . . . . b are independent. If the
4
Factorial Experiments
55 Random
1 Fixed treatments
blocks
Reps
1
2
Reps ...
1
...
1
...
1
Reps
1
2
Reps
Reps
1
Figure 4.1.2
2
2... 2
...
Reps
2
...
Finite Model Population Grid.
r blocks have been sampled from the same single population of blocks, then the variables Bi are identically distributed. Furthermore, assume that across the entire population of blocks the average influence of Bi is zero, that is, E(Bi) --- 0 for all i = 1. . . . . b. If the random variables Bi a r e assumed to be normally distributed, then the assumptions above are satisfied when the b variables Bi "~ iid N1 (0, cry). Now consider the random variables B Tij that represent the block by treatment interaction. Recall that the population contains exactly t treatment levels for each block. Therefore, in the ith block the population contains exactly t possible values for the random variable B Tij. If the average influence of the block by treatment interaction is assumed to be zero for each block, then E[BTij ] = 0 for each i. But for each i, E[BT/j] -- ~--~/j=l BTij/t since the population contains exactly t values of BTij for each block. Therefore, ~-~tj=1 BTij = 0 for each i, implying that the variables B T/1 . . . . . B Tit are dependent, because the value of any one of these variables is determined by the values of the other t - 1 variables. Although the dependence between the B Tij variables occurs within each block, the dependence does not occur across blocks. Therefore, assume that the b vectors (BTll . . . . . B T l t ) ' . . . . . (BTbl . . . . . BTbt)' are mutually independent. If the random variables B Tij are assumed to be normally distributed, then the assumptions above are satisfied when the b(t - 1) x 1 random vector (Ib | P ' t ) ( B T l l . . . . . B T l t . . . . . BTbl . . . . . BTbt)' Nb(t-1)[0, O'~Tlb | I t - l ] where P[ is the (t - 1) x t lower portion of a t-dimensional Helmert matrix. Finally, consider the nested replicate variables R(BT)~ij)k. Within each block treatment combination, the r replicate observations are selected at random from the infinite population of nested replicates. If each block treatment combination of the population has the same distribution, then the random variables R(BT)~ij)k are independent, identically distributed random variables. Furthermore, within each block treatment combination, assume that the average influence of R(BT)~ij)k is zero, that is, E[R(BT)~ij)k] = 0 for each ij pair. If the random variables R(BT)~ij)k are also assumed to be normally distributed, then the assumptions ~
56
Linear Models
above are satisfied when the bt r random variables R (B T) (ij)k ' ~ i i d N 1 ( 0 , 0 -2 (B T;.)" Furthermore, assume that random variables Bi, the t x 1 random vectors (B T/1 . . . . . . BTit)', and the random variables R(BT)(ij)k are mutually independent. In the previous model formulation, the random variables B Tij contain a finite population of possible values for each i. If the variables are assumed to have zero expectation, then the finite population induces restrictions and distributional dependencies. Note that variables representing interactions of random and fixed factors are the only types of variables that assume these restrictions. Furthermore, the dependencies occurred because of the assumed population structure of possible observed values. Kempthome (1952) called such models finite models, because the fixed by random interaction components were restricted to a finite population of possible values. In the next section we discuss models where the population is assumed to have a structure where no variable dependencies occur.
4.2
MODELS THAT DO NOT ADMIT RESTRICTIONS (INFINITE MODELS)
Consider the same experiment discussed in Section 4.1. Use a model with the same variables
Yijk -- # j -k- Bi -+- BTij + R(BT)(ij)k where all variables and constants represent the same effects as previously stated. In this model formulation, the population has an infinite number of random blocks. For each block, an infinite number of replicates of each of the t treatment levels exists. Each of these treatment level replicates contains an infinite number of experimental units (see Figure 4.2.1). The btr observed values for the experiment are sampled from the population by first choosing b blocks at random from the infinite number of blocks in the population. For each selected block, one replicate of each of the t treatment levels is selected. Finally, within the selected block treatment combinations, r replicate observations are randomly sampled. Since the blocks are randomly selected from one infinite population of blocks, assume the random variables Bi are independent, identically distributed. With a normality and zero expectation assumption, let the b block random variables Bi ~ iid N1 (0, a2). Since the t observed treatment levels are randomly chosen from an infinite population of treatment replicates, an infinite number of possible values are available for the random variables B T/j. Assume that the average influence of B Tij is zero for each block. But now E[BT/j] = 0 does not imply ~'~tj= 1 BTij - 0 for each i since the variables BTij have an infinite population. Therefore, a zero expectation does not imply dependence. With a normality assumption, let the bt random variables BTij ~ iid N1 (0, a2T). Finally,
4
Factorial Experiments
57 Random blocks Reps
Reps Treatment level 1 reps
1
2...
1
Reps 1
2...
1
Reps Treatment level t reps
Figure 4.2.1
1
2... 2...
2...
Reps 1
2...
Reps
Reps 1
2...
Reps
1
2...
Infinite Model Population Grid.
within each block treatment combination, the nested replicates are assumed to be sampled from an infinite population. With a normality and zero expectation assumption, let the btr random variables R(BT)(ij)k ~ iid N1 (0, r 2 Furthermore, assume that random variables Bi, the random variables B Tij, and the random variables R(BT)(ij)k are mutually independent. Hence, in models that do not admit restrictions, all variables on the fight side of the model are assumed to be independent. Kempthorne (1952) called such models infinite models, because it is assumed that all of the random components are sampled from infinite populations. In passing, we raise one additional topic. Consider the previous experiment, with one replicate unit within each block treatment combination (r = 1). Observing only one replicate unit within each block treatment combination does not change the fact that different experimental units are intrinsically different. The random variables R (B T)(ij)k represents this experimental unit difference. Hence, there is some motivation for leaving the random variable R(BT)(ij)k in the model. However, the variance a2(Br) is not estimable when r = 1. Therefore, it also seems reasonable to drop R(BT)(ij)k from the model in this case. If the R(BT)(ij)k variable is dropped from the model, then the expected mean squares will not contain the crR(Br ) 2 term. However, all F tests in the ANOVA are the same whether R(BT)(ij)k is dropped or not. In addition, the covariance matrix construction is slightly less complicated if variables with nonestimable variance parameters are dropped from the model. In summary, the overall analysis is unchanged in its essential characteristics when variables with nonestimable variance parameters are dropped from the model. Thus, variables with nonestimable variance parameters are generally not used in this text.
58
Linear Models
In the next section we establish the sum of squares and covariance matrix algorithm rules for models that admit restrictions. We then show how the covariance matrix algorithm can be easily modified to accommodate models that do not admit restrictions. Finally, for completeness, we discuss how the algorithms can be modified to accommodate models that contain variables with nonestimable variance parameters. Therefore, the algorithms can be applied to any complete, balanced experiment, using finite or infinite models, and using models with or without nonestimable variance parameters.
4.3
SUM OF SQUARES AND COVARIANCE MATRIX ALGORITHMS
In Section 4.1 a finite model was presented for an experiment with r replicate observations nested in bt block treatment combinations. This experiment is now used to establish the sum of squares and covariance matrix algorithm rules for finite models. Let the btr x 1 random vector Y = (Ylll . . . . . Yllr . . . . . Y b t l . . . . . Y b t r ) ' . The covariance matrix E = cov(Y) for the finite model is given by = cr 2 [Ib |
Jt |
Jr ]
1
+ cr~
Example 4.3.1
Rule E 1. Factor Levels = l
R
B b
r
|
[
~
|
+ cr~(Br)[
|
|
+ o~
] ]
].
4
Factorial E x p e r i m e n t s
59
Rule ~ 2 Compare the factor letters in the first column heading with each subscript letter on the variance (i.e., B from a82, T from a2r). Rule ~2.1 If the factor letter does not match a subscript letter on the variance, place a Jl in the Kronecker product with the same row as the variance and the same column as the factor letter, where l is the number of levels of the factor. In the examples below, new matrix additions to the covariance matrix are underlined.
Example 4.3.1
(continued) Rule E2.1 Factor Levels = l aS2
B
T
R
b
t
r
[
|
[
-3I"-O'R ( B T ) [
-~- 0"2T 2
Jt
|
Jr
]
~)
~) Jr
]
(~
(~
]
Rule ~2.2 If a factor letter corresponding to a fixed non-nested factor matches a subscript letter on the variance, place It - 71Jl in the Kronecker product.
Example 4.3.1
(continued) Rule E2.2
Factor Levels = l
B
T
R
b
t
r
aB2
[
|
@Jr
]
"~-O'~T
[
@ (It -- 1 j t )
@Jr
]
@
@
].
2
-Jr- O'R(BT ) [
Rule ~2.3
Place Ii elsewhere.
Example 4.3.1
(continued) Rule E2.3
Factor Levels = l
B
T
b
t
Ib
| J,
Ib
|
(I/-
2 aRfBT) [ Ib
|
It
a2r
| ~Jt)
Jr |
Ir
60
Linear Models
The covariance algorithm can be applied to complete, balanced finite models with any number of fixed and random main effects, interactions, or nested factors. For infinite models, the same covariance matrix algorithm can be used, except that Rule E2.2 is omitted. That is, for infinite models that do not contain restrictions on variables representing interactions of random and fixed factors, the covariance matrix is constructed following Rules E l, E2, E2.1, and E2.3. The next example illustrates the construction of covariance matrices in such models.
Example4.3.2
Consider the experiment in Example 4.3.1, but now use an infinite model that does not admit restrictions. Using Rules E l, E2, E2.1, and E2.3, the covariance matrix is given by E = o"2 4-o-2 T
[Ib | Jt | Jr] [Ib | It | Jr]
4- 0,2(BT) [Ib |
It | L].
Although finite and infinite models are motivated in different ways and produce different covariance structures, algebraically the two covariance structures are simply reparameterizations of each other. To illustrate this point, consider the previous experiment with b random blocks, t fixed treatments, and r random replicates per treatment. First, rewrite the covariance matrix for the finite model in the following equivalent form. E=
~-teBr
[Ib|174
+ ~r
[Ib | It | Jr l
2 4- 0,R(BT)
[Ib | It | Ir].
Now to distinguish the parameters in the finite and infinite models, rename the variance parameters 0,2, 0,2r, and 0,R(Sr)2in the infinite model as 0,2,, 0,2r, ' and , respectively. Then the covariance matrix for the infinite model becomes
0,R(BT)*2
]E - 0-2, 4- 0-2T,
lib | Jt | Jr] [Ib | It | Jr]
2 [Ib | It | Ir ]. 4- 0-R(BT)* Note that the finite and infinite model covariance matrices are equal with 0-B2, -0.2 __ 1 2 0-2BT* - - 0,2 r and 0-R(BT)* 2 2 . Therefore, the two covariance [0-BT, -- 0-R(BT) structures are simply reparameterizations of each other. An algorithm is now given for constructing the matrices As in the sums of squares Y'AsY. This sum of squares algorithm applies to finite and infinite models,
4
Factorial Experiments
61
The subscripts s on the matrices As are numbered sequentially with A1 being associated with the sum of squares for the overall mean/z0, A2 being associated with the sum of squares for the next factor, etc. In the example considered in this section, A2 is associated with the blocks B, A3 with the treatments T, A4 with interaction of B and T, and A5 with the nested replicates R ( B T ) . The sum of squares matrices As are constructed in tabular form. The first rule describes the table; the second rule describes the matrix terms in the table. Rule A1 Construct two row headings where the first designates the letters of the factors and interactions, while the second designates the matrices As associated with those factors and interactions. Construct two column headings where the first is the factor letter and the second is the number of levels, l, of the factor. Place brackets ([ ]), Kronecker product symbols (| and equal signs (=) in each row, as described in Example 4.3.3. Example 4.3.3
Rule A1. Factor
B
T
R
Levels = l
b
t
r
lz0
A~ =
[
|
|
]
B
A2 --
[
|
|
]
T
A3 =
[
|
|
]
BT
An --
[
|
|
]
R(BT)
A5 --
[
|
|
1.
Rule A2 Compare the factor letters in the first row heading with the factor letters in the first column heading. Rule A2.1 If a row factor letter does not match the column factor heading, place a 7Jl in the Kronecker product. Note that/z0 does not match any column factor heading; hence, the Kronecker product for A1 will be comprised of terms of the form f Jl. Example 4.3.3
/zo
(continued) Rule A2.1
Factor
B
T
R
Levels -- l
b
t
r
A1 =
1 ~-Jb o
|
1 -J, t
|
1 -Jr r
3
J
62
Linear Models
B A2 --
[
l
|
-Jt t
1
]
1
]
'
]
|
-Jr r
-Jrr
[1~Jb
|
|
B T A4 --
[
|
R ( B T ) A5 =
[
|
@ -Jr F |
T A3 =
].
Rule A2.2 If a non-nested factor in the row heading matches the column factor heading, place an Ii -- 71 J / i n the Kronecker product.
Example 4.3.3
(continued) Rule A2.2
Factor
B
T
R
Levels -- l
b
t
r
1
A1 --
B
A2 ---A3 =
BT
A4 =
R(BT)
A5 =
Rule A2.3
1
-~Jb
#0
~ -Jtt
|
( Ib- -~Jb ') ~ 7Jt ,
[' [(
~Jb Ib -- ~Jb
( ) ) ( )
lib k
,
-/,.J r
Q 1Jr
]
|
It -- 1jt
|
1Jr
]
@
It -- 7Jt
|
r Jr
'
1
|
|
It
]"
Place I1 elsewhere.
Example 4.3.3
(continued) Rule A2.3
Factor
B
T
R
Levels - l
b
t
1-
1
#0
A1 -"
~Jb
]
1
|
-Jtt
1
|
-Jr F
]
4
Factorial Experiments
63
B A2 =
[( , ) ,
T A3 --
[
BTA4=
I b - gJb
|
1Jb
|
[(l) Ib--aJb
|
7Jr
|
,
7Jr
( 1), (l) 1 It -- 7Jt
|
7Jr
It-TJt
|
7Jr
] ] 1
(Ir 'r)r ] The sums of squares for each factor and interaction can be written as quadratic forms, Y'AsY, using these Kronecker product matrices As. For example, the sum 1 1 1 of squares for the factor B is Y'AEY = Y'[(Ib -- aJb) | 7Jr | r Jr] Y. In Examples 4.3.1, 4.3.2, and 4.3.3 all the variance parameters in the model were estimable. In the next example we apply the algorithms to a model that contains a nonestimable variance parameter.
Example 4.3.4
Consider an experiment with b random blocks, t fixed treatments, and r = 1 replicate observations per block treatment combination. Assume the model
Yijk = gj + Bi + BTij + R(BT)(ij)k 2 is nonestimable and f o r i = 1. . . . . b , j = 1 . . . . . t, a n d k = 1. Thus, o.g(Br) R(BT)(ij)k could be dropped from the model. However, if R(BT)(ij)k is left in the model, then the sum of squares and covariance matrix algorithms are applied just as they were in Examples 4.3. l, 4.3.2, and 4.3.3 with r replaced by 1. Therefore, for the finite model, the covariance algorithm follows Rules E 1, E2, E2. l, E2.2, and E2.3. With r -- 1 the covariance matrix E is given by ---- o.2 [Ib | Jt | 1] + o.~r [Ib |
= o.~ [Ib | Jr] + o.~r Ib |
( It -- ~ Jt ) | 1 + o.R(BT) 2 [Ib|174
It -- tJt
+ o.R(Br) [lb | It].
For the infinite model, the covariance algorithm follows Rules E 1, E2, E2.1, and E2.3. With r = 1 the covariance matrix N is E = o.~ [Ib | Jr] + o.2r [Ib | It] + O.2R(BT) [Ib | It]. For both finite and infinite models, the sum of squares algorithm follows Rules A 1, A2, A2.1, A2.2, and A2.3. With r = 1 the sum of squares matrices
64
Linear Models
A1 . . . . . A5 are given by 1 A1 = ~ J b |
A2 =
1
(Ib1--)~Jb
1
| -Jtt
1 (Ilt -)- J t t
A3-- gJb|
A4= ( I b - - l j b ) | A5 = Ib | It |
1 -- -i-1
-
-
Obtxb t .
Since A5 is a bt x bt matrix of zeros, the sum of squares due to effect equals zero and no estimate of cr~(Br),, exists.
R(BT)
The covariance structures described in this section apply to a broad class of finite and infinite models. However, these covariance structures are based on certain distributional assumptions on the random components in the model. If other distributional assumptions are made, then the covariance algorithms are not applicable. Exercise 12 in Chapter 5 provides an example of a covariance structure that does not conform to the distributional assumptions made for the finite or infinite models. We do not wish to confine our discussion solely to the finite and infinite covariance structures covered in this chapter. Therefore, a more general class of covariance structures is developed in Chapter 10. The finite and infinite covariance structures are a subset of this more general class. Finally, the sums of squares and covariance algorithms presented in this chapter can be applied directly to complete, balanced factorial models. However, in Chapters 7 and 8 we show that these algorithms can also be used to construct sums of squares and covariance matrices for unbalanced data sets and data sets with missing values. In the next section we introduce expected mean squares in complete, balanced experiments.
4.4
EXPECTED MEAN SQUARES
The mean square of an effect is the sum of squares of that effect divided by the corresponding degrees of freedom. For complete, balanced designs the mean
4
Factorial Experiments
65
square is [ 1 / t r ( A s ) ] Y ' A s Y , since the degrees of freedom equal the tr(As). The expected value of the mean square, usually called the expected mean square (EMS), is a function of the mean vector/~ = E(Y) and of the variance parameters in = cov(Y). The expected mean square indicates how the mean squares can be used to obtain unbiased estimates of functions of the variance parameters. The expected mean square in complete, balanced designs is defined in the following theorem. The proof of the theorem is a direct result of Theorem 1.3.2. Theorem 4.4.1 Let Y be an n x 1 random vector associated with the observations of a complete, balanced factorial experiment with an n x 1 mean vector I~ = E (Y) and n x n covariance matrix ~ = coy(Y). The expected mean square associated with the sum of squares Y'AY is E { [ 1 / t r ( A s ) ] Y ' A Y } = [tr(A~) + f f A ~ ] / t r ( A ) where tr(A) equals the degrees of freedom associated with Y'AY.
Consider the experiment described in Examples 4.3.1 and 4.3.3 in which a finite model was assumed. The sums of squares due to the random effect B and the fixed effect T are Y'A2Y and Y'A3Y, respectively, where 1 1 1 1 1 1 A2 = (lb -- gJb) | 7Jt | r Jr , A3 = gJb | (It ?-Jt) | 7Jr , and the btr x 1 random vector Y - ( Y l l l . . . . . Yllr . . . . . Ybtl . . . . . Ybtr) t. The mean vector/~ E(Y) = E ( Y l l l . . . . . Yllr . . . . . Ybtl . . . . . Ybtr)' = l b @ ( # 1 . . . . . # t ) ' @ l r . Note that AzN - [trcr2 + cr2(sr)]A2 and AaN - [rcr2r + cr2(Br~]A3. Therefore, by Theorem 4.4.1, the expected mean square of the random effect B equals
Example 4.4.1
2 E [ Y ' A e Y / t r ( A e ) ] = [r(AzE) + I~'Ae~]/tr(Ae) = [trcr2 + crR(sr )]
+ {[lb |
(/Z1 . . . . .
X [1 b @ ( # 1 . . . . .
#t) t | lr]' /Zt)' @
[( Ib -- 1jb ) @ ~ Jt (~ -1J r F
lrl/(b -
1)}
2
= trcr 2 + erR(st)
and the expected mean square of the fixed factor T equals 2 E [ Y ' A 3 Y / t r ( A 3 ) ] = [tr(A3N) + l.t'A31.t]/tr(A3) - [ t r c r 2 + crR(BT)]
+{[lb| x [lb | (#1
. . . . . lZt)' | lr]' . . . . .
[~
Jb |
(
lr]/(t -- 1)}
#t) t | t
2 = [rcr2r + erR(st)] + br Z ( # j j=l
1 ) !
It - - J t t
- / 2 . ) 2 / ( t - 1).
|
Jr
]
66
Linear Models
Thus, the expected mean square of the random effect B provides an unbiased estimate of trtr 2 + aR(BT)2 . Likewise, the expected mean square of the fixed factor T provides an unbiased estimate of r a 2 + a2(BT) + br X jt = 1 (],Zj / 2 . ) 2 / ( t - - 1),, The EMSs of the other effects can be calculated in a similar manner and are left to the reader.
4.5
ALGORITHM
APPLICATIONS
The sum of squares and covariance matrix algorithms are now applied to a series of complete, balanced factorial experiments. As mentioned in Section 4.3, finite and infinite model covariance structures are reparameterizations of each other. Therefore, the choice between the finite and infinite model is somewhat arbitrary. For the remainder of this text, the finite model is assumed. Therefore, unless specifically stated, subsequent covariance matrices for complete, balanced factorial experiments will always be constructed with Rules E 1, E2, E2.1, E2.2, and E2.3. E x a m p l e 4.5.1 Consider a two-way cross classification where the first two factors S and T are fixed with i - 1 . . . . . s and j - 1 . . . . . t levels and the third factor is a set of k -- 1 . . . . . r random replicates nested in the st combinations of the first two factors. Let Yijk be a random variable representing the k th replicate observation in the ij th combination of the two fixed factors. Let the str x 1 random vector Y = ( Y l l l . . . . . Yllr . . . . . Ystl . . . . . Ystr)'. The model is
Yijk = lZij + R(ST)(ij)k ls are constants representing the two fixed factors and R(ST)(ij)k are str of the nested replicates. Assume that i i d NI(0, aR(ST)). 2 Therefore, the str x the s t r x 1 mean vector where
mean effect of the ij th combination of the random variables representing the effect the str random variables R(ST)(ij)k "~ 1 random vector Y - ~ Nstr (~, Y]) where
.....
Yllr . . . . .
--
(/Zll lr .....
tZstlr)'
=
(/Zll .....
13, - " E ( Y l l l
Ystl . . . . .
Ystr)'
~st) t | lr
and the str • str covariance matrix = ~r2(sr) [Is | It | Ir]. In addition, by the sum of squares algorithm is Section 4.3, Y'A1Y . . . . . Y'AsY are the sums of squares due to the overall mean, the fixed factor S, the fixed
4
Factorial Experiments
67
factor T, the fixed interaction of S and T, and the nested replicates, respectively, where
1 1 1 A1 --" -Js | Jt @ - J r s t r A2=
1 )
Is--Js S
|
1
Jr|
1(1)1
A3 -- -Jss @ I t - t J t
A4 - (Is - ! J s )
_1 r
Jr
@ -Jrr
| (It - t1 j t ) | 1r J r
A -Is It (Ir- '-,r).r 2 Am for m = 1. . . . . 5 where each Am is idempoFurthermore, Am E -- 0-R(ST) tent with rank(A1) - 1, rank(A2) - s - 1, rank(A3) = t - 1, rank(A4) = (s - 1)(t - 1), and rank(As) = st(r - 1). Thus, by Corollary 3.1.2(a), Y'AmY "~" 0.2 (ST) Xrank(Am) 2 (~-m) for m = 1. . . . . 5 where [(#11, -'. ' IZst)'(~ lr]'[ -1Jss | 1Jr | !Jr] [(#llr
, ... , lZst)'| lr]
2aZ(sr) = str[zZ../(20-~(sr)) [(#11 . . . . . #st)'(~ l r ] t [ ( I s ~2--
= rt s i=1 [(1-s
lJs)s ~ l ] t ~ lJr] [ ( / z l l ' ' ' ' ' 2 20-R(ST)
[Z..)2/(20-Z(sT))
[
#st) t | lr]' !Jss ~
~3--
(It1- )7Jr ~ 7Jr,] [(~11,-.., 20-2(sr)
t
= rs Z ( F Z j j=l
IZst)t(~ lr]
-- FZ..)2/(2a2(ST) )
]J~st)t ~ lr]
68
Linear Models [(~11, . . .
,lZst)'|
- s1J s ) |
(It
-
71J t )
|
1
.....
l.tst)'|
2~2(sr) = r
s
~(fZij
- fLi. - fL.j -~- f t . . ) 2 / ( 2 t r 2 ( S T ) )
i=1 j=l [(~11 . . . . .
lZst)
[
@ l r ] ' Is |
It |
~.5 ~-
(')] Ir -- 7Jr
[(/Zll . . . . .
lZst) t |
lr]
2~2(sr)
=0.
By T h e o r e m 3.2.1, the random variables Y'Am Y are mutually independent since Ostrxstr for m, n = 1 . . . . . 5 and m % n. The E M S s associated with each sum of squares are calculated using Theorem 4.4.1"
AmmAn = cr2(sT)AmAn =
EMS (mean) - [ t r ( A l ~ ) + # ' A l # ] / t r ( A 1 ) -
a2(sr) + rstflz2..
EMS (fixed factor S) = [tr(AzN) + # ' A z # ] / t r ( A 2 ) = aR(sr) +
rt
(~i. -
~ . . ) 2 / ( s - 1)
i=1
E M S (fixed factor T) -- [tr(A3N) + # ' A 3 # ] / t r ( A 3 ) t 2 = aR(Sr ) +
rs Z (Fz.j -/2..)2/(t
-
1)
j=l
E M S (fixed inter
ST)
= [tr(A4~) + / f A 4 # ] / t r ( A 4 )
+r~ ~
=
dR(ST)2
(lZij - fLi. - ~L.j + ~ . . ) 2 / ( S -
1)(t-
l)
i=1 j = l
EMS (random
R(ST))
= [tr(AsN) + # ' A s # ] / t r ( A 5 ) 2
-- GR(ST ) .
5 rank(Am) = 1 + ( s 1) + ( t Bhat's l e m m a 3.4.1 also applies since •m=l 5 1) + (s - 1)(t - 1) + r(s - 1)(t - 1) -- str, Is | It | Ir = Em=lAm, and 5 = l A m ) ~ -- Em= 5 1(Am ~-]) = Em=laR(sT)A 5 2 = (Em m. Therefore, (i) Y'AmY O'2(sT)Xr~lk
(Am)()~'m)and (ii) Y'AmY
are mutually independent for m - 1 . . . . . 5.
4
Factorial Experiments
69
E x a m p l e 4.5.2 Consider the same two-way layout as in Example 4.5.1 except now let S and T both be random factors. The model is
gijk --Ol -4- Si -~- Tj "JI-STij + R(ST)(ij)k where a is a constant representing the overall mean effect; s i are random variables representing the effect of the first random factor; Tj are the random variables representing the second random factor; S T i j a r e random variables representing the interaction between S and T; and R ( S T ) ( i j ) k are random variable defined as in Example 4.5.1. Assume the s random variables Si "~ i i d N1 (0, o.2); the t random variables Tj ,~ i i d N1 (0, o2); the st random variables STij '~ i i d N1 (0, o ' 2 ) ; and the s t r random variables R ( S T ) ( i j ) k ~ i i d NI(0, o.R~sr)). Furthermore, assume that {Si, i = 1 . . . . . s}, {Tj, j = 1 . . . . . t}, {STij, i = 1 . . . . . s, j = 1 . . . . t}, and {R (ST)(ij)k, i = 1 . . . . , s, j = 1 . . . . . t, k = 1 . . . . . r } are mutually independent sets of random variables. Therefore, the s t r x 1 random vector Y "~ Nstr (~, Y]~) where the s t r x 1 mean vector
# = E(Ylll . . . . .
Yllr . . . . . Ystl . . . . . Ystr) t = a l s | It | lr
and, by the covariance algorithm, the s t r x s t r covariance matrix
]~ = o .2 [ I s |
|
[Js~It
|
-~- O.2T [Is | It | Jr ] + O.2(ST) [Is | It | Ir ]. The sum of squares matrices are not dependent on whether the factors are fixed or random. Therefore, the sum of squares Y'Am Y for m = 1 . . . . . 5 are the same as those given in Example 4.5.1. Furthermore, Am Yl, = creAm for m = 1 . . . . . 5 where
2 Cl = tro. 2 + sro. 2 + ro.2r + o.R~sr~ c2 = tro.~ + r o . ~ + o.2~sr) c3 = sro. 2 + ro.2T + o.2~sT)
2 C4 = ?'O.~T ~ O.R(ST) 2 C5 = O.R(ST)" It is left to the reader to show ~.1 = s t r o t 2 / ( 2 c l ) and Z2 = )~3 = ~,4 - - ~,5 = 0. By Bhat's lemma, or by Theorem 3.2.1 and Corollary 3.1.2(a), Y'AmY 2 CmXrank(Am)(~m) where the rank(Am) are given in Example 4.5.1. The EMS associated with each sum of squares can be calculated using Theorem 4.4.1 and are given by EMS(mean) = Cl + s t r u 2, EMS(random factor S) = c2, EMS(random factor T) = c3, EMS (random inter S T ) = ca and EMS(random reps R ( S T ) ) = c5.
70
Linear Models
Examples 4.4.1, 4.5.1, and 4.5.2 provide the EMSs for three factorial experiments. Snedecor and Cochran (1978, p. 367) provide EMSs for the same three experiments in their Table 12.11.1. Snedecor and Cochran's EMSs are the same as the EMSs presented in this text, although they use different notation. Snedecor and Cochran's A, B, AB, e r r o r , 0.2, 0.2, 0.2, 0.2B, K2, g2, K2B, a, b, and n are equivalent to our T , S , ST, R(ST), 0.R(ST) 2 1) , , 0.2 ' 0.2 , 0.2 , y-.]tj=l ( f z . j - f z ". ) 2 / ( t -
E~=l ( ~ i . - 1~..)2/( S - 1), ~-~7=~ ~-~tj=~ ( ~ i j -- ~ i . t, s, and r, respectively.
~ . j -'1- 1~..)2/[( S - 1 ) ( t - 1)1,
Example 4.5.3 Consider the split plot experiment discussed by Kempthorne (1952, sixth printing 1967, pp. 374-375). The experiment has a random replicate factor, R, with i = 1. . . . . r levels; a set of fixed whole plot treatments, T, with j -- 1. . . . . t levels; and a set of fixed split plot treatments, S, with k = 1. . . . . s levels. In Kempthorne's Table 19.1 he lists the following sources of variation: replicates R, whole plot treatments T, replicate by whole plot interaction R T, split plot treatments S, split by whole plot interaction ST, and remainder. The remainder is equal to the interaction RS plus the interaction R S T , or equivalently, the replicate by split plot treatment interaction nested in whole plot treatments RS(T). The fixed portions of the experiment are designated by T, S, and ST with subscripts j and k. The random portions of the experiment are designated by R, RT, and RS(T). A model that identifies these sources of variation is Yijk = IZjk + R i -Jr- R T i j + R S ( T ) i ( j ) k
where the r random variables R i "~ NI(0, 0"2); the r(t - 1) x 1 random vector (Ir t~) Ptt)(RTll . . . . . RTrt)' ~ N r ( / - 1 ) ( 0 , 0.2TIr ~ I/-1); and the rt(s - 1) x 1 2 random vector (Ir @ I t P I s ) ( R S ( T ) I ( 1 ) I . . . . . g S ( T ) r ( t ) s ) ' ~ N r t ( s - 1 ) ( O , 0.Rs(r)Ir | It @ Is-l) where P't and P's are (t - 1) • t and (s - 1) x s lower portions of t- and s-dimensional Helmert matrices, respectively. Furthermore, these three sets of variables are mutually independent. Thus, the r ts x 1 random vector Y = (Ylll . . . . . Y l l s . . . . . Yrtl . . . . . Yrts) t "-" Nrts(tl,, E ) where the rts x 1 mean vector # - E(Ylll . . . . . Y l l s . . . . .
Yrtl . . . . .
Yrts) t
= lr | (/-Zll . . . . . /-s . . . . . /-Ztl. . . . . I,,s t and, by the covariance matrix algorithm, the rts x rts covariance matrix is
~=
0.2 [Ir ~ J t |
+0.2T [ I r ~
+OSEIrItO('s
(I/--~Jt)|
I
4
Factorial Experiments
71
By the sum of squares algorithm, the matrices A1, . . . , A7 are
1
1
1
r
t
s
Overall mean
A1 = - J r |
Replicates R
A2 = ( I r - - J r1 r
Whole plot treatments T
-Jt |
-Js
) (~ ~ Jt ~ -1J s s
A3= 1--Jr~ ( I t - ~ J t ) {~ s
RT
A4--
(I r - l j r ) ~( 1I t )- 1
Split plot treatments S
A5 --
! 1 (1)
ST
RS(T)
r
Jr |
|
t Jt
-Jt | t
s
Is - - J s s
A6-- !Jr~ ( I t - ~ J t ) ~ ( Is- 1--Js)S A7--
Ir--
Jr
|174
(1)
Is--Js.
s
Furthermore, Am E = cmAm for m = 1 . . . . . 7 where C1 " - C2 - - stcr2, c3 = C 4 = 2 )," Ir @ It | Is -- ]Em=IAm; 7 7 scr2r, and c5 = c6 = c7 = oRS(T Em=lrank (Am) -1 + ( r - - 1) + (/-- 1) + (r-- 1)(t-- 1) + (s-- 1) + (t-- 1)(s-- 1) + (r-- 1)/(s-- 1) = 7 7 7
rts; and ~ = (Y'~-m=lAm)E = ( ~ m = l AmE) = Y~m=l creAm. Therefore, by 2 ( ) V m ) and (ii) Y'AmY are mutually Bhat's L e m m a 3.4.1, (i) Y'AmY ~ Cm)Crank~Am) independent for m = 1 . . . . . 7 where
[lr @(/Zll, . . .
, IZts)t]t [7Jr 1
1 @ sJs 1 ] [lr @ (]-s . . . . . @ 7Jt
[,Zts)t]
(2stcr 2) = r/22../(2crR2),
[lr ~ (/Zll . . . . . ~.2--
=0
l-tts)']' Ir -- r
"" 2stcr2
72
Linear Models
..
7Jr |
It - 7Jt
@ ~Js [lr | (/-/,11..... [~ts) t]
2saZr =
r ~ , ~ ( / 2 j . - #..)2/(2a2 r) j=l [lr (~ (]~11..... l.s
~.4
..... ~ts) t ]
---
2salt
=0
, (I)s -]sJs
[lr ~) (/Z 11..... lZts)t]t ~ J r | 1 7 4 ~,5 =-
[1. .|
. . . .
bl, ts) t ]
2O'2s(r) s
= rt Z
(fz.k - lz..) - 2/(2aRS(T 2 ))
k=l
[lr ~ (//,11 ' .. .,~ts)t]t[1 7Jr ~ ( It - 7Jt 1 ) @ ( Is - ~Js 1 ) ] [lr ~ (//,11. . . . . ll~ts)t] ~.6
--
2aZs(T)
= r~~(IZjk--lzj.--~.k+IZ..)2/(2~2S(T)) j=l k=l [lr @ (/Xll ..... tZts)~]t[(Ir-~Jrl@It@(Is-~Jsl][lr@(lzll ~.7 =
. . . . . ~ts) t]
2 2~rRS(T)
=0.
By Theorem 4.4.1 the expected mean squares are EMS (overall mean) = stcr 2 + r s t l z . . 2 EMS (replicate R) -- stcr 2
t EMS (whole plot T) = scr2r + rs Z ( F z j . - / 2 . . ) 2 / ( t - 1) j=l EMS (RT) = sorer
4
Factorial Experiments
73 s
EMS (split plot S) - aRS(T 2 ) -~- r t Z
s
(/~.k - /~..)2 /(s - 1)
k=l
EMS ( S T )
= O~s(r ) + r
EMS ( R S ( T ) )
-- a2S(T).
Z(~jk j=l k=l [ ( t - 1 ) ( s - 1)1
- ~ j . - #.k +/2..)2/
Kempthorne (1952) provides the EMSs for T, R T , S, S T , and R S ( T ) in his Table 19.2. Kempthorne's EMSs are the same as the EMSs given here, although Kempthorne uses different notation. Kempthorne's a 2, tj, a), sk, and (ts)jk are equivalent to our sa2T, (~s -- /~..), O'RS(T 2 ) , (FZ.k -- [z..), and (~s -- ~s - ~.k -[- ~ . . ) , respectively.
E x a m p l e 4.5.4 Consider an experiment where bst experimental units are divided into b homogeneous sets of st units. Within each of the b sets (or, equivalently, within each of the b random blocks B) the st units are randomly assigned to the st combinations of the two fixed factors S and T where factor S has s levels and T has t levels. This is a classic random block design where the fixed treatments are identified by two fixed factors. Let Yijk be the random variable representing the observation in the k th level of factor T, the jth level of factor S, and the ith block for i = 1. . . . . b, j = 1. . . . . s, and k = 1. . . . . t. This experiment is characterized by the model Yijk = ~ j k -~- Bi + Rijk
where ~ j k are st constants representing the mean effect of the j k th combination of factors S and T; the Bi a r e random variables representing the random effect of blocks; and the Rijk a r e random variables representing the random residual or remainder. It is our intention to write a covariance matrix that contains two variance components, one associated with the variance of the random variables Bi and one associated with the variance of the random variables Rijk. However, to use the covariance algorithm, we must first rewrite the variables Rijk in terms of the factor letters B, S, and T. Note that Rijk c a n be equivalently written as Rijk -- B Sij + BTik + B STijk where BSij , BTik, and B STijk a r e random variables representing the interaction of B with S, B with T, and B with S T , respectively. We could proceed from here by putting the last two equations together to produce a model
Yijk = #jk -+- Bi + Bgij -k- BTik + BSTijk. However, the last model has four sets of random variables and would thus require a definition with four, not two, random components. The solution to the problem
74
Linear M o d e l s
is to view the st combinations of fixed factors S and T as one fixed factor, say, V, with st levels. If we let v - 1 . . . . . st designate the levels of factor V then the residual Rijk can be rewritten as Rijk -- B Sij 4- BTik + B STijk - B Viv. Now assume the random variables Bi "~ N1 (0, 0.2) and the b(st - 1) x 1 random vector ( I b | Ptst) ( B V l l . . . . . BVb,st)' ~" Nb(st-l)(O, 0.2vIb | I s t - 1 ) where Ptst is the (st - l) x st lower portion of an st-dimensional Helmert matrix. Thus, the bst x 1 random vector Y = ( Y l l l . . . . . Yllt . . . . . Ybsl . . . . . Ybst)' "~" Nbst(~, ~) where the bst • 1 mean vector //, - - E ( Y l l l --
lb |
Yllt . . . . .
.....
(/Zll .....
Ybsl . . . . .
/.tit .....
/-s
Ybst)t
.....
IZst)'.
The covariance matrix algorithm can be applied using the two factor letters B and V where the number of levels is b and st, respectively. The bst x bst covariance matrix, ~ , is constructed here" Factor Letter Levels
B b
0.2
m
V st
[Ib @
Jst]
(Ist-ljst)].
+0.2 v l i b | That is, ~2=
0.2
[Ib|174
+ 0.2 V [Ib @ Is | It] --o'2 v
[I b,| 1 7 4 s
,]Jt 9
7
The sum of squares algorithm is now used to calculate matrices A1 . . . . . A6. Recall that the sum of squares remainder equals the sum of the sum of squares due to the interactions B S, B T, and B S T . Thus, A6 equals the sum of the matrices associated with the sum of squares due to B S, B T, and B S T . Overall mean
Blocks B
1 A1 = ~Jb |
A2--
1 -Jss |
(Ib, --) -~Jb
1 t Jt
,
,
| sJs | tJt
4
Factorial Experiments
75
1 ( 1 ) 1
Factor S
A3 : ~Jb @
Is - -Jss
Factor T
A4 : ~Jb t~ -Jss |
1 1
ST
Remainder
(l)
It -- t Jr
( 1 ) ( 1 )
As-~Jb|
L--J~s
Ib --
A6 :
t~ t Jr
|
|
I t - tJt
Is -- -Js
|
s
Js) (It Jt)l Again A m ~ = c m A m for m = 1 . . . . . 6 where Cl = c2 = stcr 2 , ca = c4 = c5 = C6 = O'2v, Ib @ Is | It -- E6=lAm, Em=lrank 6 (Am) = 1 + (b - 1) + (s - 1) -q( t - 1) + ( s - 1 ) ( t - 1) + ( b - 1 ) [ ( s - 1) + ( t - 1) + ( s - 1 ) ( t - 1)] = bst, ~ = ( E 6 = 1 A m ) ~ "- ( E m6= l Am ~'~) = E m6= l cmAm Therefore, b y B h a t ' s lemma 2 (/~,m) and (ii) Y'AmY are mutually independent for 3.4.1, (i) Y'AmY "~ Cm)(.rank(Am) m = 1 . . . . . 6 where ~.1 = [ l b ~ ) ( / Z l l
.....
• [lb @ ( # l l . . . . .
tJt]
.....
.....
/Zlt . . . . .
/Zsl . . . . .
lZst)']/(2stt7 2)
= b;z2../(2a~) ~.2 = [lb (~) (#11 . . . . .
~s . . . . .
• [lb @ (~11 . . . . .
[/,sl . . . . .
/-tit . . . . .
~-s
~sl .....
Ib -- -~Jb @ -Js @ Jt
~-
s
[~st)']/(2St0"2)
=0 '~3 -- [lb @ (#11 . . . . .
/Zlt . . . . .
• [lb ~) (/Zll . . . . .
= bt ~ j=l
/Zsl . . . . .
/Zlt . . . . .
(fzj.-/2..)2/(2cr2v)
[Zst)t]t
#sl .....
1
bJb |
(
#st)']/(2O2v)
1js )
Is - -s
|
1 ] tJt
76
Linear Models
,~,4 - -
[lb
~
(/-s
.....
@ (/Zll
X [lb
/-s
.....
.....
lZst)']'(~ Jb| 1jS|
/Zsl .....
~lt .....
~sl,
..-,
(It-1jt)]
lZst)']/(20"2V)
t
= bs Z (ft.k - ft..)2/(2O'B2V)
j=l
)',5 :
[lb
@ (/Zll,
l
x -~Jb| x
...,/-/,It
.....
/-s
(Is--Jss l)(
[lb @ ( ~ 1 1
.....
lZst)']t
.....
')l
I'-TJt
|
~lt .....
]-s
...,
lZst)']/(2o'2V)
= b ~ ~ (lZjk - fzj. - ~.1~- [z..)2/(2a2v) j=l
k=l
~,6 "-" [ l b @ (/Z 11, 9 9 9 , /Z It . . . . .
x { [(lb--~Jo)|
~st)t]t
/Zs 1 . . . . .
(Is-!Js)| 1
d- [(Ib-- bJb) |174
~Jt] 1
(It- tJt)l
+ [(lb--ljb)|174 x
[lb |
(/Zll .....
/-s
(It-~Jt)l} .....
#st)']'/(2~
/-s
=0. By Theorem 4.4.1, the expected mean squares are EMS (overall mean) - sta 2 + bst[z2.. EMS (random block B) - stcr2 EMS (fixed factor S) = aZv + bt ~(~j.
-
~..)2/(s
-
1)
j=l t
EMS (fixed factor T) -- aZv + bs Z ( F t k -/2..)2/(t k=l
-
1)
4
Factorial Experiments
EMS ( S T ) = a 2 v +
77 s b~j=~
'
Ek=l
(fLjk -- ~ L j . - fL.k -~-/~..)2 [(s-
1)(t-
1)]
EMS (remainder R ) = ~r2 v .
EXERCISES 1. Consider an experiment where r replicate units are nested in each of the s levels of a fixed factor s. The t levels of a second fixed factor T are then applied to each of the s r experimental units. A model for this experiment is
Yijk -" #ik + R(S)(i)j + RT(S)(i)jk for i = 1 . . . . . s, j = 1 . . . . . r, and k = 1 . . . . . t where R represents the replicate units nested in the s levels of factor S. Assume the s t r x 1 random vector Y - - ( g i l l . . . . . gilt . . . . . glrl . . . . . glrt . . . . . gsll . . . . . gslt . . . . . Ysrl . . . . .
gsrt )t ~, Nsrt (l~, ~ ) . (a) D e r i v e / , and ~ . (b) Write out the ANOVA table and define the matrices Am used in the sums of squares Y'Am Y for m = 1 . . . . . 7 where A7 is the matrix corresponding to the sum of squares total. (c) Derive the distributions of Y'AmY for m = 1 . . . . . 6. (d) Calculate all the expected mean squares. (e) Construct all "appropriate" F statistics and explicitly define the hypothesis being tested in each case. Prove that all the statistics constructed above have F distributions. 2. Consider a factorial experiment with three fixed factors S, T, and U where the factors have s, t, and u levels, respectively. Within the s t u combinations of the three main factors, r random replicate observations are observed. Let R represent the random nested replicate factor. Assume the s t u r x 1 random vector Y = ( Y l l l l . . . . . Y l l l r . . . . . Ystul . . . . . Ystur)' ~ Nstur(t -t, Y]~).
(a) Write a model for this experiment and derive the mean vector/z and the covariance matrix E. (b) Write out the ANOVA table and define the matrices Am used in the sums of squares Y'AmY for m = 1 . . . . . 10 where A10 is the matrix corresponding to the sum of squares total. (c) Derive the distributions of Y'AmY for m = 1 . . . . . 9.
78
Linear Models
(d) Calculate all the expected mean squares. (e) Construct all "appropriate" F statistics and explicitly define the hypothesis being tested in each case. Prove that all the statistics constructed above have F distributions. 3. Redo Exercise 2 above when S is a fixed factor and T and U are random factors. 4. Redo Exercise 2 when S, T, and U are all random factors. 5. Consider the paired t-test problem introduced in Exercise 15 of Chapter 3. However, now view the problem as an experiment where two levels of a fixed factor T are applied to each of the n levels of a random factor B. Thus, the n levels of B are the n experimental units. The model for this problem is Yi j -- ~ j .qt_ B i -k- B Ti j
where ~j are constants representing the average effect of the jth level of fixed factors T, Bi are random variables representing the effect of the ith experimental unit, and B Tij are random variables representing the interaction effect of factors B and T. Assume the 2n x 1 random vector Y -(Yll, Y12, Y21, Y22. . . . . Ynl, Yn2)' ~ Nzn(~, ~ ) . (a) Derive/~ and use the covariance algorithm to define the covariance matrix in terms of o"2 and o'2r. (b) Write out the ANOVA table and define the matrices Am used in the sums of squares Y'AmY for m -- 1. . . . . 5 where A5 is the matrix corresponding to the sum of squares total. (c) Derive the distributions of Y'AmY for m - 1. . . . . 4. (d) Calculate all the expected mean squares. (e) Construct a statistic to test H0 9 #~ -- #2 versus H~ 9 #~ -r #2. Does the statistic you constructed have an F distribution? Prove your answer. (f) Prove that the mean square due to factor T divided by the mean square due to the B T interaction is equal to T 2 where T is defined in Exercise 15b from Chapter 3. 6. An animal scientist was interested in evaluating the effect of five different types of feed on the weight gain of cattle. An experiment was run where 60 cattle were randomly assigned to 15 pens with 4 cattle in each pen. Each pen then received a certain type of feed. The assignment of feeds to pens was done randomly with 3 pens in each of the five feed types. Weight observations on each head of cattle were then taken at three different times. Let Yijkl be a random variable representing the weight gain observed at the lth time period, on the k th head of cattle, in the jth pen, and being fed the ith feed type for
4
Factorial Experiments
79
i = 1, 2, 3, 4, 5, j = 1, 2, 3, k = 1, 2, 3, 4, and I = 1, 2, 3. Assume the 180 x 1 random vectorY = ( Y l l l l . . . . . Yll13 . . . . . Y5341 . . . . . Y5343) t ~ N180(//,, •). A model for this problem is
Yijkl = ]-Lil + P(F)(i)j -I-C(FP)(ij)k -+- PT(F)(i)jl + CT(FP)(ij)kl where [.Lil are constants representing the average effect of the i th feed type at the Ith time and P(F)(i)j, C(FP)cij)k, PT(F)(i)jl, and CT(FP)(ij)kl are random variables representing the effects of pens nested in feed types, cattle nested in feed pens, the interaction of time by pens nested in feeds, and the interaction of time and cattle nested in feed pens, respectively. (a) Derive/~ and E. (b) Write out the ANOVA table and define the matrices Am used in the sums of squares Y'AmY for m = 1. . . . . 9 where A9 is the matrix corresponding to the sum of squares total. (c) Derive the distributions of Y'AmY for m = 1. . . . . 8. (d) Calculate all the expected mean squares. (e) Construct all "appropriate" F statistics and explicitly define the hypothesis being tested in each case. Prove that all the statistics constructed above have F distributions. 7. A pump manufacturer wanted to evaluate how well his assembled pumps performed. He ran the following experiment. He randomly selected 10 people to assemble his pumps, randomly dividing them into two groups of 5. He then trained both groups to assemble pumps, but one group received more rigorous instruction. The two groups were therefore identified to be of two skill levels. Each person then assembled two pumps, one pump by one method of assembly and a second pump by a second method of assembly. Each assembled pump then pumped water for a fixed amount of time and then repeated the operation later for the same length of time. The amount of water pumped in each time period was recorded. The order of the operation (first time period or second) was also recorded. Let Yijkl be a random variable representing the amount of water pumped during the Ith time period or order, on a pump assembled by the k th method and the jth person in the ith skill level for i = 1, 2, j = 1, 2, 3, 4, 5, k = 1, 2, and I = 1, 2. Assume the 40 x 1 random vector Y = ( Y l l l l , Yll12, Yl121, Yl122 . . . . . Y2511, Y2512, Y2521, Y2522)' ~" N 4 0 ( ~ , ]~). A model for this problem is
Yijkl = lZikl + P(S)(i)j + PM(S)~i)jk + PO(SM)(i)j(k)l where tXikl are constants representing the average effect of the/th order in the k th method with the i th skill level, P(S)~i)j, PM(S)(,i)jk, and PO(SM)~i)j~k)l are
80
Linear Models random variables representing the effects of people nested in skill levels, the interaction of methods and people nested in skill levels, and the interaction of order and people nested in skill levels and methods. (a) Derive/~ and ~. (b) Write out the ANOVA table and define the matrices Am used in the sums of squares Y'AmY for m = 1, . . . , 12 where Y'A12Y is the sum of squares total. (c) Derive the distributions of Y'AmY for m = 1. . . . . 11. (d) Calculate all the expected mean squares. (e) Construct all "appropriate" F statistics and explicitly define the hypothesis being tested in each case. Prove that all the statistics constructed above have F distributions.
8. Consider Exercise 7. (a) Calculate the standard error of Yijkt/20 for i = 1, 2.
~"1...- ~"2...where ~'i... "- ~=1 ~--~2=1~-'~2=1
(b) Find an unbiased estimator of/21.12 -/22.12 where and calculate the standard error of the estimator.
#i.kl -- Y'~=I #ijkt/5
9. Prove that the necessary and sufficient conditions of Bhat's Lemma 3.4.1 are satisfied in the following situation. Consider any complete, balanced factorial experiment with n observations where the n x 1 random vector Y -~ Nn (/z, ~ ) . The covariance matrix algorithm rules E 1, E2, E2.1, E2.2, and E2.3 are used to derive E. The sum of squares algorithm rules A 1, A2, A2.1, A2.2, and A2.3 are used to derive the k sum of squares matrices A l . . . . Ak with ~-~=1 Ai = In.
5
Least-Squares Regression
In this chapter the least-squares estimation procedure is examined. The topic is introduced through a regression example. Later in the chapter the regression model format is applied to a broad class of problems, including factorial experiments.
5.1
ORDINARY LEAST-SQUARES ESTIMATION
We begin with a simple example. An engineer wants to relate the fuel consumption of a new type of automobile to the speed of the vehicle and the grade of the road traveled. He has a fleet of n vehicles. Each vehicl0e is assigned to operate at a constant speed (in miles per hour) on a specific grade (in percent grade) and the fuel consumption (in ml/sec) is recorded. The engineer believes that the expected fuel consumption is a linear function of the speed of the vehicle and the speed of the vehicle times the grade of the road. Let Yi be a random variable that represents the observed fuel consumption of the ith vehicle, operating at a fixed speed, on a road with a constant grade. Let Xil represent the speed of the ith vehicle and let xi2 represent the speed times the grade of the ith vehicle. The expected fuel
81
82
Linear Models
consumption of the i th vehicle can be represented by E(Yi) =/~o + i~lXil + i~2Xi2 where/~0,/~1, and/~2 are unknown parameters. Due to qualities intrinsic to each vehicle, the observed fuel consumptions differ somewhat from the expected fuel consumptions. Therefore, the observed fuel consumption of the ith vehicle is represented by
Yi = E(Yi) + Ei or Yi - - i~0 dr- i ~ l X i l -~- i~2xi2 dr- E i
where Ei is a random variable representing the difference between the observed fuel consumption and the expected fuel consumption of the ith vehicle. An example data set for this fuel, speed, grade experiment is provided in Table 5.1.1. In a more general setting consider a problem where the expected value of a random variable Yi is assumed to be a linear combination of p - 1 different variables X i l , Xi2 . . . . . X i , p - 1 . That is, E(Yi) =/~0 + ~lXil -'~-''" + i~p-lXi,p-1 9 Adding a component of error, Ei, to represent the difference between the observed value of Yi and the expected value of Yi we obtain Yi = i~o + i~ l X i l .qt_ . . . + i~ p _ l X i, p _ 1 _1t_ E i .
By taking expectations on the right and left sides of the preceding two equations, we obtain E(Ei) = 0 for all i = 1. . . . . n. Table 5.1.1
Fuel, Speed, Grade Data Set i
Fuel Yi
Speed Xil
Grade
Speed • Grade Xi2
1 2 3 4 5 6 7 8 9 10
1.7 2.0 1.9 1.6 3.2 2.0 2.5 5.4 5.7 5.1
20 20 20 20 20 50 50 50 50 50
0 0 0 0 6 0 0 6 6 6
0 0 0 0 120 0 0 300 300 300
5
Least-Squares Regression
83
The m o d e l just discussed can be expressed in matrix form by noting that Y1 - r0 + fllXll + " " Y2 :
+
]~0 + ~lX21 ' ~ - " " " - t -
9
flp-lXl,p-1 + E1 ~p-lX2,p-1 + E2
o
Yn : ~0 -31-"~lXnl "~""" 'JI- ~p-lXn,p-1 -4- En or
Y-
X/~+E
Yn)',
where the n x 1 r a n d o m vector Y = (Y1 . . . . . (rio, fll . . . . ~p-1)', the n x 1 r a n d o m vector E -
the p x 1 v e c t o r / 3 ( E l . . . . . En)' and the n x p
matrix
1
X l l
9 9 9
Xl,p-1
1
X21
999
X2,p-1
"""
Xn, p-
X 9
,
1
Xn 1
o
1
Furthermore, E ( E i ) : 0 for all i = 1 . . . . . n implies E ( E ) = O n • 1- Therefore E ( Y ) = X/3. For the present, assume that the Ei's are independent, identically distributed r a n d o m variables where var(Ei) = o-2 for all i = 1 . . . . . n. Since the Ei's are independent, c o v ( E i , Ej) : 0 for all i :/: j . Therefore, the covariance matrix of E is given by E = c o v ( E ) = o-21n. In later sections of this chapter more c o m p l i c a t e d error structures are considered. Note that E has been used to represent the covariance matrix of the n x 1 r a n d o m error vector E. However, E is also the covariance matrix of the n • 1 r a n d o m vector Y since c o v ( Y ) -- E[(Y - X/3) (Y - X/3)']
= E[EE'] =E. Since the xij values are k n o w n for i = 1 . . . . . n and j - 1 . . . . . p - 1, .~.j = Y~i=l xij/n can be calculated for any j . Therefore, the preceding m o d e l can be n
equivalently written as
Yi -- fl~ -k- i~1 (Xil - -~.1) -3t - ' ' " ~ - / ~ p - 1 (Xi,p-1 - - X . p - l ) + Ei where flo -- fl~ - / ~ 1 - ~ . 1 . . . . .
t~p_lf(,p_l. In matrix form Y-
X*/3* + E
84
L i n e a r Models
where f l * --- ( ~ ~) , ~1 . . . . .
X*=
[3p - 1 ) ' ,
1
Xll
-- $.1
"'"
Xl,p-1
1
x21
--
"'"
X 2 , p - 1 -- $ . p - 1
X.1
.
.
1
Xnl - $.1
-- $.p-1
= [lnlXc] "'"
Xn,p-1 - $.p-I l
and Xc is ann x ( p - 1) matrix such that lnX c -- 0Ix(p-I). This later model form is called a centered model. Without loss of generality, a model can always be assumed to be centered since any model Y = Xfl + E can be written as Y = X*fl* + E. The asterisks on the centered model are subsequently dropped since X can always be considered a centered matrix if necessary. In the next example, the 10 x 3 centered matrix X is derived for the example data set.
Example 5.1.1
For the example data given in Table 5.1.1, the average speed is $.1 = [5(20) + 5(50)]/10 = 35 and the average value of speed x grade is $.2 = [6(0) + ( 1 ) 1 2 0 + 3(300)]/10 = 102. Therefore, the 10 x 3 centered matrix X = (ll01Xc) where (-15, Xc =
-102) |
14
(-15, 18) | ( 15, - 1 0 2 ) | ( 15, 198) |
1 12 13
"
The main objective to this section is to develop a procedure to estimate the p unknown parameters ill,/31 . . . . . /Sp_ 1. One method that provides such estimators is called the ordinary least-squares procedure. The ordinary least-squares estimators of/30, fll . . . . . tip-1 are obtained by minimizing the quadratic form Q with respect to the p x 1 vector 15 where Q = ( Y - X f l ) ' ( Y - Xfl) = Y'Y-
2fl'X'Y + fl'X'Xfl.
To derive the estimators, take the derivative of Q with respect to the vector/3, set the resulting expression equal to zero, and solve for ft. That is,
3 a / o f l = - 2 X ' Y + 2X'Xfl = Opxl or X'Xfl = X'Y. If X'X is nonsingular [i.e., rank (X'X) = p] then the leastsquares estimator of fl is
/3 - (X'X)-1X'Y.
5
Least-Squares Regression
85
Thus, the ordinary least-squares estimator of the p x 1 vector/3 is a set of linear transformations of the random vector Y where (X'X) -1X' is the p x n transformation matrix. If E(Y) = X/3,/3 is an unbiased estimator of/3 since E(3) = (X'X)-IX'E(Y) = (Xtx)-lxtx/~-
/~.
Furthermore, the p x p covariance matrix of/3 is given by COV(~) = ( x t x ) - l x
t(0-21n)x(x'x)-I
.-- 0 -2 (XtX) -1"
when ~ = cov(Y) - 0-21n. It is also generally of interest to estimate the unknown parameter 0-2. The quadratic form d2 = Y'[In - X ( X ' X ) - I x ' ] Y / ( n - p). provides an unbiased estimator of 0-2 when E -- 021 n since E(6 "2) - E{Y'[In - X ( X ' X ) - l x ' ] Y / ( n - p)} = {tr[(In - X(X'X)-Ix')(0-21n)] +/~'X'(In -- X ( X ' X ) - l x ' ) x ~ } / ( n -- p) = {0-2(n -- p) + (/~'X'X/3
--/0'X'X(X'X)-lx'x/~)}/(n _ p) ~_. 0-2.
In the next example the ordinary least-squares estimates of/3 and 0-2 are calculated for the example data set. The IML procedure in SAS has been used to generate all the example calculations in this chapter. The PROC IML programs and outputs for this chapter are presented in Appendix 1. Example 5.1.2 For the example data given in Table 5.1.1, the least-squares estimate of [3 is given by /~ -- ( X ' X ) - I x ' y = (3.11, 0.01348, 0.01061)'. Therefore, the prediction equation is ~' = 3.11 + 0.01348(Xl - 35) + 0.01061(x2 - 102) or
~" = 1.556 + 0.01348Xl + 0.01061x2.
86
Linear Models,
The ordinary least-squares estimator of ~r2 is given by 6 2 = Y'[In - X ( X ' X ) - I x ' ] Y / 7 - 0.05988.
5.2
BEST LINEAR UNBIASED ESTIMATORS
In many problems it is of interest to estimate linear combinations of 1%. . . . . /~p-1, say, t'/3, where t is any nonzero p x 1 vector of known constants. In the next definition the "best" linear unbiased estimator of t ' ~ is identified.
Definition 5.2.1
Best Linear Unbiased Estimator (BLUE) oft'O: The best linear unbiased estimator of t'/~ is (i)
a linear function of the observed vector Y, that is, a function of the form a'Y + a0 where a is an n • 1 vector of constants and a0 is a scalar and
(ii)
the unbiased estimator of t ' ~ with the smallest variance.
In the next important theorem t'/3 = t ' ( X ' X ) - I x ' y is shown to be the B L U E ol t ' ~ when E(E) = 0 and cov(E) - cr2ln. The theorem is called the Gauss-Markov theorem.
Theorem
5.2.1 Let Y - Xfl + E where E(E) = 0 and cov(E) - cr2In. Then the least-squares estimator of t'~ is given by t'/9 : t ' ( X ' X ) - I x ' y and t'~ is the
BLUE of t'~ Proof: First, the least-squares estimator of t'/~ is shown to be t'/9. Let T be a p x p nonsingular matrix such that T = (tlTo) where t is a p x 1 vector and To is a p x ( p - 1) matrix. If R = T '-l then
Y=X~+E = XRT'/3 + E =Uw+E where U = X R and w-T'/3=
[t'/~ T~/3 1 "
The least-squares estimate of w is given by = (U'U)-~ U'Y
= (R'X'XR)-IR'X'y = R- 1(X'X)- 1R'- 1R'X'Y
5
Least-Squares Regression
87
= T ' ( X ' X ) - I X ' Y - T'/3
_ [ t'/3 Therefore, t'/3 is the least-squares estimator of t'fl. Next, t'/5 is shown to be the BLUE of t'fl. Linear estimators of t'fl take the form a'Y +a0. Since t'(X'X) -1X' is known, without loss of generality, let a' - t' (X'X)- 1X' + b'. Then linear unbiased estimators of t'fl satisfy the relationship
t'fl - E(a'Y + a0) = E ( t ' ( X ' X ) - I x ' y + b'Y + a0) = t ' ( X ' X ) - I x ' x / 3 + b'Xfl + ao = t'/3 + b'Xfl + a0. Therefore, in the class of linear unbiased estimators b'Xfl + a0 - 0 for all ft. But for this expression to hold for all fl, b'X = 01 • p and a0 = 0. Now calculate and minimize the variance of the estimator a'Y + a0 within the class of unbiased estimators of t'fl, (i.e., when b'X - 01 • p and a0 - 0). var(a'Y + a0) - var(a'Y) = var(t' (X'X)-1X'Y + b'Y) = cre(t'(X'X)-lt + b'b). But cr 2 and t' ( X ' X ) - l t are constants. Therefore, var(a'Y + a0) is minimized when b'b - 0 or when b = 0p • 1. Therefore, the BLUE oft'fl has variance cret ' (X'X)-1 t. But t'/J is a linear unbiased estimator of t'/~ with variance cret ' (X'X)- 1t. Therefore, t'/3 is the BLUE of t'/3. II
Example 5.2.1 Consider the example data set given in Table 5.1.1. By the Gauss-Markov theorem, the best linear unbiased estimate of/~1 - 32 is t'/~ = (0, 1 , - 1 ) ( 3 . 1 1 , 0.01348, 0 . 0 1 0 6 1 ) ' - 0.00287.
5.3
ANOVA TABLE FOR THE ORDINARY LEAST-SQUARES REGRESSION FUNCTION
An ANOVA table can be constructed that partitions the total sum of squares into the sum of squares due to the overall mean, the sum of squares due to fll . . . . . /~p- 1, and the sum of squares due to the residual. The ANOVA table for this model is given in Table 5.3.1. The sum of squares under the column "SS" can be applied to any form of the n x p matrix X. The sum of squares under the column "SS Centered" can be applied to centered matrices X - [ l n l X c ] where l ' n X c - - 0 I x ( p - I ) .
88
Linear Models Table 5.3.1 Ordinary Least-Squares ANOVA Table Source
df
SS
SS Centered
Overall mean
1
y~l nJnY
tl = Y nJnY
p - 1
yt[x(xtx)-lxt
n-
Y'[In - X ( X ' X ) - ~ X ' ] Y
Regression (/~1 . . . . . Residual
~p--1)
p
t -- n1J n ] Y = Y t X c ( X c X c ) --l X ct Y
= Y'[In - nJn __X c (XcX) t --1 Xc]Y t
Total
y'y
n
The expected mean squares for each effect are calculated using Theorem 1.3.2: EMS (overall mean)= E ( Y'lJnY)n
= tr[(0.2/n)Jn] + fl'x'ljnXfl n __ 0.2 _~_
fl'[lnlXcl'ljn[1,,IXclfl n
[l'.l.ll'~Xclfl/n
X;l~
=~2+fl,
n2 O] 0
o fl/n
= o2 + n~2 EMS (Regression)= E[Y'Xc(X'cXc)-lX'cY/(p-
1)]
-- (tr[0.2Xc(X'cXc)-lX'c] + ~tXtXc(XtcXc)-lXtcX~}/(
p --
1)
= {0.2(p_ 1)
+fl'[l'n[lnlxc]fl}/(p-1)x, ]Xc(X'cXc)-lX'c c
._ 0.2 _~_ (/~1 . . . . .
~p-l)XtcXc(~l
.....
~p_l)t/(p
-
1)
and EMS (residual) = E(8 2) = a2 as derived in Section 5.1. The ANOVA table for the fuel, speed, grade data set is provided in the next example.
5
Least-Squares Regression
89
Example 5.3.1 The ANOVA table for the data in Table 5.1.1 is given here"
Source
df
Overall mean Regression (ill, f12) Residual Total
5.4
SS
1 2 7
96.721 24.070 0.419
10
121.210
WEIGHTED LEAST-SQUARES REGRESSION
In the first three sections of this chapter the model was confined to Y = Xfl + E where E(E) = 0 and cov(E) = o'21n . In this section, the model Y = Xfl + E is considered when E(E) = 0, cov(E) = tr2V, and V is an n x n symmetric, positive definite matrix of known constants. Because V is positive definite, there exists an n x n nonsingular matrix T such that V = TT'. Premultiplying both sides of the model Y = Xfl + E by T -1 we obtain T - 1 y = T-1X/5 + T-1E Yw = Xw/~ + E w where Yw = T - 1 y , Xw = T - i x , and Ew = T-1E. Therefore, E(Ew) = T-1E(E) = 0p• and cov(Ew) = cov(T-1E) = T - I ( a 2 V ) T -1' = a2In. The weighted least-squares estimators of fl and a2 are derived using the ordinary leastsquares estimator formulas with the model Yw = Xw/3 + Ew. That is, the weighted least-squares estimators of fl and o.2 are given by t
-1
3w = (XwXw) XwYw = (X'T-I'T- 1X)- iX'T- l'T- 1y = ( X t V - 1 X ) - 1 X t V - 1y
and ,,2
~w -- (Yw - Xw/~w)'(Yw - Xw/~w)/(n - p) = [ y ' ( v -1 _ v - l x ( x ' v - m x ) - l x ' v - 1 ) y ] / ( n
_ p).
The Gauss-Markov theorem can also be generalized for the model Y = X/3 + E where E(E) = 0 and cov(E) = ~2V. For this model, the weighted least-squares estimator of t'/~ is given by t'/3w = t ' ( X ' V - l X ) - l X ' V - I y and t'/Sw is the BLUE of t'/3. The proof is left to the reader.
90
Linear Models Table 5.4.1
Weighted Least-Squares ANOVA Table Source
df
SS
Overall mean
1
Y'V -1 l n ( l ' V -1 ln) -1 I~V-1Y
p - 1
ytv-I [X(XtV-lX)-lx ' - ln(l~V-11n)-I l~]V-1y
Residual
n-
y'[v-I _ v-lx(xtv-lx)-lxtv-1]y
Total
n
Regression (/~1. . . . .
tip-l)
p
y,v-1y
The ANOVA table for weighted least-squares regression functions can be constructed using Table 5.3.1 and substituting T-1X for X and T - 1 y for Y. The weighted least-squares ANOVA table is provided in Table 5.4.1. In the next example a weighted least-squares ANOVA table is derived for the fuel, speed, grade data set.
Example 5.4.1 Consider the data in Table 5.1.1. Suppose the fuel consumption observations are independent but the variance of the observations at speed 50 mph is twice the variance of observations at speed 20 mph. Then cov(E) = 0-2V where the 10 • 10 matrix
It 01
V=
0
2
|
The weighted least-squares estimates of/3 and /~w = ( X ' V - 1 X ) - 1 X ' V - 1 Y
-
0-2 a r e
(3.11, 0.013, 0.0107)'.
Therefore, the prediction equation is ~" = 3.11 + 0 . 0 1 3 ( x l - 3 5 ) + 0.01072(x2- 102) or
~" = 1.563 + 0.013xl + 0.01072x2. The weighted least-squares estimator of ,,2 = [Y'(V- 1 - V0-w
1x
0 -2
is given by
(X'V- 1X) - 1 X t V - 1 )Y]/7 - 0.0379.
Least-Squares Regression
91
The weighted least-squares ANOVA table is Source Overall mean Regression (ill, f12) Residual Total
df
SS
1 2 7
57.4083 14.5813 0.2654
10
72.2550
Thus far in Chapter 5 no assumptions have been made about the functional form of the distribution of the n x 1 random vector E (and hence about the distribution of Y). However, if we want to test hypotheses or construct confidence bands on model parameters, then we need to make an assumption about the functional form of the distribution of E. It is common to assume that the n x 1 random vector E has a multivariate normal distribution where E(E) = 0 and cov(E) = ~ . In the simplest case the Ei's are independent, identically distributed normal random variables such that E ,-, Nn(0, cr21n). In more complicated problems, it is assumed that E ~ Nn (0, ~ ) where ~ is an n x n matrix whose elements are functions of a series of unknown variance components. In Section 5.5 model adequacy is discussed when E ,-~ Nn (0, a21~). In Section 5.6 least-squares regression is developed for complete, balanced factorial experiments where the n x n covariance matrix is a function of a series of unknown variance components. Later in Chapter 6 a general discussion on confidence bands and hypothesis testing is provided.
5.5
LACK OF FIT TEST
In this section assume that the n x 1 random error vector E ,-~ N,, (0, c r 2 I n ) . It is of interest to check whether the proposed model adequately fits the data. This lack of fit test requires replicate observations at one or more of the combinations of the X1, X2, 9 9 9 Xp-1 values. Since the elements of the n x 1 random vector Y = (Y1 . . . . . Yn)' can be listed in any order, we adopt the convention that sets of Yi values that share the same xl . . . . . Xp-1 values are listed next to each other in the Y vector. For example, in the data set from Table 5.1.1, the 10 x 1 vector Y = (1.7, 2.0, 1.9, 1.6, 3.2, 2.0, 2.5, 5.4, 5.7, 5.1)' with Y1-Y4 sharing a speed equal to 20 and a speed x grade equal to 0, Y5 having a speed equal to 20 and a speed x grade equal to 120, Y6-Y7 sharing a speed equal to 50 and a speed x grade equal to 0, and Y8-Y10 sharing a speed equal to 50 and a speed x grade equal to 300. When replicate observations exist within combinations of the Xl . . . . . Xp_ 1 values, the residual sum of squares can be partitioned into a sum of squares due to pure error plus a sum of squares due to lack of fit. The pure error component is a
92
Linear Models
measure of the variation between Yi observations that share the same x l . . . . . X p._ 1 values. In the example data set from Table 5.1.1, the pure error sum of squares is the sum of squares of Y1-Y4 around their mean plus the sum of squares of Y5 around its mean (zero in this case), plus the sum of squares of Y6-Y7 around their mean plus the sum of squares of Y8-Y10 around their mean. In general the sum of squares pure error is given by SS (pure error) - Y'ApeY where Ape is an n x n block diagonal matrix with the jth block equal to Irj - •rj for j = 1 . . . . . k where k is the n u m b e r of combinations of x l . . . . . X p_l values that contain at least one observation and rj is the number of Yi values in the jth combination with n -- ~ = 1 r j. Note that Ape is an idempotent matrix of rank n - k and JnApe -- 0,,• Furthermore, the first rl rows of the matrix X are the same, the next r2 rows of X are the same, etc. Therefore, ApeX = 0nxp. In balanced data structures rl = r2 = . . . . rk = r, n = r k , and the n x n pure error matrix Ape can be expressed as the Kronecker product Ik | (Ir - _1 Jr). r For the fuel, speed, grade data set, the 10 x 10 pure error sum of squares matrix Ape is derived in the next example. From the Table 5.1.1 data set, four groups of Yi's share the same speed and grade values. Therefore, k - 4, rl = 4, r2 = 1, r3 - 2, r4 - 3, and Ape is given by
E x a m p l e 5.5.1
1
14 -- ~J4 Ape
0 01 x 1
l
9
I2 - ~J2 1
0 1
13 - ~J3 1
In this example, r2 = 1 so Ir2 - 72Jr2 = 1 - T 1 = 0. Thus, the fifth diagonal element of Ape equals the scalar 0, indicating that observation It5 does not contribute to the pure error sum of squares. The rank(Ape) = 10 - 4 = 6. The sum of squares lack of fit is calculated by subtraction. Therefore, SS (lack of fit) = SS (residual) - SS (pure error) = Y' [I~ - X ( X ' X ) - ~X' - Ape ]Y. The sums of squares due to the overall mean, regression, lack of fit, pure error, and total are provided in Table 5.5.1. Note that [In -- X ( X ' X ) - l x ' - Ape]tr21n - O'2[In -- X ( X ' X ) - l x ' Ape] where I~ - X ( X ' X ) - I X ' - Ape is an idempotent matrix of rank k - p. Likewise, [Ape]Cr2I~ = trZApe where Ape is an idempotent matrix of rank n - k . Therefore, by
5
Least-Squares Regression
93
Table 5.5.1 ANOVA Table with Pure Error and Lack of Fit Source Overall mean
df
SS
SS Centered
1
fl Y nJnY
__ytl nJnY = Y'Xc(XtcXc)-lXlcY
Regression (/~1 . . . . . /~p-1)
p - 1
Y ' [ X ( X ' X ) - I x ' - 1Jn]Y
Lack of fit
k - p
Y'[In -- X ( X ' X ) - l X ' -- Ape]Y = Y'[In
_
1 nJn - Ape
- X c (XcX) t -1 Xc]V t Pure error
n - k
Y'ApeY
Total
n
y,y
= Y'ApeY
Corollary 3.1.2(a), the sum of squares lack of fit Y'[In - - X ( X t X ) - l x t - - Ape]Y "" O.2 Xk-p 2 0~lof) and the sum of squares pure error Y'ApeY "" O.2 Xn-k 2 0~p~) where ~.lof -- [E(Y)]'[In - X ( X ' X ) - I x ' - Ap~][E(Y)]/(2O- 2) > 0 and ~.pe = [E(Y)]'Ape[E(Y)]/(2O- 2) = ~'X'Ap~X/5/(2 0-2) = 0 Furthermore, by Theorem 3.2.1, the lack of fit sum of squares and the pure error sum of squares are independent since [In - X ( X ' X ) - I x ' - Ape](o-Zln)(Ape) = O.2[Ape - X(X'X)-IX'Ape - Ape] = O.2[Ape - Ape]
--- Onxn ,
Therefore, the statistic Y'[In - X(X'X)-Ix
F*=
'-
Ap~]Y/(k - p )
~ Fk_p,n_k()qof).
Y'ApeY/(n - k)
Note that if E(Y) = X/~, then ~qof = [E(Y)]'[In - X ( X ' X ) - I x ' - Ape][E(Y)]/(2O- 2) =/~'X'[In
- X(XtX)-lx
t-
A p e ] X / ~ / ( 2 o - 2)
--- i ~ t X t X / ~ - - / ~ t X t X ( X t X ) - l x t x / ~
-/~tXtApeX/~
--0.
If E(Y) --# X/~ then )LIof > 0. Therefore, the hypothesis H 0 " ~.lof = 0 versus H1 ")qof > 0 is equivalent to H0 9 E(Y) = X/~ versus H1 9 E(Y) :# X~. The statement E(Y) = X/~ implies that the model being used in the estimation
94
Linear Models
provides a good fit and therefore may be appropriate. Thus, a y level rejection region for the hypothesis H0 versus H1 is as follows: Reject H0 if F* > F~_p,,,_t. In the next example, the pure error and lack of fit ANOVA table is provided for the fuel, speed, grade data set.
Example 5.5.2 The ANOVA table with lack of fit and pure error calculations is provided below for Table 5.1.1 data set.
Source
df
Overall mean Regression (ill, f12) Lack of fit Pure error Total
1 2 1 6
SS
MS
96.796 24.070 0.014 0.405
96.796 12.035 0.014 0.0675
F,
0.21
10
The lack of fit test statistic F* = 0.21 < F 1,6 ~176= 5.99 indicating that the equation ~' = 1.556 + 0.013848Xl + 0.01601xi2 provides a good fit of the data.
5.6
PARTITIONING THE SUM OF SQUARES REGRESSION
In Table 5.3.1 the sum of squares regression was expressed with p - 1 degrees of freedom. This sum of squares represented the total influence of the variables Xl . . . . . Xp-1 in the ordinary least-squares regression. It is often of interest to check the contribution of a particular variable (or variables) given that other variables are already in the model. Such contributions can be calculated by partitioning the n x p matrix X as X -
( X l l X 2 1 " ' " IXm)
where Xj is an n x pj matrix for j - 1. . . . 9 m, p = ~ mj--1 PJ' and X l ---- In. I f R1 = X l , R2 = ( X l [X2) . . . . . R m - 1 - - ( X l IX2] " ' " [ X m - 1 ) a n d Rm = X , t h e n the sum of squares due to the pj variables in Xj given that Xl, X2 . . . . . Xj_l are already in the model is given by SS (Xj [X~, X 2 , . . . , Xj-1) = Y'[Rj (R~Rj)- ~Rj' -
Rj_I(R~_IRj_I)
--1
t
Rj_I]Y.
Such conditional sums of squares are often called Type I sums of squares. The entire ANOVA table with the Type I sums of squares is presented in Table 5.6.1.
5
Least-Squares Regression
95
Table 5.6.1 Type I Sum of Squares ANOVA Table Source
df
Type I SS
Overall m e a n X 1
Pl = 1
Y , R l ( R , 1 R l ) - l R j y , = y, nJnY1
XxlXl
P2
Y'[R2(R~R2)-IR~ - Rl (R'IR1)-IR'I]Y
X3lXl, X2
P3
y'[R3(R~R3)-IR~ - R 2 ( R ~ R 2 ) - l R ~ ] Y
9
Pm
Y ' [ X ( X ' X ) - I x ' - Rm-l (R~n_lRm-l)-lR~m_l]Y
Residual
n - p
Y'(In -- X ( X ' X ) - I x ' ) Y
Total
n
y'y
Xm ] X l , X 2 . . . . .
Xm-I
Note that the sum of squares due to all sources of variations still add up to the total sum of squares Y'Y. The Type I sums of squares for the fuel, speed, grade data set are provided below with the output provided in Appendix 1.
Example 5.6.1
Using the example data set from Table 5.1.1, the Type I sums of squares are provided for the overall mean, for the speed variable Xl given the overall mean, and for the speed x grade variable x2 given the overall mean and Xl.
Source
df
Overall mean x 110verall mean x21overall mean, Xl Residual Total
SS
1 1 1 7
96.721 10.609 13.461 0.419
10
121.210
Some useful relationships are developed next. Note R~ Rj (R~ R j ) - l Rj' - Rj for any j = 1. . . . . m. That is,
xl
I xl9
[x', Rj(R Rj)-'R
=
96
Linear Models
Therefore, for any 1 < i < j < m XIRj (R~Rj) -1Rj'
=
X i/
and
I -1 t I R iRj (R~Rj) Rj = R i. It is left to the reader to show that the above relationships imply that the matrices Rj (R~Rj) -1 Rj' - R j_l (R~_l R j_l )- 1R~_ 1 are idempotent of rank p j for any j = 2 . . . . . m and to show that any two matrices Ri (R IRi )- 1RI _ Ri_ 1(RI_ 1Ri-l ) -1 Ri-' 1 and Rj (R~ Rj )- 1Rj' - R j_l (R~_IRj_I )-lR~-1 are orthogonal for any i # j. In certain problems the n x p j matrix Xj is orthogonal to the n x Pi matrix Xi for all i # j, i, j = 1. . . . . m. In such cases the Type I sums of squares due to Xj IX1 . . . . . Xj-1 reduce to much simpler forms for all j = 1. . . . . m. That is, if XlX j = 0pi • for all i # j, then X ~ R j _ l - X~(XlIX21 , . . IXj-1) -- (X~XlIX~X21 . . . IX~Xj_l) = Opj •
Therefore, the Type I sum of squares due to Xj IX1 . . . . . Xj_l for any j = 1. . . . . m is given by SS ( X j I X I , . . ., Xj_l) - Y'[Rj(R~Rj) - 1Rj, - Rj-1 (R~ --1 Rj -1 )-1 Rj_I]Y , = Y'{(Rj_~IXj)[(Rj_IIXj)'(Rj_~IXj)]-I(Rj_~IXj)
-- Rj-I(R~_IRj-1) = Y'[(Rj_IlXj)
-
[
--1
'
I
Rj_ 1}Y
Rj-IRj-~ X~Rj_I
ill, 1
Rj-IXj X~Xj
Rj-1 X~
Rj-1 (R~_IRj-1)-IR~_I }Y
---- Y'[Rj-1 (R~_I Rj-1)-I Rj_I' + Xj (X~Xj) -1 Xj' , )-1 , }y - Rj-1 (Rj_IRj-1 Rj_ 1 = Y'Xj ( X ~ X j ) - I x ~ y . In the previous sections, the model Y = X/3 + E was introduced within the context of a regression analysis. However, Y - X/3 + E is a very general model form that can be used to describe a very broad class of experiments, including complete, balanced factorial experiments. In the next section the general linear model, Y - X/3 + E, is adapted to complete, balanced factorial experiments where the n x n covariance matrix, cov(E) = X, may be a complicated function of many
5
Least-Squares Regression
97
variance components. In Chapters 6 and 7 more general applications of the model Y = X/3 + E are introduced.
5.7
T H E M O D E L Y = X ~ + E IN C O M P L E T E , BALANCED FACTORIALS
The experiment presented in Section 4.1 has b random blocks, t fixed treatments, and r random replicates nested in each block treatment combination. The btr x 1 random vector Y = (Ylll Yllr ..... Ybtl . . . . . Ybtr) t "~ Nbtr(l~, ~ ) where the btr x 1 mean vector and the btr x btr covariance matrix are given by . . . . .
//, = lb | (/Z1 . . . . . ],s t | l r and
=o'2[Ib|174
[ (l) Ib|
It-- tJt
|
l
+ cr2(Br)[Ib | It | Ir].
This experiment can be characterized by the general linear model Y - X~ + E. First, cov(E) equals the btr x btr covariance matrix E. Next, the btr x 1 vector # must be reconciled with the btr x 1 mean vector E(Y) = X/~ from the general linear model. Note that the btr x 1 mean vector/.t is a function of the t unknown parameters/Zl . . . . . /xt. Therefore, the general linear model mean vector X~ must also be written as a function of/Zl . . . . . /zt. One simple approach is to let the t x 1 vector/3 = (/zl . . . . . /zt)' and let the btr x t matrix X = lb | It | lr. Then the btr x 1 mean vector of the general linear model is
X/~ = (lb | It | lr)(/Zl . . . . . /Zt)' = (lb | It | lr)[1 | (#1
. . . . .
t.s t |
1]
= lb | (/Zl . . . . . ]Zt) t | l r = ft. The preceding example suggests a general approach for writing the mean vector # as X/3 for complete, balanced factorial experiments. First, if # is a function of p unknown parameters, then let/3 be a p x 1 vector whose elements are the p unknown parameters in #. In general these elements will be subscripted, such as lZijk. The elements of/3 should be ordered so the last subscript changes first, the second to the last subscript changes next, etc. The corresponding X matrix can then be constructed using a simple algorithm. The previous experiment is used to develop the algorithm rules. Rule X1 Construct column headings where the first column heading designates main factor letters and the second heading designates the number of levels of the factor, e. Place Kronecker product symbols | as described in Example 5.7.1.
98
Linear Models
Rule X2 Place le in the Kronecker product under the random factor columns. Rule X3 Place le elsewhere.
Example 5.7.1 Rules X1, X2, and X3 for the example model. Factor Levels e
B b
T t
R r
X = lbt~It Qlr
where Rule X2 is designated by ~ and Rule X3 is designated by . . . . . This formulation of the X matrix and its associated/3 vector is not unique. Another X matrix and/3 vector can be generated for the same experiment. This second formulation of X and/3 is motivated by the sum of squares matrices A,, from Section 4.2. In the example experiment, the sum of squares matrices for the mean, blocks, treatments, block by treatment interaction, and the nested replicates are given by 1
A1
=
A2-
1
1
~Jb | t J t | -Jrr (
11 )/ ~1 Ib-- z-Jb | 1 7 4 t
A3=~JbN
It-
r
Jt |
A4-- (Ib--1jb) ~ (It-- ~Jt)@!Jr A5--Ib|174
(1) lr--Jr
F
,
respectively. Matrices A1 through A5 can be rewritten as A1 - XlX~, A2 = Z1Z'~, A 3 -- X2X~, A4 = Z2Z~, and A5 = Z3Z3 where Xl - (1/~/-b)lb | (1/~/t)lt | (1/~/T)lr Zl = Pb | (1/~/7)lt | (1/~/7)lr
X2 = (1/q~)lb | Pt | (1/~/7)lr Z2 = Pb | Pt | (1/V/7)lr Z3
=
Ib @ It @ Pr
where the (e - 1) x e matrix P~ is the lower portion of an e-dimensional Helmert matrix. Note that X1 and X2 are associated with the fixed factor matrices and
5
Least-Squares Regression
99
Z 1, Z 2 , and Z 3 a r e associated with the random factor matrices 9 In this form X'1X 1 = 1, Z~Zl -- Ib-1, KEN2 = I t - l , Z2Z2 - I(b-1)(t-1), and Z~Z3 = Ibt(r-1). Now let the b t r x t matrix X = (X11X2) where X1 and X2 are the b t r x 1 and b t r x (t - 1) matrices defined earlier. Note that X~X2 = 01• Then define the t x 1 vector /3 such that X/~ = / ~ . Premultiplying this expression by (X'X) -1X' we obtain I
I
-- (XtX)-lxt$./, = [ ( X l I X 2 ) ' ( X l l X 2 ) ] - l ( X l I X 2 ) ' ~ ( X lt X l ) -1 X t1 1 t -1 t ].L X2X2) X 2
= or t
t~j l x / ~ l t j=l th = ( z t - z 2 ) / v ~ / a
f13 = (/Zl + ~2 - 2 # 3 ) / v / b r / 6 9
9
9
9
fit = (1~1 + 1~2 + ' "
+ lzt-1 - (t -
1)l~t)/v/br/[t(t
-
1)].
A third formulation of the matrix X can be constructed by writing A1 -- X1 ( X ~ X l ) - l x t 1, A2 = Z l ( Z ~ Z l ) - l z ~ , A3 = X 2 ( X ~ X 2 ) - l x ~ , A4 = Z 2 ( Z ~ Z 2 ) - l z ~ , and A5 = Z3(Z~Z3)-Iz~ where X1 = lb | It | lr Z l = Qb | It | lr
X2 = lb | Qt | lr Z2 = Q b | 1 7 4
Z3 = Ib | It | Qr and where the s x (s - 1) matrix Qe is given by 1
-1 0
Qe =
1
1
...
1
1
1
...
1
1
...
1
...
1
...
1
o
9
-2
0
0
0
0
0
9
.
.
o
o
o
-3
....
(e -
1)
100
L i n e a r Models
Note that the columns of Qe equal ~/j (j + l) times the columns of Pe where j is the column number for j = 1 . . . . . s - 1. Now let the btr x t matrix X = (X1 IX:2) where X1 and X2 are the btr • 1 and btr x (t - 1) matrices defined above. Note Xtl X2 = 01 x (t--1). It is apparent that a number of different forms of the matrices X, X1, X2 . . . . . Z 1, Z2 . . . . can be constructed in complete balanced factorial designs. Furthermore, in any particular problem, one form of the matrix X can be defined while another form of the X1, X2 . . . . . Z1, Z2 . . . . matrices can be used to construct the sum of squares matrices. For example, in the previous experiment, the btr x t matrix X can be defined as X = lb @ It | lr. Then with X 1 = lb @ It ~) l r , Z l -- Qb ~) It | lr, X2 -- lb t~ Qt | lr, Z2 = Qb @ Qt | lr, Z3 -~ Ib | It | Qr, the sum of squares matrices can be constructed as A1 -- Xl (X~X1)-lx'I, A2 = Z I ( Z ~ Z 1 ) - I z ~ , A3 = XE(X~X2)-lx~, A4 = Z2(Z~ZE)-Iz~, and A5 = Z3 (Z~Z3)-I Z~. In general, any acceptable form of the X matrix can be used with any acceptable form of the matrices X1, X2 . . . . . Z1, Z2 . . . . . where the later set of matrices is used to construct the sums of squares matrices.
EXERCISES 1. From Table 5.3.1, let B1, B2, and B3 represent the matrices for the sums of squares due to the overall mean, regression and residual, respectively. Prove B E = Br for r - 1, 2, 3 and BrBs - 0 for r ~ s. 2. Let Y = X/~ + E where X is an n x p matrix and E ~ Nn (0, cr2V) for any n x n symmetric, positive definite matrix V. (a) I s / ~ = ( X ' X ) - I x ' y an unbiased estimator of fl?
(b) Prove that if there exists a p x p nonsingular matrix F such that VX = XF then/~ = / ~ w where/~w is the weighted least-squares estimator of/~. 3. Let Y - Xfl + E where X is an n x p matrix and E -~ Nn (0, o'2V) for any n x n symmetric, positive definite matrix of known constants V. Prove that t'(X'V-1X)-Ix'v-1y is the BLUE of t'fl. 4. From Table 5.4.1, let B1, B2, and B3 represent the matrices for the sums of squares due to the overall mean, regression, and residual, respectively. Prove BrV is an idempotent matrix for r = 1, 2, 3 and BrVBs = 0 for r # s. 5. Let Rj be defined as in Section 5.6.
Let Bj = R j ( R ~ R j ) - I R ~ - R j - 1
(R~_IRj_I) -1 R 'j-1. (a) Prove Bj is an idempotent matrix of rank P j for j = 2 . . . . . m. (b) Prove B i B j = 0 for i :/: j, i, j = 2 . . . . . m.
5
Least-Squares Regression
101
6. Assume the model Y = Xfl + E where X is defined as in Section 5.6 and E "~ Nn (0, o-2In). Find the distributions of all the Type I sums of squares in Table 5.6.1. 7. Let Y = X / 3 + E where the n • (a + b ) matrix X - [X1 IX2] with n • a and n • b matrices X1 and X2, the (a + b ) • 1 vector ~ = (ill . . . . . flalfla+l . . . . . fla+b)', and E ~ Nn(0, o-2In). Let Y'XI(X'IX1)-IX'I Y, Y ' [ X ( X ' X ) - I x ' - X1 (X'1 X1)-Ix'I]Y, and Y'[In - X ( X ' X ) - I x ' ] Y be the sum of squares due to fll . . . . . fla, fla+l . . . . . fla+O Ifll . . . . . fla and the residual, respectively. (a) Prove X ~ X ( X ' X ) - I X ' = X I f o r / = X'.]
1, 2. [Hint: Show X ' X ( X ' X ) - I X ' =
(b) Find the distributions of the sums of squares due to fla+l . . . . . /~a and the residual and show they are independent.
fla+b Ifll . . . . .
(c) Construct a statistic to test the hypothesis H0 " fla+l = "'" = fla+b = 0 versus H1 " at least one fla+l . . . . . fla+O =/: O. What is the distribution of the statistic? 8. Consider the experiment from Exercise 7 in Chapter 4. Write the model as Y = Xfl + E defining all terms and the appropriate distributions explicitly. 9. Let Yij = a + bixj -+" Eij for i = 1, 2 and j = 1 . . . . . n. Assume E ( E i j ) -n O, v a r ( E i j ) = o"2, the E i j ' s are uncorrelated, and ~--~j=l x j = O. (a) Find the BLUE of bl - b2. Write your answer in terms of the Yij's and
Xj 'S. (b) Find the variance of the BLUE of bl - b2. 10. Let Y1 = # l + El, Y2 = / z 2 + E2, and Y3 = #1 + #2 + E3 where E(Ei) -- 0 and E ( E 2) = 2, E ( E i E j ) = 1 for i -r j = 1, 2, 3. (a) Find the BLUEs of/Zl and #2. (b) Find the covariance between the BLUEs of/zl and #2. (c) Find the BLUE and the variance of the BLUE of 2#1 + 3/x2. 11. LetYi = x i f l + U i f o r / = 1 . . . . . n a n d 0 < Xl < x2 < -.. < x n w h e r e Ui = E1 + E2 + . . . + Ei and the E i ' s are uncorrelated with E(Ei) -0, var(Ei) = o-2(xi - x i - 1 ) for i > 1 and var(E1) = o-2xl. (a) Find the BLUE of fl and show it depends only on (Yn, Xn). (b) Find the variance of the BLUE of ft. (Hint: Transform the Yi values into Yi* values such that Yi* - Yi - Yi-1 for i > 1 and YI* = Y1 .)
102
Linear Models
12. Consider the following design layout: 1
Replicates
(j)
Factor A (fixed) (i) 2 ...
Y11
Y21
9" "
Yal
Y12
Y22
"'"
Ya2
9
.
Yln
.
Y2n
Fan
"'"
Let Y - fl01a @ In + X/3 + E where Y = (Yll . . . . . Fin . . . . . Fal . . . . . Fan)t, 130 is an unknown scalar parameter, 13 = (/~l. . . . . /~p)' is a p • 1 vector of unknown parameters, X = X* | In with X* an a • p matrix of known values, p < a - 1 , l a' X * ---- 01• and the a n • 1 vector E ~ Nan (0, ~ ) with = Ia |
(cr2In -q-o'2Jn)].
(a) Let Y'A1Y, Y'A2Y, Y'A3Y, and Y'A4Y be the sum of squares due to rio,/~, lack of fit, and pure error, respectively. Find A1, A2, A3, and A4. (b) Find the distributions of Y'A1Y, Y'A2Y, Y'A3Y, and Y'A4Y. (c) Are Y'A1Y, Y'A2Y, Y'A3Y, and Y'A4Y mutually independent? (d) Assume no significant lack of fit in the model. Construct a test for the hypothesis Ho 9 13 - 0 versus H1 " 13 -r 0. 13. Consider this factorial layout: Factor A (fixed)
(i)
Factor B (fixed) (j)
1
2
Ylll Yl12
Y211 Y212
YI21 Y122
Y221 Y222
Let the 8 x 1 vector Y = (Yl11, Yl12, Y121, Y122, Y211, Y212, Y221, Y222)/. Assume the model Y = X / 3 + E where X = [X1 [Xz]X3 ]X4], Xl = 12|174 X2 = Q z | 1 7 4 X3 = 12|174 X4 = Qz~)Q2| Q2 = ( l , - l ) ' , / 3 = (/~1,/~2,/~3,/~4), and E --~ N8(0, tr218). Note that the vector 131 corresponds to the overall mean, 132 to the fixed factor A, 133 to the fixed factor B, and ~4 to the interaction of A and B. (a) Define the sums of squares due to 131,132, f13, and fla. (b) Find/3, the least-squares estimator of/3. (c) Calculate the standard error of/3.
5
Least-Squares Regression
103
14. Let Yij = tz + Bi + R(B)(i)j where/z is an unknown constant, and the Bi's and R ( B ) ~ i ) j ' s are uncorrelated random variables with 0 means and variances tr~ and tTR(B), 2 respectively, for i = 1 b and j = 1 r ,
9
..
,
.
.
.
.
.
.
(a) Write the model as Y = X/9 + E where E(E) = 0 and cov(E) = ~2. Identify all terms explicitly. (b) Construct the matrices for the sums of squares for the usual ANOVA table for this model and derive the corresponding expected mean squares. (c) Assume the Yij'S a r e normally distributed. Find the distributions of the sums of squares defined in part b.
6
Maximum Likelihood Estimation and Related Topics
This chapter deals with maximum likelihood estimation of the parameters of the general linear model Y = Xfl + E when E --~ Nn (0, E). The maximum likelihood estimators of/3 and ~ are the parameter values that maximize the likelihood function of the random vector Y. In the first section of the chapter, the discussion is confined to the cases where E = O21n and E = ~r2V when V is known. In the second section, the concepts of invariance, completeness, sufficiency, and minimum variance unbiased estimation are discussed. In the third section, maximum likelihood estimation is developed for more general forms of E. Finally, the likelihood ratio test and related confidence bands on linear combinations of the p • 1 vector/5 are examined.
6.1
MAXIMUM
OF fl AND
LIKELIHOOD
ESTIMATORS
0 .2
For the present, assume the model Y = Xfl + E where E ~ Nn(0, cr21n). Therefore, the likelihood function is given by
105
106
Linear Models 0 "2, Y) -- (271.0-2)-n/2e-{(Y-X~)'(Y-X3)/(2cr2)}"
s
The logarithm of the likelihood function is
log[s
0-2 , y)] = - - ~n log(2zr) - ~n log(o- 2) -- ( Y - Xfl) ( Y - Xfl)/(20-2).
The objective is to find the values of fl and 0-2 that maximize the function log[s 0-2, y)]. Take derivatives of log[s 0-2, y)] with respect to the p x 1 vector fl and 0-2, set the resulting expressions equal to zero, and solve for fl and 0-2. That is, 3 log[s
0-2, Y)]/3/~ = [ - 2 X ' Y + 2X'X/3]/(20- 2) = 0
and O log[e(/3, 0-2, y)]/00-2 = _n/(20-2) + {(y _ X/3)'(Y - X/~)/(20-4)} = 0. Solving the first equation for/3 produces X'X/3 = X'Y. If the n x p matrix X has full rank (i.e., if X'X is nonsingular), then the maximum likelihood estimator (MLE) of/3 is given by = (IRK)- lx,y.
Solve the second equation for 0-2 with/3 replaced by/3. The resulting maximum likelihood estimator of 0-2 is (~2 = ( y _ X 3 ) ' ( Y - X[3)/n = Y'[In - X ( X ' X ) - I x ' ] Y / n . The maximum likelihood estimator of 13 is a set of p linear transformations of the random vector Y. Since Y .-- N~(X/3, 0-2In), by Theorem 2.1.2, the MLE is "~ N p [ ( X ' X ) - l x t x / ~ ,
(X'X)-lx'(0-21n)X(X'X)
-1]
-'- Np [/~, 0-2 ( X ' X ) - 1].
Therefore,/3 is an unbiased estimator of/3. Furthermore, (In - X ( X ' X ) - I X ') (0-21n) = 0-2 (in --X(XtX) -1X') is a multiple of an idempotent matrix of rank n - p. Therefore, by Corollary 3.1.2(a), n 8 2 = Y'[In - X ( X ' X ) - l x ' ] Y '~ O"2Xn-p2('~) where ~. = (X/~)'[In - X ( X ' X ) - l x ' ] x / 3 / ( 2 0 -
2)
= [/~'X'X~ - - / ~ ' X ' X ( X ' X ) - m x t x / 3 ] / ( 2 0 -
2) = 0.
The MLE ~.2 is not an unbiased estimator of 0-2 since E(6 "2) = E[0-2Xn_p(O)/n] 2 (n - p)0-2/n. Finally, by Theorem 3.2.2,/3 and 0 2 are independent since ( X / X ) - l x t(0-21n)[l n -- X ( X / X ) - l x / ]
-- 0 - 2 [ ( X / X ) - l x t _ (XtX)-lxtx(xtx)-lx
Opxn.
']
6
Maximum Likelihood Estimation
107
In Chapter 5,/~ and 8 2 denoted the ordinary least-squares estimators (OLSEs) of/3 and cr 2, respectively. Note that the OLSE/3 equals the MLE/5, and the OLSE 8 2 is a multiple of the MLE 5 2 for the model Y = X/3 + E when E -~ Nn(0, O'21n) [i.e.,/~ - - / ~ and c~2 = n62/(n - p)]. Furthermore, the OLSE 6"2 is an unbiased estimator of cr 2 for this model. Therefore, the OLSE 6"2 is often used to estimate cr 2 instead of the MLE 6 2. In the following example the MLEs o f / 3 and cr 2 are derived for a one-way balanced factorial experiment.
Example 6.1.1
Consider the one-way classification described in Examples 1.2.10 and 2.1.4. Rewrite the model Yij - - # i -~- R(T)(i)j as Y - X/3 + E where Y = ( Y l l . . . . . Ylr . . . . . Ytl ..... Ytr) t, X -- It | l r , / 3 -- (/51,..., fit)' ---(/Zl . . . . . /zt)', and coy(E) - o'2It | Ir. Therefore, the MLE of 13 is given by /~ - - ( X ' X ) - I x ' y
= [(It |
lr) t(It
|
lr)] -1
(It | l r ) ' Y
-- [It | (1/r)l'rlY = (El ......
Ft.) I
where ?i. = ~ = 1 Yij/r. That is, the MLEs of fll . . . . . fit (or #1 . . . . . #,) are the observed treatment means. Furthermore, the MLE of 0.2 is 5 2 = Y'[I, | Ir - X(X'X)-Ix']Y/(rt) = Y'{It | Ir -- (It | lr)[(It | lr)'(It | l r ) ] - l ( I t | lr)'}Y/(rt) =Y'
[
lt|174
=Y'[It|
(') lt|
F
(lt|
]
Y/(rt)
Y/(rt).
Thus, the MLE of o.2 equals the sum of squares due to the nested replicates divided by tr. We conclude this section by briefly examining likelihood estimation for the model Y = X/3 + E when E -~ Nn (0, cr2V) and V is an n x n positive definite matrix of known constants. Since V is an n x n positive definite matrix there exists an n x n nonsingular matrix T such that V - TT'. Premultiplying the original model by T -1 we obtain T - 1 y _ T-1X/~ + T - 1 E Y* - X*/~ + E*
108
Linear Models
where Y* = T - 1 y , X* = T - I x , and E* = T - 1 E with E* --~ N,,(0, a21n). Therefore, the maximum likelihood estimator of/3 is /~ = (X* 'X* ) - l x* ' v = (XtV-1X)-I X t V - l y . Likewise, the MLE of a 2 is given by 6 2 -
(Y* -
X*/~)'(Y*
- X*/3)/n
= y t I V - 1 _ V - 1X ( X t V - 1X ) - 1X t V - 1] y / n .
It is left to the reader to find the distributions of/3 and 6 2 when E ~ Nn (0, aZv).
6.2
INVARIANCE PROPERTY, SUFFICIENCY, AND COMPLETENESS
In this section the invariance property, sufficiency, completeness, and minimum variance unbiased estimators are discussed. Definition 6.2.1 Invariance Property: Let the k x 1 vector/) = (01 . . . . . 0k)' be the MLE of the k x 1 vector 0. If g(O) is a function of 0 then g(0) is the MLE of g(O).
Example 6.2.1 Consider the one-way classification in Example 6.1.1. By the invariance property, the MLE of g(/3, a e) -- ~--~I=1~ i / t 7 is g(/~, 62) = (1/~/-~)1'//3
l't (71 ...... {Y'[It |
(Ir-
?t.)'
1 J r ) ] Y / ( r t ) } 1/2"
Sufficiency involves the reduction of data to a concise set of statistics without loss of information about the unknown parameters of the distribution. Thus, if the parameters of the distribution are of interest, attention can be focused on the joint distribution of the "reduced" set of statistics. In this sense, the reduced set of statistics provides sufficient information about the unknown parameters. The topic of sufficiency is addressed in the next theorem. T h e o r e m 6.2.1 Factorization Theorem: Let the n x 1 random vector Y -(Y1 . . . . . Yn)' have joint probability distribution function fv(Y1 . . . . . Yn, O) where 0 is a k x 1 vector of unknown parameters. Let S -- (S1 . . . . . Sr)' be a set o/ r statistics for r > k. The statistics S1 . . . . . Sr are jointly sufficient for 0 if and
6
Maximum Likelihood Estimation
109
only if fv(Y~ . . . . .
r . , o) = g ( S , O)h(Y~ . . . . . Y.)
where g(S, 0) does not depend on Y1 . . . . . Y, except through S and h(Y1 . . . . . Y~) does not involve O. Example6.2.2
L e t t h e n x l r a n d o m v e c t o r Y = (Y1 . . . . . Yn)' ~ N , ( o t l , , 0-2I,,). The statistics S1 = I',Y and $2 = Y'Y are jointly sufficient for 0 = (or, a2) ' since fu
.....
Yn, or, 0-2) = (2yg0-Z)-n/Ze-(Y-otl.)'(Y-trl.)/(2~2) : (2~0-2)-n/Ze-[Y'Y-2a(l'.Y)+naZl/(2a2) : (2~0-2)-n/2e-[Sz-2aSl+naZ]/(2~2).
The next theorem and example link the ideas of sufficiency and maximum likelihood estimation.
Theorem 6.2.2 I f S = ($1 . . . . . Sr)' are jointly sufficient for the vector 0 and if (~ is a unique MLE of O, then (~ is a function of S.
Proof:
By the factorization theorem
fu
. . . . . Yn, 0) = g(S, 0) h(Y1 . . . . . Yn)
which means that the value of 0 that maximizes f v (.) depends on S. If the MLE is unique, the MLE of 0 must be a function of S. II
Example 6.2.3
Consider the problem from Example 6.2.2. Rewrite the model as Y = Xa + E where the n • 1 matrix X = In and the n x 1 random vector E ~ N,(0, 0-2I,,). Therefore, the MLE of c~ is given by = (X'X)-~ X'Y = (1' 1,,)-11' Y =~'.
and the MLE of
0 -2
is #2 = Y'[In - X ( X ' X ) - ~ X ' ] Y / n = Y'[l.
- 1.(l'.1.)-ll'.]Y/n
= Y' [In - 1n J " ] Y/n" The MLEs c~ = S1/n and a2 = [($2 - S~/n)/n] and jointly sufficient for a and 0-2 where S1 = l'nY and $2 = Y'Y.
110
Linear Models
This section concludes with a discussion of completeness and its relation to minimum variance unbiased estimators. Definition 6.2.2 Completeness: A family of probability distribution functions {fx(t, 0), 0 6 | is called complete if E[u(T)] = 0 for all 0 6 | implies u (T) - 0 with probability 1 for all 0 6 | Completeness is a characterization of the joint probability distribution of the statistics T. However, the term complete is often linked to the statistics themselves. Therefore, a sufficient statistic whose probability distribution is complete is referred to as a complete sufficient statistic. Completeness implies that two different functions of the statistics T cannot have the same expectation. To understand this interpretation, let Ul (T) and u2 (T) be two different functions of T such that E[Ul (T)] : z'(0) and E[u2(T)] = r(0). Therefore, E[ul(T) - u2(T)] = 0. If the distribution of T is complete then ul(T) - u2(T) = 0 or ul(T) = u2(T) with probability 1. That is, an unbiased estimator of any function of 0 is unique if the distribution of T is complete. Suppose it is of interest to develop an unbiased estimator of r(0). Let T be sufficient for 0. Therefore, when searching for an unbiased estimator of r (0), we confine ourselves to functions of T. If T is a set of complete sufficient statistics, then there is at most one unbiased estimator of r(0) based on T. Since this estimator is the only unbiased estimator of r (0), trivially, it must be the unbiased estimator with the smallest variance. From the preceding discussion the class of unbiased estimators of r (0) based on complete sufficient statistics has at most one member. Therefore, to call this estimator "best" in its class is in some sense misleading, since there are no competing estimators in the class. With no competition these best estimators could perform very poorly. Surprisingly, minimum variance unbiased estimators based on complete sufficient statistics do in many cases have relatively small variances and, in that sense, do turn out to be good estimators. The next theorem identifies the complete sufficient statistics of/3 and 0.2 when Y = X/3 + E and E ,~ Nn (0, 0.2In).
Theorem 6.2.3 Let Y = X f l + E where Y is an n x 1 random vector, X is an n x p matrix of constants, ~ is a p x 1 vector of unknown parameters, and the n x 1 random vector E "~ Nn(0, 0.2In). The MLEs /3 = ( X ' X ) - I x ' y and ~2 _.. Y'[In - X ( X ' X ) - I x ' ] Y / n are complete sufficient statistics for ~ and 0.2. Furthermore, any two linearly independent combinations of fl and 0 .2 a r e also complete sufficient statistics for ~ and 0.2. Example 6.2.4 Consider the problem from Example 6.2.3. By Theorem 6.2.3, ~'. and Y'[ln - nJn]Y 1 are complete sufficient statistics for ot and 0.2. Furthermore,
6
Maximum Likelihood Estimation
111
~'. and Y'[In - ~ J . ] Y are independent random variables where t'. ~ N1 (ct, 0-2/n) and Y'[In - n1 J . ] Y "~ 0 -2 X2n_l(~, = 0). Therefore, E(~'.) = ot and E{(n - 3) [Y'(In - 1 J n ) y ] - l } = 0--2. Thus, the minimum variance unbiased estimator of c~/0- 2 is given by (n - 3)(~'.)[Y'(I. - 8 8
6.3
ANOVA M E T H O D S F O R F I N D I N G MAXIMUM
LIKELIHOOD
ESTIMATORS
In some models the ordinary least-squares estimators and certain ANOVA sums of squares provide direct MLE solutions, even when the covariance matrix cov(E) = E is a function of multiple unknown variance parameters. The topic is introduced with an example and then followed by a general theorem.
Example 6.3.1
Consider a two-way balanced factorial experiment with b random blocks and t fixed treatment levels. Let the bt observations be represented by random variables Yij for i = 1 . . . . . b and j = 1. . . . . t. The model for this experiment is Yi j = I~ j -Jr-B i q- B Ti j w h e r e # j is a constant representing the mean effect of the jth t r e a t m e n t level, Bi is a random variable representing the effect of the i th random block, and
BTij is a random variable representing the interaction of the ith block and the jth treatment. Assume Bi ~ i id N1 (0, a 2) and (|b @ P't) (B T11 . . . . . B Tbt) "~ Nb(t-1)(0, 0-2TIb| Furthermore, assume (B1 . . . . . Bb) and [BTll . . . . . BTbt] are mutually independent. Rewrite the model as Y = X/~ + E where the bt x 1 random vector Y = (Yll . . . . . Ylt . . . . . Ybl . . . . . Ybt)', the bt x t matrix X = lb @ It, the t x 1 vector/3 = (ill . . . . . fit) ' = (lzl . . . . . /zt)', and the bt x 1 error vector E = ( E l l . . . . . Elt . . . . . Ebl . . . . . Ebt)' ~ Nbt(0, ]g) with E=
0-2[Ib|
r [Ib|
( I t - - t1 j t ) ]
.
The covariance matrix E is derived using the covariance matrix algorithm in Chapter 4. The problem is to derive the maximum likelihood estimators of fll . . . . . /~t, a 2 and a2T . First, let the bt x bt matrix
p = [ibOl ] Ib | P't where P't is the (t - 1) x t lower portion of the t-dimensional Helmert matrix.
112
Linear Models
Recall PtP1t
: It
- -i1Jr,
P t,P t
-" It-l,
and l'tPt
= 0Ix(t-I).
By Theorem 2.1.2,
lib | 1'/] Y'~'Nbt(P'Xfl, P'EP) I~ | v't
l~y =
where
P'Xfl =
[ lb
| l't/~ 1
and
lb | P'tfl
P'EP =
[ t2~ 0
0 tr~rlb
|
] I/-1
The distribution of the b x 1 vector (Ib | I'/)Y is a function of the two unknown parameters l'tfl and cr2. The distribution of the b(t - 1) x 1 vector (Ib | Pt)Y is a function of the t unknown parameters P'tfl and tr2r. Furthermore, by Theorem 2.1.4, the two vectors are independent. Therefore, the MLEs of l'tfl and cr2 can be calculated separately from the MLEs of P'tfl and cr~r. First, model the b x 1 vector (Ib | I'/)Y as (Ib | l't)Y = lb | l't/3 -f- E1 Y1 = K101 + E1 where the b x 1 vector Y1 -- (Ib | l't)Y, the b x 1 matrix K1 -- lb, the unknown scalar 01 -- l'ti~ = ~-~=1 ]~J and the b x 1 random vector E1 ~ Nb(0, tEcrElb). Therefore, the MLE of 01 is
t 01 ~- (K'IK1)--1K1Y1 = ( l ~ l b ) - l ( l b | 1)'(Ib | l't)Y
b t i=1 j=l The MLE of t 2 ~ is given by t2t~ = Y'l[Ib - KI(KIK1)-IK'I]Y1/b. Therefore, the MLE of tr~ is -- lb(l'blb)-ll'bl(Ib | l't)Y/(tEb)
#2 = Y'(Ib |
1
~Jt]Y/(bt)
Note that the MLE of o'~ equals the sum of squares for blocks divided by bt. Now model the b(t - 1) x 1 vector (Ib | P't)Y as (Ib | P't)Y = Ib | ~ fl + E2
Y2 =K202+E2
6
Maximum
Likelihood Estimation
113
where the b(t- 1) x 1 vector Y2 - (Ib | P't)Y, the b(t- 1) • ( t - 1)matrix K2 - lb @ It-l, the (t - l) x 1 vector of unknown parameters 02 = P'tfl, and the b(t - 1) • 1 random vector E2 ~ Nb(t-1)(0, tr2rIb | I t - i ) . Therefore, the MLE of the ( t - 1) x 1 vector 02 is 02
--
(K~K2)-lK~Y2
= [(lb | l/-1)'(lb | I/-1)]-l(lb | l/-i)'(Ib | P'/)Y = [(1/b)l~, | P'/]Y. The MLE of cr2r is given by
6~r -- Y~[Ib |
Ke(K'eK2)-IK'2IYe/[b(t -
It-1 --
= Y'(Ib | P~t)'{lb | It-1 -- (lb | It-1)[(lb | (lb | It-1)'}(lb |
P't)Y/[b(t -
= Y'(Ib QPt)[(lb @ I t - l ) = Y'(Ib | Pt)
[(l)
Ib -- ~Jb
1)]
It-l)'(lb |
It-l)] -1
1)]
( ~ J b | It_l)
]
| It-1 (Ib |
=Y'
(Ib - ~Jb) | PtP't]Y/[b(t -1)l
=Y'
(Ib- ljb) | (It - ljt)]
(Ib
|
P't)Y/[b(t -
1)]
Note that the MLE of tr2r is the sum of squares for the block by treatment interaction divided by b(t - 1). Now the maximum likelihood estimators of (01, 0~) and the invariance property are used to derive the MLEs of the original parameters = (ill . . . . . j~t)t. Note that
l,tl
a't
~1
o~
"
Premultiplying by the t x t matrix (71It IPt) we obtain
1 ( 1tJt
( ltlPt)[o ~ 1 -4- PtP tI) ~ = tlltOl_.t_Pt02
[~Jt+ ( I t - ~ J t ) ] ~ - t
1
ltOl+Pt02
114
Linear Models
or 1 t~ --- - ltO1 -Jr-P t 0 2 . t
Therefore, by the invariance property, the MLE of 13 is given by 1 ~ -- t l t O 1 - + - e t 0 2
1
= - l t [ ( 1 / b ) l ' b | I't]Y + P t [ ( 1 / b ) l ' b | P't]Y
t
=
, 1 ' 1 (1/b)lb | -.It + (1/b)lb | PtP't Y
-- [ ( 1 / b ) l b, |
J
t
I 1- J t + lt - -,It 1 ) t
t
Y
= [(1/b)l; | ItlY = [(lb | lt)'(lb | | t ) ] - l ( l b | lt)'Y = (X'X)-Ix'y. The model from Example 6.3.1 belongs to a class of linear models where the MLE of ~ equals the ordinary least-squares estimator of ~ and the MLEs of the variance parameters in Z are linear combinations of the ANOVA mean squares for the random effects. The next theorem provides formulas for the MLEs Of a broad class of models, including the model from Example 6.2.1. 6.3.1 Let Y -- X ~ + E where Y is an n x 1 random vector, X is an n • p matrix o f rank p, ~ is a p • 1 vector o f unknown paramters, and the n • 1 random v e c t o r E ~ Nn(0, Z). Fori -- 1. . . . . m, l e t Y ' B i Y a n d Y ' C i Y be sums oj' squares corresponding to the various fixed and random effects, respectively, such m that In = ~-~i=1(Bi -t- Ci) w h e r e rank(Bi) -- Pi > 0, r a n k (Ci) - ri > 0, p -1 Pi a n d n ~-~im=l(Pi -4- ri). I f there exist unique constants ai > 0 s u c h thai' - ~-~im=l ai (Bi -4- Ci) then Theorem
i)
the maximum likelihood estimator o f ~ is given by ~ - ( X ' X ) - I x ' y if and only i f ~-~im=l B i X -- X, and
ii)
under the conditions that induce i), the maximum likelihood estimator o f ai is given by 5i - Y ' C i Y / ( p i q- ri).
Proof:
By Theorem 1.1.7, Bi and Ci are idempotent matrices for i - 1. . . . . m
B i C j -- Onxn for any i, j = 1 . . . . . m; B i B j - Onxn for any i =/: j; and therefore.
Bi q- Ci is an idempotent matrix of rank Pi + ri. These conditions imply ~-~ = ~"]~im=laZ 1(Bi -+- Ci).
6
Maximum Likelihood Estimation
115
m
i) A s s u m e E i = I B i X : X. Then BiCj : Onxn for all i, j implies C j X -~'~iml CjBiX -- Onxp for all j. Now let the n x p matrix Q = [Q11Q21"" [Qm] where Qi is an n x pi matrix of rank pi such that ni = Qi Q'i for each i = 1. . . . . m. Thus, QQ' = ~-~iml n i and ~im=l a~lni - - QA-1Q ' where A is a p x p nonsingular block diagonal matrix with ai Ipi on the diagonal for i = 1 . . . . . m. Therefore, X = ~i%1 BiX = QQ'X, which implies Q = X(Q'X) -1. The maximum likelihood estimator of/3 is given by /3 = ( X ' ~ - l X ) - l X ' ~ - l y =
i=l =
]} ]}1 ]
aF 1(Bi + C i) X
X'
X'
i=1
a~-lBi i=l
X
]
a~- 1(Bi -+- Ci) Y
Xt
a/-1Bi Y
X'
i=l
= (X'QA- 1Q'X)- 1X'QA- 1Q'y = (Q'X)-IA(X'Q)-IX'QA-IQ'y = (Q'X)-IQ'y = {[(Q'X)-I]'X'X}-I[(Q'X)-~]'X'y = ( X ' X ) - l x ' y . The "only if" portion of the proof of i) is omitted, but can be found in Moser and McCann (1995). ii) Before deriving the MLE of ai, we need to derive a particular relationship between the matrices X and Bi. From the proof of i), Q = X(Q'X) -1. Therefore, X(X'X)-IXQi - - Q i for i = 1. . . . . m. Premultiplying by QI produces X(X'X)-IXBi = B i . N o w , t o find the MLE of ai, write the likelihood function with/3 replaced by/3 and E -1 = ~ i m l a71 (Bi + Ci). Note that ai are the eigenvalues of E with multiplicity Pi -k- ri for i -- 1. . . . . m. Therefore, m
I~:l = H ai(Pi+ri) i=1
and
f (al . . . . . am, [3, Y)
--
(27r)-n/2]~Z[-1/2e -[(Y-x~)'~-'(Y-X~)]/2
a~(pi+ri)/2
-- (2~)-n/2 i=1 X e -(Y-X/~)'
[~m=l(Bi+Ci)/ai] (Y-X/~)/2
116
L i n e a r Model,,
Take derivatives of log f ( a l . . . . . a m , [3, Y) with respect to set the derivatives equal to zero, and solve for a j" log
f(al
.....
am,
aj for j = 1. . . . . m
[3, Y) -- - ( n / 2 ) log(2zr) m
-q- E [ - - ( p i -4- ri)/2] log(a/) -- (Y - XD)' i=1
(1/ai)(Bi -+-Ci) (Y - X/3)/2 and 0 log
f(al
.....
am,
~, Y)/Oaj = - ( p j + rj)/(2aj) +(Y
-
XC))'(Bj + C j)
(Y - X/3)/(2a}) = 0. Therefore, hj - (Y - X/3)'(Bj + C j ) ( Y -
Xf3)/(pj + rj)
= Y'[I~ - X(X'X)-~X'](Bj + Cj) [In --
X(X'X)-lX'lY/(pj + rj)
= Y'[(Bj + Cj) - X ( X ' X ) - l X ' B j -- B j X ( X ' X ) - l x '
+ X(X'X)-IX'BjX(X'X)-Ix']Y/(pj = Y'CjY/(pj + rj).
+ rj)
l
In Theorem 6.3.1 the lower limit on Pi is zero. Therefore, the theorem allows Bi to equal Onx n and thus admits situations where the number of sums of squares for fixed effects differs from the number of random effects sums of squares. Furthermore, note that the lower limit on ri is strictly positive, implying that the number of matrices Ci and the number of unknown variance parameters both equal m. This restriction is imposed so that the MLE estimators are defined, rather than being nonestimable. Example 6.3.1 is now reworked using Theorem 6.3.1.
Example 6.3.2
Consider the two-way balanced factorial experiment given in Example 6.3.1. The model can be written as Y - X/3 + E where X - lb | It, ~ -(31 . . . . fit)' and
+
(it
6
Maximum Likelihood Estimation
117
Let the sums of squares due to the overall mean, the random blocks, the fixed treatments, and the random interaction of blocks and treatments be represented by Y'B1Y, Y'C1Y, Y'BEY, and Y'CzY, respectively, where Pl - 1, rl = b - 1, P2 t - 1, r2 = ( b - 1)(t - 1) and
1 1 B1 - ~Jb | tJt Cl=
(ol) Ib--~-Jb
Bz=~-Jb|
It-t
|
t
Jt
Jr
C2= ( I b - b J b ) | Note that Ib | It -- B1 W C1 W B2 + C2. Furthermore, B1E = tcr2B1, C1E -tcrEC1,B2E = crErB2, and CEE = o'B2rCe. Therefore, ~-~-~=1ai(Bi + Ci) = tcrE[Ib | 1Jt] + crEr[Ib | (it -- 1Jt)] - E where al = to"2 and ae - trOT. Furthermore, 2
/__~IBiX = [ ( ~ J b |
(~Jb|
(It-{Jt))][lb|
= lb| =X. Therefore, by Theorem 6.3.1, the MLE of/3 is given by /~ = (X'X)-1 X'Y = [(lb | It)'(lb | It)]-l(lb | It)'Y " ~ (~z. 1 . . . . .
}z.t)t
and the MLEs of a l and a2 are al = t~B2 = Y'C1Y/b and
a2 = 62BT = Y'C2Y/[b(t - 1)]. Therefore, the MLEs of aB2 and a 2 r are
1
~jtlY/(bt)
118
Linear Models
and
1
(It
~Jt)]Y/
(t
1 .
These are the same MLEs derived in Example 6.3.1. Although this chapter deals mainly with maximum likelihood estimators of multivariate normal models, Theorem 6.3.1 also motivates a further generalization of the Gauss-Markov theorem. The Gauss-Markov theorem was introduced in Section 5.2 for the model Y = X/3 + E when the n x 1 error vector E had a distribution with E(E) = 0 and cov(E) = o'2In. In Section 5.4 the Gauss-Markov theorem was extended to include the model Y = X/3 + E with E(E) = 0 and cov(E) = cr2V where V is an n x n positive definite matrix of known constants. In the next theorem, the Gauss-Markov theorem is again extended to include an even broader class of covariance matrices.
Theorem 6.3.2 Let Y = Xfl + E where Y is an n x 1 random vector, X is an n x p matrix of rank p, ~ is a p • 1 vector of unknown parameters, and E is an n x 1 random vector with E(E) = 0 and cov(E) = E. For i = 1 . . . . . m, let Y'Bi Y and Y'Ci Y be the sums of squares corresponding to the various fixed and random effects, respectively, such that In = ~"~i%1(Bi + Ci) where r a n k ( B i ) - Pi >__O, rank (Ci) : ri > O, p = ~-~i%1Pi and n = ~-'~im=l(Pi + ri). If there exist unique m constants ai > 0 such that E = ~~i=1 ai(Bi + Ci) then the BLUE of t ' ~ is given by t' (X'X)-1X'Y if and only if ~ i % 1 B i X - - X . Proof:
(Sufficiency) Assume E i % l BiX = X. The BLUE of t'/3 is given by
t'/3 = t ' [ X ' Z - 1 x ] - l x ' x - l Y = t'
aFI(Bi +
X' i=1
1/
Ci) X
-1
a~- 1(Bi + Ci) Y
X' i=l
= t' (X'X)- 1X, y . (Necessity) The necessity proof is given in Moser (1995).
II
Theorem 6.3.2 is applied in the next example.
Example 6.3.3 Consider the model in Example 6.3.2. Since E = ~i%1 ai (Bi + Ci) and ~]iL1 BiX = X, by Theorem 6.3.2, the BLUE of s'/3 is s'(XtX)-Ix'y
-- s'(Y.1 . . . . .
where the t • 1 vector s = (sl . . . . . st)'.
Y.t) ' =
t ~ siY.i i=1
6
Maximum Likelihood Estimation
119
In the next section the likelihood ratio test in derived for hypotheses of the form H/3 = h where H is a q x p matrix of constants and h is a q x 1 vector of constants.
6.4
THE LIKELIHOOD RATIO TEST FOR = h
Tests of the hypothesis 1-1/3 - h are developed using the likelihood ratio statistic9 For the model Y = X/3 + E where E ~ N n ( 0 , 0.21n) the likelihood function is given by s
(2~0.2)-n/2e -(Y-x~)'(Y-X~)/(2cr2).
"2, Y) =
The likelihood ratio statistic is a function of two values: 1.
the maximum value of s 2, Y) maximized over all possible values of/3 and 0.2, that is, over all 0 < 0.2 < O0 and - ~
2.
the maximum value of s (/3, 0.2, y ) maximized over the parameter space defined by I-I/3 = h.
The likelihood ratio statistic, L is the ratio of these two values. max e(/3, cr 2, Y) ~,o2,nCj=h max e (/3, 0.2, y )
L-
/0,O"2
The denominator of L is maximized when the maximum likelihood estimators of/3 and 0-2 are used in s o-2, Y). That is, max s
o"2, Y )
--
(2:rr#2) -n/2e-~v-x~D)'~v-x/3~
~,O "2
.--(271,)-n/2(O.2)-n/2e-n/2 where/30 = (X'X) -1X'Y and 82 = Y'(In - X ( X ' X ) - I x ' ) Y / n . For the numerator of L, the likelihood function is maximized with respect to /3 and 0.2 under the restriction I-lfl - h. To this end, let G be a (p - q) x p matrix of rank p - q chosen such that the p • p matrix [~] has full rank with
Oq•
It -1
I-IG' = Let [G] = [RIS] where R is a p x q matrix and S is a p x ( p - q) matrix9 Therefore, I-IR = Iq, GS -- lp_q, HS - Oqx(p_q), and G R = O(p_q)• Note that
120
Linear Models
so R = H ' ( H H ' ) -1 and S = G ' ( G G ' ) -1. Also,
-]
[H' (HH')-11G ' (GG') -1 ] G
= Ip
or
H'(HH')-IH + G'(GG')-IG = lp. Rewrite Y = X/3 + E under H~) = h as
Y=X(RS)
G r
Y = XRHr + XSGr + E Y = X R h + XSGr
+ E.
Since X R h is known, rewrite the above model as
Z=KO+E where the n x 1 random vector Z - Y - X R h , the n x (p - q) matrix K - XS, the (p - q) x 1 vector 0 = G/3, and the n x 1 random vector E -~ Nn(0, tr2In). Therefore, the likelihood in the numerator is m a x i m i z e d under H/51 - h when the m a x i m u m likelihood estimators of 0 and ~r2 for the model Z = K 0 + E are used in the likelihood function. That is, max
s
cr :2, Y ) - - m a x
j~,0"2,H~--h
e(0, cr 2, Z)
0,0 "2
= = <2.>-n/'
where 0N -- (KtK) - I K ' Z and 6"2 _ Z'(In - K ( K ' K ) - I K ' ) Z I n .
Therefore, the
likelihood ratio statistic is given by
<2.>-n/'
Instead of using L as the test statistic, use the following monotonic function of L'
V = (L -2/n - 1 ) ( n - p ) / q = ( 6 2 / 6 .2 - 1 ) ( n -
p)/q
(6"2 _ 6 " 2 ) / q
~/(n
- p)
{Z'(In - K ( K ' K ) - ~ K ' ) Z - Y'(I# - X ( X ' X ) - I x ' ) Y } / q Y'(In - X ( X ' X ) - l X ' ) Y / ( n - p)
6
Maximum
Likelihood Estimation
121
A second form of V can be generated by noting that [In - X ( X t X ) - l x t ] z
-
[In - X ( X ' X ) - l x ' ] [ Y -
XH'(HH')-lh]
= Y - X ( X ' X ) - 1 X t y _ XH t ( H H ' ) - 1h + X (X'X)- ~X'XH' ( H H ' ) - 1h = [In -- X ( X ' X ) - l x t ] y or Z'[In - X ( X ' X ) - ~ X ' ] Z
= Y'[In - X ( X ' X ) - l x ' ] Y .
Therefore, V can be rewritten in a second form as Z'[X(X'X)-Ix ' - K(K'K)-IK']Z/q V--
Y'(In -- X ( X ' X ) - l X ' ) Y / ( n - p)
Another form of the likelihood ratio statistic, V, can be generated using Lagrange multipliers. This third form of V equals V =
(Hfl - h)'[H(X'X)-lH']
-1 (H/3 -
h)/q
Y'(In - X ( X ' X ) - l X ' ) Y / ( n - p) where/3 = (X'X) - l x ' Y . The denominator of the third form of L is the same as the other two forms. That is, the denominator equals max e(/3, 0 "2, Y) - (27l") -n/2 /~,O "2
(0"2)-n/2e-n/2
where ~2 was defined earlier. Lagrange multipliers are used to derive the numerator of the third form of L. By the Lagrange multiplier technique, the numerator of L is given by max
/3,o'2,H/~=h
where ~. is derivatives zero, solve expression. are
e(fl, 0.2, y ) = max [e(fl, 0-2 Y) - )~'(H/3 - h)] /0,O'2,~.
'
a q x 1 vector. To maximize the right side of this expression, take with respect to/3, 0.2, and ~., set the resulting expression equal to for/3, 0.2, and ~., and substitute these solutions back into the original Let e*(/3, 0.2, ~., y ) = ~(/3, 0.2, Y ) - Z ' ( H f l - h ) . Then the derivatives
Oe*(/~, 0 "2, ~., Y)/O/3 = ( - 2 0 . 2 ) - 1
Oe*(fl, a 2, )~, Y ) / O a 2 =
E- ~ n
( 2 X t X / ~ - 2X'Y) e(/3, 0 "2, Y) - H'X
,
(1)
1
+ 2--~ (Y - Xfl)'(Y - X/3) e(/3, 0.2, y ) (2)
0e* (/3, 0.2, ~., y)/0~. = - ( H / 3 - h).
(3)
122
Linear Models
Set equations (1), (2), and (3) equal to zero and solve for/3, 0 "2, and )~. Let the solutions of/3, 0-2, and )~, be designated as/3N, 62, and ~,N. From equations (1), (2), and (3): - X t X / ~ N -at- X'Y - H'~*~,Nwhere ~'N~*__ (~2~,/g (]~N ' (~2, Y)
6t~ - ( y -
(4)
X~N)'(Y-X~N)/n
(5)
I-I~N -- h.
(6)
/~N -- ( x t x ) - 1 X ' Y - (X'X)- 1i_it 9 . . . .~"N"
(7)
From (4):
Premultiplying each side of (7) by H and noting from (6) that H/3 N = h, we obtain H/~ N = H ( X t X ) - 1X'Y - H ( X t X ) - 1H'-*~.N -- h.
(8)
Solving the fight-hand equality in (8) for ~.~, we have ~.~ - [ H ( X ' X ) - I H ' ] - I [ H ( X ' X ) - I x ' y
- hi.
(9)
Substituting ~.~ into (4) and solving for/3N, we obtain - ( X ' X ) / ~ N + X'Y -- H ' [ H ( X ' X ) - I H ' ] - I [ H ( X ' X ) - I x ' y
- hl
3N = (X'X) -1X'Y - ( X ' X ) - I H ' [ H ( X ' X ) - I H ' ] -1 x [ H ( X ' X ) - I x ' y - h] or ~ N = ~ D -- ( X t X ) - 1 H i
[H(X'X)-IH'I-I[H(XtX)-1X'Y --
h].
(10)
_ hj}t
(11)
Substituting (10) into (5) and rearranging terms, we have #2 = {(y _ X ~ D ) _ X ( X t X ) - I H t [ H ( X t X ) - I H t l - I [ H ~ D
{(Y - X~D) -- X ( X ' X ) - I H ' [ H ( X ' X ) - I H ' ] - I [ H ~ D - h]}/n. Since (Y - X 3 D ) ' X ( X ' X ) - I H ' [ H ( X ' X ) - I H ' I - I [ H 3 D
-- h] = 0
and { X ( X ' X ) - I H ' [ H ( X ' X ) - I H ' I - I [ H / ~ D - h]} t {X(X t X ) - 1H ' [ H ( X ' X ) - I H ' ] -1 x [rl3D
-- h ] } = [ r I 3 D
-- h ] ' [ r l ( x ' x ) - l r r ] - l [ r l 3 D
- h],
6
Maximum Likelihood Estimation
123
equation (11) reduces to 8N2 = {(Y - X~D) t (Y - X~D )
(12)
+ (H/3D -- h)'[O(X'X)-mH'] -1 (H/3D -- h)}/n = ~ 2 + -1( H 3 D
- h ) ' [ H ( X ' X ) _ i l l , ] - 1 ( H 3 D -- h).
n
Therefore, max [s
~2, y ) _ ~,(H/3 - h)] = s
~2y)
= (27r)-n/2
(~2)-n/2e-n/2"
The likelihood ratio statistic L therefore equals L
=
Instead of using L as the test statistic, use a monotonic function of L"
V - (L - 2 / n -
1)(n- p)/q
(6"2N -- 62D)/q --
6.2/(n
-- p)
(H/3 o - h ) ' [ H ( X ' X ) - I H ' ] -1 (H/~ o -- h ) / q V'(I, - X ( X ' X ) - I X ' ) Y / ( n - p) The M L E / ~ D will subsequently be referred to as the least-squares estimator/3 since/3D = /3. The derivations of all three forms of V essentially follow the derivations presented by Graybill (1976). The distribution of V is needed before the statistic V can be used to test the hypothesis H/3 = h. Note that H/3 = H ( X ' X ) - I X ' Y ; therefore, by Theorem 2.1.2, H/3 ~ Nq(H/~, c r 2 H ( X ' X ) - I H ' ) . Under H0 " H/3 = h, H/3 - h Nq(0, cr2H(X'X)-lH'). By Corollary 3.1.2(b), (H/3 - h ) ' [ H ( X ' X ) - I H ' ] -l (H/~ h) ~ crZx2(~.) where ~. = 0 i f H/3 = h andS. > 0 i f H/3 -# h. Furthermore, by Theorem 3.2.1, ( H / 3 ) ' [ H ( X ' X ) - I H ' ] - I H / 3 is independent of Y'[I, X (X'X)- 1X']Y since
(H/~)'[H(X'X)-IH']-'H/~- y'X(X'X)-IH'[H(X'X)-IH']-IH(X'X)-Ix'y and [In - X ( X ' X ) - I x ' ] ( ~ 2 1 n ) [ X ( X ' X ) - I H ' [ H ( X ' X ) - ~ H ' ] - ~ H ( X ' X ) - I x
'] - On•
Since H/5 - h has the same covariance matrix as H/3 and both vectors are normally distributed, (H/3 - h ) ' [ H ( X ' X ) - I H ' ] -1 (H/5 - h) is also independent of Y'[In X(X'X)-IX']Y. Therefore, by Definition 3.3.3, V ~ Fq,n-p(~.). Note that all
124
Linear
Models
three forms of the statistic V are equal and therefore they all have the same kdistribution. The various forms of the likelihood ratio statistic are now applied to a number of example problems.
Example 6.4.1
Consider the model Y - X 3 + E where Y is an n x 1 random vector, X - (XI IX2) is an n x p matrix where X1 is an n x (p - q ) matrix and X2 is an n x q matrix,/3 - (/~0. . . . . [3p_q_l Iflp_q . . . . . tip-l)' is a p x 1 vector and the n x 1 error vector E ~ Nn(0, O'21n). The objective is to construct a likelihood ratio test for the hypothesis H0 " [3p_q . . . . . /Sp_l = 0 versus H1 " not all/~i - - " 0 for i = p -- q . . . . . p -- 1. The hypothesis H0 is equivalent to the hypothesis H/~ - h where the q x p matrix H [Oqx(p_q)llq] and the q x 1 vector h - 0. Since h = 0, Z - Y - X H ( H H ' ) - l h - Y. Furthermore, let the (p - q) x p matrix G = [Ip_q [O(p_q)xq ] S O that [HI has rank p and H G ' = Oqxp. Therefore, K - X G ' ( G G ' ) -1
--(XllX2) [ip0_q ] {[Ip_ql0 ] [iPoql }-1 =X 1 and the first form of the likelihood ratio statistic V equals
V = {Y'(In - XI(X'IX1)-Ix'I)Y - Y'(In - X(X'X)-lx')Y}/q. Y'(In - X ( X ' X ) - ~ X ' ) Y / ( n - p) Label Y = X ~ + E as the full model with p unknown parameters in/3. Label Y = Xl/3 (1) + E1 as the reduced model with p - q unknown parameters in /3 (1) = (150,..., ~p-q-1)'. The first form of the likelihood ratio statistic equals
V =
(SSER - SSEF)/q S S E F / ( n - p)
~ Fq,n_p(~. )
where SSEg is the sum of squares residual for the reduced model and SSEF is the sum of squares residual for the full model. A y level test of H0 versus Hi is to reject H0 if V > Fq,n-p • . Note that Y' (I n -- n1J n ) Y = SSRegR + S S E R = S S R e g F + S S E F , which implies SSER - SSEF = SSRegF - SSRegR where SSRegF and SSRegR are the sum of squares regression for the full and reduced models, respectively. Therefore, the likelihood ratio statistic for this problem also takes the form V = (SSRegF -- SSRegR)/q SSEF/(n - p)
Example 6.4.2
'~ Fq,n-p()~).
Consider a special case of the problem posed in Example 6.4.1. From Example 6.4.1 let the n x p matrix X = ( X I l X 2 ) w h e r e X1 equals the n x 1 vector 1,, and X2 equals the n x (p - 1) matrix Xc such that l'nXc = 01• and partition the p x 1 vector/3 as (/501/51. . . . . tip-l)'. The objective is to test
6
M a x i m u m Likelihood E s t i m a t i o n
125
the hypothesis H0 " /~1 = " ' " - - / ~ p - 1 = 0 versus H1 " not all /~i : 0 for i = 1 . . . . . p -- 1. F r o m E x a m p l e 6.4.1, the likelihood ratio statistic equals V = {Y'(In - ln(l',,1,,)-ll'n)Y - Y'(I,, - X ( X ' X ) - l x ' ) Y } / ( p
- 1)
Y'(I,, - X ( X ' X ) - ~ X ' ) Y / ( n - p) Y'[X(X'X)-~X '-
88
1)
Y'(I,, - X ( X ' X ) - I X ' ) Y / ( n
- p)
Y'[Xc (XcXc) ' - 1 ,Xc]Y/( p = Y'(I,,- X(X'X)-~X')Y/(n-
1) p) "" Fp_l,,,_p(~.).
Therefore, V equals the m e a n square regression due to/~1 . . . . . /3p_1 divided by the m ean square residual, as depicted in Table 5.3.1.
Example 6.4.3
Consider the one-way classification from E x a m p l e 2.1.4. The experiment can be m o d e l e d as Y = X/3 + E where the tr x 1 vector Y = (Yll . . . . . Ylr . . . . . Ytl . . . . . Ytr)', the tr x t matrix X = [It | lr], the t x 1 vector /~ -" (/~1 . . . . . /~t)t .~ (~L/,1. . . . . ]-s w h e r e / x l . . . . . ~L/~t are defined in E x a m p l e 2.1.4, and the n x 1 r a n d o m vector E " ~ N t r ( 0 , O.R(T)2It | Ir). The objective is to construct the likelihood ratio test for the hypothesis H0 : /~1 = . . . . /~t versus H1 : not all/~i equal. The hypothesis H0 is equivalent to the hypothesis H/3 = h with the (t - 1) x t matrix H = P't and the (p - 1) x 1 vector h = 0 where P't is the (t - 1) x t lower portion of a t-dimensional He l me r t matrix with
e'tet
:
It-l,
Pte' t : It - 7Jt, 1
and
ltPt
-- 0 i x ( t - I ) .
Note
= (X'X)-1 X ' Y = [ (It ~ lr)t (i t ~ l r ) 1-1 (It |
--
H/3-h=P; [H(X'X)-IH']
(1) It|
r
lr)'Y
Y
It N r r j Y =
-1 = {P't[(It ~ l r ) ' ( I t
(1) P;|
@ lr)]-lpt}
Y
-1
= (l~tPt) -1 ~ r = I t - 1 ~ r
It | L - X ( X ' X ) - I X' - It | L
-- {(It ~ lr)[(It ~ lr)'(It ~ l r ) ] - l ( I t ~ lr)'} :-It~)Ir--
=It~(Ir-ljr)
It@-Jr r
126
Linear Models
Therefore, the third form of the likelihood ratio statistic is given by
Y'
[(
P't @ ll'r
(It-1 ~
r)
(
P't ~
ll'r
)]
Y / ( t - 1)
V= Y'It |
(Ir-
s
--t)
Yt[(It - lJt) | lJr]Y/(t -1) =
~
Ft_l,t(r_l)(~.).
- rJr) 1 ] V/[t(r - 1)]
Y'[It|
Therefore, V equals the mean square for the treatments divided by the mean square for the nested replicates, which is the usual ANOVA test for equality of the treatment means.
CONFIDENCE BANDS ON LINEAR COMBINATIONS OF/3
6.5
The likelihood ratio statistic is now used to construct confidence bands on individual linear combinations of/3. Assume the model Y = X/3 + E where E ~ Nn (0, O'2In). Let the q x p matrix H and the q x 1 vector h from Section 6.4 equal a 1 x p row vector g' and a scalar go, respectively. The hypothesis H0 : I-I/3 = h versus H1 : I-I/3 ~ h becomes Ho : g'/3 = go versus HI : g'/3 ~: go. The third form of the likelihood ratio statistic equals
go) v = [Y'(I.- X(X'X)-lX')Y/(n-
p)][g'(X'X)-lg] ~
( g ' / 3 - go) = {[Y'(In - X ( X ' X ) - l X ' ) Y / ( n -
p)][g'(X'X)-lg]} 1/2
Fl,n-p(Z)
or
~
tn_p(X)
Therefore,
1m y
m
P(
,~y / 2
~ t n _ p _<
,/V <_t~ny_/ p2
)
where t~n--p Y / 2 is the 100(1 - y / 2 ) th percentile point of a central t distribution with n - p degrees of freedom. Substitute for ~ / ~ and solve for go - g'fl in the preceding probability statement. The resulting 100(1 - y)% confidence band on g'/3 is
g' fl + ty/2 -n_p{[Y'(In
-
X(X'X)
1X')Y/(n - p)][g'(X'X)-lg]} 1/2.
In the next example, confidence bands are placed on certain linear combinations of the treatment means in a one-way ANOVA problem.
6
M a x i m u m Likelihood Estimation
127
Consider the one-way ANOVA problem from Example 6.4.3. The objective is to construct individual confidence bands on/31 - / 3 2 , and on / ~ " - E I = I ~i/t -- llt/~/t. For the one-way ANOVA problem, n = tr, p - t,
E x a m p l e 6.5.1
(XtX)-1X'Y
/~ --
= [(It @ l r ) ' ( I t @ lr)] -1 (It @ l r ) ' Y = (~'1 . . . . . .
Y't.)'
and Y'[I. - X ( X ' X ) - I x ' ] Y
-- Y'{It | Ir - (It ~) l r ) [ ( I t @ l r ) ' (it @ l r ) l - l ( I t
=Y'[It|
@ lr)'}Y
(Ir-!Jr)]Y.
For 1~1 -- 1~2, the t x 1 vector g - (1, - 1 , 0 . . . . . 0)', g'/3 = El. - Y2. and g'(X'X)-lg-
( 1 , - 1 , 0 . . . . . 0)[(It ~) l r ) ' ( I t ~) lr)] -1 ( 1 , - - 1 , 0 . . . . . 0)' = 2/r.
For ~. = (1/t)l'ti~, the t x 1 vector g - ( 1 / t ) l t , g'~ - ~'.. and g'(X'X)-lg
=
[(1/t)|tl'[(It ~
lr)'(It |
lr)l-l[(1/t)|t]
= (rt) -1 .
Therefore, 100(1 - y ) % confidence bands on/~1 - ]~2 are ( [ - + tY/2 ~'1.- Y2. "tCr-1) 2Y' It •
( 1 ) Ir -- -rJ r
Y/[tr(r
- 1)]} 1/2
and 100(1 - y)% confidence bands on 13. are t Y/2 Y..-4-"t(r-1)
{E ( ' ) Y' It |
Ir -- -r J r
1/2 Y/[t2r(r-
1)]
.
EXERCISES 1. LetYi = flo-+-fllXi-k-Eif o r / - 1 . . . . . n where Ei "~ i i d N l ( O , 0"2). Construct individual confidence bands on 130 and/~1 and write the answer in terms of ~-]~Xi, E Yi, ~-~x2, ~ y2, ExiYi, andn.
128
L i n e a r Models
2. Consider the one-way classification from Example 6.4.3. Find the minimum variance unbiased estimator o f ~-~=1 /Zi//0,2(T)" 3. Let Y1, Y2. . . . . Y,, be a random sample of size n from a N1 (/z, 0"2) distribution w h e r e - c ~ < # < c~ and 0 ,2 > 0. Consider the y level test H0 " /z = /z0 versus H1 " /z 5~ /z0 where /.to is given. Show that the likelihood ratio y/2 test is equivalent to the two-sided test, reject H0 if IT[ > ts 1 where T -(t"- lzo)/[s/~], t" = ~in=l Yi and s 2 = ~-~i~=l(Yi - t ' ) 2 / ( n - 1). 4. Let Y = X ~ + E where E .~ Nn(0, 0,2In)~:_ A 100(1 - y ) % confidence interval on h'fl is h ' ~ + t ~ / 2 [ s ~ ( h ' ~ ) ] where std(h'~)is the estimated standard deviation of h'/~. (a) Find the distribution of L 2 where L is the length of the interval. (b) If p = 3, n = 20,/3 - - ( i l l , f12, f13) t, Y'Y = 418, X'X - 13, and X'Y = (5, 10, 15)', construct individual 95% confidence bands on ill,/~l +/~2, and/~1 -/~3. 5. Let Y1 --/~s "q-61, Y2 = / Z l +/z2 "~-61 -~-62, Y3 = / z 1 -~-63, and Y4 - / z 1 + #2 -]63 -~- 64 where/Zl and/z2 are unknown parameters and 6 - (61,62, 63, 64)'
S4(0, &I4). (a) Write the model as Y = X ~ + E where E - (El, E2, E3, E4)'. Define each term and distribution explicitly. (b) Find the MLE of/~ = (/z l,/z2)'. (c) Find the distribution of the MLE of/~. 6. Let Yij = lzi Jr- Eij where the Eij's are distributed as independent N1 (0, j0,2) random variables. Let Y = (Yll, Y12, Y21, Y22)'. Find the likelihood ratio statistic for testing the hypothesis H 0 9 /Zl = / z 2 versus H1 " /Zl ~;~ //,2. 7. Consider the paired t-test problem of Exercise 5 in Chapter 4. (a) Calculate the MLEs of #1,/z2, 0,~, and 0,Er. (b) Construct the likelihood ratio statistic for testing the hypothesis H0 /Zl - - //,2 versus H1 " //~l @ /-s 8. Consider the experiment from Exercise 6 in Chapter 4. Calculate the MLEs 2 2 2 and 0,CT(FP)" 2 of/Zll ..... /Z53, 0,P(F)' 0,C(FP)' 0, T ( F ) ' 9. Consider the experiment from Exercise 7 in Chapter 4. (a) Calculate the MLEs of ~111 . . . . .
2 , /Z222, 0,P(S)
0,2M(S) '
and 0,20(MS ) "
(b) Write the model as Y = X/3 + E. Define the vector p in Kronecker product form such that p ' ~ -/21.. -/22.. - (/Zlll +//.112 9 -~-/Z122 - /-s
- - /Z212 - - /Z221 - - / Z 2 2 2 ) / 4 .
6
Maximum Likelihood Estimation
129
(c) Find the MLE of p'/3 from part b. (d) Find the standard error of the MLE of p'/3. 10. Consider the experiment from Example 4.5.4. (a) Find ]~-1 in terms of aB2 and
a2v .
(b) Write the model as Y = X~ + E and find the MLEs of/3, a 2, and
a2v .
11. Under the conditions of Theorem 6.3.1, prove that the eigenvalues of E are al . . . . . am with multiplicities Pl + rl . . . . . Pm + rm, respectively.
7
Unbalanced Designs and Missing Data
In complete, balanced factorial experiments, the same number of replicate values is observed within each combination of the factors. Kronecker products may be used in such experiments to construct covariance and sum of squares matrices. However, in other types of experiments, the number of replicates per combination of the factors varies, or certain factor combinations may have no observations at all. Kronecker products are often not useful for constructing covariance and sums of squares matrices in such unbalanced and missing data experiments. To accommodate these unbalanced and missing data situations, replication and pattern matrices are introduced.
7.1
REPLICATION MATRICES
This section beings with a reexamination of the data in Table 5.1.1. This data set contains four distinct combinations of speed and grade; {(speed, grade)} {(20, 0), (20, 6), (50, 0), (50, 6)}. Four replicates (Y1 . . . . . Y4) were observed in the (20, 0) speed x grade combination, one replicate (I:5) in the (20, 6) combination, two replicates (Y6, Y7) in the (50, 0) combination, and three replicates (Y8, Y9, Y10) in
131
132
Linear Models
the (50, 6) combination. Thus, the observations Yi appear in k - 4 distinct groups with rl = 4, r2 - 1, r3 = 2, and r4 - 3 observations per group. Assume the model Y = X/3 + E where the 10 x 1 random vector Y = (Y1 . . . . . Y10)', the 3 x 1 vector/3 = (/31,/32,/33)', the 10 x 1 error vector E Nl0(0, cr2110), and the 10 x 3 matrix X is given by 14 | (1, 20,
0)
11 l~) (1, 20, 120) X
__
12 | (1, 50,
0)
13 | (1, 50, 300) where the three columns of X correspond to an intercept, a speed variable, and a speed • grade variable9 Note that the rows of X appear in four distinct groups with r j rows per group for j = 1 . . . . . 4. Therefore, the 10 • 3 matrix X can be rewritten as X = RXd where the 10 • 4 replication matrix R is given by 14
0
11
R-
0
12 13
and the 4 x 3 matrix Xd defines the distinct speed and speed x grade combinations where
Xd - -
1
20
0
1
20
120
1
50
0
1
50
300
In general, the model Y = X/3 + E can always be rewritten as Y - RXd/3 + E where rj >_ 1 is the number of replicate observations for the jth distinct set of x l . . . . . X p _ l values for j = 1 . . . . . k >_ p; the replication matrix R is an n x k block diagonal matrix with lrj on the diagonal; Xd is a k x p matrix of distinct Xp-1 values with 1~ in the first column; and n - ~--]~=l r j . The ordinary least-squares estimate of/3 is
Xl . . . . .
= (X~IDXd)-lX~ {DD -1 }R'Y - (X~DXcl)-lX~D'~
7
U n b a l a n c e d Designs and Missing Data
133
where the k x k diagonal matrix rl
r2 D-
0
R'R = 0
". rk
and the k x 1 vector Y - D - 1 R ' y = (~'1. . . . . ~'k)'. The random variable ~'j is the average of the observed Y values in the jth group identified by distinct Xl . . . . . Xp_ 1 values. The least-squares estimate of 0 "2 is 6.2 __ Y'[In - RXd(X~R'RXo)-IX~R']Y/(n - p) = Y'[In - RXd(X~DXo)-IX~R']Y/(n - p). The least-squares estimator of cr 2 is the mean square residual and this estimator is unbiased for cr 2 provided E(Y) = X/3 = RXo/3. In Table 5.5.1 the sum of squares for the mean, regression, lack of fit, and pure error were presented for the model Y = X/3 + E. These sums of squares are now reconstructed for the equivalent model Y = RXd/3 + E. First, substitute RXd for X in the regression sum of squares Y ' [ X ( X ' X ) - I x ' - 1Jn]Y and into the residual sum of squares Y'[In - X ( X ' X ) - l x ' ] Y to obtain SSReg = Y' RXd(X~DXd)-IX~R ' -
1 -Jn n
Y
SSRes = Y'[In - RXd(X~DXd)-IX~R']Y. Observe that the pure error sum of squares can be rewritten as 1
Irl -- VlJr, Y'ApeY = Y'
0 "..
0
Irk -- ~Jr~
I lr, - Y'
Y 1
0 1
rll
"..
In0
= Y'[In - R D - 1 R ' ] Y
0
1'rl
".. lrk
0
0 "..
rk 1
0
Y llrk
134
Linear Models Table 7.1.1 ANOVA Table with Pure Error and Lack of Fit Source
df
SS
Overall mean
1
y , ~J,~Y 1
R e g r e s s i o n (ffl . . . . , ffp-1)
p -- 1
Y ' [ S X d ( X d,O X d ) -1 X d! a t --
1Jn]V
Lack of fit
k_ p
Y'{R[D-1 _ X d ( X dt D X d )
X d, ] R , }Y
Pure error
n - k
Yt[In - R D - 1R']Y
Total
n
yty
--1
where Ape is the pure error sum of squares matrix defined in Section 5.5. Therefore, by subtraction the lack of fit sum of squares is SSLack of Fit - SSRes - SSPure Error = V'{[In - R X d ( X ~ D X d ) - l X ~ R '] - [In - R D - 1 R ' ] } Y = y ' { R [ D -1 _ X d ( X ~ n X d ) - l X ~ ] R / } y . Table 7.1.1 gives the ANOVA table for the model Y = RXd/~ + E. In the next example problem, a replication matrix is used in a one-way classification problem when the n u m b e r of observations in each level is unequal.
Example 7.1.1
Consider a one-way classification problem with three fixed treatment levels and r l = 3, r2 = 4, and r3 = 2 observations in each level, respectively. Let Yij represent the jth numbered observation in the i th fixed treatment level for i = 1, 2, 3 and j - 1 . . . . . ri. The 9 x 1 vector of observations is Y -- (Yll, Y12, Y13, Y21, Y22, Y23, Y24, Y31, Y32)'. A s s u m e the model Y - - l X d / ~ + E where the 9 x 1 random error vector E -~ N9(0, Crg(T)2I9), the 9 x 3 replication matrix
R=
I 13 0
0 14 12
the 3 x 3 matrix X d = I3, and the 3 x 1 v e c t o r / ~ = (1~1, 1~2, ]~3)t. For this model, k = p = 3 and n = 9. It is not possible to compute a lack of fit sum of squares since k - p = 0. Furthermore, ]~i represents the mean effect of the ith treatment level for i = l, 2, 3. The value k = 3 indicates that there are three distinct treatment levels and each treatment level is defined by a row of the Xd matrix. A 1 in the first row of the Xo matrix indicates treatment level one, a 1 in
7 Unbalanced Designs and Missing Data
135
the second row indicates treatment level two, and a 1 in the third row indicates treatment level three. For the example, the sum of squares due to the mean is Y' ~J9Y 1 with 1 degree of freedom. The sum of squares for regression is equivalent to the sum of squares for treatments with p - 1 -- 3 - 1 -- 2 degrees of freedom. The matrix for the sum of squares for lack of fit has zero rank. That is, R[D -1 - Xo(X~DXo)-IX~]R ' - 0 or R D - 1 R ' - RXd(X~DXo)-IX~R '. Therefore, the sum of squares for treatments equals
Y' [RXd(X~tDXd)-IX~R' - ~J9] Y = Y' [RD-1R ' - 1 ~J9 ] Y = y,
01 / 1
1
0
~J4
1 ~J2
-- ~J9
g.
Finally the sum of squares for pure error equals I13 Y'[I9 - R D - 1 R ' ] Y -
1 - ~J3
14 - 1 ~J4
Y' 0
O
1 I2 - ~J:
1 Y.
Replication matrices can also be used when it is of interest to partition the regression sum of squares into Type I sums of squares. Assume that there are k distinct groups of x values with r j replicate observations in each group for j - 1. . . . . k. The n x p matrix X can be written as
X = RXd = R[XldlX2dl""" IXmd] where R is the n x k block diagonal replication matrix with lrj on the diagonal for j - 1. . . . . k and Xd -- [XldlX2dl''" IXmd] is a k x p matrix. Each k • Ps matrix Xsd contains the k distinct values of Ps x variables. Furtherm more, p - ~-~s=l Ps. Let Xld -- lk, S1 = RXld, 82 = R[XldlX20] . . . . . Sm-1 = R[XldlX2dl"" IXm-l,d], and Sm = RXd. The Type I sums of squares using R, S1 . . . . . Sm and Y are given in Table 7.1.2. In the next example, a replication matrix is used to generate Type I sums of squares when the model has a covariance matrix with several variance components. Consider a two-way cross classification with b random blocks, t fixed treatment levels, and rij >__ 1 replicate observations per block treatment combination for i - 1 . . . . . b and j = 1. . . . . t. The total number of observations n = E L 1 ~tj= 1 rij. Use the model Y = RXd/~ + E where the n x 1 random vector Y = (Ylll . . . . . Yllr~l . . . . . Ybtl . . . . . Ybtrbt) t, the n x bt replication matrix R is a block diagonal matrix with lr,j on the diagonal; the bt x t matrix Xd = E x a m p l e 7.1.2
136
Linear Models Table 7.1.2 Type I Sums of Squares with Replication Source
df
Type I SS
Overall mean Xl
Pl = 1
y, 1 nJnY
X21Xl
P2
Y'[S2(S~S2)-Is~- 88
X31Xl, X2 9
P3
y'[S3(S~S3)-IS~ -- S2(S~S2)-Is~]y
XmlX1. . . . . Xm-1
Pm
yt[Sm(S~Sm
Lack of fit
k- p
y'{R[D -l _ Xd(X~DXd)-lX~]R'}Y
Pure error
n- k
Y'[In - RD-1R']Y
Total
n
y'y.
t - S m _ l ( S m t_ l S m _ l ) ) -1 Sm
-
t 1Sm_I]Y
(XldlX2d) = (lb | l t l l b | Pt) where P't is the (t - 1) x t lower portion of a t-dimensional Helmert matrix; the t x 1 v e c t o r / ~ = (flllfl2 . . . . . fit)'; and the n x 1 random vector E - (Elxl . . . . . Ellrll . . . . . Ebtl . . . . . Ebtrbt)' ~'~ Nn(0, 2~). To construct the n x n covariance matrix E, redefine the random variable Eijv as
Eijv -- Bi + BTij + R(BT)(ij)v for v -- 1 . . . . . rij where the random variables Bi represent the random block effect such that Bi "~ i id N1 (0, aB2); the random variables B T/j represent the random block treatment interaction such that the b(t - l) x I vector (Ib | P't) (BTll . . . . . BTbt)' "~ Nbr (0, a2Brlb | I t - l ) ; and the random variables R(BT)r represent Furtherthe random nested replicates such that R(BT)(ij)v "~ iid N1 (0, O'R(BT)). 2 more, assume that Bi, (BTll . . . . . BTbt)', and R(BT)r are uncorrelated. Next, construct the covariance matrix when there is one replicate observation per block treatment combination. If rij -- 1 for all i, j then the bt x bt covafiance matrix is given by
~d-
aE[Ib |
+a2Br l i b |
(It--~Jt)]
where the subscript d on ~-]d denotes a covariance matrix for one replicate observation in each of the bt distinct block treatment combinations. Note that the variance of R(BT)(ij)v is nonestimable when rij = 1 for all i, j . Therefore, O,)R(BT 2 does not appear in ~d. Now expand the covariance structure )-']dto include all n = ~>'~-I ~"~tj-1 rij observations by premultiplying Eo by R, postmultiplying ~d and R', and adding a variance component that will account for the estimable variance ofthe R(BT)(ij)k variables whenrij _ 1. Therefore, t h e n x n covariance
7
Unbalanced Designs and Missing Data
137
Table 7.1.3 Type I Sums of Squares for Example 7.1.2 Source
df
Type I SS
Overall mean/z
1
Y, S1 (StlS1)-I Stl y = yt ~J,,Y 1
Block (B)l/z
b- 1
yt[Tl(TtlT1)-lTt
y,
Treatment (T)I/z, B
t- 1
BTIIz, B, T
(b -
Rep ( B T )
n - bt
Total
n
1 - 1Jn]y , -1 , -1 , ] y [$2($2S2) S 2-Tm(T~IT1) T 1
y'[T2(T~T2)-IT~ - S2(S~S2)-Is~]Y
1 ) ( / - 1)
Y'[I. - R D - 1 R ' ] Y
y'y
matrix E is given by E = REdR' +
2 aR(Br)In.
The ANOVA table with Type I sums of squares can also be constructed for this example. First, consider what the sums of squares would be if there was one replicate observation per block treatment combination. If r i j "-- 1 for all i, j, the matrices for the sums of squares due to the overall mean, blocks, treatments, and the block by treatment interaction are given by 1
1
,
B1 -- ~ J b @ -Jtt - X l d ( X l d X l d ) - l X t l d
B2 --
Ib --
1
B3 = ~.Ib |
B4 --
t~)
--- Z l d ( Z t l d Z l d ) - l Z l d
( It -- ~jt )
( 1)(1) Ib -- ~ J b
@
,
= Xzd(XzdXzd)
It -- t J t
-1 Xzd,
= Z2d ( Z 2 d Z 2 d ) - I Z~ d
respectively, where Xld = lb | I t , Z l d = Pb @ l t , X2d -- lb | P t , Z2d -- Pb @ Pt, and P~ is the (e - 1) • e lower portion of an e-dimensional Helmert matrix. Let S l - " l X l d - - 1,,, T1 = R(XldlZld),S2 -- R(XldlZldlX2d), T2 = R(XldlZldlXEdlZEd), and D = R'R. Matrices Sl, T1, S2, T2, and R are used to construct Type I sums of squares in Table 7.1.3. In the next section pattern matrices are used in data structures with crossed factors and missing data.
138
Linear Models Random blocks
Fixed treatments
Figure 7.2.1
7.2
Y21
1 2 3
YI2 Y13
Y31 Y32
Y23
Missing Data Example.
PATTERN MATRICES AND M I S S I N G DATA
In some data sets, certain data points accidentally end up missing. In other data sets, data points are intentionally not observed in certain cells. For example, in fractional factorials or incomplete block designs, data are not observed in certain cells. In either case, the overall structure of such experiments follows the complete, balanced factorial form except that the actual observed data set contains "holes" where no data are observed. These holes are located in patterns in fractional factorial experiments and incomplete block designs. However, in other experiments, the holes appear irregularly. Such experiments with missing data can be examined using pattern matrices. The following example introduces the topic. Consider the two-way cross classification described in Figure 7.2.1. The experiment contains three random blocks and three fixed treatment levels. However, the observed data set contains no observations in the (1, 1), (2, 2), and (3, 31) block treatment combinations and one observation in each of the other six block treatment combinations. This may have arisen from a balanced incomplete block design. We begin our discussion by first examining the experiment when one observation is present in all nine block treatment combinations. In this complete, balanced design, let the 9 x 1 random vector Y* = (YI1, Y12, Y13, Y21, Y22, Y23, Y31, Y32, Y33)'. Write the model Y* = X*fl+E* where the 9x3 matrix X* = 13@13, the 3 • 1 vectorj~ = (/31, ]32,/33)', the9x 1 error vector E* = (Ell, El2, El3, E21, E22, E23, E31, E32, E33)' ~" N9(0, ~*), and
E* =o2113 |
+cr2T [13@ ( I 3 - - ~ J 3 ) ]
9
The 9 x 9 covariance matrix E* is built by setting Eij = Bi q- (BT)ij and applying the covariance matrix algorithm from Chapter 4. For this complete, balanced data set, the sums of squares matrices for the mean, blocks, treatments, and the block
7
Unbalanced Designs and Missing Data
139
by treatment interaction are given by 1
1
B1 = ~J3 | ~J3 = Xl(X/1Xl)-lXl B2 --
(131- )~J3
1 | B3 -- ~-J3 B4 =
1
,_1, Zl
@ ~J3 - Zl(ZllZl
( 13 - ~-J3 1 )_X2(X~X2)
(1 13 - )~J3
(1)
@ 1 3 - ~J3
-
1X2,
--Z2(Z2Z2)-lz2
respectively, where X 1 = 13 | 13, Z 1 = Q3 | 13, X2 = 13 | 1
1
Z2 - Q3 |
and
01
-2"
The actual data set only contains six observations since Yll, Y22, and Y33 are missing9 Let the 6 x 1 vector Y - (Y12, Y13, Y21, Y23, Y31, Y32)' depict the actual observed data set. Note that Y = MY* where the 6 x 9 pattern matrix M is given by
M
m
-0
1
0
0
0
0
0
0
0
0 0 0 0
0 0 0 0
1
0
0
0 0 0
1
0
0 0
0 0
0 0
0
0 0 0 1
0 0 0 0
0 0 0 0
0
0
0
0
0
0
0
1
0
1
Furthermore, note MM' -- I6 and the 9 x 9 matrix 0 1
0
MtM 1 1 0 The vector of actual observations Y contains the second, third, fourth, sixth, seventh, and eighth elements of the complete data vector Y*. Therefore, the second, third, fourth, sixth, seventh, and eighth diagonal elements of M'M are
140
Linear M o d e l s
ones and all other elements of M'M are zero. Furthermore, M is a 6 • 9 matrix of zeros and ones, with a one placed in the second, third, fourth, sixth, seventh, and eighth columns of rows one through six, respectively. Since the 9 x 1 complete vector Y* ~ N9(X*/~, I2"), the 6 x 1 vector of actual observations Y = MY* ~ N6(X~, •) where 0
1 0 0 0 0 1 0 0 0 0 1 0
0 0 0
0 0 0
0 0 0
0 0 0
i
0 0
0 0
0 0
0 0
1 0 0 1
0
0
0
0
0
0
0
0
0
1
0
X/3 = MX*/3 =
0
IiilIi!l
-" (f12, f13, ill, f13, ~1, f12) t
and E = ME*M' = M {tr2[I3 | J31 + crB2r [I3|
=o2113 |
+tr2T [13|
(I3 - ~ 1j 3)] } M'
(12--~J2)l .
The Type I sums of squares for this problem are presented in Table 7.2.1 using matrices Sl = MX1, T1 -" M[X1 [Zl], and S2 = M[X11Zl IX2]. The sum of squares matrices A1 . . . . . A4 is Table 7.2.1 were calculated numerically using PROC IML in SAS. The PROC IML output for this section is presented in Section A2.1 of Appendix 2. The resulting idempotent matrices are as follows: 1
1
A1 = ~J3 | ~J2 A2 =
(1 3 - ~J3 ,) 1
1
| ~J2
I ll( 1
A3--~
-1
1 A4 = ~
-1
2
1
1
2
1 1
1
1 2 - ~J2
|
-1
@
)
1 1 2 - ~J2
9
1
Depending on the pattern of the missing data, some Type I sum of squares matrices may have zero rank. In the example data set, the Type I sums of squares
7
Unbalanced Designs and Missing Data
141
Table 7.2.1 Type I Sums of Squares for the Missing Data Example Source
df
Type I SS
Overall mean #
1
Y'S1 (S'lSt)-! S'l Y
Block (B)l/z
2
Y'[T1 (T'IT1)-IT~1 -
Treatment (T)l/z, B
2
y t [ s 2 ( s ; s 2 ) - l s ; - T l ( T t l T l ) - l T t l ] y - - Y'A3 Y
BTI/z, B, T
1
Y'[I6 - S2)(S~S2)-Is~]Y
Total
6
y'y
-- Y'A1Y Sl (StlSl)-lStl]Y
--
Y'A2Y
-- Y'A4Y
matrices for all factors have ranks greater than zero. However, because there are only n = 6 observations, there is only one degree of freedom for the BTI#, B, T effect. Furthermore, the sum of squares for BTI#, B, T is calculated as the residual effect after/z, B, and T have been removed. Therefore, the sum of squares matrix for BTIIz, B, T is A4 -- I6 -- S 2 ( S ~ S 2 ) - l s ~ . Since the matrices A1 . . . . . A4, M, and E have been determined for the example problem, the distributions of Y'AsY for s = 1. . . . ,4 can be derived. From the PROC IML output note that A1E - (2a 2 + ~1a 2 r ) A 1, AzE - (2a2 + 1 2 ~0.BT)A2, A 3 E = 0.2TA3 , a n d A n E = 0.2TA4. Therefore, by Corollary 3.1.2(a), Y'A1Y ~ (2082 + ~0"BT)X1 12 2 (~.1), Y t AzY '~ (20.2 + -30"BT)X2 12 2 (/~.2), Y'A3Y '~ 2 2 0"BTX2 (~.3), and Y'AaY ~ 0"2TX?(~.4) where
[(
~1 = ( X / ~ ) t A l ( X / ~ ) /
2
2a 2 +
1 )3
-~G2T
= (th + th + t~3)2/(6o~ + O-~r)
= (~12 + ~2 "1--~2 __ ~1~2 -- ~1~3 -- ~2/~3)/ [2
(60 .2 + 0.2r) ]
~.3 = (X~)'a3(xct)/(20.2r) =
+
+
-
x4 = (XCl)'A4(XCl)/(2Cr~r) =0.
Furthermore, As EAt = 06x6 for all s ~ t, s, t = 1. . . . . 4. By Theorem 3.2.1, the
142
Linear Models
four sums of squares Y'A1Y . . . . . Y'A4Y and mutually independent. Therefore, F* =
Y'A3Y/2 Y'A4Y/1
~ F2.1(~.3)
where ~3 = 0 under the hypothesis H0 "/~1 - f12 -" f13. A ~, level rejection region for the hypothesis Ho "/~1 -/~2 =/~3 versus Hi" not all/~'s equal is to reject Ho if F* > F,2,1 • where F,2,1 • is the 100(1 y) percentile point of a central F distribution with 2 and 1 degrees of freedom. Note that Ho is equivalent to hypothesis that there is no treatment effect. The Type I sums of squares can also be used to provide unbiased estimators of the variance components cr2 and tr2T . The mean square for B T l l z , B , T provides an unbiased estimator of cr2T since ~.4 = 0 and E(Y'A4Y/1) = E(cr2 TX2 (0)) - trOT. Constructing an unbiased estimator for cr2 involves a little more work. In complete balanced designs, the sum of squares for blocks can be used to find an unbiased estimator for cr2. However, in this balanced, incomplete design problem, the block effect is confounded with the treatment effect. Therefore, the Type I sum of squares for Block(B)l/z has a noncentrality parameter ~.2 > 0 and cannot be used directly with Y'A4Y to form an unbiased estimator of cr2. One solution to this problem is to calculate the sum of squares due to blocks after the overall mean and the treatment effect have been removed. After doing so, the block effect does not contain any treatment effects. As a result, the Type I sum of squares due to Block (B)I#, T has a zero noncentrality parameter and can be used with Y'AaY to construct an unbiased estimator of tr2. The Type I sum of squares due to Block (B)I#, T is given by SS Block(B)l#, T = Y A 3 Y *' where A~ = T~ ~,~1 t w , , ' ~v ,1 ) - 1TT, -- S~(S~tS~) -1 Syz, with S~ = M [ X I IX2] and T 91 -_ M[XIIX2 IZ1]. Note that the matrices S~ and T7 now order the overall mean matrix X1 first, the treatment matrix X2 second, and the block matrix Z~ third. From the PROC IML output, the 6 • 6 matrix A~ for the example data set equals
,
A3=6
2
1
1
2
1
1
-1
-1
-1
-1
-2
2
1
-1
2
1
-2
-2
1
2
-1
1 2
1
1 -1
-1 -1
1
-2
1
2
1
-1
1
1
2
"
Furthermore, tr(A~E) - (3tr 2 + tr2T) and (X/~)tA~ (X/~) = (~2, ~3, ~1, ~3, ~1, ~ 2 ) A ; (~2, ~3,/~1, ~3, ~1, ~2) t -" 0.
7
Unbalanced Designs and Missing Data
143
1 '[A; Therefore, an unbiased estimator of tr~ is provided by the quadratic form ~Y A4]Y since
E
Y'[A; - A4IY
= g{(3cr 2 +
~T) -- ~r} =
~
The procedure just described is now generalized. Let the n* x 1 vector Y* represent the observations from a complete, balanced factorial experiment with model Y* - X*/3 + E* where X* is an n* x p matrix of constants,/3 is a p x 1 vector of unknown parameters, and the n* x 1 random vector E* -~ Nn (0, E*) where E* can be expressed as a function of one or more unknown parameters. Suppose the n x 1 vector Y represents the actual observed data with n < n* where n* is the number of observations in the complete data set, n is the number of actual observations, and n* - n is the number of missing observations. The n x 1 random vector Y = MY* ~ Nn (Xfl, E) where M is an n x n* pattern matrix of zeros and ones, X = MX*, and E = M E * M ' . Each of the n rows of M has a single value of one and (n* - 1) zeros. The ij th element of M is a 1 when the i th element in the actual data vector Y matches the jth element in the complete data vector Y* for i = 1 . . . . . n and j = 1 . . . . . n*. Furthermore, the n x n matrix M M ' = In and the n* x n* matrix M ' M is an idempotent, diagonal matrix of rank n with n ones and n* - n zeros on the diagonal. The ones on the diagonal of M ' M correspond to the ordered location of the actual data points in the complete data vector Y* and the zeros on the diagonal of M ' M correspond to the ordered location of the missing data in the complete data vector Y*. Finally, l e t Xs(XtsXs)-lXts a n d Zs(Z'sZs)-lZts be the sum of squares matrices for the fixed and random effects in the complete data set for s - 1 . . . . . m where rank(X,) = ps >_ 0, rank(Zs) = q~ > 0, a n d X 1 = In*. Let Ss = M [ X l l Z l l X 2 l - ' - IZ~-llXs] and Ts = M [ X l l Z l l X 2 l . . . IXslZ~] for s - 1 . . . . . m. The Type I sum of squares for the mean is Y ' S l ( S ' l S l ) - l s ~ y = Y' nJn 1 Y. The Type I sums of squares for the intermediate fixed effects take the form t Y'Ss(S'sS,) --1 SsY - Y ' T s - I ( T 's _
1
/ Ts-1)- 1 Ts_IY.
The Type I sum of squares for the intermediate random effects take the form ! Y'T~(T'~Ts)- 1 T~Y - Y'S~(S'~Ss)-IS's Y
for s = 2 . . . . . < m. However, the missing data may cause some of these Type I sum of squares matrices to have zero rank. Furthermore, it may be necessary to calculate Type I sums of squares in various orders to obtain unbiased estimators of the variance components. The estimation of variance components with Type I sums of squares is discussed in detail in Chapter 10.
144
7.3
Linear M o d e l s
USING REPLICATION AND PATTERN MATRICES TOGETHER
Replication and pattern matrices can be used together in factorial experiments where certain combinations of the factors are missing and other combinations of the factors contain an unequal number of replicate observations. For example, consider the two-way cross classification described in Figure 7.3.1. The experiment contains three random blocks and three fixed treatment levels. The data set contains no observations in the (1, 1), (2, 2), and (3, 3) block treatment combinations and either one or two observations in the other six combinations. As in Section 7.2, begin by examining an experiment with exactly one observation in each of the nine block treatment combinations. In this complete, balanced design the 9 x 1 random vector Y* = (Ylll, Y121, Y131, Y211, Y221, Y231, Y311, Y321, Y331)'. Use the model Y* = X*/3 + E where E --~ N9(0, ]E*). The matrices X*,/3, E, E*, X1, Z1, X2, Z2, and M are defined as in Section 7.2. For the data set in Figure 7.3.1, let the 9 x 6 replication matrix R identify the nine replicate observations in the six block treatment combinations that contain data. The replication matrix R is given by 1
12 R m
0 12
0
1
12 Finally, let the 9 • 1 random vector of actual observations Y -- (YI21, Y131, Y132, Y211, Y212, Y231, Y311, Y321, Y322) t. Therefore, Y ~ N9(X/~, ]~) where X/3 = RMX*/3 - R(fl2, f13, ill, f13, ill, f12) t (f12, f131~ flll 2, f13, ill, fl212
Random blocks (i) Y211 Y212 Fixed treatments
(j)
Figure 7.3.1
Y321 Y322
Y121 YI31 Y132
r311
Y231
Missing Data Example with Unequal Replication.
7
Unbalanced Designs and Missing Data
145
Table 7.3.1 Type I Sums of Squares for the Missing Data Example with Unequal Replication df
Type I SS
Overall mean/z
1
Y'S1 (S'l$1)-1S'I Y
Block (B)I#
2
Treatment (T)I#, B
2
Y'[T1 (T'IT1)-IT'I - $1 (S'IS1)-Is'I]Y = Y'AzY Y'[S2(S~S2)-Is~ - Tl (T'lT1)-lT'l]V = Y'AaY
BTII,z, B, T
1
Y'[RD-1R ' - S2(S~S2)-Is~]Y
= Y'A4Y
Pure error
3
Y'[I9 - RD-IR']Y
= Y'AsY
9
y'y
Source
Total
= Y'A1Y
and E - RME*M'R' + cr2(BT)I9
=R(a2[13|174 The Type I sums of squares for this problem are presented in Table 7.3.1 using matrices R, D, Sl = RMX1, T1 = RM[XIlZ1], and S2 = RM[XIlZIlX2]. The sums of squares matrices A1 . . . . . A5 in Table 7.3.1, the matrices A1 EA1 . . . . . AsEAs, A~, A~EA~, and the noncentrality parameters ,kl . . . . . )~5, ~.~ were calculated numerically using PROC IML in SAS. The PROC IML output for this section is presented in Section A2.2 of Appendix 2. From the PROC IML output note that A I ~ A 1 --
(3~2B
AEEA2 =
3or2
A3EA3 -
2)
+ 2t72BT + CrR(BT) A1 -Jr-~0"2BT -+-tTR(BT)
( -~O'BT 42 2 )) A 3 + O'R(BT
42 2 )) A 4 A4]~A4 -- ( -~O'BT + O'R(BT
AsEA5 = (O'R2(BT))A5.
146
Linear Models
Therefore, by Corollary 3.1.2(a),
Y'AzY ~ 3a2 + gcrnr + ~(Br) X22(X2)
2 X2()v3) Y'A3Y "~ gerBr+ ~R(Br) Y'A4Y ~"
if2 T + O'R(BT)
2 x~(Xs) Y'AsY ~ ~R(Br)
where )~l- (X~)'A1(X/3)/[2 (3cr~ + ~cr2r + tr~(Br)) ] = (~1 + ~2 + ~3)~/2 3o~ + g o 2 + o 2 X2 = (X/3)'A2(X~)/ E( 2 3aB2 + 5crB2r 2 + Crg2(Br)
-(flZl+f12+f123 -- ~1~2 -- ~1~3 -- ~2~3)/ [3 ( 3or2 + --
52~2Br+ o'2(Br))]
4
= 2(/~12+ r + f132-/~1/~2-/~lf13~,4- (X~)/A4(X/~)/[2 (~O'2T
f12f13)/[(4O'2T +
2 30"R(BT))]
-]-0"2(BT))1
--0.
•5 -- (X/3)'As(X/~)/(2cr~(,r)) =0.
The quadraticforms 3[Y'A4Y-(Y'AsY/3)] and Y'AsY/3 are unbiasedestimators of er2r and a2(nr), respectively,since 3 3 4 E{~[Y'A4Y- (Y'AsY/3)]}- E{~ [(~tr~r + ~r~(sr))X~(0) -er2(Br) (X2(0)/3)] } - ~ 2 r
7
Unbalanced Designs and Missing Data
147
and
E[Y'AsY/3] = E[o2(B:r)X2(0)/3]-- o2(B:r). Furthermore, As E A t = 06x6 for all s # t, s, t = 1 . . . . . 5. By Theorem 3.2.1, the five sums of squares Y'A1Y . . . . . Y'AsY, are mutually independent. Therefore, F* =
Y'A3Y/2 Y'AaY/1
~ F2,1 (X3)
where X3 = 0 under the hypothesis H0 :/31 = ~2 = ~3. A Y level rejection region for the hypothesis Ho :/~l =/32 =/33 versus H1: not all/3's equal is to reject Ho if F* > F,2,1 • where F,2,1 • is the 100(1 - ~,) percentile point of a central F distribution with 2 and 1 degrees of freedom. The Type I sum of squares due to Block (B)l/z, T is given by SS Block ( B ) I # , T - Y'A~Y where A~ . _ T ~ ( T ~ " T ~ ) - I T 91 , - S 2 ,~t ' g' 2l , t ~ ,'-'2) - 1 S~, with S~ - RM[XIIX2] and T ,1 = RM[XIIXzIZ1]. Furthermore, fl'X'A~Xfl = 0, t r ( A ~ E ) - 4or2 + 40.2T _+. 2 4 2 2 2trR(Br ), and E[Y'A~Y] - 4or2 + ~crsr + 2erR(st ). Therefore, an unbiased esti-
mator of tr 2 is provided by 1 {[Y'A~Y] - [Y'A4Y + (Y'AsY/3)]} since 1 ~E{[Y'A~Y] -[Y'A4Y + (Y'AsY/3)}
1 4 = ~{ (40"2+ ~o' 2:r+ 2o~(8/,))- E[(~-4
-Jr- tT R2 ( B T )
(X2(0)/3)]}
= o r b2.
EXERCISES 1. If B1, B2, B3, and B4 are the n x n sum of squares matrices in Table 7.1.1, prove B ] = Br for r -- 1 . . . . . 4 and BrBs - - O n x n for r 5~ s. 2. In Example 7.1.1 prove that the sums of squares due to the mean, regression, and pure error are distributed as multiples of chi-square random variables. Find the three noncentrality parameters in terms of/51,/32, and/53. 3. In Example 7.1.2 let b - t - 3, rll = r22 - r33 -- 2, r12, r23 - r31 -- 1, rl3 - r21 = r32 - 3, and thus n - 18. Let B1, B2, B3, B4, B5 be the Type I
148
Linear
Models
sum of squares matrices for the overall mean #, Bl/z, TIlz,B, BTIIz,B,T, and R(BT)IIz, B,T, BT, respectively. Construct and B1, B2, B3, B4, B5. 4. From Exercise 3, construct the n x n covariance matrix E. 5. From Exercises 3 and 4, calculate t r ( B r E ) and/3tX~RtBrRXd~ for r 1 .....
-
5.
6. From Exercise 3, find the distributions of YtBrY for r = 1. . . . . 5. Are these five variables mutually independent?
8
Balanced Incomplete Block Designs
The analysis of any balanced incomplete block design (BIBD) is developed in this chapter.
8.1
GENERAL BALANCED INCOMPLETE BLOCK DESIGN
In Section 7.2 a special case of a balanced incomplete block design was discussed. As shown in Figure 7.2.1, the special case has three random blocks, three fixed treatments, two observations in every block, and two replicate observations per treatment, with six of the nine block treatment combinations containing data. We adopt Yates's/Kempthorne's notation to characterize the general class of BIBDs. Let b = the number of blocks t = the number of treatments
149
150
Linear Models Random blocks (i) YI 1 Fixed treatments
(j)
Y31
Y51
Y42
Y12 Y23
Y62
II33
I'24
I'63 r44
r54
Figure 8.1.1 Balanced Incomplete Block Example.
k = the n u m b e r of observations per block r = the n u m b e r of replicate observations per treatment. The general BIBD has b random blocks, t fixed treatments, k observations per block, and r replicate observations per treatment, with bk (or tr) of the bt block treatment combinations containing data. Furthermore, the number of times any two treatments occur together in a block is ~. = r(k - 1)/(t - 1). In the Figure 7.2.1 example, b = 3, t = 3, k = 2, and r = 2. The total number of block treatment combinations containing data is bk or tr, establishing the relationship bk = tr. To obtain a design where each block contains k treatments, the number of blocks equals all combinations of treatments taken k at a time or b = t!/[k!(t - k)!]. A second example of a BIBD is depicted in Figure 8.1.1. This design has b = 6 random blocks, t = 4 fixed treatments, k = 2 observations in every block, and r -- 3 replicate observations per treatment, with bk - 12 of the bt = 24 block treatment combinations containing data. Next we seek a model for the general balanced incomplete block design. To this end, begin with a model for the complete balanced design with b blocks, t treatments, and one observation per block/treatment combination. The model is Y* = X*'r + E* where the bt • 1 vector Y* = ( Y l l . . . . . Y l t . . . . . Y b l . . . . . Y b t ) ' , the bt x t matrix X* = l b @ It, the t x 1 vector of unknown treatments means r = (rl . . . . . rt)' and the bt x 1 random vector E* = ( E l l . . . . . Elt . . . . . Ebl . . . . . Ebt)' ~ N b t ( O , ~ * ) where
~* -- r [Ib | Jt ] -q- cr2T [Ib | (It - ~ Jt ) ] . Let the bk x 1 vector Y represent the actual observed data for the balanced incomplete block design. Note that Y can be represented by Y = MY* where M
8
Balanced Incomplete Block Designs
is a bk
x bt pattern matrix M.
151
The matrix M takes the form M1
M
M2
0
0
".
_~
Mb where Mi is a k x t matrix for i = 1. . . . . b. Furthermore, MM' = Ib | Ik and M(Ib | Jt)M' - [[M1JtM'I
L
"..
0
0MbJtM'b 1
-- Ib |
Each of the k rows of Mi has t - 1 zeros and a single value of one where the one indicates the treatment level of the jth observation in the i th block for j -- 1. . . . . k. Therefore, the model used for the balanced incomplete block design is Y - X-r + E where the bk x t matrix X = MX*, E .-~ Nbk (0, ]E) and E = ME*M'
--M{cr2[Ib|174
(It-~Jt)])M'
= tr2M[lb | Jt]Mt--[ - O'2T [M(1o | It)M t - 1M(Ibt | J,)M'] = cr2[lb | Jk] + cr2r [Ib | ( I k - t Rearranging terms, E can also be written as E=
(or 2 -
~cr2r)[Ib|
+ cr2r[Ib |
or
E=
(kcr2+
t -t k c r Z r ) [ l b |
1jk ]
+ trZr [ib | (ik _ ~jk) ]
1 This last form of E can be used to find E -1. Note [lb | ~Jk] and [Ib | (lk l ~ j k ) ] are idempotent matrices where [Ib | (l~jk)] [Ib | (Ik -- ~J~l)] = 0. Therefore,
t -k t
)-1
152
Linear Models
Table 8.2.1 First ANOVA Table with Type I Sum of Squares for BIBD
8.2
Source
df
Type I SS
Overall mean #
1
Y'A,Y=Y'~Jb|
Block (B)l/z
b-
Treatment (T)I/z, B
t - 1
Y'AaY
B T I # , B, T
bk - b - t + 1
Y'A4Y
Total
bk
y'y
ANALYSIS
1
Y'AzY = Y'(Ib - ~ J b ) |
OF THE GENERAL
88
CASE
The treatment differences can be examined and the variance parameters cr~ and cr~r can be estimated by calculating the Type I sums of squares from two ANOVA tables. Some notation is necessary to describe these calculations. Let the bt x 1 matrix X1 = lb | l t , the bt x (b - 1) matrix Z1 = Qb @ 1/, the bt x (t - 1) matrix X2 = Ib @ Q t and the bt x ( b - 1)(t - 1) matrix 2 2 = Q b | Q t where Qb and Q t are b x (b - 1) and t x (t - 1) matrices defined as in Section 5.7. Furthermore, letS1 = MX1, T1 = M[X11Z1], S2 = M[X11Z1 IX2], and T2 = M[X11ZllX21Z2]. The first ANOVA table is presented in Table 8.2.1. The Type I sums of squares due to the overall mean ~, due to Blocks (B)I/z, Treatments ( T ) I # , B, and BTIIz, B, T are represented by Y'A1Y, Y'A2Y, Y'A3Y, and Y'A4Y, respectively, where A1 = S1 (S'lS1) -1 S '1 -- ~Jb 1 @ ~Jk 1 A2 = T I ( T ' I T 1 )
_~T ,1 -
S1
(S,lSl) - 1,S 1 - - ( Ib
1 )
-- ~ J b
|
1 ~Jk
A3 = $2(S~$2) - 1S 2, - T1 (T'1T1) 1T'l A4 - (Ib @ Ik) -- A1 - A2 - A3. Note A1 + A2 - lb | By Theorem 1.1.7, A2,, Au(A1 + A2) - Au (Ib @
1 4 Au -- Ib | Ik, and ~-~.4u=1 rank(Au) - bk. ~J~, ~--~.u:l Au for u - 1 . . . . . 4 and AuAv - 0 for u -76- v. Also 1 ~Jk) -- 0 or Au(Ib | Jk) = 0 for u -- 3, 4. Therefore,
AuX -- cr~Au[Ib Q Jk] + cr~rAu [Ib Q (Ik -- ~Jk) l -- cr~Au for u -- 3, 4. Furthermore, Au E - (kcr~ + ~-~ crBer)Au for u -- 1, 2. Therefore, by Corollary 3.1.2(a), Y'A1Y "" alX~()~l), Y'AeY ~ aex~_l(~.2), Y'A3Y "~
8
Balanced Incomplete Block Designs
153
Table 8.2.2 Second ANOVA Table with Type I Sum of Squares for BIBD Source
df
Type I SS
Overall mean/z
1
Y'A 1Y = Y' ~ Jb |
Treatments (T)I#
t - 1
Y'A~Y
Block (B)I#, T
b-
Y'A~Y
BTI#, B, T
bk - b - t + 1
Y'A4Y
Total
bk
Y'Y
1
1
~ Jk Y
t - k G2
a3xt21 (~.3), and Y'AaY ~ a4x b2k - t - b + l (~,4) where al - ae - (ka 2 + -7-- Br) a3 -- a4 -- a2T, ~u -- ( X ' r ) ' A u ( X ' r ) / ( 2 a u ) for u - 1, 2, 3 and ~ 4 - - 0 . Finally,
by Theorem 3.2.1, Y'A1 Y, Y'AeY, Y'A3Y, and Y'AnY are mutually independent. A test on the treatment means can be constructed by noting that ri . . . . . "rt implies ~3 = 0. Therefore, a y level rejection region for the hypothesis H0 9 ri rt versus Hi " not all "rj's equal is to reject H0 if F* > F t -• i , b k - t - b + i where " " " --F
*
_._
Y'A3Y/(t - 1) Y'A4Y/(bk
Unbiased estimators of cr~ and r estimator of O-~r is provided by ..2 = Y ' A 4 Y / ( b k O'BT
- t - b + 1)
can also be constructed. An unbiased
- t - b + 1)
since Y'AnY ~ O ' 2 T X b2 k - t - b + i (/~4 -- 0) An unbiased estimator of a~ can be developed using a second ANOVA table. Let the Type I sums of squares due to the overall mean/z and due to B T I # , B, T be represented by Y'A1Y and Y'A4Y, as before. Let the Type I sums of squares due to Treatments (T)I# and due to Blocks (B)I#, T be represented by Y'A~Y and Y'A~Y, respectively. The matrices A1 and A4 are defined in Table 8.2.1. Matrices A~ and A~ can be constructed by setting S~ - - M [ X i I X 2 ] and T~ - - M [ X i I X 2 I Z i ] . Then A~
--
S~(S~tS~)-is~ t - Ai
A~
t , T , t T , -1 ,t , ,t , - i ,t -- T~l 1 -i) T1 -- 8 2 ( 8 2 8 2 ) 82.
The second ANOVA table is presented in Table 8.2.2. The expected mean square for Blocks (B)I#, T is now derived and then used to generate an unbiased estimator of aBe. First, let G be the t x bk matrix such that GY = (~'.i . . . . . ~'.t)'. The t x t covariance matrix G E G ' has diagonal elements equal to ( a 2 + L ~ a 2 B T ) / r and off-diagonals equal to ( k - 1 ) ( a 2 - - T a 2i B T ) / [ r ( t - - 1 ) ] .
154
Linear Models
The sum of squares due to Treatments (T)Ib, can be reexpressed as rY'G' (I/ Jt)GY. Therefore, A~ - rG' (I/ -- ~Jtl )G and tr[A~E]=rtr[G'(It-~Jt)GE]
= r t r (It -- ~ J t ) G~G']
t(~
=r
t-1
- tr + r
t-1
1
t
+ t(k-1)(cr~-~
1 = (t -- k)cr2 + t[(t - 1) 2 + (k - 1)]a2T .
But A2 + A3 -- A~ + A~, therefore tr[A~E] = tr[(A2 + A3)]~] - tr[A~E] =(b-l) -
{
(bk -
[
kcr2 +
t
1
trZr +(t--1)crZr
,
(t-k)o'~+t[(t-1 t)~ 2 +
b(t - k ) t
+(k-1)lo~r
}
o2 r.
Furthermore, E[Y'A~Y/(b- 1)] : tr[A~E]/(b- 1) (bk-t) = ~tr~"(b1-
+
b(t-k) Cr2r t ( b - 1)
since -r'X'A~X-r - O. Therefore, an unbiased estimator of t~B 2 is provided by l
6 . 2 _ (bk- t)
[y,A~y -
b(t-k) Y'A4Y] t(bk - b - t + l)
because 1_ E { (bk 1
(bk - t)
[[Y'A~Y_ t)
b(t - k) Y'A4Y] } = t(bk - b - t + 1)
(bk - t)a 2 +
b(t - k ) .2 OBT t
b~t;
~)~] _ o~
8
Balanced Incomplete Block Designs
155
Treatment comparisons are of particular interest in balanced incomplete block designs. Treatment comparisons can be described as h'-r where h is a t x 1 vector of constants. By the Gauss-Markov theorem, the best linear unbiased estimator of h ' r is given by h ' [ X ' Z - 1 X ] - I x ' z - 1 Y where Z -1 is given in Section 8.1. Therefore, the BLUE of h'-r is a function of the unknown parameters tr 2 and a2 r. However, an estimated covariance matrix, Z, can be constructed by using Z with crB 2 and cr2r replaced by t~2 and 82 r, respectively. An estimator of h'-r is provided by h ' [ X ' Z - 1 X ] - I x ' z - 1 Y .
8.3
MATRIX DERIVATIONS OF KEMPTHORNE'S INTERBLOCK AND INTRABLOCK TREATMENT DIFFERENCE ESTIMATORS
In Section 26.4 of Kempthorne's (1952) Design and Analysis of Experiments text, he develops two types of treatment comparison estimators for balanced incomplete block designs. The first estimator is derived from intrablock information within the blocks. The second estimator is derived from interblock information between the blocks. The purpose of this section is to develop Kempthorne's estimators in matrix form and to relate these estimators to the discussion presented in Sections 8.1 and 8.2 of this text. Within this section we adopt Kempthorne's notation, namely: Vj = the total of all observations in treatment j Tj = the total of all observations in blocks containing treatment j Qj = Vj - T j / k . Kempthorne indicates that differences between treatments j and j ' (i.e., rj - rj,) can be estimated by 01 = k(t - 1)[Qj - Q j , ] / [ t r ( k 02 - ( t -
1)[Tj - T j , ] / [ r ( t - k)].
1)]
(1) (2)
The statistics 01 and ~)2 are unbiased estimators of rj - rj, where 01 is derived from intrablock information and ~)2 is derived from interblock information. The following summary provides matrix procedures for calculating estimators (1) and (2).
Estimator I Let A1 and A3 be the bk x bk idempotent matrices constructed in Section 8.2 where rank(A1) = 1 and rank(A3) - t - 1. Let A1 = R1R'1 and A3 = R3R~
156
Linear Models
where R1 -- ( 1 / ~ / ~ ) l b @ lk and R3 is the bk x (t - 1) matrix whose columns are the eigenvectors of A3 that correspond to the t - 1 eigenvalues equal to 1. The estimator ~1 is given by
~1 = g ' ( R ' X ) - 1R'Y where the bk x t matrix X and the bk x 1 vector Y are defined in Section 8.1, the bk x t matrix R - [R11R3], and g is a t x 1 vector with a one in row j, a minus one in row j ' and zeros elsewhere. A second form of the estimator ~)l can be developed by a different procedure. First, construct a t • b matrix N. The i j th element of N equals 1 if the i j th block treatment combination in the BIBD factorial contains an observation and the i j th element of N equals 0 if the i j th block treatment combination is empty for i - 1 . . . . . b and j = 1 . . . . . t. For example, the 3 x 3 matrix N corresponding to the data in Figure 7.2.1 is given by
0, '1 1 1
N
0 1
1 0
and the 4 x 6 matrix N corresponding to the data in Figure 8.1.1 is given by
N __
1
0
1
0
1
0
i
0 1 1
0
1
0
1
1 0
0 1
0 1
1 0
"
The matrix N is sometimes called the incidence matrix. T h e estimator t)l is given by ~1 -- { k ( t -
1)/[tr(k-
[ ( ')]
1)]}g' N []
Ik -- ~Jk
Y
where N is the t x b matrix constructed above and N U(I~ - ~Jk) is a t • bk BIB product matrix. For example, the 3 x 6 matrix [N 0 (Ik -- ~1Jk)] corresponding to the data in Figure 7.2.1 is given by
E / N[]
I2-
=~
,[0 0,,, 1
-1
-1
1
0
-1
0
-1
1
0
,] 1
0
corresponding to the data in Figure 8.1.1 and the 4 x 12 matrix [N n(l~ - ~J~)] 1
8
Balanced Incomplete Block Designs
157
is given by 1-1 0 0 1-1 0 0 1-1 0 0 [F ~/' ~1j2"~] )] -1 1 0 0 0 0 1 - 1 0 0 1 - 1 N[] I2 -- 1/2 0 0 1 -1 -1 1 0 0 0 0 -1 1 " 0 0-1 1 0 0-1 1-1 1 0 0 The following relationships are used to derive the variance of 01. g'g
= 2
g'Jtg
= 0
IN ff] (Ik -- ~Jk) ] [Ib~ lk] --"Ot• [N [Z](Ik -- ~Jk) l [Ib@Jk] -- Otxbk =tr(k-1)[It 1)
1 ~J/~)l [N ff] ( I k - ~Jk)l
IN D (Ik -
k(t-
1jt] -t
"
Therefore, var(01) =
{k(t-1)/[tr(k-1)]}2g'[ND(Ik-~Jk) •174174
_
1
tJk) } [ N D ( I k - - ~ J k ) l g
=o'~r{k(t-1)/[tr(k-1)]}2g'[ND(Ik-~Jk)J x [ND(Ik-~Jk)]'g
=aZr,k(t-1)/[tr(k-1)]}g'(lt-~Jt)g -- 2k(t- 1)azr/[tr(k- 1)]. The matrix IN D (Ik- 1Jk)] is used to create the estimator 01. This estimator is constructed from the treatment effect after the effects due to the overall mean and blocks have been removed. Similarly, Y'A3Y is the sum of squares due to treatments after the overall mean and blocks have been removed. Therefore, [N ffl (Ik -- ~Jk)] 1 and A3 are related and can be shown to satisfy the relationship A3 =
tr(kk(t-1)_1) [ND (Ik - ~ l J k ) ] '
IN [2 ( I k - ~ J k ) ]
lab
Linear Models
There is a one-to-one relationship between the t • b incidence matrix N and the bk x bt pattern matrix M. In Appendix 3 a SAS computer program generates the bk x bt pattern matrix M for a balanced incomplete design, when the dimensions
b, t, k, and the t x b incidence matrix N are supplied as inputs. A second SAS program generates the incidence matrix N, when the dimensions b, t, k, and the pattern matrix M are supplied as inputs.
Estimator
2
The estimator 02 is given by 0 2 ~-
(t - 1)g'[N | l ' k l Y / [ r ( t - k)]
where g, N, and Y are defined as in estimator 1. The following relationships are used to derive the variance of 02" NN'
k) It + ~ J t (t - 1) (t - k)
and
r (t
g'NN'g = 2r(t - k ) / ( t - 1).
Therefore,
var(02) = {(t •
1)2/[r2(t
k)2]}gt[N | 1~]
-
{
[ ( ')} ( ) 1
cr2[Ib | Jk] + cr2r Ib |
Ik -- t J k
= { ( t - 1 ) 2 k / [ r 2 ( t - k)2]} kcr2 + _
2k(t-
r(t -k)
1)
[
+
t
cr2r
[N' | lk]g g'NN'g
.
t
Finally, Kempthorne suggests constructing the best linear unbiased estimator of the treatment differences by combining 01 and 02, weighting inversely as their variances. That is, the BLUE of rj - r j, is given by
03 -- 01/var(01) -~- 02/var(02) . [ 1/ var(01) + 1/ var(02)] Note that 03 is a function of cr2 and a z r since var(01) and var(02) are functions of cr2 and a z r . However, as discussed in Section 8.2, a 2 and crZr can be estimated by 8 2 and 6 2 r , respectively. Therefore, 03 can be estimated by
0~ --
01/var(01) -+- 02/~(02) [1/~'r(01) @ 1/ffa'r(02)]
8
Balanced Incomplete Block Designs
159
where f"~(01) and ~ ( 0 2 ) equal var(01) and var(02), respectively, with o'2 and o'2 r replaced by ~2 and ~2 r. It should be noted that the estimator 0~ given above equals the estimator h'[X'E-1X]-IX'~,-1Y given in Section 8.2 when h = g. ^
EXERCISES Use the example design given in Figure 8.1.1 to answer Exercises 1-11. 1. Define the bk x bt pattern matrix M, identifying the t x t matrices M1 . . . . . Mb explicitly. 2. Construct the matrices A1, A2, A3, A4, A~, and A~. 3. Verify that ArE = (ko'~ + tt-Zko'~r)Ar for r = 1, 2 and ArE = o'~rAr for r = 3,4. 4. Verify that
~,4 =
0.
5. Verify that
~.3 =
0
under H0
" Z'l - - - - .
-- zt.
6. Construct the t x bk matrix G such that GY = (~'.1. . . . . ~'.t)'. 7. Verify that G E G ' is a t x t matrix with (o'2 + t~tlo'Er)/r on the diagonal and 7 o ' B r ) / [ r ( t -- 1)] on the off-diagonal.
8. Verify that A~ = rG' (It - 71jt)G.
1 - 1 )2 + (k - 1)]o'2r. 9. Verify that tr(A~E) = (t - k)o" 2 + -i[(t 10. Verify that tr(A~E) = (bk - t)o" 2 + br
o'2 r.
11. Verify that 7-'X'A~XT" = 0. Use the following data set of answer Exercises 12-16. Random blocks
1 Fixed treatments
2
3
7
12
8 15
16 20
12. Calculate the Type I sum of squares for Tables 8.2.1 and 8.2.2. 13. Compute unbiased estimates of o-2 and o'27..
160 14. Test the hypothesis H0 : y = 0.05.
Linear Models Z'I
-"
Z'2
=
Z'3 versus H1 : n o t all r j ' s are equal. Llse
15. C o m p u t e Kempthorne's estimates of 01 and 02 for the differences rl - r2, r~t r3, and r2 - r3. 16. C o m p u t e Kempthorne's estimates of 03 for the differences rl - r2, rl - r3, and r2 - r3 and then verify that these three estimates equal h ' ( X ' E - 1X)- 1X, ~:-~Y when h = (1, - 1, 0)', (1, 0, - 1)' and (0, 1, - 1)', respectively.
9
Less Than Full Rank Models
In Chapter 7 models of the form Y = RXdfl + E were discussed when the k • p matrix Xd had full column rank p. In this chapter, models of the same form are developed when the matrix Xd does not have full column rank.
9.1
MODEL
ASSUMPTIONS
AND EXAMPLES
Consider the model Y = RXd/~ + E where Y is an n • 1 random vector of observations, R is an n • k replication matrix, Xd is a k • p known matrix of rank k < p, and E is an n • 1 vector of random errors. For the present assume E(E) = 0 and coy(E) = a2In . The assumption E ,~ Nn (0, a2In) will be added in Section 9.3 when hypothesis testing and confidence intervals are discussed. In Section 9.5 the analysis is described when E ~ Nn (0, cr2V) and V is an n • n positive definite matrix. The least squares estimators o f / 3 are obtained by minimizing the quadratic form (Y - RXd/3)'(Y - RXd/3) with respect to/3. The solution/9, satisfies the
161
162
Linear Models
normal equations X~OXa/3 = X ~ R ' Y where the nonsingular k x k matrix D = R'R. Since the p x p matrix X~DXa has rank k < p, X~DXa is singular and the usual least-squares solution/3 = ( X ~ D X a ) - l X ~ R ' Y does not exist. Therefore, the analysis approach described in Section 7.1 is not appropriate and an alternative solution must be found. In Section 9.2 the mean model solution is developed. Before we proceed with the mean model, a few examples of less than full rank models are presented.
Example 9.1.1
Searle (1971, p.165) discussed an experiment introduced by Federer (1955). In the experiment, a fixed treatment A with three levels has rl = 3, r2 = 2, and r3 = 1 replicate observations per level. Let Yij represent the jth observation in the ith treatment level for i = 1, 2, 3 and j - 1. . . . ri with the 6 x 1 random vector Y - (Yll, Y12, Y13, Y21, Y22, Y31)'. The experimental layout is presented in Figure 9.1.1. Searle (1971) employs the less than full rank model Yij = ot + oli + Eij
where ct is the overall mean, O~i is the effect of the ith treatment level, and Eij is a random error term particular to observation Yij. The model can be rewritten in matrix form as Y = RXa/3 = E where the 6 • 3 replication matrix R, the 3 x 4 matrix Xd, the 4 • 1 vector/3 and the 6 • 1 error vector E are given by
R =
113 01 I11001 12
0
'
Xd - -
1
1
0
1
0
1
0
0
1
'
/3 =
c~1 t3/2
ct3
and E = (Ell, El2, El3, E21, E22, E31 /. The 3 x 4 matrix Xd has rank 3 and therefore X~DXd is singular. In this case n - 6, p - 4, and k = 3.
Fixed factor A(i) 1 2 3 Yll
Y21
Y12
Y22
Y31
Replicates (j) Y13
Figure 9.1.1
Searle's (1971) Less than Full Rank Example.
9
Less Than Full Rank Models
163
The main difficulty with the model in Example 9.1.1 is that p -- 4 fixed parameters (ct, C~l, c~2, and ct3) are being used to depict k -- 3 distinct fixed treatment levels. Therefore k < p and the less than full rank model overparameterizes the problem. Less than full rank models can also originate in experiments with missing data. In Chapter 7 pattern matrices proved very useful when analyzing missing data experiments. However, as shown in the next example, pattern matrices do not in general solve the difficulties associated with the less than full rank model.
Example 9.1.2 Consider the two-way layout described in Figure 9.1.2. Fixed factor A has three levels, fixed factor B has two levels, and there are r ll = r32 = 2 and rl2 -- r21 - r31 = 1 replicate observations per A, B combination. Note there are no observations in the (i, j ) = (2, 2) A, B combination. As in Section 7.3, we develop the model for this experiment using a pattem matrix. Let Y* = X*3 + E* describe an experiment with one observation in each of the six distinct combinations of factors A and B where the 6 x 1 random vector Y* = (Ylll, YI21, Y211, Y221, Y311, Y321)', the 6 x 6 matrix X* - [X11Xx[X3IX4], the 6 x 1 vector/3 = (i31. . . . . /36)' and the 6 x 1 error vector E* = ( E l l l , El21, E211, E221, E311, E321) t with X 1 = 13 | 12, X2 -- Q3 @ 12, X3 = 13 | Q2, X4 = Q3 | Q2, Q2 = (1, - 1 ) ' , and
Q3 -
E 1 1] -1 0
1 -2
Let the 7 x 1 random vector of actual observations Y = (Ylll, Y112, Y121, Y211, Y311, Y321, Y322)'. Therefore, the model for the actual data set is
Y = RMX*~ + E where the 7 x 5 replication matrix R, the 5 x 6 pattern matrix M, and the 7 x 1
Fixed factor A (i) 1
Ylll Yll2
2
Y121
Fixed factor B (j)
Figure 9.1.2
V211
V311 Y321 Y322
Less than Full Rank Example Using a Pattem Matrix.
164
Linear Models
vector E are given by 12
1 R=
0 1
0
,M1
12
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 0 1
and E = (Elll, El12, El21, E211, E311, E321, E322)'. The preceding model can be rewritten as Y = RXd/~ + E where the 5 • 6 matrix Xd = MX*. In this problem, n = 7, p = 6, k = 5, with X~DXd is a 6 • 6 singular matrix of rank 5. The main difficulty with Example 9.1.2 is that p - 6 fixed parameters are used to depict the k - 5 distinct fixed treatment combinations that contain data. Note that the use of a pattern matrix did not solve the overparameterization problem. In the next section the mean model is introduced to solve this overparameterization problem.
9.2
THE MEAN MODEL SOLUTION
In less than full rank models, the number of fixed parameters is greater than the number of distinct fixed treatment combinations that contain data. As a consequence, the least-squares estimator of/3 does not exist and the analysis cannot be carried out as before. One solution to the problem is to use a mean model where the number of fixed parameters equals the number of distinct fixed treatment cornbinations that contain data. Examples 9.1.1 and 9.1.2 are now reintroduced to illustrate how the mean model is formulated.
Example 9.2.1 Reconsider the experiment described in Example 9.1.1. Let E(Yij) = [,,I,i represent the expected value of the jth observation in the ith fixed treatment level. Use the mean model
Yij = #i -'1"-Eij. In matrix form the mean model is given by Y=R/~+E where the 3 • 1 vector/~ = (/zl, #2,/z3)' and where Y, R, and E are defined as in Example 9.1.1. Note the 6 • 3 replication matrix R has full column rank k = 3.
9
Less T h a n Full R a n k M o d e l s
165
Example 9.2.2
Reconsider the experiment described in Example 9.1.2. Let represent the expected value of the k th observation in the ij th combination of fixed factors A and B. Use the mean model E(Yijk)
--
]s
Y=R#+E where the 5 x 1 vector # = ( / s 1 6 3 1 6 3 1 6 3 and where Y, R, and E are defined as in Example 9.1.2. Note the 7 x 5 replication matrix R has full column rank k - 5. In general, the less than full rank model is given by Y - RXd/~ + E
where the k x p matrix Xd had rank k < p. The equivalent mean model is Y=R#+E where the n x k replication matrix R has full column rank k and the elements of the k x 1 mean vector # are the expected values of the observations in the k fixed factor combinations that contain data. Since the two models are equivalent R # -- RXd/~. Premultiplying each side of this relationship by ( R ' R ) - I R ' produces # = Xd/3.
This equation defines the relationship between the vector/~ from the mean model and the vector/5 from the overparameterized model.
9.3
M E A N M O D E L A N A L Y S I S W H E N C O V ( E ) - o'2In
The analysis of the mean model follows the same analysis sequence provided in Chapter 5. Since the n x k replication matrix R has full column rank, the ordinary least-squares estimator of the k x 1 vector/~ is given by /2 = ( R ' R ) - I R ' y = D - 1 R ' y = ~,
where Y = D - t R ' y is the k x 1 vector whose elements are the averages of the observations in the k distinct fixed factor combinations that contain data. The least-squares estimator/2 is an unbiased estimator of/z since E(/~) = E [ ( R ' R ) - I R ' y ] = (R'R)-IR'R/~ =/~.
166
Linear Models Table 9.3.1
Mean Model ANOVA Table Source
df
SS
Overall mean
1
Treatment combinations
k-
y , n1J n Y
Residual
n - k
Y'[In - R D - 1 R ' ] Y
Total
n
Y'Y
1
= Y'A1Y
Y'[RD-1R '-
1Jn]Y = Y'A2Y = Y'ApeY
The k x k covariance matrix of/2 is given by COV(/2) - - ( R ' R ) - I R
'(o'2In)R(R'R)
-1 -
c r 2 D -1
and the least-squares estimator of tr 2 is $ 2 _ V'[I~ - R ( R ' R ) - I R ' ] Y / ( n = V'[In - R D - 1 R ' ] Y / ( n
- k)
- k)
= Y'ApeY/(n - k) where Ape is the n x n pure error sum of squares matrix originally defined in Section 5.5. The quadratic form $2 provides an unbiased estimator of O"2 since E($ 2) - E[Y'ApeY/(n - k)]
= {tr[Ape(crZIn)] +/_t'R'ApeR/.t}/(n - k) =O"
2
where tr(Ape) = n - k and ApeR - 0nxk. Furthermore, by Theorem 5.2.1, the least-squares estimator t ' # -- t'Y is the BLUE of t'/.t for any k x 1 nonzero vector t. An ANOVA table that partitions the total sum of squares for the mean model is presented in Table 9.3.1. The expected mean squares are calculated below using Theorem 1.3.2 with the k x 1 mean vector/, - ( # 1 , # 2 . . . . . #k) t.
EMS (overall mean) - E [ Y '1nJn Y ]
= tr[(1/n)cr2jn] + # ' R ' - J n R # I n
1
k
= cr2 + - Z ?1
k
Zrurv#u#v
u=l v=l
9
Less T h a n Full R a n k Models
167
EMS (treatment c~176
= E IY' (RD-1R' - ljn) Y] /(k -- { tr[0.ZRD-
1R' ] =
,-n =
(n -
~'R'RD-1R'R~
_1, ru (ru - 1)lZ2u
1)0.2 + u--1
- (2/n)
k
1
rur~lZulZv /(k-
Z
l)
u=l,u
and the EMS (residual) - E[Y'ApeY]/(n - k)] - 0.2, as derived earlier. Hypothesis tests and confidence bands can be constructed if it is assumed that the n x 1 random vector E "~ Nn(0, 0.2In). In Section 5.5 it was shown that Y'ApeY ~ " 0.2X2n-k (0). Furthermore, A2 =
ER D - 1R' _ _1Jn ]2 n
= R D- 1 R ' R D- 1 R ' - + - -1J n n =
1
1
1
n
-Jnn
-Jn n
RD-1R'RD-1R ' +-Jn-
-- RD -1R'
RD -1 R ' 1 j n n
--
1 j n R D - 1Rt n
1
-
-Jn n
= A2 and A2(0.2ln)Ape--0.
2
=
0.2
RD-1R'- 1jn][In-RD-1R '] R D - 1 R , _ R D - 1 R , R D - 1 R , _ -1J n + - J1n R D -1R' ] n
- - 0 .2
n
[RD-1R t _ RD-1R' - _1j n + 1 j n n
n
Onx n .
since JnRD -1R'
=
Jn.
Therefore, by Corollary 3.1.2(a), Y'A2Y ~ 0.2Xk_12(~.2)
168
Linear M o d e l s
where ~.2 -" /z'R'A2R/z/(2 o'2) --/,' [R'RD-1R'R - R'IjnRI/z/(2r
)
= l~'R' (In - l jn) By Theorem 3.2.1, Y'A2Y and Y'ApeY are independent and therefore
Y'AeY/(k-
F*=
1)
Y'ApeY/(n - k)
"~ F k _ l , n _ k ( ~ . 2 ) .
Furthermore, if the k elements of/z are equal then ~.2 - - 0. Therefore, a y level test for H0 "/z = ot lk versus H1 9 # ot lk is to reject H0 if F* > F{_ 1,n-k where c~ is the overall mean. Confidence bands can be constructed on the linear combinations t'/z where t is a k • 1 nonzero vector of constants. Under the normality assumption t'/2 N1 (t'/~, crZt'D-lt) since E[t'/2] - E [ t ' D - I R ' y ] - t'D-1R'R/~ = t'/~ and cov[t'/2] = t'D-1R'(~r21n)RD-lt - ~r2t'D-lt. By Theorem 3.2.2, t'/2 and Y'ApeY are independent since
t'D-mR'(creIn)[In
-
R D - 1 R '] = ~ret'[D-1R ' - D - 1 R ' R D - 1 R '] = 01•
Therefore, t* =
r
- t't~
[Y'ApeY/(n
-
k ) ] 1/2
~ t~_k(0).
A 100(1 - ?')% confidence band on t'/~ is given by
t'~ + tn_• k {[Y'ApeY/(n 9.4
ESTIMABLE
- k)][fD-lt]} 1/2
FUNCTIONS
In Section 9.1 the less than full rank model Y = RXd/~ + E was introduced. The mean model Y = R/~ + E was developed in Sections 9.2 and 9.3 to solve the difficulties caused by the less than full rank model. Arguably, there is no need to develop the less than full rank model since the mean model solved the overparameterization problem. However, less than full rank models are used (in
9
Less Than Full Rank Models
169
SAS, for example), so it seems worthwhile to explore them and their relationship to the mean model. For the less than full rank model, the least-squares estimator/~ satisfies the system of normal equations
X~DX~/~- X~R'Y. However, since X~DX,i is singular, no unique solution for/~ exists. In fact, there are an infinite number of vectors/3 that satisfy the normal equations. All of these solutions are linear combinations of the vector Y, but none of them is an unbiased estimator of/3. As Graybill (1961, p. 227) points out, no linear combination of the vector Y exists that produces an unbiased estimator of/~. So how should we think of the term/37 As Searle (1971) states, for a less than full rank model,/3 provides "a solution" to the normal equations "and nothing more." Therefore,/~ should be thought of as a nonunique solution to the system of p normal equations, rather than as an estimator of/3. Although no linear combination of the vector Y produces an unbiased estimator of/3, unbiased estimators of g'/3 do exist for certain p x 1 nonzero vectors g. Unfortunately, unbiased estimators of g'/~ do not exist for all g. For example, let the p x 1 vector g = (1, 0, 0 . . . . 0)'. The parameter g'/3 is not estimable in this case since/3 is not estimable and therefore no element of/3 is estimable. The term "estimable" has been introduced. Before continuing, we formally define estimability. ^
Definition 9.4.1 Estimable: A parameter (or function of parameters) is estimable if there exists an unbiased estimate of the parameter (or function of parameters). Definition 9.4.2 Linearly Estimable: A parameter (or function of parameters) is linearly estimable if there exists a linear combination of the observations whose expected value is equal to the parameter (or function of parameters). For the remainder of this chapter we confine our attention to linearly estimable functions. Therefore, the term estimable will subsequently imply linearly es-
timable. The next example demonstrates that all linear combinations of the vector/z from the mean model are estimable.
Example 9.4.1
For the mean model Y = R/~ + E E[t'/2] - E[t'D-1R'y] = t'D-1R'R/~ = t'/~
where t is any k • 1 nonzero vector. For the less than full rank model the question still remains: When is g'/3 estimable? The following theorem addresses this question. The answer lies in the relationship that links/~ and/3, namely,/~ = Xd3.
170
Linear Models
T h e o r e m 9.4.1 The linear combination g'~ is estimable if and only if there exists a k x 1 vector t such that g - X~t.
Proof:
By definition g'/3 is estimable if and only if there exists an n • 1 vector b such that E[b'Y] = g'/3. First assume that g'/3 is estimable. Therefore, there exists an n x 1 vector b such that g'/3 = E[b'Y] = b'RXd/3 for all/3, which implies g' - b'RXd or g = X~R'b. Let t = R'b and there exists a k x 1 vector t such that g - X~t. Now assume there exists a k x 1 vector t such that g = X~t. Then E[t'/2] = E [ t ' D - 1 R ' y ] = t'/~ = t'Xd/3 = g'/3. Therefore, g'/3 is estimable. II As mentioned earlier, the normal equations have an infinite number of solution s. If/~0 represents any one of the solutions then the next theorem shows that g'/~0 is invariant to the choice of/30 when g'/3 is estimable.
If g'~ is estimable then g'/~0 - t'/2 provides a unique, unbiased estimate of g'~ where ~o is any solution to the normal equations and t is defined in Theorem 9.4.1.
T h e o r e m 9.4.2
Proof:
Solve the normal equations for Xd/~0. Therefore, X~DXd/~o - X ~ R ' Y ,
or Xd/~ 0 --
D-1R'y-/?z.
Since g'/3 is estimable, g'/3o = t'Xd/3o = t't2 where t't2 - t'~/" is a unique estimate. Furthermore, g'/3o is an unbiased estimate of g'/3 since E[g'/~o] = E[t'fz] -- t'tz -t'Xd~ = g'~. II The Gauss-Markov theorem is applied to find the BLUE of g'/3. T h e o r e m 9.4.3
If g'~ is estimable then the BLUE of g'~ is g'/~0 - t'/2.
Proof:
By Theorem 9.4.2, t'/~ = g'/~0. By Theorem 9.4.1, t'/~ = t'Xd/~ : g'/~l. By the Gauss-Markov theorem, t'/2 is the BLUE of t'/~ = g'/~. II The heart of the three previous proofs lies in the relationship/~ = Xd/3. Since t'/~ is always estimable by t'/2, t'Xd/3 = t'/~ is also estimable by t'/2. Therefore, g'/3 is estimable provided g' can be written as t'Xd for some k x 1 nonzero vector t. One could argue that the whole topic of estimable functions is viable only in so far as/3 is related to/~ through the relationship/~ = Xd/3. Stated more strongly, estimable functions have little meaning without the mean model and, because of the mean model, estimable functions are at best redundant.
9
Less I han l~'ull Rank Models
171
This section concludes with a SAS PROC GLM program that analyzes Federer's (1955) data from Example 9.1.1. Both the mean model and Searle's less than full rank model are run. The two models are then used to generate the same estimable functions.
Example 9.4.2 The data set is given in Figure 9.4.1. The SAS program and output are presented in Appendix 4. The SAS output provides the following parameter estimates for the model Yij "- Ol + Oli "l- E i j (or equivalently for the model Y = RXdfl + E). ~0 = (&, &l, &2, &3) t = (32, 68, 54, 0)'.
Note that although the notation &, &~, &2, &3 is used,/90 should not be viewed as an estimate of/3 = (a, a l, ot2, or3)'. Rather/30 is one of the normal equation solutions for ft. The SAS output also provides the following estimates for the mean model Yij -- [d,i + Eij (or equivalently for the model Y = Rtt + E).
86, 32)'.
-- (/~1, /~2, /~3) t "- (?1., ]7"2., Y3.) t "- ( 1 0 0 ,
Suppose it is of interest to estimate al - a2 - g'fl for g - (0, 1 , - 1, 0)'. By Theorem 9.4.1, g'fl is estimable since there exists a 3 • 1 vector t = (1, - 1 , 0)' such that
1 1 0 0]
g' = (0,1, - 1 , 0) -- (1, - 1 , 0)
I
1
0
1
0
=t'Xd.
1 0 0 1
Therefore, by Theorems 9.4.2 and 9.4.3, the unique BLUE of g'/~ = c~1 - c~2 is provided by g'/90 = t'ft where
680 32
g'~o = (0, 1 , - 1, O)
5a
=68-54=14=(1,-1,0)
Fixed Factor A 2 3 Replicates
bl
84
lq ~5
88
1q
q
Figure 9.4.1
32
14
Federer's (1955) Data Set.
[100] 86 32
=t'ft.
172 9.5
Linear Models MEAN MODEL C O V ( E ) = o'2V
ANALYSIS WHEN
Previous sections of Chapter 9 have been limited to models where cov(E) = c r 2 1 n . Such models occur when all the main factors in the design are fixed with random replicates nested in the combinations of the fixed factors. In these cases, the replication matrix R identifies the replicate observations in the k combinations of the fixed factors. Therefore, R is an n • k matrix, # is a k • 1 vector and Y = R # + E is a viable model. We are now interested in extending our attention to models where cov(E) = a2V and V is an n • n positive definite matrix. Such models can occur when some of the factors in the design are fixed and some are random, with random replicates nested in the combinations of the fixed and random factors. In these cases, the replication matrix identifies the replicate observations nested in the k combinations of the fixed and random factors. Therefore, R is an n • k matrix. However,/z is a v • 1 vector where v < k is the number of fixed treatment combinations that contain data. In this case, R # is not a viable structure since the number of columns of R ( = k) is greater than the number of rows of # ( = v). A solution to this problem is to use the more general mean model Y = RC# + E where C is a k • v matrix of zeros and ones. The matrix C identifies which elements of the v • 1 vector # correspond to the k combinations of the fixed and random factors. The next example illustrates how to construct the matrix C.
Example 9.5.1
Consider the unbalanced experimental layout given in Figure 9.5.1 where the three blocks are random, the two treatments are fixed, and the nested replicates are random. In this experiment the number of observations is n - 8, the number of combinations of the fixed and random factors that contain data is k -- 5, the number of fixed treatment combinations is v - 2, and the number of replicates in the ij th combination of blocks and treatments is rij where
Random blocks (i) Ylll Fixed treatments
(j)
Figure 9.5.1
Y211
Y3! 1
Y212 YI21
Y321 Y322 Y323
Unbalanced Experimental Layout for Example 9.5.1.
9
Less Than Full Rank Models
173
-- rl2 -- r31 = 1, r 2 1 - - 2, and r 3 2 - - 3. The 8 x 5 replication matrix R, the 5 x 2 matrix C, and the 2 x 1 mean vector/~ are given by
rll
1 1 R =
0 12
,
0
C -
1 13
1 0 1 1 0
0 1 0 0 1
,
and ~ =
#2
Note that the k x v matrix C can also be constructed using the relationship C = MX* where matrices M and X* are defined according to the methods in Section 7.2. For Example 9.5.1, the 5 x 6 pattern matrix M and the 6 x 2 matrix X* are given by
M=
1 0 0 ~
0 1 0 0 0
0 0 1 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 0 1
and
X*-13|
and the 5 x 2 matrix C - MX*. The general mean model Y = RC/~ + E also applies to the examples in Sections 9.1 through 9.4 where all the main factors were fixed and the nested replicates were random. In such cases, v = k and the k • v (i.e., the k • k) matrix C = Ik. The analysis approach presented in Section 9.3 is now summarized for the general mean model Y = RC/~ + E when E ~ N n ( 0 , o'2V) and V is a known n • n positive definite matrix. The weighted least squares estimator of/~ is given by /2 w = ( C ' R ' V - 1 R C ) - I C ' R ' V - 1 y . The weighted estimator/2 w is unbiased for ~ since E(/2w) = E [ ( C ' R ' V - 1 R C ) -1 C ' R ' V - 1 y ] - ( C ' R ' V - 1 R C ) - I C ' R ' V - 1 R C / ~ - / ~ . The covariance matrix of/2 w is given by cov(/2w) = ( C ' R ' V - 1 R C ) - I C ' R ' V -I(Cr2V)V-1RC(C'R'V-1RC) -1 = cr2 ( C ' R ' V - 1 R C ) -1 and the weighted least squares estimator of tr 2 is ^2
O"
w
- -
Y'[V -1
__
V-1RC(C'R'V-1RC)-IC'R'V-1]y/(n
__
k)
.
The ANOVA table for the weighted analysis is provided in Table 9.5.1. In the weighted case, the 100(1 - y ) % confidence bands on t'/~ are given by t/~w 4- t n•_ k { [ Y , A 3 w Y / ( '^
n -
k)][t'(C'R'V-1RC)
lt}1/2
174
Linear Models Table 9.5.1 Mean Model Weighted ANOVA Table Source
df
SS
Overall mean
1
Y'V -1 In (l~nV -1 ln) -11~V - I y
Treatment combinations
k- 1
Y ' V - l [RC (C'R'V-1 RC)-1
CtR t
--In (l'n v - 1 ln) -1 I~]V-1 y Residual
n - k
y'[v-1 _
Total
n
ytv-1 Y
V-IRC(C'R'V-IRC)-IC'R'V-1]y
where A3w is the residual sum of squares matrix from Table 9.5.1. A V level test of Ho "/~ = clk versus H1 "/z # clk is to reject Ho if F* > F~_l,n_ k where Fw*=
Y'A2wY/(k - 1) Y'A3wY/(n - k)
and A2w is the treatment combination sum of squares matrix from Table 9.5.1. The derivations of the confidence band and the test statistic are left to the reader. Models of the form Y = R C # + E with E ~ N,, (0, trZv) are also encountered when V is an n x n positive definite matrix whose elements are functions of m unknown variance parameters. The analysis for this type of model is discussed in Chapter 10.
EXERCISES Use the data in Table 5.1.1 to solve Exercises 1-5. 1. A researcher wants to fit the model
Yi = flo + i~lXil + ~2xi2 + i~3xi3 + fl4xi4 -+- Ei where i = 1 . . . . . 10, Xil is the speed of the ith vehicle, Xi2 is the speed • grade of the ith vehicle, xi3 is the speed • speed of the ith vehicle, and Xi4 is the speed • speed • grade of the ith vehicle. (a) Write the above model in matrix form Y - RXo/3+E where the 5 • 1 vector /3 = (/50, t51,152,133,/54)'. Define all terms and distributions explicitly. (b) Is/3 estimable? Explain. (c) Write the mean model Y - R # + E for this problem. Define all terms and distributions explicitly.
9
Less I han 14ull Rank Models
175
(d) What is the rank and dimension of X d ? 2. Find the ordinary least-squares estimates of/~, cr2, and cov(/2) for the mean model Y = R/~ + E. 3. Use the mean model to do the following: (a) Construct the ANOVA table. (b) At the y = 0.05 level, test the hypothesis H0 : /z = ctl4 versus H1 :/z 7~ ctl4 where/~ = (/s # 2 , / z 3 , # 4 ) ' . 4
(c) Place 99% confidence bands on Y]k=l/zk. 4. Show that/32 + 50/34 is estimable by defining vectors g and t such that g - X~t. The parameters/30,/31, and/32 are defined in Exercise 1. 5. For the mean model, assume cov(E) = o'2V where V=
[101 0
2
|
(a) Find the weighted least-squares estimates of ~, cr2, and cov(~). (b) Construct the weighted ANOVA table. (c) At the y = 0.05 level, test the hypothesis H 0 : ~ 1 = ~ 2 = /s = /s v e r s u s H1 : at least one of the #k's is not equal to the others where ~ = (#1, #2, #3, ~4)'. 4
(d) Place 99% confidence bands on Y']~k=l /zk. 6. Consider the experiment described in Figure 7.3.1. (a) Write the general mean model Y = RC/~ + E defining all terms and distributions explicitly. (b) In Section 7.3, the model for the experiment was written V = RMX*/3 + E where E ~ N9(0, E). Define relationships between the terms R, C, #, and E from the mean model in part a and the terms R, M, X*,/3, and E from the model in Section 7.3. In particular, define the matrix Xd such that /, = Xd/~. What is the rank and dimension of Xd? (C) Is the model Y = RMX*/~ + E from Section 7.3 a less than full rank model? (d) If/2 and/9 are the ordinary least-squares estimators of/z and/3, respectively, find the relationship between 12 and/9.
176
Linear Models
7. Consider the balanced incomplete block experiment described in Figure 8.1.1. (a) Write the general mean model Y = R C ~ + E defining all terms and distributions explicitly. (b) In Section 8.1, the model for the design was Y = MX*~ + E where E ~ Nbk(0, E). Relate the terms R, C, ~, and E from the general mean model in part a to the terms M, X*,/3, and E from the model in Section 8.1. (c) Is the model Y = MX*/~ + E from Section 8.1 a less than full rank model?
10
The General Mixed Model
Any model that includes some fixed factors and some random factors is called a mixed model. Numerous examples of mixed models have been presented throughout the text. For example, the model for the experiment described in Figure 4.1.1 is an example of a balanced mixed model. The general class of mixed models applies to both balanced and unbalanced data structures. Furthermore, the general mixed model covariance structure contains a broad class of matrix patterns including those discussed in Chapter 4. In this chapter the analysis of the general mixed model is presented. Balanced mixed model examples from previous chapters are reviewed and new unbalanced examples are presented to illustrate the general mixed model approach.
10.1
THE MIXED MODEL STRUCTURE AND A S S U M P T I O N S
The mixed model is applicable whenever an experiment contains fixed and random factors. Consider the experiment presented in Table 4.1.1. The experiment has three factors where B and R are random factors and T is a fixed factor. The
177
178
Linear Models
model is
Yijs = lzj + Bi + BTij + R(BT)(ij)s for i = 1 . . . . . b, j = 1 . . . . . t, and s = 1 . . . . . r where # j represents the fixed portion of the model and Bi + BTij + R(BT)(ij)s represents the random portion. Assume a finite model error structure where Bi ~" iid NI(0, cr~),(Ib | fit) (BT11 . . BTbt) . ~ .Nb(t-1)(0, . . cr2Tlb| 1) and R(BT)(ij), ~" iid N1 (0, O.R(nr~) . 2 Furthermore, assume the three sets of random variables are mutually independent. In matrix form the mixed model is Y = RC/z + Ulal + Uza2 + U3a3 where the btr x 1 vector Y = (Ylll . . . . . Yllr . . . . . Ybtl . . . . . Ybtr) t. The fixed portion of the model is given by RC/~ where the btr x bt replication matrix R = Ib | It | lr, the bt x t matrix C = lb @ It, and the t x 1 mean vector /z = (/Zl . . . . . #t)'. The random portion ofthe model is Ulal + U z a : +U3a3 where theb x 1 random vector al ~ Nb(0, cr~Ib), thebtr • matrix U1 = I b | the b ( t - 1) x 1 random vector a: -~ Nb(t-1)(0, Cr2TIb| thebtr • 1)matrix U: = Ib | Pt | lr, the (t - 1) x t matrix P't is the lower portion of a t-dimensional Helmert matrix, the btr • 1 random vector a3 ~' Nbtr(O, cr~(Br)Ib | It @ Ir), and the btr x btr matrix U3 = Ib @ It @ Ir. Furthermore, vectors al, a2, and a3 are mutually independent. Therefore, the btr x 1 random vector Y ~ Nbtr( R C # , •) where --- o 2 U 1 U 1
t
+ 02TU2U;
2 t --I- O'R(BT)U3U 3
=cr2[Ib|174
[ (1) Ib|
I,--J,
|
]
t
+ cr2(Br)[Ib | It N Irl. Note that E matches the finite model covariance structure given in Section 4.2. If an infinite model is assumed, the same mixed model can be used with U2 replaced by Ib | It | lr. The mixed model can be generalized as m
V -- RC/~ + Z U f a
f
f=l
where Y is an n x 1 random vector of observations. The fixed portion of the model is given by RC/~ where R is an n x k replication matrix of rank k, C is a k x v matrix of rank v, and # is a v x 1 mean vector. The random portion of the model m is ~-'~f=l U f a f where Uf is an n x qf matrix of rank qf, af is a qf x 1 random vector distributed Nql (0, cr}lql), and al . . . . . am are mutually independent. The
10
General Mixed Model
179
general mixed model can also be written in the form
Y = RC# + E where the n x 1 random vector E ~ Nn(0, E) and E = E f = l o'~UfUf. The n x v matrix RC has full column rank v. Therefore, the elements of # are the expected values of the random, variables in the v distinct fixed factor combinations that contain data. Consequently, this form of the mixed model is a full rank model and could logically be called the mixed mean model. Similarly, the random portion of the model ET_ 1U f a f is sometimes called the variance components portion of the model since the distributions of the m random vectors al . . . . . am are functions of the m variance components of trl2. . . . . cr2. Although the general mixed model was motivated using a complete, balanced design with three factors, the model applies to a broad class of balanced and unbalanced designs with a wide variety of covariance structures. In fact, every experimental layout and covariance structure covered in this text can be modeled in the general mixed model format. However, because the general mixed model format applies to such a broad class of experiments, no one analysis approach is "best" for all cases. A number of different analysis approaches are available. Two approaches for analyzing the random portion of the model are presented in Sections 10.2 and 10.3. A numerical example of the random portion analysis is provided in Section 10.4. An analysis of the fixed portion of the model is presented in Section 10.5. m
10.2
!
RANDOM PORTION ANALYSIS: TYPE I SUM OF SQUARES METHOD
In some mixed models, estimates of the variance components can be derived by calculating the Type I sums of squares for the fixed portion first and then calculating m Type I sums of squares for the subsequent random factors. The expectations of the Type I sums of squares for the subsequent m random factors do not involve since they are calculated after the fixed portion has been removed. Setting these m expectations equal to the corresponding Type I sums of squares produces m equations and the m unknowns tr 2 . . . . . Crm 2 . Estimates of the m unknown variance parameters can then be derived provided the m equations are linearly independent. Balanced and unbalanced examples of the method are given next.
Example 10.2.1
Consider the complete, balanced design described in Section 10.1 where B and R are random factors and T is a fixed factor. Since the design is complete and balanced, the Type I sums of squares equal the sums of squares defined by the algorithm in Section 4.3. Therefore, the Type I sums of squares due
180
Linear Models
to the three random effects are given by SS (BI overall mean, T) -- Y'
[(1) 1 Ib -- ~Jb
@ tJt •
= Y'A3Y, SS (BTI overall mean, T, B) -- Y'
[(l)
lb -- ~Jb
@
Jr Y
(1)
It -- t J t
| ~ J r ] Y = Y'A4Y
[
SS (R(BT)I overall mean, T, B, BT) = Y' Ib | It |
( ')l L - --Jr r
Y
= Y'AsY. The expectations of these three Type I sums of squares are 2 E[Y'A3YI = ( b - 1 ) [ r t ~ + ~R(sr)l,
E[Y'A4Y] = ( b - 1 ) ( t - 1)[ro'~r +
2 ~R(sr)l
E[Y'AsY] = bt (r - 1)o'S(st ). Setting the three Type I sums of squares equal to their expectations and solving for o~, O'er, and rrR(Br ) 2 produces the estimators
62 -- { Y ' A 3 Y / ( b - 1 ) - Y ' A s Y / [ b t ( r - 1)]}/(rt) 82r = {Y'A4Y/[(b- 1 ) ( t - 1 ) ] - Y ' A s Y / [ b t ( r - 1)]}/r O.R(BT ),,2
-- Y ' A s Y / [ b t ( r - 1)]
In general, if any of the variance estimates is less than zero, set it equal to zero.
Example 10.2.2 Consider the unbalanced experimental layout given in Figure 9.5.1. Let the 8 x 1 random vector Y - (Ylll, Y121, Y211, Y212, Y311, Y321, Y322, I:323)'. The mean model for this experiment is Y-
RC/~+E
where the 2 x 1 mean vector/~ = (#1,/z2)' and the 8 x 1 random error vector E : (Elll, El21, E211, E212, E311, E321, E322, E323)' ~' N8(0, E). The 8 x 5 replication matrix R and the 5 x 2 matrix C are given in Example 9.5.1. Let Y'A3Y, Y'AaY, and Y'AsY equal the SS (BI overall mean, T), SS (BTI overall mean, T, B) and SS (R(BT)I overall mean, T, B, BT), respectively. The matrices A3, A4, As, and E were generated using the SAS PROC IML program listed in Section A5.1 of Appendix 5. The program output and derivations of the various
10
General Mixed Model
181
results are also given in Section A5.1. From Appendix 5 the expected values of the three Type I sums of squares are E[Y'A3Y] = 4a 2 + 0.8cr2r E[Y'A4Y] = 1.2Cr2T +
+
2 2CrR(BT )
0"2(BT)
2 E[Y'AsY] = 3trg(sr ).
Setting these three Type I sums of squares equal to their expectations and solving for cr2, trEr, and produces the estimators
(TR(BT)2
82 = {Y'AaY- (2Y'A4Y/3) - (4Y'AsY/9)}/4 82r = {Y'A4Y- (Y'AsY/3)}/1.2 ,, 2 O'R(BT )
--
Y'AsY/3.
We now demonstrate that the covariance structure for the unbalanced experiment in Example 10.2.2 follows the mixed model covariance format. Following the methods described in Section 7.3, the covariance matrix of E from Example 10.2.2 is given by E - RME*M'R'
+
2 CrR(BT)I8
where the 8 x 5 replication matrix R is provided in Example 9.5.1. The 6 x 6 matrix E* and the 5 x 6 pattern matrix M are given by E * = a~I3 | J2 + cr2~13 |
(12-~J2)
and 1 M
~..
0 1 0 0 0
0 0 1 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 0 1
.
Therefore,
[ (
]~ = ~r2RM(13 ~ Jz)M'R' + a 2 r R M 13 |
I2 - 2J2
)]
M'R' + crR(Br)18
=-- cr2RM(I3 @ 12)(13 @ I~)M'R' + cr2rRM[13 @ P2][I3 @ P[IM'R' + O'2(BT)I8 =
o'?U1U]
2 t 2 t --1--0"~ U z U 2 --]- 0"3 U 3 U 3
182
Linear Models
where Crl2 = cry, ~r2 = cr~r , cr~ = cr~(Br), the 8 • 3 matrix U 1 - - RM(I3 | 1:!), the 8 x 3 matrix U2 = RM(I3 | P2), the 8 • 8 matrix U3 = 18, and the 1 x 2 matrix P~ is the lower portion of a two-dimensional Helmert matrix with P2P~ =
1 I2- ~J2. 10.3
RANDOM PORTION ANALYSIS: RESTRICTED MAXIMUM LIKELIHOOD METHOD
The maximum likelihood estimators (MLEs) of the variance components are the values of tr 2 . . . . . cr2 that maximize the function e (/z, ~r2 . . . . . Crm 2, Y) =
(27r)-n/21XI-1/2e-{(Y-RCIz)'x-l(y-RClz)}/2
where the maximization is performed simultaneously with respect to the k + m terms/,, tr 2 . . . . . Crm 2. By Theorem 6.3.1, for complete, balanced designs, the M L E o f / , equals the ordinary least-squares estimator ( C ' D C ) - I C ' R ' Y and the M L E of or} for f = 1 . . . . . m equals a linear combination of the sums of squares for the m random effects and interactions. However, in general, for unbalanced designs, derivation of the MLEs o f / ~ , trl2 . . . . . trm2 can be tedious and usually involves numerical techniques such as the Newton-Raphson method. Russell and Bradley (1958), Anderson and Bancroft (1952), W. A. Thompson (1962), and later Patterson and R. Thompson (1971, 1974) suggested what is called a restricted maximum likelihood (REML) approach. The REML estimators of trl2 . . . . . Crm 2 are derived by first expressing the likelihood function in two parts, one involving the fixed parameters,/~, and the second free of these fixed parameters. To construct the REMLs, let G be any n x (n-k) matrix ofrank n - k such that G ' R C = 0(n-k)• For example, G can be defined such that G G ' = I n - R C ( C ' D C ) - I C ' R ' and G ' G = I,,_k where D = R'R. Next, transform the n x 1 random vector Y by the n x 1 nonsingular matrix [RC, G]' where C'R'
y ~
C'DC/~
,
C'R'ERC G'ERC
C'R'EG] ) G'EG
"
The distribution of G ' Y is free of the fixed parameters/~. The REML estimators of the variance components are the values of tr 2 . . . . . tr2 that maximize the marginal likelihood of G'Y: s (a2 . . . . . a2, y ) =
(27r)-(n-k)/21G,~Gl-1/2e-Y'(G'XG)-ly/2.
As with the maximum likelihood approach, numerical techniques are often necessary to determine the R E M L estimates of tr 2 . . . . . cr2. Furthermore, the maximization is performed under the restriction that the estimators of trl2 . . . . . Crm 2 are all positive.
I0
General Mixed Model
183
The SAS PROC VARCOMP routine has a Type I sum of squares and a restricted maximum likelihood option. In the next section a numerical example is analyzed using PROC VARCOMP.
10.4
R A N D O M P O R T I O N ANALYSIS" A NUMERICAL EXAMPLE
Suppose that the observations in Table 9.5.1 take the values Y = (Ylll, Y121, Y211, Y212, Y311, Y321, Y322, Y323) t - - (237, 178,249,268, 186, 183, 182, 165)'. All of the numerical calculations are performed using the PROC VARCOMP output listed in Section A5.2 of Appendix 5. The Type I sums of squares option of the PROC VARCOMP program provides the SS (BI overall mean, T) = Y'AaY = 2770.8, SS (BTI overall mean, T, B) = Y'A4Y = 740.03 and SS (R(BT)I overall mean, T, B, BT) = Y'AsY = 385.17. These values are now substituted into the variance parameter relationships derived in Example 10.2.2. Therefore, the finite model estimates of a 2, a 2 r , and O'R2(B T) are 82 _ [2770.8 - 2(740.03/3) - 4 ( 3 8 5 . 1 7 / 9 ) 1 / 4 = 526.56 #2 T - [740.03 - (385.17/3)]/1.2 - 509.70
O. ) ,R 2,(BT
--
385.17/3 -- 128.39.
The PROC VARCOMP program will actually provide variance parameter estimates automatically for both the Type I sum of squares and the REML options. However, when calculating the variance parameter estimates, PROC VARCOMP assumes an infinite model for both options. Conversion between the infinite and finite model estimates can be accomplished quite easily, however, since one model is simply a reparameterization of the other. For example, let the variance parameters for the finite model remain a 2, a 2 r , and o'R(/~T).2 Let a2,, a 2 r , , and 2 aR(Br), be the corresponding variance parameters for the infinite model. Then
1 2 0.2T = a~r,, and O'R(BT where t is the number of a~ = a~, + 7aBr,, e ) = aR(sr), e fixed treatment levels. Therefore, using the preceding relationships, infinite model variance parameter estimates provided by PROC VARCOMP can be converted directly to finite model estimates. From the PROC VARCOMP output in Section A5.2, the Type I sum of squares estimates of the infinite model variance parameters are ~2, = 271.71, ~2T, = 509.70, and d2(sr), = 128.39. Therefore, the corresponding finite model variance parameter estimates are
1
= 271.71 + ~ (509.7) - 526.56
184
Linear M o d e l s
6 , 2 = 6.27,* = 509.7 6.2(B7~) = 6.2(B~), = 128.39. These finite model estimates agree exactly with the estimates derived earlier using the Example 10.2.2 variance parameter relationships. The expected mean squares (EMSs) for the infinite model are also automatically given by the Type I sum of squares option of the PROC VARCOMP program. The three EMSs are 2 E[Y'A3Y/2] = 2o'2, + 1.4o'27., + o'R(sr)* 2 E[Y'A4Y/1] - 1.2o'2r, + o'R(BT)*
E[Y'AsY/3] -
o ' R2( B T ) *
"
Using the relationships that link the finite and infinite model parameters, the expected mean squares for the finite model are E[Y'A3Y/2] = 2 o'2 _ 2o'2r = 2o'2 + 0"4~
+ 1.4o'2r + o'R(BV) + o.~(s:r)
2 E[Y'AaY/1 ] -- 1.2o'27. + o.R(sr)
E[Y'AsY/3] = a~(sr ). These finite model EMSs are equivalent to the expectations derived in Example 10.2.2. The REML option of PROC VARCOMP was also run on the data set. Assuming the infinite model, the program provides the REML variance parameter estimates 6.2, = 88.60, 6.2r, -- 726.62, and 6.2(B~r), = 129.15. The corresponding REML variance parameter estimates for the finite model are therefore 1
1
6.2 _ 6.2, + 2 6 . 2 , _ 88.6 + ~(726.62) -- 451.91 627~ -- 62~, = 726.62 ,,2
o'R(BT)
10,5
--
,,2
o'R(BT)*
FIXED PORTION
-129.15. --
ANALYSIS
In Section 10.4 REML and Type I sum of squares estimates of the variance parameters o.2 . . . . . o.2 were derived for both the finite and infinite models. Therefore,
10
General Mixed Model
185
the covariance matrix E can be estimated by "2 u2u2t + . . . + ~z ^ 2 U m U tm ~: = e?u1u'~ + a~
where 82, . . . . a^m 2 are a set of variance parameter estimates. The weighted leastsquares estimator of/~ can therefore be estimated by ~w - (C'R'~2-1RC)
1C'RE-1Y-
The BLUE of t'/~ can be estimated by t'/2 w where t is any nonzero v x 1 vector of constants. The covariance matrix of the weighted least- squares estimator of/~ can be estimated by c~(/2w) - ( C , R , ~ - I R C ) -1 and the variance of the BLUE of t'/~ is estimated by
I^ var(t/~w)
.-- t / ( C ' R ' E - 1 R C )
-it.
For large samples, (t'/2w - t'/~)/v/f'a~(t'/2w) ~ N1 (0, 1). Therefore, a 100(1 F)% confidence band on t'/~ is
t/~w ,~ + Z• v/~(t,/~w) Hypothesis tests on various fixed effects can be constructed using a Satterthwaite approximation. Suppose the expected mean square of a certain fixed effect is kl 0"2 + k2ff 2 at-... + kmff 2 + Op(F) where 9 (F) is the fixed portion of the expected mean square and k l . . . . . km are constants. An unbiased estimator of the first m terms of the expected mean square can be constructed using a linear combination of random effect mean squares. That is, W - - c l M S 1 -+- c 2 M 8 2 + " "
-+-
cmMSm
where E(W) = klCr2 + kzcr2 + - . - + kmcr2 and where M S / i s the mean square of a particular random effect for f - 1 . . . . . m. If MSF is the mean square for a specific fixed effect with Vr degrees of freedom, then the statistic M S F / W is approximately distributed as an F random variable with Vv and ~ degrees of freedom where W2
~--" { (clMsl)2v. -JI-"'"
71- (cmMSm)2
and Vl . . . . . Vm are the degrees of freedom associated with MS1 . . . . . MSm. Therefore, a y level test on the equality of the fixed treatment means is to reject if M S F / W > F~,f.
186
Linear Models
All of these fixed portion calculations are performed for an example problem in the next section.
10.6
FIXED PORTION ANALYSIS: A NUMERICAL E X A M P L E
An estimate of #, a confidence band on t'#, and a hypothesis test on the fixed treatment effect are now calculated using the data in Section 10.4. The numerical calculations are performed using the SAS PROC IML procedure detailed in Section A5.3 of Appendix 5. All calculations are made assuming a finite model with Type I sums of squares estimates for ~ , cr~r, and aR(BT)" 2 From Section A5.3 the estimate of ~ = (#1,/22)' is given by ~w - ( ~ , / ~ 2 ) ' -
(C'R'E-1RC) - 1 C ' R ' ~ - ~ Y -
[227.94] 182.62 "
The estimated covariance matrix of the weighted least squares estimator of ~ is co~-~(~w) = (C,R,~_IRC)_I = [ 295.78 88.33 An estimate of the BLUE of/x~
-
#2
"-- t t j L t
,~ t~w--(1,-1)
88.33] 418.14 "
fort' = (1, - 1 ) is given by
[227.94] 182.62 =45.32.
The variance of the BLUE of #~ - #2 is estimated by var(t ~w) = t ' ( C ' R ' s
~t 88.33
Therefore, 95% confidence bands on
~1
418.14 --
~2
-1
= 537.26.
are
t'~w + Zo.o25v / ~ ( t ' ~ w ) - 45.32 -4- 1.96~/537.26 = (-0.11, 90.75). A Satterthwaite test on the treatment means can be constructed. Note that E [ Y ' A 2 Y / 1 ] - a 2 + 1.5r
+
r
*(T)
where Y'AzY is the Type I sum of squares due to treatments given the overall mean and 9 (T) is the fixed portion of the treatment EMS. Furthermore, 0.2 + 1.5cr2r + tx~(sr ) -- c,E[Y'A3Y/2] + ceE[Y'A4Y] + c3E[Y'AsY/3] where Cl = 1/2, c2 = 1.3/1.2, and c3 = -0.7/1.2. Therefore, the Satterthwaite
10
General Mixed Model
187
statistic to test a significant treatment effect is MSF/W =
=
Y'A2Y/1 Cl (Y'A3Y/2) + c2(Y'A4Y/1) + c3(Y'AsY/3) 6728 1419.51
= 4.74.
The degree of freedom for the denominator of the Satterthwaite statistic is W2
~= / (clY'A3Y/2)22"~ (c2Y'A4Y/1)21"q- (c3Y'A}sY/3)2 3 =
(1419.51) 2
{ (2770"8/4)22-~ [(1"3/1"2)(740"03)]21 ~- [(-~
= 2.28.
Since M S F / W -- 4.74 < 18.5 - F 1,2 ~176, do not reject the hypothesis that there is no treatment effect. EXERCISES For Exercises 1-6, assume that E has a multivariate normal distribution with mean vector 0 and covariance matrix E = ~-'~7=1o'}UfU~. 1. Write the general mixed model Y = RC/~ + E for the two-way cross classification described in Example 4.5.2. Define all terms and distributions explicitly. 2. Write the general mixed model Y = RC/~ + E for the split plot design described in Example 4.5.3. Define all terms and distributions explicitly. 3. Write the general mixed model Y = RC/~ + E for the experiment described in Example 4.5.4. Define all terms and distributions explicitly. 4. Write the general mixed model Y = RC/~ + E for the experiment described in Figure 7.3.1. Define all terms and distributions explicitly. 5. Write the general mixed model Y = RC/~ + E for the experiment described in Exercise 12 in Chapter 5. Define all terms and distributions explicitly. 6. Write the general mixed model Y = R C ~ + E for the experiment described in Exercise 7 in Chapter 4. Define all terms and distributions explicitly. 7. Letthe 8 x 1 vector of observations Y = (Ylll, Yl12, Y121, Y122, Y211, Y:I:, Y2:l, Y222)' = (2, 5, 10, 15, 17, 14, 39, 41)' represent the data for the experiment described in Example 10.2.1 with b = t = r = 2.
188
Linear Models
1
W. Plot (j) Split plot (k)
1
23.7
2
28.9
3
17.8
Figure 10.6.1
37.2
Replicates (i) 2 W. Plot (j)
3 W. Plot (j)
21.4 27.9
60.6
21.9 40.5
32.6
21.2
45.1
41.6
Split Plot Data Set for Exercise 9.
(a) Find the Type I sum of squares estimates of cr2, a finite model.
crZr, and tYR(BT ) 2 assuming
crZ(Br)
(b) Find the REML estimates of a 2, crZr, and assuming a finite model. Are the Type I sum of squares and REML estimates of the three variance parameters equal? 8. The observations in Table E 10.1 represent the data from the split plot experiment described in Example 4.5.3 with r = s -- 3, t = 2, and with some of the observations missing. (a) Write the general mixed model Y -- RC/~ + E for this experiment where E ~ N13(0, ~]) and Z - E ) = I o'}UTU~. Define all terms and distributions explicitly. (b) Find the Type I sum of squares and REML estimates of the variance parameters defined in part a assuming a finite model. (c) Find the Type I sum of squares and REML estimates of the variance parameters defined in part a assuming an infinite model. (d) Calculate the estimates/~w and cb'~(/2w). (e) Construct a 95% confidence band on the difference between the two whole plot treatment means. (f) Construct a Satterthwaite test for the hypothesis that there is no significant whole plot treatment effect. (g) Construct a Satterthwaite test for the hypothesis that there is no significant split plot treatment effect.
A P P E N D I X 1: C O M P U T E R O U T P U T F O R C H A P T E R 5
Table A1.1 provides a summary of the notation used in the Chapter 5 text and the coresponding names used in the SAS PROC IML computer program. Table AI.1 Chapter 5 Notation and Corresponding PROC IML Program Names Chapter 5 Notation
PROC IML Program Name
Y, X1, Xc, X /~, 6.2 yt 1 nJ.Y
Y, X1, XC, X BHAT, SIG2HAT SSMEAN SSREG SSRES SSTOT WBETAHAT, WSIG2HAT SSWMEAN
ytXc(X'cXc)-IX'cY 1 Y'[In - ~J. - Xc(X'cXc)-~X'c]Y Y'Y /~w, (7^w2 y , v - i [ln (1" V -1 ln) -1 l'n ]V- ] Y Y ' V - 1 [ X ( X ' V - 1 X ) - I X' -
ln(l'nV-11n)-lltn]V-1y
Y'[V -1 - V - 1 X ( X ' V - 1 X ) - I x ' v - 1 ] Y Y'V- 1y Y'ApeY Y'[In -- X(X'X)-]X ' - Ape]Y SS(x] I overall mean) SS(x21 overall mean, xl )
SSWREG
SSWRES S SWTOT SSPE SSLOF SSX1 SSX2
The SAS PROC IML program and output follow:
PROC IML; Y={1.7, 2.0, 1.9, 1.6, 3.2, 2.0, 2.5, 5.4, 5.7, 5.1}; Xl = J(lO, 1, 1); XC = { - 1 5 - 1 0 2 , - 1 5 -102, - 1 5 -102, - 1 5 -102, -15 18, 15 -102, 15 -102, 15 198, 15 198, 15 198 }; X = Xll IXC; BHAT = INV(T(X)*X)*T(X)*Y;
189
190
Linear Models
SIG2HAT = (1/7)#T(Y)*(I(10)- X*INV(T(X)*X)*T(X))*Y; A 1 = X 1 *INV(T(Xl )*Xl )*T(Xl ); A2 = XC*INV(T(XC)*XC)*T(XC); A3 = 1(10) - A1 - A 2 ; S S M E A N = T(Y)*AI*Y; SSREG = T(Y)*A2*Y; SSRES = T(Y)*A3*Y; SSTOT = T(Y)*Y; V = B L O C K ( l , 2) @ 1(5); WBETAHAT = INV(T(X)*INV(V)*X)*T(X)*INV(V)*Y; WSIG2HAT=( 1/7)#T(Y)*(INV(V)- INV(V)*X*I NV(T(X)*INV(V)*X) *T(X)*INV(V))*Y; AV1 = INV(V)*Xl *INV(T(Xl )*INV(V)*Xl )*T(Xl )*INV(V); AV2 = INV(V)*X*INV(T(X)*INV(V)*X)*T(X)*INV(V)- AV1; AV3 = INV(V) - AVl - A V 2 ; S S W M E A N = T(Y)*AVI*Y; S S W R E G = T(Y)*AV2*Y; S S W R E S = T(Y)*AV3*Y; SSWTOT = T(Y)*INV(V)*Y; B1 = 1(4) - J(4,4,1/4); B2 = 0; B3 = 1(2) - J(2,2,1/2); B4 = 1(3)- J(3,3,1/3); APE = BLOCK(B1 ,B2,B3,B4); ALOF = A 3 - A P E ; SSLOF = T(Y)*ALOF*Y; SSPE = T(Y)*APE*Y; R1 = X(11:10,11); R2 = X(I1:10,1:21); T1 = RI*INV(T(R1)*R1)*T(R1); T2 = R2*INV(T(R2)*R2)*T(R2)-T1; T3 = X*INV(T(X)*X)*T(X)-R2*INV(T(R2)*R2)*T(R2); S S X l = T(Y)*T2*Y; SSX2 = T(Y)*T3*Y; PRINT BHAT SIG2HAT SSMEAN SSREG SSRES SSTOT WBETAHAT WSIG2HAT SSWMEAN SSWREG SSWRES SSWTOT SSLOF SSPE S S X l SSX2;
Appendix 1
191
BHAT SIG2HAT SSMEAN SSREG 3.11 0.0598812 96.72124.069831 0.0134819 0.0106124 WBETAHAT WSIG2HAT SSWMEAN SSWREG 3.11 0.0379176 57.408333 14.581244 0.013 0.0107051 SSLOF 0.0141687 QUIT; RUN;
SSPE 0.405
SSRES 0.4191687
SSTOT 121.21
SSWRES SSWTOT 0.2654231 72.255
SSXl SSX2 1 0 . 6 0 9 13.460831
A P P E N D I X 2: C O M P U T E R O U T P U T F O R C H A P T E R 7
A2.1
C O M P U T E R O U T P U T F O R S E C T I O N 7.2
Table A2.1 provides a summary of the notation used in Section 7.2 of the text and the corresponding names used in the SAS PROC IML computer program. Table A2.1 Section 7.2 Notation and Corresponding PROC IML Program Names Section 7.2 Notation
PROC IML Program Name
X* I3 | J3 I3 | (I3 -- 89 X1, Q3, z1, x2, z2 M, X = MX* M[I3 @ J3]M' M[I3 | (I3 -- ~1J3)]M t Sl,T1 A1 . . . . . A4 A1M[I3 | J3]M' . . . . . A4M[I3 | J3]M t A1M[I3 | (I3 - 89 ~. . . . . A4M[I3 | (I3 - 89 XtA1X . . . . . XIA4 X S~, T~ A~, A; X fA~X tr[A~M(I3 | J3)M'] tr{A~M[I3 | (I3 - ~1J3)]M t }
XSTAR SIG 1 SIG2 x1, Q3, z1, x2, z 2 M, X SIGM1 SIGM2 S1, T1 A1 . . . . . A4 A1SIGM1 . . . . . A4SIGM1 A1SIGM2 . . . . . A4SIGM2 XA1X . . . . . XA4X S2STAR, T1STAR A2STAR, A3STAR XA3STARX TRA3ST1 TRA3ST2
~
The SAS PROC IML program for Section 7.2 and the output follow:
PROC IML; XSTAR=J(3, 1, 1)@1(3) SIG1=I(3)@J(3, 3, 1); SIG2=I(3)@(I(3)-(1/3)#J(3, 3, 1)); Xl=J(9, 1, 1); Q 3 = { 1 1, -1 1, 0 - 2 }; ZI=Q3@J(3, 1, 1); X2=J(3, 1, 1)@Q3; Z2=Q3@Q3; 193
194
M={0 1 0 0 0 0 0 0 0 0 0 0 0
Linear Models
0 1 0 0 0 0
0 0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 1 1 0 }
0 0 0 0 0 ;
0, 0, 0, 0, 0,
X=M*XSTAR; SlGM1 =M*SlG 1*T(M); SlGM2=M*SlG2*T(M); SI=M*Xl; TI=M*(XllIZ1); S2=M*(Xl IlZl IIX2); A1 =$1 *1NV(T(S1)*$1 )*T(S 1); A2=T1 *1NV(T(T1)*T1 )*T(T1 )-A1; A3=S2*INV(T(S2)*S2)*T(S2)-A1 -A2; A4= I(3)@ I(2)-A 1-A2-A3; A1SIGM1 =A1 *SIGM 1; A1SIGM2=AI*SIGM2; A2SIGM1 =A2*SIGM1; A2SIGM2=A2*SIGM2; A3SIGM1 =A3*SIGM1; A3SIGM2=A3*SIGM2; A4SIGMI=A4*SIGM1; A4SIGM2=A4*SIGM2; XAlX=T(X)*AI*X; XA2X=T(X)*A2*X; XA3X=T(X)*A3*X; XA4X=T(X)*A4*X; S2STAR=M*(Xl IIX2); T1STAR=M*(Xl IIX211Zl); A2STAR=S2STAR* INV(T(S2STAR)*S2STAR)*T(S2STAR)-A1 ; A3STAR=T1STAR*I NV(T(T1STAR)*T 1STAR)*T(T1STAR)-A2STA R-A 1; XA3STARX=T(X)*A3STAR*X; TRA3ST1 =TRAC E(A3STAR*SIG M 1); TRA3ST2=TRAC E(A3STAR*SIG M2); PRINT SIG1 SIG2 SIGM1 SIGM2 A1 A2 A3 A4 A1SIGM1 A1SIGM2 A2SIGM1 A2SIGM2 A3SIGM1 A3SIGM2 A4SIGM1 A4SIGM2 XA1X XA2X XA3X XA4X A3STAR XA3STARX TRA3ST1 TRA3ST2
Appendix 2
195
SIG1 1 1 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0
0 0 0 1 1 1 0 0 0
0 0 0 1 1 1 0 0 0
0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 1 1 1
SIG2 0.6667 -0.333 -0.333 0 0 0 0 0 0 -0.333 0.6667 -0.333 0 0 0 0 0 0 -0.333 -0.333 0.6667 0 0 0 0 0 0 0 0 0 0.6667 -0.333 -0.333 0 0 0 0 0 0 -0.333 0.6667 -0.333 0 0 0 0 0 0 -0.333 -0.333 0.6667 0 0 0 0 0 0 0 0 0 0.6667 -0.333 -0.333 0 0 0 0 0 0 -0.333 0.6667 -0.333 0 0 0 0 0 0 -0.333 -0.333 0.6667 SIGM1 1 1 0 0 0 0
1 1 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 0 0 1 1
0 0 0 0 1 1
SIGM2 0.6667 -0.333 0 0 0 0
-0.333 0.6667 0 0 0 0
0 0 0.6667 -0.333 0 0
0 0 -0.333 0.6667 0 0
0 0 0 0 0.6667 -0.333
0 0 0 0 -0.333 0.6667
196
Linear Models
A1 0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
0.333333 0.333333 -0.16667 -0.16667 -0.16667 -0.16667
-0.16667 -0.16667 0.333333 0.333333 -0.16667 -0.16667
-0.16667 -0.16667 0.333333 0.333333 -0.16667 -0.16667
-0.16667 -0.16667 -0.16667 -0.16667 0.333333 0.333333
-0.16667 -0.16667 -0.16667 -0.16667 0.333333 0.333333
-0.33333 0.333333 -0.16667 0.166667 0.166667 -0.16667
0.166667 -0.16667 0.333333 -0.33333 0.166667 -0.16667
-0.16667 0.166667 -0.33333 0.333333 -0.16667 0.166667
-0.16667 0.166667 0.166667 -0.16667 0.333333 -0.33333
0.166667 -0.16667 -0.16667 0.166667 -0.33333 0.333333
-0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667
-0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667
0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667
0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667
-0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667
A2 0.333333 0.333333 -0.16667 -0.16667 -0.16667 -0.16667 A3 0.333333 -0.33333 0.166667 -0.16667 -0.16667 0.166667 A4 0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667 A1SIGM1 0.333333 0.333333 0.333333 0.333333 0.333333 0.333333
0.333333 0.333333 0.333333 0.333333 0.333333 0.333333
0.333333 0.333333 0.333333 0.333333 0.333333 0.333333
0.333333 0.333333 0.333333 0.333333 0.333333 0.333333
0.333333 0.333333 0.333333 0.333333 0.333333 0.333333
0.333333 0.333333 0.333333 0.333333 0.333333 0.333333
Appendix 2
197
A1SIGM2 0.055556 0.055556 0.055556 0.055556 0.055556 0.055556
0.055556 0.055556 0.055556 0.055556 0.055556 0.055556
0.055556 0.055556 0.055556 0.055556 0.055556 0.055556
0.055556 0.055556 0.055556 0.055556 0.055556 0.055556
0.055556 0.055556 0.055556 0.055556 0.055556 0.055556
0.055556 0.055556 0.055556 0.055556 0.055556 0.055556
A2SIGM1 0.666667 0.666667 -0.33333 -0.33333 -0.33333 -0.33333
0.666667 0.666667 -0.33333 -0.33333 -0.33333 -0.33333
-0.33333 -0.33333 0.666667 0.666667 -0.33333 -0.33333
-0.33333 -0.33333 0.666667 0.666667 -0.33333 -0.33333
-0.33333 -0.33333 -0.33333 -0.33333 0.666667 0.666667
-0.33333 -0.33333 -0.33333 -0.33333 0.666667 0.666667
0.111111 0.111111 -0.05556 -0.05556 -0.05556 -0.05556
-0.05556 -0.05556 0.111111 0.111111 -0.05556 -0.05556
-0.05556 -0.05556 0.111111 0.111111 -0.05556 -0.05556
-0.05556 -0.05556 -0.05556 -0.05556 0.111111 0.111111
-0.05556 -0.05556 -0.05556 -0.05556 0.111111 0.111111
A2SIGM2 0.111111 0.111111 -0.05556 -0.05556 -0.05556 -0.05556 A3SIGM1 1.1E-16 1.1E-165.51E-175.51E-17 0 0 1.65E - 16 1.65E - 16 5.51E - 17 5.51E - 17 0 0 5.51E-175.51E-17 1.1E-16 1.1E-16 0 0 5.51E - 17 5.51E - 17 1.65E - 16 1.65E - 16 0 0 0 0 0 0 - 1 . 1 E - 16 - 1 . 1 E - 16 0 0 0 0 - 1 . 1 E - 16 - 1 . 1 E - 16 A3SIGM2 0.333333 -0.33333 0.166667 -0.16667 -0.16667 0.166667 -0.33333 0.333333 -0.16667 0.166667 0.166667 -0.16667 0.166667 -0.16667 0.333333 -0.33333 0.166667 -0.16667 -0.16667 0.166667 -0.33333 0.333333 -0.16667 0.166667 -0.16667 0.166667 0.166667 -0.16667 0.333333 -0.33333 0.166667 -0.16667 -0.16667 0.166667 -0.33333 0.333333
198
Linear Models
A4SIGM1 1 . 1 E - 16 1 . 1 E - 16 0 0 0 0 5 . 5 1 E - 17 5 . 5 1 E - 17 0 0 0 0 0 0 1.1E-16 1.1E-16 0 0 0 0 5 . 5 1 E - 17 5 . 5 1 E - 17 0 0 0 0 0 0 1 . 1 E - 16 1 . 1 E - 16 0 0 0 0 1 . 1 E - 16 1 . 1 E - 16 A4SIGM2 0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667 -0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667 0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667 0.166667 -0.16667 -0.16667 0.166667 XA 1X
XA2X
0.666667 0.666667 0.666667 0.666667 0.666667 0.666667 0.666667 0.666667 0.666667
0.333333 -0.16667 -0.16667 -0.16667 0.333333 -0.16667 -0.16667 -0.16667 0.333333
XA3X
XA4X
1 -0.5 -0.5 -0.5 1 -0.5 -0.5 -0.5 1
1.65E-16 2.76E-17 2.76E-17 2.76E-17 1.65E-16 2.76E-17 -2.8E-17 -2.7E-17 1.65E-16
A3STAR 0.333333 0.166667 0.166667 -0.16667 -0.16667 -0.33333 0.166667 0.333333 -0.16667 -0.33333 0.166667 -0.16667 0.166667 -0.16667 0.333333 0.166667 -0.33333 -0.16667 -0.16667 -0.33333 0.166667 0.333333 -0.16667 0.166667 -0.16667 0.166667 -0.33333 -0.16667 0.333333 0.166667 -0.33333 -0.16667 -0.16667 0.166667 0.166667 0.333333 XA3STARX 2.76E-16 1.65E-16 0 QUIT; RUN;
TRA3ST1 TRA3ST2 1.65E-16 2.76E-16 0
0 0 -2.2E - 16
3
1
Appendix 2
199
In the following discussion, the computer output is translated into the results that appear in Section 7.2. 13 | J3 - SIG1 13|
113-=J3
=SIG2
SO
tr2[I3 |
+ cr2T [I3| ( I 3 - - ~ J 3 ) ] - tr2 SIG1 + tr2T SIG2 M[I3 | J3]M' = 13 | J2 = SIGM1
M[13|
1
_1~J3) M'= [13| (12 _ ~ j2)] = SIGM2
so
M{cr2[13 |
+t72T [13| (13--~J3)] }M '-Cr2 SIGMI +O'2T SIGM2 1
1
~J3 | ~J2 = A1
( 13- 3J3 ) | ~J2 1 = A2
1121 _l I 1
-
1
3 -1
2
= A3
2
1
1] 1 1 | ~J2 = A4
1 1
-1
3 1
-1
1
1 |
1
1
A1M[I3 | J3 ]M' = A 1SIGM 1 = 2A 1 1
A1M [13 | (13 - ~J3)l M' = A1SIGM2 = -A1 3 SO
AIM{or2[13 | J3] + tr2r [13 | (13 - ~J3)] } M' = o.2 A1SIGM1 + cr2r A 1SIGM2
200
Linear Models
A2M[13 | J3]M' = A2SIGM1 = 2A2 A2M[13 | (I3
_
1j
1
SO
A2M{cr2[13 |
+trOT [13| (13--~J3)] }M' = o.2A2SIGM 1 + cr~rA2SIGM2 1
A3M[I3 | J3]M' -- A3SIGM1 - -
06x 6
[ (13 - ~J3')M'- A3SIGM2 = A3
A3M 13 | SO
A3M{tr2[13 |
+cr2r [13@ (13 - ~ 1j 3)]} M' = tr2A3SIGM1 + o2r A3SIGM2 = cr~r A3 A4M[13 | J3]M' = A4SIGM1 = 06x6 1
A4M [13 | (13 - ~J3) ] M' - A4SIGM2 -- A4 SO
A4M{cr~[13 | J3] + cr~r [13 | (I3 - ~J3) ] }M' - o-2A4SIGM1 + cr~rA4SIGM2 - cr~rA4 2 X'A1X - XA1X- :J3 3 SO
2 /3'X'A1X/3 =/3'~J3/3 - ~(/31 + f12 -4- f13) 2
Appendix 2
201
X'A2X : XA2X = ~1 13- ~
3
SO
~'X'AzX/~ - / ~ ' ~
1j) 1 13 - ~ 3 /~ = ~(/~12+ / ~ + r - ~1/~2 -/~/~3 - r
2
X'A3X=XA3X= ( I 3 - ~ J 3 ) SO
/~'X'A3XC] : / ~ ' (I3
--
~J3),8 = (,62 +/~2 + ,823 --
X ' A 4 X ---
XA4X-
/~1] ~2 -- /~1r
-- /~21~3)2
03x3
so
fl'X'A4Xfl = 0 !
,
X A3X :
XA3STARX - -
03x3
SO
/~'X'A~Xfl -- 0 tr[A~o'seM(I3 | J3)M' + tr{A~o~rM [13 | (13 - ~J3)]M'} = cr~TRA3ST1 + cr2TTRA3ST2 : 34 +
A2.2
COMPUTER OUTPUT FOR SECTION 7.3
Table A2.2 provides a summary of the notation used in section 7.3 of the text and the corresponding names used in the SAS PROC IML computer program. The program names XSTAR, SIG1, SIG2, X1, Q3, Z1, X2, Z2, and M are defined as in Section A. 1 and therefore are not redefined in Table A2.2.
202
Linear Models Table A2.2 Section 7.3 Notation and Corresponding PROC IML Program Names Section 7.3 Notation
PROC IML Program Name
R,D
R,D SIGRM1 SIGRM2 SIGRM3 X S1, T1, $2 A1 . . . . . A5 A1SIGRM1 . . . . . A5SIGRM1 A1SIGRM2 . . . . . A5SIGRM2
RM[13 | J3]M'R' RM[13 | (13 - ~J3)]M 1 ,R , RMI9M~R ' X = RMX*
S1, T1,S2 A1 . . . . . A5 A1RM[I3 | J3]M'R'A1 . . . . . AsRM[I3 | Ja]M'R'A5 A1RM[I3 | (I3 1Ja)]M'R'A1 . . . . . A5RM[I3 | (I3 - 89 AII9A1 . . . . . A519A5 X/A1X . . . . . X'AsX -
-
s~, TT A~,A; A~RM[13 | Ja]M'R'A2, A;RM[I3 | Ja]M'R'A; 1 )]M'R'A~ A~RM[I3 | (I3 - ~J3 A~RM[I3 | (I3 - ~ ~J3)]M'R'A~ AEI9A2, AaI9A 3 * X'A~X X ' A2X, tr{A~RM[I3 | J3]M'R'} tr{A~RM[I3 | (13 - ~J3)]M 1 ,R , } tr{A~I9}
A 1SIGRM3 . . . . . A5 SIGRM3 XA1X . . . . . XA5X S2STAR, T1 STAR A2STAR, A3STAR A2STARM 1, A3STARM 1 A2STARM2 A3STARM2 A2STARM3, A3STARM3 XA2STARX, XA3STARX TRA3ST1 TRA3ST2 TRA3ST3
The SAS PROC IML program for Section 7.3 follows:
PROC IML;
XSTAR=J(3, 1, 1)@1(3); SIG1=I(3)@J(3, 3, 1); SIG2=I(3)@(I(3)-(1/3)#J(3, 3, 1)); Xl=J(9, 1, 1); Q3={ 1 1, -1 1, 0 -2}; ZI=Q3@J(3, 1, 1); X2=J(3, 1, 1)@Q3; Z2=Q3@Q3; M={O 1 0 001 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 00 0 0 1 0 0 0 0 1 0 } ;
O, O, O, O, O,
Appendix 2
203
R=BLOCK(1,J(2, 1, 1), J(2, 1, 1), 1, 1, J(2, 1, 1)); D=T(R)*R; X=R*M*XSTAR; SIGRM 1=R*M*SIG 1*T(M)*T(R); SIGRM2=R*M*SIG2*T(M)*T(R); SIGRM3=I(9); $1 =R*M*X1; TI=R*M*(XlIIZl); S2=R*M*(Xl IlZl I IX2); A1 =$1 *INV(T(S 1)*$1 )*T(S 1); A2=TI*INV(T(T1)*T1)*T(T1)-A1; A3=S2*INV(T(S2)*S2)*T(S2)-A1 - A2; A4=R*INV(D)*T(R) - A1 - A 2 -A3; A5=1(9)- R*INV(D)*T(R); A1SIGRM 1=A1 *SIGRM 1*A1; A 1SIG RM2=A 1*SIG RM2*A 1 ; A 1SIG RM3=A 1*SIG RM3*A 1 ; A2SIG RM 1=A2*SIG RM 1*A2; A2SIGRM2=A2*SIGRM2*A2; A2SIGRM3=A2*SIGRM3*A2; A3SIG RM 1=A3*SIG RM 1*A3; A3SIGRM2=A3*SIGRM2*A3; A3SIG RM3=A3*SIG RM3*A3; A4SIG RM 1=A4*SIG RM 1*A4; A4SIG RM2=A4*SIGRM2*A4; A4SIG RM3=A4*SIGRM3*A4; A5SIG aM 1=A5*SIG aM 1*A5; A5SIGRM2=A5*SIGRM2*A5; A5SIGRM3=A5*SIGRM3*A5; XAlX = T(X)*AI*X; XA2X = T(X)*A2*X; XA3X = T(X)*A3*X; XA4X = T(X)*A4*X; XA5X = T(X)*A5*X; S2STAR=R*M*(Xl IIX2); T1STAR=R*M*(Xl IIX211Z1); A2STAR=S2STAR*INV(T(S2STAR)*S2STAR)*T(S2STAR)-A1 ; A3STAR=T1 STAR* IN V(T(T1STAR)*T 1STAR)*T(T1STAR)-A2STAR-A1 ; A2STARM 1=A2STA R* SIGRM 1*A2STAR; A2STARM2=A2STAR*SIGRM2*A2STAR; A2STARM3=A2STAR*SIGRM3*A2STAR; A3STA RM 1=A3STAR*SIG RM 1*A3STAR;
204
Linear Models
A3STARM2=A3STAR*SIGRM2*A3STAR; A3STARM3=A3STAR*SIG RM3*A3STAR; XA2STARX = T(X)*A2STAR*X; XA3STARX = T(X)*A3STAR*X; TRA3ST1 =TRACE(A3STAR*SIGRM 1); TRA3ST2=TRACE(A3STAR*SIG RM2); TRA3ST3=TRACE(A3STAR*SIG RM3); PRINT A1 A2 A3 A4 A5 A1SIGRM1 A1SIGRM2 A1SIGRM3 A2SIGRM1 A2SIGRM2 A2SIGRM3 A3SIGRM1 A3SIGRM2 A3SIGRM3 A4SIGRM1 A4SIGRM2 A4SIGRM3 A5SIGRM1 A5SIGRM2 A5SIGRM3 XAlX XA2X XA3X XA4X XA5X XA3STARX TRA3ST1 TRA3ST2 TRA3ST3; QUIT; RUN; Since the program output is lengthy, it is omitted. In the following discussion, the computer output is translated into the results that appear in Section 7.3.
A1RM[I3 | J3]M'R'A1 = AISIGRMI = 3A1 AI RM
13 i~)
13 -- ~ J3
A1 - - m I SIGRM2 = 3 A I I 9 A 1 ---= A I S I G R M 3
so
AIEA1--
1
= A1
( 3cr~ + ~CrST 22 +C~2(8T)) A1
A2RM[I3 | J3]M'R'A2 = A2SIGRMI = 3A2
[ (
AERM 13|
13-g
3 M'R'A2=A2SIGRM2=~ 1,)1 A219A2 = A2SIGRM3 = A2
so
A2EA2-
2 2 )A2 3cr~ + ~crEr + crg(sr)
A3RM[I3 | J3]M'R'A3 = A3SIGRM1
[ (
A3RM 13 |
= 09•
1)1 M'R'A3 = A3SIGRM2 = ~A3 4
13 - ~J3
A319A3 = A3SIGRM3 = A3
Appendix 2
205
SO
A4RM[I3 | J3]M'R'A4 = A4SIGRM1
= 09•
4 A4RM [13 | (13 - ~J3) ] M'R'A4 - A4SIGRM2 - ~A4 A419A4 - A4SIGRM3 - A4 SO A4EA4 -
4 2 -+-CrR(BT 2 ))A4 -~rBT
AsRM[13 | J3]M'R'A5 - A5SIGRM1 - 09x9 AsRM [I3 | (13 - ~J3) ] M'R'A5 - A5SIGRM2 - 09x9 AsI9A5 - A5SIGRM3 = As so 2 ) As AsEA5 - oR(sr
X'A1X = XA1X - J3 so /3'X'A1X,O : ,0'J3/3- (131 -+- f12 -4- f13)2
(l)
X ' A 2 X - XA2X = 2 I3 -
5
J3
so
,,X,A2X,-,,2(
I3 - -
X'A3X = X A 3 X -
2
~J3 /3 - 5(/~12 + / ~ + / ~
- ~1/~2 - / ~ 1 / ~ 3 - / ~ 2 / ~ 3 ) 2
4( l)
~ 13 - ~J3
so
/3,XtA3X~_~,4~ ( 13 -
1 ) /3 - ~(/~12 4 ~J3 + f122 + / ~ -/~1r
X'A4X - XA4X - 03x 3
-/~1/~3 - r
2
206
Linear Models
so /3'X'A4X/~
-
0
X'AsX = XA5X
-- 03•
so
/TX'AsX/3 - 0
X'A~X - XA3STARX
-- 03•
SO !
!
,
/3 X AgX/3 = 0
tr[A~cr2RM(13 | J3)M'R'] + tr A3CrBTRM 13 | 9
13 - ~J3
2
+ tr{AaCrR(BT)I9} = ~2 TRA3ST1 + cr2r TRA3ST2 2 TRA3ST3 + aR(BT) =
4ff 2
2 ). + ~02T + 2tTR(BT
) M'R'}
A P P E N D I X 3: C O M P U T E R O U T P U T F O R C H A P T E R 8
Table A3. l provides a summary of the notation used in Section 8.3 of the text and the corresponding names used in the SAS PROC IML computer programs and outputs. Table A3.1 Section 8.3 Notation and Corresponding PROC IML Program Names Section 8.3 Notation
PROC IML Program Names
b,t,k
B,T,K N,M
N,M,
The SAS PROC IML programs and the outputs for Section 8.3 follow:
PROC IML; THIS PROGRAM USES THE DIMENSIONS B, T, AND K AND THE T x B INCIDENCE MATRIX N TO CREATE THE BK 9 BT PATTERN MATRIX M FOR A BALANCED INCOMPLETE BLOCK DESIGN. THE PROGRAM IS RUN FOR THE EXAMPLE IN FIGURE 7.2.1 WITH B=T=3, K=R=2, LAMBDA=I. THE PROGRAM CAN BE GENERALIZED TO PRODUCE THE BK x BT PATTERN MATRIX M FOR ANY BALANCED INCOMPLETE DESIGN BY SIMPLY INPUTING THE APPROPRIATE DIMENSIONS B, T, K AND THE APPROPRIATE INCIDENCE MATRIX N. B=3; T=3; K=2; N={ 0 1 1, 1 0 1, 1 1 0 }; BK=B#K; BT=B#T; M=J(BK, BT, 0); ROW=I ; COL=0; DO I=1 TO B; DO J=l TO T; COL=COL+ 1 ; IF N(IJ, II)=1 THEN M(IROW, COLI)=I; IF N(IJ, II)=1 THEN ROW=ROW+l; END; END; PRINT M;
207
208
M 0 0 0 0 0 0
Linear Models
1 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
0 0 0 0 0 0
QUIT; RUN; PROC IML; THIS PROGRAM USES THE DIMENSIONS B, T, AND K AND THE BK x BT PATTERN MATRIX M TO CREATE THE T 9 B INCIDENCE MATRIX N FOR A BALANCED INCOMPLETE BLOCK DESIGN. THE PROGRAM IS RUN FOR THE EXAMPLE IN FIGURE 7.2.1 WITH B=T=3, K=R=2, LAMBDA=I . THE PROGRAM CAN BE GENERALIZED TO PRODUCE THE T x B INCIDENCE MATRIX N FOR ANY BALANCED INCOMPLETE BLOCK DESIGN BY SIMPLY INPUTING THE APPROPRIATE DIMENSIONS B, T, K AND THE APPROPRIATE PATTERN MATRIX M. B=3; T=3; K=2; M= { 0 1 0 0 0 0 0 0 0, 0 0 1 0 0 0 0 0 0, 0 0 0 1 0 0 0 0 0, 0 0 0 0 0 1 0 0 0, 0 0 0 0 0 0 1 0 0, 0 0 0 0 0 0 0 1 0 BK=B# K; N=J(T, B, 0); ROW=I ; COL=0; DO I=1 TO B; DO J=l TO T; COL=COL+ 1 ; IF M(IROW, COLI)=I THEN N(IJ, II)=1; IF M(IROW, COLI)=I & ROW < BK THEN ROW=ROW+l; END; END; PRINT N; N 0 1 1 1 0 1 1 1 0 QUIT; RUN;
A P P E N D I X 4: C O M P U T E R O U T P U T F O R C H A P T E R 9
Table A4.1 provides a summary of the notation used in Section 9.4 of the text and the corresponding names used in the SAS PROC GLM computer program and output.
Table A4.1 Section 9.4 Notation and Corresponding PROC GLM Program Names Section 9.4 Notation
Proc GML Program Names
Levels of factor A Replication number Observed Y
A R Y
The SAS PROC GLM program and the output for Section 9.4 follow:
DATA A; INPUT A R Y; CARDS; 1 1 101 1 2 105 1 3 94 2 1 84 2 2 88 3 1 32 PROC GLM DATA=A; CLASSES A; MODEL Y=A/P SOLUTION; PROC GLM DATA=A; CLASSES A; MODEL Y=A/P NOINT SOLUTION; QUIT; RUN;
209
210
Linear Models
General Linear Models Sums of Squares and Estimates for the Less Than Full Rank Model Dependent Variable: Y Source Model Error Cor. Total Parameter
DF
Sum of Squares
Mean Square
2 3 5
3480.00 70.00 3550.00
1740.00 23.33
Estimate
INTERCEPT A
32.00 68.00 54.00 0.00
B B B B
F Value
Pr>F
74.57
0.0028
TforH0: Parameter=0
Pr>ITI
StdErrorof Estimate
6.62 12.19 9.13 .
0.0070 0.0012 0.0028 .
4.83045892 5.57773351 5.91607978 .
NOTE" The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters. General Linear Models Estimates for the Full Rank Mean Model Parameter A
1 2 3
TforH0: Parameter=0
Pr>ITI
Estimate
StdErrorof Estimate
100.00 86.00 32.00
35.86 25.18 6.62
0.0001 0.0001 0.0070
2.78886676 3.41565026 4.83045892
In the following discussion, the preceding computer output is translated into the results that appear in Section 9.4. The less than full rank parameter estimates in the output are titled INTERCEPT, A 1, 2, 3. These four estimates are SAS's solution for/~0 = (32, 68, 54, 0)'. The mean model estimates in the output are titled A 1, 2, 3. These three estimated equal/2 - (100, 86, 32)'. Note that the ANOVA table given in the SAS output is generated by the PROC GLM statement for the less than full rank model. However, the sums of squares presented are equivalent for the mean model and the less than full rank model. Therefore, the sum of squares labeled MODEL in the output is the sum of squares due to the treatment combinations for the data in Figure 9.4.1. Likewise, the sums of squares labeled error and cor. total in the output are the sums of squares due
Appendix 4
211
to the residual and the sum of squares total minus the sum of squares due to the overall mean for the data in Figure 9.4.1, respectively. As a final comment, for a problem with more than one fixed factor, the classes statement should list the names of all fixed factors and the model statement should list the highest order fixed factor interaction between the ' = ' sign and the ' / ' . For example, if there are three fixed factors A, B, and C, then the CLASSES statement is
CLASSES A B C; and the model statement is
MODEL Y=A*B*C/P SOLUTION; (for the less than full rank model) MODEL Y=A*B*C/P NOINT SOLUTION; (for the mean model).
APPENDIX 5: COMPUTER OUTPUT FOR CHAPTER 10
A5.1
C O M P U T E R O U T P U T F O R S E C T I O N 10.2
Table A5.1 provides a summary of the notation used in Section 10.2 of the text and the corresponding names used in the SAS PROC IML computer program and output. Table A5.1 Section 10.2 Notation and Corresponding PROC IML Program Names Section 10.2 Notation
PROC IML Program Names
I3 | J2,13 ~ (I2 - 1j2), Xl, Q2, Q3 x2, z1, z2, M, R, D, C RM(13 | J2)M'R', RM[I3 | (I2 - Ij2)]M'R', I8 TI, A1, A2, A3, A4, A5 tr[A1RM(I3 | J2)M'R'] . . . . . tr[A5RM(I3 | JE)M'R'] tr{A1RM[I3 | (I2 - 89 . . . . . tr[As[I3 | (I2 - 1J2)]} tr(AlI8) . . . . . tr(AsI8) C'R~AIRC . . . . . C'R'AsRC
SIG 1, SIG2, X 1, Q2, Q3 X2, z1, z2, M, R, D, C SIGRM1, SIGRM2, SIGRM3 T1, A1, A2, A3, A4, A5 TRA 1SIG 1. . . . . TRA5 SIG 1 TRA 1SIG2 . . . . . TRA5 SIG2 TRA1SIG3 . . . . . TRA5SIG3 RCA1RC . . . . . RCA5RC
The SAS PROC IML program and output for Section 10.2 follow:
PROC IML; SIG1=I(3)@J(2, 2, 1); SIG2=I(3)@(I(2)-(1/2)#J(2, 2, 1)); Xl=J(6, 1, 1); Q2={1, -1}; Q3={ 1 1, -1 1, o
-2};
X2=J(3, 1, 1) @Q2; ZI=Q3@J(2, 1, 1); Z2=Q3@Q2; M={1 0 0 0 0 0 , 010000, 001000, 000010, 000001}; R=BLOCK (1, 1, J(2, 1, 1), 1, J(3, 1, 1)); D=T(R)*R; 213
214
Linear Models
C={1 0, 0 1, 1 0, 1 0, o 1}; SIGRM 1=R*M*SIG 1*T(M)*T(R); SIGRM2=R*M*SIG2*T(M)*T(R); SIGRM3=I(8); SI=R*M*Xl; S2=R*M*(Xl IIX2); TI=R*M*(Xl IIX211Zl); A1 =$1 * INV(T(S 1)*$1 )*T(S 1); A2=S2*INV(T(S2)*S2)*T(S2) - A1; A3=TI*INV(T(T1)*T1)*T(T1) - A1 - A2; A4=R*INV(D)*T(R) - A1 - A 2 -A3;
A5=1(8)- R*INV(D)*T(R); TRA 1SIG 1=TRAC E(A 1*SIG RM1); TRA 1S IG2=TRAC E(A 1*SIG RM2); TRA 1S IG3=TRAC E(A 1*SIG RM3); TRA2SIG 1=TRACE(A2*SIG RM1); TRA2SIG2=TRACE(A2*SIG RM2); TRA2SIG3=TRACE(A2*SIG RM3); TRA3SIG 1=TRAC E(A3*SIG RM1); TRA3SIG2=TRACE(A3*SIG RM2); TRA3SIG3=TRACE(A3*SIG RM3); TRA4S IG 1=TRACE(A4*SIG RM1); TRA4SIG2=TRAC E(A4* S IG RM2); TRA4SIG3=TRACE(A4*SIG RM3); TRA5SIG 1=TRACE(A5*SIG RM 1); TRA5SIG2=TRAC E(A5* S IG RM2); TRA5SIG3=TRACE(A5*SIG RM3); RCA1RC = T(C)*T(R)*A1 *R'C; RCA2RC = T(C)*T(R)*A2*R*C; RCA3RC = T(C)*T(R)*A3*R*C; RCA4RC = T(C)*T(R)*A4*R*C; RCA5RC = T(C)*T(R)*A5*R*C; PRINT TRA3SIG1 TRA3SIG2 TRA3SIG3 TRA4SIG1 TRA4SIG2 TRA4SIG3 TRA5SIG1 TRA5SIG2 TRA5SIG3 RCA1RC RCA2RC RCA3RC RCA4RC RCA5RC;
Appendix 5
215
TRA3SIG1 TRA3SIG2 TRA3SIG3 TRA4SIG1 TRA4SIG2 TRA4SIG3 4 0.8 2 0 1.2 1 TRA5SIG1 TRA5SIG2 TRA5SIG3 0 0 3 RCA1RC 2 2 2 2
RCA2RC 2 -2 -2 2
RCA3RC 0 0 0 0
RCA4RC 0 0 0 0
RCA5RC 0 0 0 0
QUIT; RUN;
In the following discussion, the computer output is translated into the results that appear in Section 10.2. E[Y'A3Y] = ~r~tr[A3RM(I3 | J2)M'R'] 1
+ tr2rtr{A3RM [13 | (12 - ~J2) ] R'M' } + cr2(Br)tr(A318) -- crETRAaSIG1 + trErTRAaSIG2 + cr2(sr)TRAaSIG3 = 4o"2 + 0.8trZr + 2crZ(Br) E[Y'A4Y] = trZtr[A4RM(I3 | Jz)M'R'] + trzTtr{A4RM [13 | (12 - ~J2) ] R'M'} + crZ(Br)tr(A418) = crZTRA4SIG1 + aZTTRA4SIG2 + crZ(BT)TRA4SIG3 2 = 1.2trzr + aR(Br)
E[Y'AsY] -- crZtr[AsRM(13 | Jz)M'R'] + crZrtr{AsRM[13 | ( 1 2 - ~ J 2 ) ] R ' M ' } + ~rz(sr)tr(Asl8) = cr2TRA5SIG 1 + a2rTRA5SIG2 + tr2(Br)TRA5SIG3 = 3a2(Rr) Note that C'R'A3RC = RCA3RC = 02x2, C'R'A4RC = RCA4RC = 02• and C'R'AsRC = RCA5RC = 02• which implies #'C'R'A3RC# = #'C'R'AaRC# = #'C'R'AsRC# = 0. Therefore, the E[Y'A3Y], E[Y'AaY], and E[Y'AsY] do not involve #.
216
A5.2
Linear Models
C O M P U T E R O U T P U T F O R S E C T I O N 10.4
Table A5.2 provides a summary of the notation used in Section 10.4 of the text and the corresponding names used in the SAS PROC VARCOMP computer program and output. Table A5.2 Section 10.4 Notation and Corresponding PROC VARCOMP Program Names Section 10.4 Notation
PROC VARCOMP Program/Output Names
B,T,R,Y,R(BT) , ~'~r, , t T R2( B T ) * , , ( r )
B, T, R, Y, Error var(B ), var(T 9 B ), var(Error), Q (T)
~
The SAS program and output for Section 10.4 follow:
DATA A; INPUT B T R Y; CARDS; 1 1 1 237 1 2 1 178 2 1 1 249 2 1 2 268 31 1 186 3 2 1 183 3 2 2 182 323165 PROC VARCOMP METHOD=TYPE1; CLASS T B; MODEL Y=TIB / FIXED=l; RUN; PROC VARCOMP METHOD=REML; CLASS T B; MODEL Y=TIB / FIXED=l; QUIT; RUN; Variance Components Estimation Procedure Class Level Information Class Levels Values T 2 12 B 3 123 Number of observations in data set = 8 Dependent Variable : Y
Appendix 5
217
TYPE 1 SS Variance Component Estimation Procedure Source T B T*B Error Corrected Total Source T B T*B Error
DF 1 2 1 3 7
Type I SS 6728.0000 2770.8000 740.0333 385.1667 10624.0000
Type I MS 6728.0000 1385.4000 740.0333 128.3889
Expected Mean Square Var(Error) + 2 Var(T*B) + Var(B) + Q(T) Var(Error) + 1.4 Var(T*B) + 2 Var (B) Var(Error) + 1.2 Var(T*B) Var(Error)
Variance Component Var(B) Var(T*B) Var(Error)
Estimate 271.7129 509.7037 128.3889
RESTRICTED MAXIMUM LIKELIHOOD Variance Components Estimation Procedure Iteration Objective Var(B) Var(T*B) Var(Error) 0 35.946297 30.899 616.604 158.454 1 35.875323 51.780 663.951 144.162 2 35.852268 66.005 693.601 136.830 3 35.845827 81.878 701.484 133.406 4 35.843705 86.686 711.185 131.422 5 35.843069 87.877 718.188 130.334 6 35.842890 88.250 722.237 129.757 7 35.842842 88.421 724.385 129.458 8 35.842830 88.509 725.494 129.305 9 35.842827 88.553 726.060 129.227 10 35.842826 88.576 726.348 129.187 11 35.842825 88.587 726.493 129.167 12 35.842825 88.593 726.567 129.157 13 35.842825 88.596 726.605 129.152 14 35.842825 88.598 726.624 129.149 Convergence criteria met In the following discussion, the computer output is translated into the results that appear in Section 10.4.
218
Linear Models
Y'A2Y = Type I SS due to T = 6728.0 Y'A3Y = Type I SS due to B = 2770.7 Y'AnY = Type I SS due to T*B = 740.03 Y'AsY = Type I SS due to Error = 385.17
E[Y'A3Y/2] = 2Var(B) + 1.4 Var(T*B) + Var(Error) 2 = 2a2, + 1.4azr, + aR(BT),
E[Y'AaY/1] - 1.2 Var(T*B) + Var(Error) 2 = 1.2o-2~. + crR(BT).
E[Y'AsY/3] -
A5.3
Var(Error) -
aR(BT) ,2
C O M P U T E R O U T P U T F O R S E C T I O N 10.6
Table A5.3.1 provides a summary of the notation used in Section 10.6 of the text and the corresponding names used in the SAS PROC IML computer program and output.
Table A5.3 Section 10.6 Notation and Corresponding PROC IML Program Names Section 10.6 Notation
PROC IML Program Names
13 | J2,13 | (I2 - 132), XI, Q2, Q3 X2, Z1, Z2, M, R, D, C RM(I3 | J2)M'R', RM[I3 | (I2 - 89
SIG 1, SIG2, X 1, Q2, Q3 x2, z1, z2, M, R, D, C SIGRM1, SIGRM2, SIGRM3 COVHAT, MUHAT, COVMUHAT T, TMU, VARTMU
18
~:, ~w, co~'~(#w) t, t'~, v~ar(t', ~w)
t'/~w + Z0.0e5V/~ar(t'/2w) Sl, S2, TI, AI, A2, A3, A4, A5 W, f, Y'AzY/W
LOWERLIM, UPPERLIM S1, $2, T1, A1, A2, A3, A4, A5 W, F, TEST
The SAS PROC IML program and output for Section 10.6 follow:
PROC IML; Y={237, 178, 249, 268, 186, 183, 182, 165}; SIG1=I(3)@J(2, 2, 1); SIG2=I(3)@(I(2)-(1/2)#J(2, 2, 1)); Xl=J(6, 1, 1);
Appendix 5
Q2={1, -1}; Q3={ 1 1, -1 1, o -2}; X2=J(3, 1, 1) @Q2; ZI=Q3@J(2, 1, 1); Z2=Q3@Q2; M={1 0 0 0 0 0, 0 1 0 0 0 0, 0 0 1 0 0 0, 0 0 0 0 1 0, 0 0 0 0 0 1}; R=BLOCK(1, 1, J(2, 1, 1), 1, J(3, 1, 1)); D=T(R)*R; C={1 0, 0 1, 1 0, 1 0, 0 1}; SIG RM 1=R*M*SIG 1*T(M)*T(R); SIG RM2=R*M*SIG2*T(M)*T(R); SIGRM3=I(8); COVHAT=526.56#SIGRM1 + 509.70#SIGRM2 + 128.39#SIGRM3; M U HAT=I NV(T(C)*T(R)* INV(COVHAT)* R*C)*T(C)*T(R) *INV(COVHAT)*Y; COVM U HAT=I NV(T(C)*T(R)*I NV(COVHAT)* R'C); T={1, -1)}; TMU=T(T)*MUHAT; VARTMU=T(T)*COVMUHAT*T; LOWERLI M=TMU- 1.96#(VARTM U**.5); UPPERLIM=TMU+ 1.96#(VARTM U**.5); SI=R*M*Xl; S2=R*M*(X111X2); T1 =R*M*(XI IIX211Zl ); A1 =$1 *1NV(T(S 1)*$1 )*T(S 1); A2=S2*INV(T(S2)*S2)*T(S2) - A1; A3=TI*INV(T(T1)*T1)*T(T1)- A1 - A2; A4=R*INV(D)*T(R)- A1 -A2 -A3; A5=1(8)- R*INV(D)*T(R); W=.25#T(Y)*A3*Y + (1.3/1.2)#T(Y)*A4*Y - (0.7/3.6)#T(Y)*A5*Y; F=W**2/((((.25#T(Y)*A3*Y)**2)/2) + (((1.3/1.2)#T(Y)*A4*Y)**2) + (((0.7#T(Y)*A5*Y/3.6)*'2)/3));
219
220
Linear Models
TEST=T(Y)*A2*Y/W; PRINT MUHATCOVMUHATTMU VARTMU LOWERLIM UPPERLIM W F TEST; MUHAT 227.94 182.62153 LOWERLIM -0.111990 QUIT; RUN;
COVMUHAT 295.7818 88.33466 88.33466 418.14449 UPPERLIM 90.748923
W 1419.5086
TMU 45.318467
F 2.2780974
VARTMU 537.25697
TEST 4.739663
References and Related Literature
Anderson, R. L., and Bancroft, T. A. (1952), Statistical Theory in Research, McGraw-Hill Book Co., New York. Anderson, T. W. (1958), An Introduction to Multivariate Statistical Analysis, John Wiley & Sons, New York. Anderson, V. L., and McLean, R. A. (1974), Design of Experiments, A Realistic Approach, Marcel Dekker, New York. Arnold, S. F. (1981), The Theory of Linear Models and Multivariate Analysis, John Wiley & Sons, New York. Bhat, B. R. (1962), "On the distribution of certain quadratic forms in normal variates," Journal of the Royal Statistical Society, Ser. B 24, 148-151. Box, G. E. P., Hunter, W. G., and Hunter, J. S. (1978), Statistics for Experimenters, John Wiley & Sons, New York. Chew, V. (1970), "Covariance matrix estimation in linear models," Journal of the American Statistical Society 65, 173-181. Cochran, W. D., and Cox, G. M. (1957), Experimental Designs. John Wiley & Sons, New York. Davenport, J. M. (1971), "A comparison of some approximate F-tests," Technometrics 15, 779-790. Draper, N. R., and Smith, H. (1966), Applied Regression Analysis, John Wiley & Sons, New York. Federer, W. T. (1955), Experimental Design, Macmillan, New York. Graybill, E A. (1954), "On quadratic estimates of variance components," Annals of Mathematical Statistics, 25, No. 2, 367-372.
221
222
References and Related Literature
Graybill, E A. (1961), An Introduction to Linear Statistical Models, Volume 1, McGraw-Hill Book Co., New York. Graybill, E A. (1969), An Introduction to Matrices with Applications in Statistics, Wadsworth Publishing Co., Belmont, CA. Graybill, E A. (1976), Theory and Application of the Linear Model, Wadsworth & Brooks/Cole, Pacific Grove, CA. Guttman, I. (1982), The Analysis of Linear Models, John Wiley & Sons, New York. Harville, D. A. (1977), "Maximum likelihood approaches to variance component estimation and to related problems," Journal of the American Statistical Association 72, No. 358, 320-338. Hocking, R. R. (1985), The Analysis of Linear Models, Brooks/Cole, Publishing Co., Belmont, CA. Hogg, R. V., and Craig, A. T. (1958), "On the decomposition of certain )~2 variables," Annals (?f Mathematical Statistics 29, 608-610. John, P. W. M. (1971), Statistical Design and Analysis of Experiments, Macmillan, New York. Kempthome, O. (1952, 1967), The Design and Analysis of Experiments, John Wiley & Sons, New York. Kshirsagar, A. M. (1983), A Course in Linear Models, Marcel Dekker, New York. Milliken, G. A., and Johnson, D. E. (1992), Analysis of Messy Data, Volme 1: Designed Experiments, Von Nostrand Reinhold, New York. Morrison, D. (1967), Multivariate Statistical Methods, McGraw-Hill Book Co., New York. Moser, B. K. (1987), "Generalized F variates in the general linear model," Communications in Statistics: Theory and methods 16, 1867-1884. Moser, B. K., and McCann, M. H. (1996), "Maximum likelihood and restricted maximum likelihood estimators as functions of ordinary least squares and analysis of variance estimators," Communications in Statistics: Theory and Methods, 25, No. 3. Moser, B. K., and Lin, Y. (1992), "Equivalence of the corrected F test and the weighted least squares procedure," The American Statistician 46, No. 2, 122-124. Moser, B. K., and Marco, V. R. (1988), "Bayesian outlier testing using the predictive distribution for a linear model of constant intraclass form," Communications in Statistics: Theory and Methods 17, 849-860. Moser, B. K., Stevens, G. R., and Watts, C. L. (1989), "The two sample T test versus Satterthwaite's approximate F test," Communications in Statistics: Theory and Methods, 18, 3963-3976. Myers, R. H., and Milton, J. S. (1991), A First Course in the Theory of Linear Statistical Models, PWS-Kent Publishing Co., Boston. Nel, D. G., van der Merwe, C. A., and Moser, B. K. (1990), "The exact distributions of the univariate and multivariate Behrens-Fisher statistic with a comparison of several solutions in the univariate case," Communications in Statistics: Theory and Methods 19, 279-298. Neter, J. Wasserman, W., and Kutner, M. H. (1983), Applied Linear Regression Models, Richard D. Irwin, Homewood, IL. Patterson, H. D., and Thompson, R. (1971), "Recovery of interblock information when block sizes are unequal," Biometrika 58, 545-554. Patterson, H. D., and Thompson, R. (1974), "Maximum likelihood estimation of components of variance," Proceedings of the 8 th International Biometric Conference, pp. 197-207. Pavur, R. J., and Lewis, T. O. (1983), "Unbiased F tests for factorial experiments for correlated data," Communications in Statistics: Theory and Methods 13, 3155-3172. Puntanen, S., and Styan, G. P. (1989), "The equality of the ordinary least squares estimator and the best linear unbiased estimator," The American Statistician 43, No. 3, 153-161. Rao, C. R. (1965), Linear Statistical Inference and Its Applications, John Wiley & Sons, New York. Russell, T. S., and Bradley, R. A. (1958), "One-way variances in a two-way classification," Biometrika, 45, 111-129. Satterthwaite, F. E. (1946), "An approximate distribution of estimates of variance components," Biometrics Bulletin 2, 110-114.
References and ~telated Literature
223
Scariano, S. M., and Davenport, J. M. (1984), "Corrected F-tests in the general linear model," Communications in Statistics: Theory and Methods 13, 3155-3172. Scariano, S. M., Neill, J. W., and Davenport, J. M. (1984), "Testing regression function adequacy with correlation and without replication," Communications in Statistics." Theory and Methods, 13, 1227-1237. Scheff6, H. (1959), The Analysis of Variance, John Wiley & Sons, New York. Searle, S. R. (1971), Linear Models, John Wiley & Sons, New York. Searle, S. R. (1982), Matrix Algebra Useful for Statistics, John Wiley & Sons, New York. Searle, S. R. (1987), Linear Models for Unbalanced Data, John Wiley & Sons, New York. Seber, G. A. F. (1977), Linear Regression Analysis, John Wiley & Sons, New York. Smith, J. H., and Lewis, T. O. (1980), "Determining the effects of intraclass correlation on factorial experiments," Communications in Statistics: Theory and Methods: 9, 1353-1364. Smith, J. H., and Lewis, T. O. (1982), "Effects of intraclass correlation on covariance analysis," Communications in Statistcs: Theory and Methods, 11, 71-80. Snedecor, W. G., and Cochran, W. G. (1978), Statistical Methods, The Iowa State University Press, Ames, IA. Steel, R. G., and Torrie, J. H. (1980), Principles and Procedures of Statistics, McGraw-Hill Book Co., New York. Thompson, W. A. (1962), "The problem of negative estimates of variance components," Annals of Mathematical Statistics 33, 273-289. Weeks, D. L., and Trail, S. M. (1973), "Extended complete block designs generated by BIBD," Biometrics 29, No. 3,565-578. Weeks, D. L., and Urquhart, N. S. (1978), Linear models in messy data: Some problems and alternatives," Biometrics, 34, No. 4, 696-705. Weeks, D. L., and Williams, D. R. (1964), "A note on the determination of correctness in an N-way cross classification," Technometrics, 6, 319-324. Zyskind, G. (1967), "On canonical forms, non-negative covariance matrices and best and simple least squares linear estimators in linear models," Annals of Mathematical Statistics 38, 1092-1109. Zyskind, G., and Martin, E B. (1969), "On best linear estimation and a general Gauss-Markov theorem in linear models with arbitrary non-negative covariance structure," SLAM, Journal of Applied Mathematics 17, 1190-1202.
Subject Index
ANOVA, 10, 13-15, 34-36, 49-50, 57, 77-80, 87-88, 90, 103, 114, 127, 134, 137, 152-153, 166, 172-174, 210 Assumptions, 54-58, 91, 161-162
Confidence bands, 126-127, 168, 173-175, 185-186, 188 Chi-square distribution, 19-20, 33, 35, 38, 41-50 Completeness, 108, 110
Determinent of matrix, s e e Matrix Balanced incomplete block design, 138, 149-159 BIB product, s e e Matrix BLUE, 86-87, 89, 100-101,118, 155, 158, 166, 170-171,185-186
Complete, balanced factorial, 11, 13, 15, 29, 53-77, 91, 97-100
Eigenvalues and vectors, 6-11, 13, 21, 36, 41-42, 44, 115, 156 Estimable function, 168-171 Expectation conditional, 29-32, 39 of a quadratic form, 17 of a vector, 17
225
226
Expected mean square, 57, 64-66, 68-70, 72-73, 76-77, 88, 103, 153, 166, 184-185
Linear M o d e l s
Kronecker product, 12-16, 52, 59, 61-63, 98, 131
!
F distribution, 47-49, 77-80, 142, 147 Finite model, s e e Model
Gauss-Markov, 86-87, 89, 118, 155, 170 General linear model, 96-97, s e e a l s o Model ANOVA for, 87 interval estimation, 126-127 point estimation, 86-87, 105-108 replication, 131-137, 144-147 tests for, 119-126
Lack of fit, s e e Least squares regression Lagrange multipliers, 121 Least squares regression lack of fit, 91-94, 102, 133-134 partitioned SS, 87, 91, 94-97, 135 pure error, 91-94, 102, 133-134, 147, 166 ordinary, 85-86, 94, 107, 111, 114, 132-133, 161 weighted, 89-91, 173 Likelihood function, 105, 115, 119-120, 182 Likelihood ratio test, 119-126
m
Helmert, s e e Matrix Hypothesis tests for regression model, 91, 93-94, 101, 125 for factorial experiments, 48, 78, 80, 102, 125-126, 142, 147, 153, 160, 174
Idempotent matrix, s e e Matrix Identity, s e e Matrix Incidence, s e e Matrix Incomplete block design, s e e Balanced incomplete block design Independence of random vectors, 26 of quadratic forms, 45 of random vector and quadratic form, 46 Infinite model, s e e Model Interval estimation, s e e Confidence bands Invariance property, 108, 113-114 Inverse of a matrix, s e e Matrix
Marginal distribution, 25, 30, 39 Matrix BIB product, 15-16, 156-157 covariance, s e e a l s o Random vector, 1l, 17, 24, 53, 58-77, 111, 118, 123, 135-136, 138, 148, 153-155 determinant, of 3, 6-7 diagonal, 2, 7, 9, 16, 21, 132 Helmert, 5, 10-11, 25, 33, 35, 98, 111, 125, 136-137, 178, 182 idempotent, 9-11, 13, 21, 35-36, 41-50, 67, 92, 96, 100, 106, 114, 140, 143, 151,155 identity, 2, 9 incidence, 156 inverse, 2, 12, 21 multiplicity, 7-8, 10, 13 of ones, 2 orthogonal, 4, 6-7, 9, 11, 44, 51 partitioning, 8, 25, 94, 124, 135 pattern, 138-147, 150-152, 158, 163-164, 181
Subject Index positive definite, 8-9 possible semidefinite, 8, 10 rank of, 3, 9-11, 13, 16 replication, 131-137, 144-147, 161-165, 178, 180-181 singularity, 2, 12, 21 symmetric, 3-4, 7, 9-11, 13, 17 trace of, 2, 7, 10 transpose of, 1 Maximum likelihood estimator, 105-109, 111-119, 182 Minimum variance unbiased estimate, s e e UMVUE Mean model, s e e Model Mixed model, s e e Model Model finite, 53-56, 58-60, 64, 178, 183-184, 186 general linear, 97-100, 105 infinite, 56-58, 60, 64, 178, 183-184 less than full rank, 161-164 mean, 28, 164-174, 180 mixed, 177-187 Moment generating function, 18-20, 24, 26, 42 Multivariate normal distribution conditional, 29-32 independence, 26-28 of Y, 24 of transformation BY + b, 24 marginal distribution, 25 MGF of, 24
One-way classification, 13, 27, 47-48, 107-108, 125, 134 Overparameterization, 163-165, 168
Pattern, s e e Matrix Point estimation, s e e a l s o BLUE and UMVUE for balanced incomplete block design, 138-143, 152-159
227 for general linear model, 105-107 least squares, 84-85 variance parameters, 85, 106, 142-143, 152-153, 179-182 Probability distribution, s e e a l s o Random vector chi-square, 19-20, 33-35, 41-50 conditional, 29-32 multivariate normal, 23-36 noncentral corrected F, 48 noncentral F, 47 noncentral t, 47 PROC GLM, 171,209 PROC IML, 85, 140-141,145, 180, 186, 189, 193, 201-202, 207, 213,218 PROC VARCOMP, 183-184, 216 Pure error, s e e Least squares regression
Quadratic forms, 3, 9, 11, 13, 17, 32-36, 41-50, 63, 85, 143, 146, 161,166 independence of, 45-46
Random vector covariance matrix of, 17, 21, 28-29, 58-74, 96, 111,135, 138, 140, 145, 150-151,155, 181,185-186 expectation of, 17 linear transformation, of 17, 24 MGF of, 18-20, 24-26, 42 probability distribution, 17-18 Rank of matrix, s e e Matrix Regression, s e e Least squares regression Rejection region, 49, 94, 124, 128, 142, 147, 153, 168, 174, 185, 187 Reparameterization, 60, 66, 183 Replication, s e e also Matrix, 47-48, 53-58, 60-61, 63, 66-67, 70, 77, 91, 97-98, 107, 126, 150 REML, 182-184, 188
228
Satterthwaite test, 185-188 Sufficiency, 108-111 Sum of squares, 4, 14, 20, 33-35, 47, 49, 60-69, 71, 74, 77-80, 87, 91-96, 98, 100-103, 107, 112, 124, 131, 133-135, 140-143, 145, 147-148, 166, 174 type I, 94-96, 135, 137, 140-143, 145, 147, 152-153, 159, 179-182, 186
Linear Models for variance parameter, 49 likelihood ratio, 124 Satterthwaite, 185 Trace of matrix, s e e Matrix Transformation, 17, 24 Transpose, s e e Matrix Two-way cross classification, 28, 34, 45, 49-50, 53, 66, 69, 135 Type I SS, s e e Sum of squares u
UMVUE, 110-111, 128 t distribution, 47-48, 126, 168 Tests F test, 49, 57, 94, 124, 142, 153, 168, 174, 185 for lack of fit, 91-94 for noncentrality parameters, 48 for regression, 124
Variance components, 135-136, 142-143, 179 Vector, s e e Random vector