Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page i
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page ii
Advanced Quantitative Techniques in the Social Sciences VO LU M E S I N T H E S E R I E S 1. HIERARCHICAL LINEAR MODELS: Applications and Data Analysis Methods, 2nd Edition Stephen W. Raudenbush and Antony S. Bryk 2. MULTIVARIATE ANALYSIS OF CATEGORICAL DATA: Theory John P. van de Geer 3. MULTIVARIATE ANALYSIS OF CATEGORICAL DATA: Applications John P. van de Geer 4. STATISTICAL MODELS FOR ORDINAL VARIABLES Clifford C. Clogg and Edward S. Shihadeh 5. FACET THEORY: Form and Content Ingwer Borg and Samuel Shye 6. LATENT CLASS AND DISCRETE LATENT TRAIT MODELS: Similarities and Differences Ton Heinen 7. REGRESSION MODELS FOR CATEGORICAL AND LIMITED DEPENDENT VARIABLES J. Scott Long 8. LOG-LINEAR MODELS FOR EVENT HISTORIES Jeroen K. Vermunt 9. MULTIVARIATE TAXOMETRIC PROCEDURES: Distinguishing Types From Continua Niels G. Waller and Paul E. Meehl 10. STRUCTURAL EQUATION MODELING: Foundations and Extensions, 2nd Edition David Kaplan
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page iii
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page iv
Copyright © 2009 by SAGE Publications, Inc. All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. Portions of Chapter 7 will also appear in the forthcoming SAGE Handbook of Quantitative Methods in Psychology edited by Roger E. Millsap and Albert Maydeu-Olivares. Portions of Chapter 9 were first published in the following articles by David Kaplan and are reprinted here with permission: Finite mixture dynamic regression modeling of panel data with implications for dynamic response analysis, Journal of Educational and Behavioral Statistics; An overview of Markov chain methods for the study of stage-sequential developmental processes, Developmental Psychology (Copyright © 2008 by the American Psychological Association); Methodological advances in the analysis of individual growth with relevance to educational policy, Peabody Journal of Education. For information: SAGE Publications, Inc. 2455 Teller Road Thousand Oaks, California 91320 E-mail:
[email protected]
SAGE Publications India Pvt. Ltd. B 1/I 1 Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 India
SAGE Publications Ltd. 1 Oliver’s Yard 55 City Road London EC1Y 1SP United Kingdom
SAGE Publications Asia-Pacific Pte. Ltd. 33 Pekin Street #02-01 Far East Square Singapore 048763
Printed in the United States of America Library of Congress Cataloging-in-Publication Data Kaplan, David, 1955Structural equation modeling: foundations and extensions/David Kaplan.—2nd ed. p. cm.—(Advanced quantitative techniques in the social sciences; 10) Includes bibliographical references and index. ISBN 978-1-4129-1624-0 (cloth) 1. Social sciences—Mathematical models. 2. Social sciences—Statistical methods. I. Title. H61.25.K365 2009 300.1′5118—dc22
2008017670
Printed on acid-free paper 08 09 10 11 12 10 9 8 7 6 5 4 3 2 1 Acquiring Editor: Associate Editor: Editorial Assistant: Production Editor: Copy Editor: Typesetter: Proofreader: Indexer: Marketing Manager:
Vicki Knight Sean Connelly Lauren Habib Cassandra Margaret Seibel QuADS Prepress (P) Ltd. C&M Digitals (P) Ltd. Wendy Jo Dymond Jean Casalegno Stephanie Adams
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page v
Contents Series Editors’ Introduction to Structural Equation Modeling Jan de Leeuw and Richard Berk
vi
Preface to the Second Edition
ix
1. Historical Foundations of Structural Equation Modeling for Continuous and Categorical Latent Variables
1
2. Path Analysis: Modeling Systems of Structural Equations Among Observed Variables
13
3. Factor Analysis
39
4. Structural Equation Models in Single and Multiple Groups
61
5. Statistical Assumptions Underlying Structural Equation Modeling
85
6. Evaluating and Modifying Structural Equation Models
109
7. Multilevel Structural Equation Modeling
133
8. Latent Growth Curve Modeling
155
9. Structural Models for Categorical and Continuous Latent Variables
181
10. Epilogue: Toward a New Approach to the Practice of Structural Equation Modeling
207
References
232
Index
245
About the Author
255
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page vi
Series Editors’ Introduction to Structural Equation Modeling
O
ver the last 35 years Structural Equation Modeling (SEM) has become one of the most important data analysis techniques in the social sciences. In fact, it has become much more than that. It has become a language to formulate social science theories, and a language to talk about the relationship between variables. For about 10 years, the AQT series of books on advanced techniques in the social sciences has had David Kaplan’s excellent text. We are now pleased to have a completely revised and updated second edition. SEM is not without its critics, and most researchers active in the area will admit that it can easily be misused and in fact has frequently been misused. Its routine application as a tool for theory formation and causal analysis has been criticized by well-known statisticians such as Freedman, Wermuth, Rogosa, Speed, Rubin, and Cox. One obvious problem is that SEM is a complicated technique, whose statistical properties are far from simple, and many of its users do not have enough statistical expertise to understand the various complications. Another problem is that using SEM allows one to search over an enormously large space of possible models, with a complicated combinatorial structure, and the task of choosing an appropriate model, let alone the “best model,” is horrendously difficult. Unless one has very strong prior knowledge, which sets strict limitations on the choice of the model, it is easy to search until one has an acceptable fit. That fit will often convey more about the tenacity and good fortune of the investigator than about the world the model is supposed to characterize. Finally, at a deeper level, there is considerable disagreement about precisely what one can learn about cause and effect in the absence of experiments in which causal variables are actually manipulated. For the critics, SEM can never be a substitute for real experiments. The second edition pays a great deal of attention to causal inference. It is somewhat unfortunate that most of the books discussing SEM concentrate on the practical aspects of the technique, and are often ill-disguised vi
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page vii
Series Editors’ Introduction to Structural Equation Modeling—vii
extended manuals of specific SEM software packages. Kaplan’s second edition provides a general overview of theoretical aspects of SEM, and it does include many of these more recent developments. He mentions multilevel SEM, nonnormal SEM, missing data, latent class analysis, mixed discrete and continuous models and latent growth curves. Although the author uses Muthén’s Mplus software for its main analyses, he also uses R for supplementary statistics. But the book is not an Mplus manual, and with little additional effort, the reader can adapt the analyses to her favorite software package. Let us try to place the Kaplan book somewhat more precisely in its historical context. To some extent this is already done in the historical introduction in Chapter 1. There is very little factual distortion if we attribute the creation of modern SEM to Karl Jöreskog’s work undertaken in the early seventies. Because of the competitive structure of the field, due in part to the emphasis on commercial software packages, there have been many attempts to minimize his contributions. But it is clear from the record that Jöreskog managed, almost single-handedly, to integrate into a single framework the simultaneous equations theory from econometrics, the path analysis theory from genetics and sociology, and the factor analysis theory from psychometrics. And he implemented this synthesis in the LISREL program, which was so influential in its first twenty years of existence that many people simply referred to SEM as “LISREL modeling” or “LISREL analysis.” Kaplan’s book takes the LISREL approach as its starting point, but it integrates the many subsequent fundamental contributions of Bentler, Brown, Satorra, McDonald, and Muthén. In this second edition there is a great deal of emphasis on “second-generation SEM,” which combines latent class analysis with continuous first-generation SEM. The basic approach is not new, because it was already suggested a long time ago by Lazarsfeld, Guttman, and McDonald, but there now is a convenient implementation in the Mplus machinery. In a sense, SEM is an extension of simultaneous equation modeling, a class of techniques that has been around in econometrics since the late thirties. The major contributor around that time was Tinbergen. At the same time, SEM is in an active stage of development in quite a few different sciences, and it has a long history in each. Consequently, there inevitably are differences in the way scientists from different disciplines talk and think about the technique. Kaplan started in the psychometric tradition, but has been incorporating more and more of the work in econometrics. One of the most original contributions of the first edition is the discussion of Spanos’s work on econometric modeling and the consequent alternative approach to SEM. This nonstandard discussion of SEM remains part of the second edition. Thus, the main contributions of the book are a solid discussion of likelihoodbased inference for SEM, with many of the modern extensions, and a new methodological perspective on the model-fitting cycle based on Spanos’s work.
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page viii
viii—STRUCTURAL EQUATION MODELING
In the second edition, many more modern developments in longitudinal and growth curve analysis, and in second-generation SEM, have been added. We think this combination works, and it makes the book eminently suitable as a textbook on SEM for graduate-level courses in social science methodology or social statistics programs. Jan de Leeuw Richard Berk Series Editors
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page ix
Preface to the Second Edition
A
lmost a decade has passed since the publication of the first edition of Structural Equation Modeling: Foundations and Extensions and considerable methodological advances in the area of structural equation modeling have taken place. The vast majority of these advances have been in the analysis of longitudinal data, but advances in the analysis of categorical latent variables as well as general models that combine both categorical and continuous latent variables have also made their way into applied work in the social and behavioral sciences during this time. In addition, there have been advances in estimation methods, techniques of model evaluation, and modern conceptualizations of the modeling process—including recent thinking on the use of structural equation modeling for causal inference. In light of these advances, I have undertaken a substantial revision of the book from the original format that was adopted in the first edition. This new edition maintains and updates so-called “first-generation” structural equation modeling but now brings in developments in so-called “second generation” structural equation modeling—methods that combine continuous latent variables (factors) with categorical latent variables (latent classes) in cross-sectional and longitudinal contexts. As a result, the term structural equation modeling is being used here in a much more expansive sense, covering models for continuous and categorical latent variables. The present edition is now organized as follows. Chapter 1 retains the original historical overview but now adds an historical overview of latent class models. Chapter 2 remains relatively intact from the first edition. For completeness, Chapter 3 now contains material on nonstatistical estimation in the unrestricted model—including a discussion of principal components analysis and the common factor model. Chapter 4 remains mostly intact from the first edition. Chapter 5 provides more detail regarding mean- and variance-adjusted maximum likelihood and weighted least squares estimators along with a discussion of the extant evidence regarding their performance. Additional material regarding developments in the analysis of missing data in the structural
ix
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page x
x—STRUCTURAL EQUATION MODELING
equation modeling framework is also provided. Chapter 6 updates the section on statistical power in the structural equation modeling framework. Chapter 7 of the first edition outlined multilevel structural equation modeling. In the past few years, however, advances have been made in the estimation of multilevel structural models as well as to applying structural equation modeling to data arising from complex sampling designs. Therefore, Chapter 7 now includes a completely updated overview of multilevel structural equation modeling and a review of the application of structural equation modeling for complex sampling designs. Chapter 8 provides updated material on conventional latent growth curve modeling but is now expanded to include methods for nonlinear curve fitting, autoregressive latent trajectory models, cohort sequential designs, and flexible modeling with alternative time metrics. Chapter 9 now contains a review of very recent developments in structural equation modeling that combine models for continuous latent variables with categorical latent variable, and can best described as second-generation structural equation modeling. To begin, Chapter 9 reviews latent class analysis, which addresses the problem of categorical latent variables. Latent class analysis is quite well known in its own right, but I include it in this edition because it is now a part of a general extension of structural equation modeling. I also include extensions of latent class analysis to longitudinal designs, describing Markov chain–based models. Chapter 9 provides an overview of structural equation modeling with finite mixtures, which formally combines categorical and continuous latent into a comprehensive analytic framework. This chapter reviews mixture factor analysis, mixture structural equation modeling, growth mixture modeling, and mixture latent transition analysis. Chapter 10 of this new edition retains much of the discussion of Spanos’s (1986) probabilistic reduction approach to statistical modeling found in the first edition of the book. I believe that it is still important to include this alternative to the conventional practice of structural equation modeling because the conventional practice still dominates the social and behavioral science literature. However, during the interim between the first edition and this one, the issue of causal inference in the social and behavioral sciences became, once again, a hotly debated topic, and structural equation modeling has become one of the battlegrounds in this debate. Therefore, this edition now incorporates a discussion of causal inference in structural equation modeling motivated by recent philosophical and methodological work on the counterfactual theory of causation and its extensions. In response to valuable feedback from colleagues and students who have used the first edition of this book in their research and classes, I have decided to make one additional change. In the interest of coherence, I have decided to use Mplus Version 4.2 (L. Muthén & Muthén, 2006) as the standard structural equation modeling software program throughout this edition. However, as
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page xi
Preface to the Second Edition—xi
with the first edition, this is not a book on how to use Mplus. For any supplementary analyses, I have decided to use the open source software program R. The R programming language is best considered a “dialect” of the S programming language (Chambers, 1998). In most cases, S code can be exported to the R environment without difficulty. In some cases, it is necessary to invoke the S environment, and this is best accomplished through the commercially available version, S-Plus.
Acknowledgments Sage Publications appreciates the constructive comments and suggestions provided by the following reviewers: Roger J. Calantone Michigan State University George Farkas Pennsylvania State University Scott M. Lynch Princeton University Keith A. Markus John Jay College of Criminal Justice Sandy Marquart-Pyatt Utah State University Victor L. Willson Texas A&M University
Fm-Kaplan-45677:Fm-Kaplan-45677.qxp
6/24/2008
6:29 PM
Page xii
To Allison, Rebekah, and Hannah.
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 1
1 Historical Foundations of Structural Equation Modeling for Continuous and Categorical Latent Variables
S
tructural equation modeling can perhaps best be defined as a class of methodologies that seeks to represent hypotheses about summary statistics derived from empirical measurements in terms of a smaller number of “structural” parameters defined by a hypothesized underlying model. This definition covers a large number of special cases as we will see throughout this book. Traditionally, structural equation modeling for continuous latent variables concerned hypotheses about the means, variances, and covariances of observed data. In this book, we will also include summary statistics in the form of response frequencies among observed categorical variables thus allowing us to admit latent class models as another special case of structural equation models. We begin our treatment of structural equation modeling by attempting to define the methodology and then outlining its history. In developing the history of structural equation modeling, we can best illustrate the substantive problems that the methodology is trying to solve.
1.1 Psychometric Origins of Structural Equation Modeling for Continuous Latent Variables Structural equation modeling for continuous variables latent variables represents the hybrid of two separate statistical traditions. The first tradition is factor analysis developed in the disciplines of psychology and psychometrics. The 1
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 2
2—STRUCTURAL EQUATION MODELING
second tradition is simultaneous equation modeling developed mainly in econometrics but having an early history in the field of genetics. The origins of factor analysis can be traced to the work of Galton (1889) and Pearson (Pearson & Lee, 1903) on the problem of inheritance of genetic traits. However, it is the work of Spearman (1904) on the underlying structure of mental abilities that can be credited with the development of the common factor model. Spearman’s theoretical position was that the intercorrelations between tests of mental ability could be accounted for by a general ability factor common to all of the tests and specific ability factors associated with each of the tests. This view led to a structural equation of the form rij = li lj ,
[1.1]
where ρij is the population correlation between scores on test i and test j, and λi and λj are weights (loadings) that relate test i and test j to the general factor. Consistent with our general definition of structural equation modeling, Equation [1.1] expresses the correlations in terms of a set of structural parameters. Spearman used the newly developed product-moment correlation coefficient to correlate scores on a variety of tests taken by small sample of boys. Spearman’s reported findings that were consistent with the structural equation in Equation [1.1]. The work of Spearman and others (e.g., Thomson, 1956; Vernon, 1961) formed the so-called British school of factor analysis. However, in the 1930s, attention shifted to the work of L. L. Thurstone and his colleagues at the University of Chicago. Thurstone argued that there was not one underlying general factor of ability accompanied by specific ability factors as postulated by Spearman. Rather, Thurstone argued that there existed major group factors referred to as primary mental abilities (Thurstone, 1935). According to Mulaik (1972), Thurstone’s search for group factors was motivated by a parsimony principle that suggested that each factor should account for as much covariation as possible in nonoverlapping sets of observed measures. Factors displaying this property were said to exhibit simple structure. To achieve simple structure, however, Thurstone (1947) had to allow for the possibility that the factors themselves were correlated. Proponents of the British school, as noted by Mulaik (1972), found this correlation to validate their claim of a general unitary ability factor. In the context of Thurstone’s (1947) multiple factor model, the general ability factor exists at a higher level of the ability hierarchy and is postulated to account for the intercorrelations between the lower order primary factors. By the 1950s and 1960s, factor analysis gained tremendous popularity, owing much to the development and refinement of statistical computing capacity. Indeed, Mulaik (1972) characterized this era as a time of agnostic and blind factor analysis. However, during this era, developments in statistical
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 3
Historical Foundations of Structural Equation Modeling—3
factor analysis were also occurring, allowing for the explicit testing of hypotheses regarding the number of factors. Specifically, work by researchers such as Jöreskog (1967), Jöreskog and Lawley (1968), Lawley (1940, 1941), and Lawley and Maxwell (1971) led to the development of a maximum likelihood–based approach to factor analysis. The maximum likelihood approach allowed a researcher to test a hypothesis that a specified number factors were present to account for the intercorrelations between the variables. Minimization of the maximum likelihood fitting function led directly to the likelihood ratio chisquare test of the hypothesis that a proposed model fits the data. A generalized least squares approach was later developed by Jöreskog and Goldberger (1972). Developments by researchers such as Anderson and Rubin (1956) and later by Jöreskog (1969) led to the methodology of confirmatory factor analysis that allowed for testing hypotheses regarding the number of factors and the pattern of loadings. From a historical perspective, these developments lent a rigorous statistical approach to Thurstone’s simple structure ideas. In particular, a researcher could now specify a model that certain factors accounted for the correlations of only a specific subset of the observed variables. Again, using the method of maximum likelihood, the hypothesis of simple structure could be tested. Exploratory and confirmatory factor analysis remain to this day very popular methodologies in quantitative social science research. In the context of structural equation modeling, however, factor analysis constitutes a part of the overall framework. Indeed, structural equation modeling represents a method that, among other things, allows for the assessment of complex relationships among factors. These complex relationships are often represented as systems of simultaneous equations. The historical development of simultaneous equation methodology is traced next.
1.2 Biometric and Econometric Origins of Structural Equation Modeling Structural equation modeling represents a melding of factor analysis and path analysis into one comprehensive statistical methodology. The path analytic origins of structural equation modeling had its beginnings with the biometric work of Sewell Wright (1918, 1921, 1934, 1960). Wright’s major contribution was in showing how the correlations among variables could be related to the parameters of a model as represented by a path diagram—a pictorial device that Wright was credited with inventing. Wright also showed how the model equations could be used to estimate direct effects, indirect effects, and total effects. Wright (1918) first applied path analysis to the problem of estimating size components of the measurements of bones. Interestingly, this first application
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 4
4—STRUCTURAL EQUATION MODELING
of path analysis was statistically equivalent to factor analysis and was developed apparently without knowledge of the work of Spearman (1904; see also Bollen, 1989). Wright also applied path analysis to problems of estimating supply and demand equations and also treated the problem of model identification. These issues formed the core of later econometric contributions to structural equation modeling (Goldberger & Duncan, 1972). A second line of development occurred in the field of econometrics. Mathematical models of economic phenomena have had a long history, beginning with Petty (1676; as cited in Spanos, 1986). However, the form of econometric modeling of relevance to structural equation modeling must be credited to the work of Haavelmo (1943). Haavelmo was interested in modeling the interdependence between economic variables using the form for systems of simultaneous equations written as y = By + Γx + ζ ,
[1.2]
where y is a vector endogenous variables that the model is specified to explain, x is a vector exogenous variables that are purported to explain y but whose behavior is not explained, ζ is a vector of disturbance terms, and B and Γ are coefficient matrices. The model in Equation [1.2] was a major innovation in econometric modeling. The development and refinement of the simultaneous equations model was the agenda of the Cowles Commission for Research in Economics, a conglomerate of statisticians and econometricians that met at the University of Chicago in 1945 and subsequently moved to Yale (see Berndt, 1991).1 This group wedded the newly developed simultaneous equations model with the method of maximum likelihood estimation and associated hypothesis testing methodologies (see Hood & Koopmans, 1953; Koopmans, 1950). For the next 25 years, the thrust of econometric research was devoted to the refinement of the simultaneous equations approach. Particularly notable during this period was the work of Franklin Fisher (1966) on model identification. It is important to point out that although the simultaneous equations model of Equation [1.2] enjoyed a long history of development and application, it was not without its critics. Critics asserted that a serious problem with large macroeconomic simultaneous equations models was that they could not compete with the theory-free methods of the Box-Jenkins time series models when it came to accurate predictions (e.g., Cooper, 1972).2 The underlying problem was related to the classic distinction between theory-based but static models versus dynamic time series models (e.g., Spanos, 1986). This problem led to a serious reconsideration of the entire “conventional” approach to econometric modeling that now occupies considerable discussion in the econometric literature.
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 5
Historical Foundations of Structural Equation Modeling—5
1.3 Simultaneous Equation Modeling Among Continuous Latent Variables The above discussion briefly sketched the history of factor analysis and the history of simultaneous equation modeling. The subject of this book is the combination of the two, namely simultaneous equation modeling among latent variables. The combination of these methodologies into a coherent analytic framework was based on the work of Jöreskog (1973), Keesling (1972), and Wiley (1973). The general structural equation model as outlined by Jöreskog (1973) consists of two parts: (1) the measurement part, which links observed variables to latent variables via a confirmatory factor model, and (2) the structural part linking latent variables to each other via systems of simultaneous equations. The estimation of the model parameters uses maximum likelihood estimation. In the case where it is assumed that there is no measurement error in the observed variables, the general model reduces to the simultaneous equations model developed in econometrics (e.g., Hood & Koopmans, 1953). Issues of model identification developed in econometrics (e.g., Fisher, 1966) were brought into the general model with latent variables by Wiley (1973). A history of software development then took place culminating in the popular LISREL program (Jöreskog & Sörbom, 2000).
1.4 Structural Models for Categorical Latent Variables What has been considered thus far is the history of structural equation modeling with a focus on the introduction of continuous latent variables into the simultaneous equation framework. In many applications, however, it may be useful to hypothesize the existence of categorical latent variables. Such categorical latent variables are presumed to explain response frequencies among dichotomous or ordered categorical variables. The use of categorical latent variables underlies the methodology of latent structure analysis. Latent structure analysis was originally proposed by Lazarsfeld (1950) as a means of modeling latent attitudes derived from dichotomous survey items with the origins of latent structure analysis arising from studies of military personnel during World War II.3 The problem facing researchers at that time concerned the development of reliable and valid instruments measuring the attitudes soldiers had toward the army. The results of research conducted on World War II soldiers were published between 1949 and 1959 in a four-volume set titled The American Solider: Studies in Social Psychology in WWII (Stouffer, Suchman, Devinney, Star, & Williams, 1949).
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 6
6—STRUCTURAL EQUATION MODELING
Volume 4 of this study was devoted to the problems of measurement and scaling with major contributions made by Louis Guttman and Paul Lazarsfeld. As with the earlier work of Spearman and Thurstone on the measurement of intelligence, the goal here was to uncover underlying or “latent” structure describing the attitudes of army personnel. However, unlike Spearman and Thurstone, the observed data were discrete categorical responses and, in particular, dichotomous yes/no, agree/disagree responses. The summary empirical data were in the form of frequencies of agreement to a set of questions administered to the sample of personnel. In an example using four dichotomous items, Lazarsfeld summarized the counts of individuals who agreed with all four statements, agreed with the first, but not the remaining three, and so on. An inspection of the response frequencies led Lazarsfeld to postulate that soldiers belonged to one of two possible “latent classes”: the first being soldiers who are generally favorable to the army versus those who are generally unfavorable. Moreover, Lazarsfeld noted that if one were to have administered the items to only one of the two classes, there would be no correlations among the items. This phenomenon was termed local independence by Lazarsfeld and it implies that holding the latent class constant, there is no correlation among the manifest item responses. Missing during the early days in the development of latent structure analysis was explicit testing of the latent class model. The standard issues of goodness-of-fit, parameter estimation, standard errors, and other concepts familiar to mathematical statistics at the time were not discussed in any meaningful way within the emerging literature on latent structure analysis. It wasn’t until much later with the work of Lazarsfeld and Henry (1968) that the conventional concepts of mathematical statistics were brought into the domain of latent structure analysis. Full integration of latent structure analysis with mathematical statistics came with the publication of Leo Goodman’s (1968) paper on loglinear modeling approaches to latent structure analysis and the publication of Discrete Multivariate Analysis by Yvonne Bishop, Stephen Fienberg, and Paul Holland (1975).
1.5 Modern Developments Structural equation modeling is, without question, one of the most popular statistical methodologies available to quantitative social scientists. The popularity of structural equation modeling can be attested to by the creation of a scholarly journal devoted specifically to structural equation modeling4 as well as the existence of SEMNET, a very popular and active electronic discussion list that focuses on structural equation modeling and related issues.5 Structural equation modeling also continues to be an active area of theoretical and applied statistical research. Indeed, the past 40 years have seen
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 7
Historical Foundations of Structural Equation Modeling—7
remarkable developments in the statistical theory underlying structural equation modeling, culminating in software developments that allow flexible and sophisticated modeling under nonstandard conditions of the data. Moreover, recent developments have allowed traditionally different approaches to statistical modeling to be specified as special cases of structural equation modeling. As we will see later, the advantages of the structural equation modeling perspective are its tremendous flexibility as well as the incorporation of explicit measurement models into more general statistical models. It would be impossible to highlight all of the modern developments in structural equation modeling in this section. However, among the more important modern developments has been the extension of new estimation methods to handle nonnormal distributions. This topic is considered in depth in Chapter 5. Suffice it to say that owing to the seminal work of Browne (1984), B. Muthén (1978, 1984), and others, it is now possible to estimate the parameters of complex structural equation models when the data are nonnormal—including mixtures of dichotomous, ordered-categorical, and continuous variables. In addition to estimation with nonnormal variables, a long history of methodological developments in structural equation modeling now allows researchers to estimate models in the presence of other data-related problems. For example, B. Muthén, Kaplan, and Hollis (1987), Allison (1987), and Arbuckle (1996), building on the work of Little and Rubin (1987), have shown how one can use standard structural modeling software to estimate the parameters of structural equation models when missing data are not missing completely at random. Other developments in structural equation modeling have resulted from specifying the general model in a way that allows a “structural modeling” approach to other types of modeling. The most recent example of this development is the use of structural equation modeling to estimate multilevel data— including longitudinal data for the estimation of growth curve parameters (B. Muthén & Satorra, 1989; Willett & Sayer, 1994). These topics are taken up in more detail in Chapters 7 and 8. Finally, the merging of categorical latent variable modeling and models for continuous latent variables in cross-sectional and longitudinal contexts constitutes the current “state of the art” in structural equation modeling (B. Muthén, 2004). These topics are considered in Chapter 9.
1.6 The “Conventional” Practice of Structural Equation Modeling The previous sections provided only a taste of the foundations and extensions of structural equation modeling. Each chapter of this book provides more detail to both. These developments come about primarily through the interaction of statisticians with substantive researchers motivated by a need to solve
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 8
8—STRUCTURAL EQUATION MODELING
specific substantive problems. Interesting substantive problems lead to cuttingedge methodological developments, which, in principle, should lead to greater insights into substantive problems. Yet, although it is possible to obtain better and more precise estimates of substantive relationships, these new developments are embedded in a “conventional” practice of structural equation modeling that I would argue limits substantive understanding as well as new methodological developments. The conventional approach to structural equation modeling as generally practiced in the social and behavioral sciences can be characterized as shown in Figure 1.1. The broad details of Figure 1.1 are as follows. First, when available, a theory is presented. The structural equations, as represented in a path diagram, are seen as a one-to-one representation of the theory. Next, a sample is selected and measures are obtained on the sample. This is followed by the estimation of the parameters of the model. At this stage, the measurement model can be estimated first, followed by the structural model or the full model can be estimated at once. This is followed by an assessment of the goodnessof-fit of the model followed by model modification if necessary. Typically, this stage is cyclical with the model continually being modified and evaluated in terms of goodness-of-fit until a decision is made that the model meets some standard of adequate fit. Often, any number of conceptually different fit indices is brought to bear to aid in this decision. These indices are described in Chapter 6. Once the model is deemed to fit, a discussion of the findings follows. Rarely, if ever, are the results of the modeling exercise used for prediction studies wherein policy/clinically relevant variables are manipulated and their effects on outcome variables observed. A clear feature of the conventional approach is the connection it makes between the theory and the specification of the equations of the model as represented by the path diagram. Indeed, the conventional approach seems to suggest that the specification of the model differs from the theory only by the existence of a white noise error term. Within the conventional approach, the goal of obtaining better fit by modifying models is driven by the view that better fit suggests closer alignment with the theory. This is not to argue that improving the fit of the model is a meaningless endeavor. However, when the focus of a modeling exercise is post hoc fitting to data, such a strategy is bound to lead to disappointment because it ignores the data-generating process (DGP), that is, the actual process that generates the observed data—the so-called “actual DGP” (see Spanos, 1986)—and the distance between the DGP and the statistical model. Better fit may suggest closer alignment with the data, but not necessarily with the theory. In addition to the gap between the DGP and the statistical model as it pertains to the theory, there are scarce examples of using the results of the statistical model to validate theoretical predictions. One could argue that this is because there are few examples of theories in the social and behavioral sciences
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 9
Historical Foundations of Structural Equation Modeling—9
Theory
Model Specification
Sample and Measures
Estimation
Assessment of Fit
Modification
Discussion
Figure 1.1
Diagram of Conventional Approach to Structural Equation Modeling
that articulated well enough to lead to theoretical models (mathematical formulations of theories) that could generate predictions. However, as is discussed in more detail in Chapter 10, the restrictions placed on structural equation models for estimation and testing purposes implies the existence of a theoretical model even if one was not explicitly articulated. Thus, I argue that a problem with the conventional approach to structural equation modeling as practiced in the social and behavioral sciences is that theory, theoretical models, and statistical models are viewed as one and the same apart from an error term—with the actual data playing little to no role at all. Moreover, structural equation models are rarely, if ever, used to validate theoretical predictions. Interestingly, this problem in social and behavioral science applications of structural equation modeling parallels similar problems observed in econometrics discussed above. An alternative approach to this problem from the
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 10
10—STRUCTURAL EQUATION MODELING
econometric perspective has been articulated by Spanos (1986, 1990, 1995), who has offered a different formulation of econometric modeling that I believe is worth exploring within the structural equation modeling community. Although the primary aim of this book is to provide a detailed discussion of the statistical foundations and extensions of structural equation modeling, one central goal of this book is to open the debate about the practice of structural equation modeling by formulating an alternative approach based on a modified version of Spanos’s work. In addition to an explication of Spanos’s approach to structural equation modeling, another extremely important issue concerns the use of structural equation models for testing causal claims. Indeed, the problem of causal inference has led to an ongoing and vigorous debate between those advocating a structural econometric approach to causal inference and those whose approach rests on the tenets of randomized experimental designs. In addition to explicating Spanos’s notions of matching the statistical model with the actual DGP, I also offer a discussion of the problem of causal inference as it bears on applications of structural equation modeling.
1.7 A Note on the Substantive Examples The substantive examples in this book draw from current issues in the field of education and that are at the forefront of national debate on school effects. My choice in using examples from the field of education stems mainly from the fact that this is the substantive area with which I am most familiar. In addition, many of the topics covered in this book can be convincingly demonstrated on problems in the field of education. However, many of the new extensions in structural equation modeling that constitute a part of this book can be quite clearly demonstrated on problems arising from fields other than education. Many of the examples used throughout this book are guided by a theoretical framework. The theoretical framework used throughout this book is referred to as the input-process-output theory of education (Bidwell & Kasarda, 1975). A number of diagrammatic formulations have been offered to describe the input-process-output theory of the U.S. educational system. Figure 1.2 shows one such diagram offered by Shavelson, McDonnell, and Oakes (1989) and often referred to as the RAND Corporation Indicators Model. There are numerous aspects of this figure that are worth pointing out. First, and of relevance to the subject of this book, is the implied complexity of the educational system. To take an example, schooling inputs such as fiscal resources are theorized to have their effects on outputs mostly via other schooling variables as well as teacher and classroom process variables. The teacher/ classroom process variables, in turn, exhibit their own structural complexity.
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 11
Historical Foundations of Structural Equation Modeling—11 Inputs
Processes
Fiscal Resources
Outputs
Achievement
Curriculum Quality
Teacher Quality
School Quality
Instructional Quality
Participation/ Dropout
Teaching Quality Attitudes/ Aspirations
Student Background
Figure 1.2
The Input-Process-Output Model of the U.S. Educational System
SOURCE: From Shavelson, McDonnell, and Oakes (1989).
The statistical methodology most suited to capturing the complexity of relations among the inputs, processes, and outputs of the educational system is structural equation modeling. The input-process-output model suggests a large number of questions that can be posed relating to the structural complexity of schooling. Chapters 2 and 4 consider these questions as a means of motivating the use of path analysis and structural equation modeling, respectively. Second, the terms displayed in the boxes represent loosely defined theoretical constructs and not specific observable data. Moreover, the figure does not specify exactly which instantiations of the constructs should be selected for measurement. For example, the construct of “student attitudes” is shown in the model to be an important output of the educational system and one that should be measured. But which attitudes—and how should they be measured? The issue of measurement, including the reliability of measures and their construct validity are essential for the construction of structural equation models. The measurement of some of the constructs implied by the input-processoutput theory constitutes our discussion of factor analysis in Chapter 3. Third, the educational system is viewed as multilevel in form. That is, outputs at the student level are hypothesized to be a function of processes mostly
01-Kaplan-45677:01-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 12
12—STRUCTURAL EQUATION MODELING
occurring at the teacher/classroom level, which, in turn, are a function of inputs mostly at the school level. Methods for the analysis of multilevel data have been discussed, for example, by Raudenbush and Bryk (2002). Extensions of multilevel methods to structural equation models are discussed and demonstrated in Chapter 7. Fourth, what is not obvious from an inspection of Figure 1.2 is that the measurements taken of the inputs, processes, and outputs of the educational system constitute a snapshot of an ongoing dynamic process. In other words, although Figure 1.2 implies an educational system in static equilibrium, the reality of educational systems may be quite the opposite, especially when the inputs constitute potential instruments of policy and the outputs (such as achievement) are expected to change over time—perhaps in response to changes in policy-relevant input variables. One solution might be to collect longitudinal data—and such national longitudinal panel data are readily available for analysis. Chapter 8 considers the problem of measuring growth in continuous variables whereas Chapter 9 considers stage sequential changes in latent categorical variables— both with applications to important educational outcomes. Finally, as the motivating example is based on issues that are important for educational reform, it is necessary to examine how models derived from the theoretical framework can be used to improve upon conventional statistical practice and to test causal claims. In this regard, Chapter 10 provides a review of modern ideas of causal inference as they pertain specifically to the practice of structural equation modeling.
Notes 1. As discussed in Berndt (1991), the Cowles Commission was founded by Alfred Cowles, III, who, among other things, provided the resources necessary to create the Econometric Society. 2. Berndt (1991) points out that in the first issue of Econometrica, Cowles (1933) stated that the best records of stock market forecasters were “little, if any, better than what might be expected from pure chance. There is some evidence, on the other hand, to indicate that the least successful records are worse than what could be reasonably be attributed to chance” (p. 324). 3. The term latent structure analysis appears to have been reserved for the study of structure of dichotomous response variables. 4. Structural Equation Modeling: A Multidisciplinary Journal published by Taylor & Francis. 5. SEMNET is composed of approximately 1500 individuals from over 75 countries. To subscribe, send an email message to
[email protected]. In the body of the message, type SUB SEMNET first-name last-name.
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 13
2 Path Analysis Modeling Systems of Structural Equations Among Observed Variables
W
e noted in the introductory chapter that structural equation modeling seeks to describe the means, variances, and covariances of a set of variables in terms of a smaller number of “structural parameters.” In this chapter, we begin by focusing on structural parameters that represent hypothesized relationships between a set of continuous observed variables modeled in terms of systems of equations. Modeling systems of structural relationships between a set of observed variables is often referred to as path analysis but is also referred to as simultaneous equation modeling in the field of econometrics. For the purposes of this chapter, we use the term path analysis. We begin with a discussion of the substantive example used throughout this chapter as a means of introducing the problem of model specification. In the course of this discussion, we introduce the distinction between recursive and nonrecursive models. The discussion of model specification is followed by an outline of the problem of model identification. Here, the necessary and sufficient conditions for model identification are provided. Model identification is followed by a discussion of parameter estimation—including the development of maximum likelihood and generalized least squares approaches. A discussion of model and parameter testing follows, along with a detailed discussion of the interpretation of the elements of a path analysis. In particular, we focus on the decomposition and interpretation of direct, indirect, and total effects. The chapter concludes with a discussion of the problem of measurement error.
13
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 14
14—STRUCTURAL EQUATION MODELING
2.1 A Substantive Example: Specification of Path Models For the purposes of this chapter, we consider a model of student science achievement. In doing so, we recognize that we are ignoring certain organizational features of the educational system as displayed in Figure 1.2. For example, we are not considering the fact that students are nested in schools. We will return to the issue of nesting when we consider multilevel structural equation modeling in Chapter 7. The data for this example come from the first follow-up of the National Educational Longitudinal Study (NELS) of 1988 (National Center for Education Statistics [NCES], 1988). The NELS survey was designed to provide relevant trend data on important transitions experienced by students as they move through elementary school and progress to high school and beyond. The subset of students used in this example were obtained from the first follow-up (10th grade) wave of the survey. Only those public school students whose science teachers filled out the teacher survey were included in this analysis. A set of student-level and teacher-level variables suggested by the education indicators literature and by the input-process-output model in Figure 1.2 were included. After listwise deletion of missing data and multiple responses, the sample size for this example was 7,361. In the initial stages of model specification, it is often useful to represent the set of structural equations in the form of a path diagram. Figure 2.1 shows the path diagram for the student-level model of science achievement implied by the theoretical model shown in Figure 1.2 of Chapter 1. Path diagrams are especially useful pictorial devices because, if drawn accurately, then there is a oneto-one relationship between the diagram and the set of structural equations.1 To fix notation, let p be the number of endogenous variables and q be the number of exogenous variables. The system of structural equations representing the model in Figure 2.1 can be compactly written as y = α + By + Γx + ζ,
[2.1]
where y is a p × 1 vector of observed endogenous variables, x is a q × 1 vector of observed exogenous variables, α is a p × 1 vector of structural intercepts, B is a p × p coefficient matrix that relates endogenous variables to each other, Γ is a p × q coefficient matrix that relates endogenous variables to exogenous variables, and ζ is a p × 1 vector of disturbance terms, where Var(ζ) = Ψ is the p × p covariance matrix of the disturbance terms, and where Var(·) is the variance operator. Finally, let Var(x) = Φ be the q × q covariance matrix for the exogenous variables.2 The elements in B, Γ represent the structural relationships between the variables. The patterns of zero and nonzero elements in these matrices, in turn,
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 15
Path Analysis—15
SCIGRA10
SCIGRA6
SES
CERTSCI
Figure 2.1
SCIACH
UNDERSTD
CHALLG
Education Indicators Model: Initial Specification
are imposed by the underlying, substantive theory. For example, in the model shown in Figure 2.1, an element of B would be the path relating SIGRA10 to CHALLG. An element in Γ would be the path relating SIGRA10 to SES. We can distinguish between three types of parameters in B, Γ, and Ψ. The first set of parameters is the one that is to be estimated. These are often referred to as free parameters. Thus, in the model in Figure 2.1, the observable paths are the free parameters. The value of these free parameters will be estimated with the methods described below. The second set of parameters are given a priori values that are held constant during estimation. These parameters are often referred to as fixed parameters. Most often, the fixed parameters are set equal to zero to represent the absence of the relationship. However, it is possible to fix an element to a nonzero value if the theory is strong enough to suggest what that value should be. Again, considering the model in Figure 2.1, we theorize that there is no direct relationship between SCIACH and SES. Thus, this path is fixed to zero. Finally, it is possible to constrain certain parameters to be equal to other parameters in the model. These elements are referred to as constrained parameters. An example of constrained parameters would be requiring that the effect of SCIGRA6 on SCIGRA10 be the same as the effect of SES on SCIGRA10.
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 16
16—STRUCTURAL EQUATION MODELING
2.1.1 RECURSIVE AND NONRECURSIVE MODELS The specification of the elements of the matrix B allows us to distinguish between two general classifications of path analytic models: (1) recursive and (2) nonrecursive or simultaneous. Consider first the recursive case. A prototype recursive model is shown in Figure 2.2.3 A characteristic feature of recursive systems is that the elements of B that represent the relationships between the endogenous variables in this model are contained in the lower triangular portion of B. In addition, note that Figure 2.2 does not contain covariances between the disturbance terms in the model. In other words, for a recursive model, Ψ is a diagonal matrix whose elements are the variances of the disturbances. A second type of model is referred to as a nonrecursive model. A prototype nonrecursive model is shown in Figure 2.3. Nonrecursive models are also referred to as simultaneous equation models and have been widely used in economics to study problems such as supply and demand for certain commodities. In the nonrecursive model a feedback loop between two endogenous variables is specified. Specification of the feedback look is determined by freeing the appropriate elements in the upper triangular part of B. Moreover, it is typically the case that a covariance term is specified between the disturbances among endogenous variables in the feedback loop. In other words, for nonrecursive models, Ψ is specified to be a symmetric matrix with nonzero off-diagonal elements. The nonzero covariance between the disturbance terms arises from the fact that because y1 affects y2 and ζ1 affects y1, this implies that ζ1 affects y1, which in turn will affect y2. This will result in a nonzero covariance between ζ1 and y2. The process works similarly for the effect of y2 on y1. The presence of feedback loops also implies an underlying dynamic specification to the structural model insofar as some period of time is required for the feedback to take place. The problem alluded to here concerns the extent to which the process specified by the feedback loop will stabilize or explode as a x1
x2
y1
x3
Figure 2.2
A Prototype Recursive Path Model
y2
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 17
Path Analysis—17
x1
y1
x2
x3
Figure 2.3
y2
Prototype Nonrecursive Path Model
result of a change to the system imparted by the exogenous variables. The problem of the stability and equilibrium has been discussed by Sobel (1990) and Kaplan, Harik, and Hotchkiss (2000). It is beyond the scope of this chapter to discuss dynamic features of nonrecursive models. Suffice to say that this issue is extremely important when we consider that most social systems are dynamic and that cross-sectional models are static views of an ongoing dynamic system (see, e.g., Tuma & Hannan, 1984). 2.1.2 REDUCED FORM AND COVARIANCE STRUCTURE SPECIFICATIONS The system described in Equation [2.1] is referred to as the structural form of the model. It is convenient to rewrite the structural form of the model so that the endogenous variables are on one side of the equation and the exogenous variables on the other side. Thus, Equation [2.1] can be rewritten as ðI − BÞy = α + Γx + ζ:
[2.2]
Assuming that (I − B) is nonsingular so that its inverse exists, Equation [2.2] can be written as y = ðI − BÞ−1 α + ðI − BÞ−1 Γx + ðI − BÞ−1 ζ, = Π0 + Π1 x + ζ :
[2.3]
This specification is referred to as the reduced form of the model, where Π0 is the vector of reduced form intercepts, Π1 is the vector of reduced form slopes, and ζ∗ is the vector of reduced form disturbances with Var(ζ∗) = Ψ∗.
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 18
18—STRUCTURAL EQUATION MODELING
Note that Equation [2.3] is a straightforward multivariate regression of y on x. We return to the reduced form specification of the model when considering the issue of identification later in this chapter and the issue of exogeneity in Chapter 5. Although the importance of the reduced form specification is recognized in econometric and other social science treatments of structural equation modeling, its role is usually relegated to issues of identification and estimation. As an aside, it should be pointed out that the system of equations described in Equation [2.3] can be represented in terms of modeling means, variances, and covariances—as per our definition of structural equation modeling in Chapter 1. To see this, let Ω be a vector that contains the structural parameters of the model—so in this case Ω = ðB, Γ, Ψ, ΦÞ .4 Furthermore, let E(x) = μx be the vector of means for x, Var(x) = E(x′x) = Φ, and E(ζ) = 0, where E(·) is the expectation operator. Then, using rules of expectation algebra EðyÞ = ðI − BÞ−1 α + ðI − BÞ−1 EðxÞ,
[2.4]
= ðI − BÞ−1 α + ðI − BÞ−1 μx ,
and Eðy, xÞ = Σyx Eðyy0 Þ Eðyx0 Þ = Eðx0 yÞ Eðx0 xÞ " ðI − BÞ−1 ðΓΦΓ0 + ΨÞðI − BÞ0−1 = ΦΓ0 ðI − BÞ0 −1
[2.5] ðI − BÞ−1 ΓΦ Φ
# :
Equations [2.4] and [2.5] show that structural equation modeling represents a structure on the mean vector and covariance matrix. The structure is in terms of the parameters of the model.
2.2 Identification of Path Models A prerequisite for the estimation of the parameters of the path model is to establish whether the parameters are identified. Identification refers to whether the parameters of the model can be uniquely determined by the sample data. If the parameters of the model are not identified, estimation of the parameters is not possible. Although the problem of identification is present in almost all parametric statistical models, the role of identification is perhaps clearest in
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 19
Path Analysis—19
structural equation models. In this section, we will define the problem of identification from the covariance structure perspective. Later, we introduce the problem of identification from the reduced form perspective when considering some simple rules for establishing identification. 2.2.1 DEFINITION OF IDENTIFICATION We begin with a definition of identification from the perspective of covariance structure modeling. First, arrange the unknown parameters of the model in the vector Ω. Consider next a population covariance matrix Σ whose elements are the population variances and covariances. We assume that there exists an underlying, substantive model that is purported to explain the variances and covariances in Σ. So, for our discussion, we assume that the model in Equation [1.2] describes the population variances and covariances. We know that the variances and covariances in Σ can be estimated by their sample counterparts in the sample covariance matrix S using straightforward formulae for the calculation of sample variances and covariances. Thus, the parameters in Σ are uniquely identified from the data— where here the data are the elements of the sample covariance matrix. Having established that the elements in Σ are identified from their sample counterparts, what we need to establish to permit estimation of the model parameters is whether the model parameters are identified. We say that the elements in Ω are identified if they can be expressed uniquely in terms of the elements of the covariance matrix Σ. If all elements in Ω are identified, we say that the model is identified.
2.2.2 SOME COMMON IDENTIFICATION RULES Let us now consider the identification of the parameters of the path analysis model in Equation [2.1]. To begin, it is important to note that there exists an initial set of restrictions that must be imposed even for simple regression models. The first restriction, referred to as normalization, requires that we set the diagonal elements of B to zero, such that an endogenous variable cannot have a direct effect on itself. The second requirement concerns the vector of disturbance terms ζ. Note that the disturbances for each equation are unobserved and hence have no inherent metric. The most common way to set the metric of ζ, and the one used in simple regression modeling, is to fix the coefficient relating the endogenous variables to the disturbance terms to 1.0. An inspection of Equation [2.2] reveals that ζ is actually being multiplied by the scaling factor 1.0. Thus, the disturbance terms are in the same scale as their relevant endogenous variables. With the normalization rule in place and the metric of ζ fixed, we can now discuss some common rules for the identification of path model parameters.
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 20
20—STRUCTURAL EQUATION MODELING
Recall again that we wish to know whether the variances and covariances of the exogenous variables (contained in Φ), the variances and covariances of the disturbance terms (contained in Ψ), and the regression coefficients (contained in B and Γ) can be solved in terms of the variances and covariances contained in Σ. Two classical approaches to identification can be distinguished in terms of whether identification is evaluated on the model as a whole or whether identification is evaluated on each equation composing the system of equations. The former approach is generally associated with social science applications of structural equation modeling, whereas the latter approach appears to be favored in econometrics. Nevertheless, they both provide a consistent picture of identification in that if any equation is not identified, the model as a whole is not identified. The first, and perhaps simplest, method for ascertaining the identification of the model parameters is referred to as the counting rule (see, e.g., Bollen, 1989). Let s = p + q, be the total number of p endogenous and q exogenous variables. Then the number of nonredundant elements in Σ is equal to –12 s(s + 1). Let t be the total number of parameters in the model that are to be estimated (i .e., the free parameters). The counting rule states that a necessary condition for identification is that t ≤ –12 s(s + 1). If the equality holds, then we say that the model may be just identified. If t is strictly less than –12 s(s + 1), then we say that the model may be overidentified. If t is greater than –12 s(s + 1), then the model may be not identified. As an example of the counting rule, consider the model of science achievement given in Figure 2.1. The total number of variables, s, in this model is 7. Thus, we obtain 28 elements in Σ. There are 10 variances and covariances of exogenous variables (including disturbances), and 8 path coefficients. Using the counting rule, we obtain t = 28 − 18 = 10. Because t is strictly less than the number of elements in Σ, we say that the model is overidentified. The 10 overidentifying elements come from the 10 restrictions placed on the model. Clearly, the advantage to the counting rule is its simplicity. It is also a necessary but not sufficient rule for identification. We can, however, provide rules for identification that are sufficient, but that pertain only to recursive models, or to special cases of recursive models. Specifically, a sufficient condition for identification is that B is triangular and that Ψ is a diagonal matrix. However, this is the same as saying that recursive models are identified. Indeed, this is the case, and Bollen (1989) refers to this rule as the recursive rule of identification. In combination with the counting rule above, recursive models can be either just identified or overidentified. A special case of the recursive rule concerns the situation where B = 0 and Ψ again a diagonal matrix. Under this condition, the model in Equation [2.1] reduces to y = α + Γx + ζ,
[2.6]
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 21
Path Analysis—21
which we recognize as a multivariate linear regression model. Here, too, we can use the counting rule to show that regression models are also just identified. Note that recursive models place restrictions on the form of B and Ψ and that the identification conditions stated above are directly related to these types of restrictions. Nonrecursive models, however, do not restrict B and Ψ in the same way. Thus, we need to consider identification rules that are relevant to nonrecursive models. 2.2.3 IDENTIFICATION OF NONRECURSIVE MODELS As noted above, the approach to identification arising out of econometrics (see Fisher, 1966), considers one equation at a time. The concern is whether a true structural equation can be distinguished from a false one formed by a linear combination of the other equations in the model (see, e.g., Land, 1973). In complex systems of equations, trying to determine linear combinations of equations is a tedious process. One approach would be to evaluate the rank of a given matrix because if a given matrix is not of full rank, then it means that there exist columns (or rows) that are linear combinations of each other. This leads to developing a rank condition for identification. To motivate the rank and order conditions consider the prototype nonrecursive model in Figure 2.3. As before, let p be the number of endogenous variables and let q be the number of exogenous variables. We can write this model as
y1 0 = y2 b21
b12 0
y1 g + 11 0 y2
g12 0
2 3 x z 0 4 15 x2 + 1 : g23 z2 x3
[2.7]
In this example, p = 2 and q = 3. As a useful device for assessing the rank and order condition, we can arrange the structural coefficients in a partitioned matrix A of dimension p × s as A = ½ðI − BÞjΓ 1 −b12 = −b21 1
−g11
−g12
0
0
0
−g23
,
[2.8]
where s = p + q. Note that the zeros placed in Equation [2.8] represent paths that have been excluded (restricted) from the model based on a priori model specification. We can represent the restrictions in the first equation of A, say
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 22
22—STRUCTURAL EQUATION MODELING
A1, as A1φ1 = 0, where φ1 is a column vector whose hth element (h = 1, . . . , s) is unity and the remaining elements are zero. Thus, φ1 selects the particular element of A1 for restriction. A similar equality can be formed for A2, the second equation in the system. The rank condition states that a necessary and sufficient condition for the identifiability of the first equation is that the rank of Aφ1 must be at least equal to p − 1. A similar result holds for the second equation. The proof of the rank condition is given in Fisher (1966). If the rank is less than p −1, then the parameters of the equation are not identified. If the rank is exactly equal to p −1, then the parameters of the equation in question are just identified. If the rank is greater than p −1, then the parameters of the equation are overidentified. The rank condition can be easily implemented as follows. First, delete the columns containing nonzero elements in the row corresponding to the equation of interest. Next, check the rank of the resulting submatrix. If the rank is p −1, then the equation is identified. To take the above example, consider the identification status of the first equation. Recall that for this example, p −1 = 1. According to the procedure just described, the resulting submatrix is
0 : −g23
With the first row 0, the rank of this matrix is 1, and hence, the first equation is identified. Considering the second equation, the resulting submatrix is
−g11 0
−g12 : 0
Again, because of the 0s in the second row, the rank of this submatrix is 1, and we conclude that the second equation is identified. A corollary of the rank condition is referred to as the order condition. The order condition states that the number of variables (exogenous and endogenous) excluded (restricted) from any of the equations in the model must be at least p −1 (Fisher, 1966). Despite the simplicity of the order condition, it is only a necessary condition for the identification of an equation of the model. Thus, the order condition guarantees that there is a solution to the equation, but it does not guarantee that the solution is unique. A unique solution is guaranteed by the rank condition. As an example of the order condition, we observe that the first equation has one restriction and the second equation as two restrictions as required by the condition that the number of restrictions must be as least p − 1 (here, equal
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 23
Path Analysis—23
to 1). It may be of interest to modify the model slightly to demonstrate how the first equation of the model would not be identified according to the order condition. Referring to Figure 2.3, imagine a path from x3 to y1. Then the 0 in the first row of A would be replaced by −γ13. Using the simple approach for determining the order condition, we find that there are no restrictions in the first equation and therefore the first equation is not identified. Similarly, the first equation fails the rank condition of identification.
2.3 Estimation of Model Parameters Assuming that the parameters of the model are identified, we now move on to describe procedures for the estimation of the parameters of the model. The parameters of the model are (a) the variances and covariances of exogenous variables contained in Φ, (b) the variances and covariances of disturbance terms contained in Ψ, and (c) the regression coefficients contained in B and Γ. Once again, it is convenient to consider collecting these parameters together in a parameter vector denoted as Ω. The goal of estimation is to ^ , that obtain estimates of the parameter vector Ω, which we will write as Ω ^ is the covariance ^ = ΣðΩÞ ^ , where Σ minimize a discrepancy function FðS,ΣÞ matrix based on the estimates of the model—the so-called fitted covariance matrix. ^ is a scalar that measures the discrepancy (distance) The function FðS,ΣÞ between the sample covariance matrix S (the data) and the fitted covariance ^ based on model estimates. A correct discrepancy function is characmatrix Σ terized by the following properties (see Browne, 1984): ^ ≥ 0, ðiÞ FðS,ΣÞ ^ = 0, if and only if Σ ^ = S, ðiiÞ FðS,ΣÞ ^ is a continuous fraction in S and Σ: ^ ðiiiÞ FðS,ΣÞ
The first property (i) requires that the discrepancy function must be a positive real number. The second property (ii) indicates that the discrepancy function is zero only if the model estimates reproduce the sample covariance matrix perfectly. The third property (iii) simply states that the function is continuous. For the purposes of this chapter, we consider maximum likelihood (ML) and generalized least squares (GLS).5 We consider the distributional assumptions underlying these methods, but we postpone the discussion of assumption violations until Chapter 5 where we consider alternative methods of estimation under more relaxed distributional assumptions.
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 24
24—STRUCTURAL EQUATION MODELING
2.3.1 MAXIMUM LIKELIHOOD An important breakthrough in the estimation of the parameters of path models came from the application of maximum likelihood (ML). Maximum likelihood was originally proposed as a method of estimation for econometric simultaneous equation models by (Koopmans, Rubin, & Leipnik, 1950) under the name full-information maximum likelihood. Later, Jöreskog (1973) discussed ML estimation for general structural equation models. To begin, let the set of observed responses x and y be denoted as z. Furthermore, let the observed responses be based on a sample of n = N − 1 observations with corresponding unbiased sample covariance matrix S that estimates a population covariance matrix Σ, and that is assumed to follow the path model in Equation [2.1]. Central to the development of the ML estimator is the assumption that the observations are derived from a population that follows a multivariate normal distribution. The multivariate normal density function of z can be written as 1 0 −1 [2.9] −ðp + qÞ=2 1=2 f ðzÞ = ð2pÞ jΣj exp − z Σ z : 2 The multivariate normal density function in Equation [2.9] describes the distribution for each observation in the sample. Under the assumption that the N observations are independent of one another, the joint density function can be written as the product of the individual densities, f ðz1 ,z2 , . . . ,zN Þ = f ðz1 Þf ðz2 Þ f ðzN Þ:
[2.10]
If Equation [2.9] represents the multivariate normal density function for a single sample member, then the product given in Equation [2.10] can be written as LðΩÞ = ð2pÞ
−N ðp + qÞ=2
jΣðΩÞj
−N =2
" exp
1 2
N X
# z0 i Σ−1 ðΩÞzi ,
[2.11]
i=1
where L(Ω) is defined to be the likelihood of the sample. To simplify the derivation, and with no loss of generality, it is convenient to take the log of Equation [2.11] yielding the log-likelihood logLðΩÞ =
−N ðp + qÞ N logð2pÞ − logjΣðΩÞj 2 2 N tr½TΣ−1 ðΩÞ: − 2
[2.12]
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 25
Path Analysis—25
The last term on the right-hand side of Equation [2.12] arises from the fact that the term in the brackets of Equation [2.11] is a scalar, and the trace of a scalar is a scalar. Thus, referring to the last term on the right-hand side of Equation [2.11] we have X X 1 N 0 −1 1 N − z i Σ ðΩÞzi = − tr½z0 i ΣðΩÞzi : 2 i=1 2 i=1
[2.13]
Multiplying and dividing by N and using the trace rule that tr(ABC) = tr(CAB) yields −
X X 1 N N N −1 0 −1 tr½z0i ΣðΩÞzi = − tr N zi zi Σ ðΩÞ 2 i=1 2 i=1 N =− tr TΣ − 1 ðΩÞ , 2
[2.14]
where T is the sample covariance matrix based on N rather than on n = N − 1. The next step is to maximize Equation [2.12] with respect to the parameters of the model. Maximizing the log-likelihood in Equation [2.12] requires obtaining the derivatives with respect to the parameters of the model, setting the derivatives equal to zero, and solving. The rules of matrix differential calculus used for this task can be found in Magnus and Neudecker (1988). Continuing, first note that Equation [2.12] contains the constant term ½ − N ðp + qÞ=2logð2πÞ , which does not contain model parameters. Thus, this term will not enter into the derivatives and can therefore be ignored. Second, we note that the difference between T (based on N) and the usual unbiased estimate S (based on n = N − 1) is negligible in large samples. We can therefore rewrite Equation [2.12] as logLðΩÞ = −
N logjΣðΩÞj + tr½SΣ − 1 ðΩÞ : 2
[2.15]
A problem with Equation [2.15] is that it does not possess the properties of a correct discrepancy function as described above. To see this, note that if S = Σ, then the second term on the right-hand side of Equation [2.15] will be an identity matrix of order p + q and the trace will equal p + q. However, the difference between the first term and second term will not equal zero as required if Equation [2.15] is to be a proper discrepancy function, as in property (ii) discussed above. To render Equation [2.15] a proper discrepancy function, we need to add terms that do not depend on model parameters and
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 26
26—STRUCTURAL EQUATION MODELING
therefore are not involved in the differentiation. To begin, we can remove the term − –N2 , in which case we are minimizing the function rather than maximizing it. Then, we can add terms that do not depend on model parameters and thus are of no consequence to differentiation. This gives FML = logjΣðΩÞj + tr½SΣ − 1 ðΩÞ − logjSj − t,
[2.16]
where t is the total number of variables in z, that is, t = p + q. It can be seen that if the model fits perfectly, the first and third terms sum to zero and the second and fourth terms sum to zero and therefore Equation [2.16] is now a proper fitting function as defined properties (i) to (iii) above. In addition to obtaining the estimates of the model parameters, we can also ^ be the r × 1 vector of estimated obtain the covariance matrix of the estimates. Let Ω ^ can be written as model parameters. Then, the asymptotic covariance matrix of Ω
2 −1 ∂ logLðΩÞ ^ ACOVðΩÞ = −E , ∂Ω∂Ω0
[2.17]
where the expression in the brackets is referred to as the Fisher information matrix denoted as I(Ω). From here, one can obtain standard errors from the square roots of the diagonal elements of the asymptotic covariance matrix of the estimates. Maximum Likelihood Estimation of the Science Achievement Model Returning to the science achievement example, the model in Figure 2.1 specifies that student background as measured by SES (socioeconomic status), prior science grades, and teacher science certification predicts student science achievement through measures of instructional quality as perceived by the students. This model omits teaching quality and curriculum quality insofar as these measures are unavailable in NELS. Table 2.1 gives the variable names and their measurement scales and descriptive statistics for the variables chosen for our science achievement model. It can be seen that certain variables exhibit moderate levels of skewness and kurtosis. The problem of how nonnormality influences the results of structural equation modeling as well as alternative estimators for addressing this problem are taken up in Chapter 5. For now, we will assume multivariate normality of the data.
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 27
Path Analysis—27 Table 2.1
Variable Names and Descriptive Statistics
Namesa
Min
Max
Mean
Std.
Skew
Kurtosis
SCIGRA6
1.000
5.000
3.950
0.970
−0.688
−0.039
CERTSCI
0.000
1.000
0.390
0.490
0.465
−1.784
−2.250
2.010
−0.007
0.745
0.036
−0.373
UNDERSTD
2.000
6.000
4.380
1.360
−0.370
−1.097
CHALLG
2.000
6.000
4.950
1.200
−1.061
0.168
SCIGRA10
2.000
9.000
6.700
1.840
−0.542
−0.505
10.130
34.680
22.044
5.866
0.125
−0.860
SES
SCIACH
Multivariate kurtosis = −2.216 (z = −8.469, p < .001) b
a. SCIGRA6, self-reported science grades from grade 6 to present; CERTSCI, Is teacher certified to teach science in state? (1 = yes); SES, socioeconomic status composite; UNDERSTD, How often is student asked to show understanding of science concepts?; CHALLG, How often does student feel challenged in science class?; SCIGRA10, self-reported science grades from grade 10; SCIACH, item response theory estimated number right on science achievement test. b. Mardia’s coefficient of multivariate kurtosis.
The software program Mplus (L. Muthén & Muthén, 2006) was used for this analysis. ML estimates of the model parameters and tests of significance are given in the upper panel Table 2.2. The unstandardized estimates are the direct effects and the covariances of the exogenous variables in the model. It can be seen that with few exceptions, each direct effect is statistically significant. 2.3.2 GENERALIZED LEAST SQUARES AND UNWEIGHTED LEAST SQUARES ESTIMATION The GLS estimator was developed by Aitken (1935) and applied to the path analysis setting by Jöreskog and Goldberger (1972; see also Anderson, 1973). As in the case of standard linear regression, the basic idea behind the GLS estimator is to correct for heteroscedastic disturbances. The GLS estimator is actually a member of the family of weighted least squares (WLS) estimators that can be written generally as FWLS = ½S − ΣðΩÞ0 W−1 ½S − ΣðΩÞ,
[2.18]
where W−1 is a weight matrix that weights the deviations S − ΣðΩÞ in terms of their variances and covariances with other elements. Notice that this is a
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 28
28—STRUCTURAL EQUATION MODELING Table 2.2
Maximum Likelihood Estimates Direct Effects for the Initial Science Achievement Model Estimates
SE
Est./SE
Std
StdYX
1.228
0.034
35.678
1.228
0.384
−0.033
0.017
−1.961
−0.033
−0.022
IRTSCI ON SCIGRA10 SCIGRA10 ON CHALLG SCIGRA6
0.781
0.020
38.625
0.781
0.413
SES
0.239
0.026
9.103
0.239
0.097
−0.040
0.039
−1.039
−0.040
−0.011
0.168
0.015
11.315
0.168
0.125
0.318
0.010
33.225
0.318
0.361
−0.030
0.033
−0.929
−0.030
−0.011
UNDERSTD
1.858
0.031
60.667
1.858
1.000
CHALLG
1.250
0.021
60.667
1.250
0.870
SCIGRA10
2.637
0.043
60.667
2.637
0.786
29.291
0.483
60.667
29.291
0.853
CERTSCI UNDERSTD CHALLG ON UNDERSTD UNDERSTD ON CERTSCI Residual variances
IRTSCI 2
Observed variable R UNDERSTD
0.000
CHALLG
0.130
SCIGRA10
0.214
IRTSCI
0.147
proper discrepancy function insofar as if the model fits the data perfectly, the first and last terms on the right-hand side of Equation [2.18] will yield a null matrix. A critical consideration of WLS estimators is the choice of a weight matrix −1 W . One choice could be W−1 = I, the identity matrix. With the identity matrix as the choice for the weight matrix, WLS reduces to unweighted least squares (ULS). Unweighted least squares is identical to ordinary least squares in the standard regression setting in that it assumes homoscedastic disturbances. Moreover, although ULS is known to yield unbiased estimates of model
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 29
Path Analysis—29
parameters, it is not the most efficient choice of estimators with respect to yielding estimates with minimum sampling variability. To address the potential problem of heteroskedastic disturbances, one can choose W−1 = S−1. Indeed, this is the most common choice for W−1. Choosing W−1 = S−1 defines the GLS estimator and can be rewritten as 1 FGLS = tr½S−1 ðS − ΣÞ2 2 1 = trðI − S−1 ΣÞ2 : 2
[2.19]
Under the assumption of multivariate normality, the GLS fitting function has identical asymptotic properties to ML—namely, the GLS estimator is asymptotically normal and asymptotically efficient, thus improving on ULS. In Chapter 5, we will consider alternative choices for W−1 that address the problem of nonnormal data. 2.3.3 A NOTE ON SCALE INVARIANCE AND SCALE FREENESS An important consideration in the choice of estimation methods is the properties of scale invariance and scale freeness. Scale invariance refers to the property that the value of the fit function is the same regardless of the change of scale of the measurements. For example, if the value of the fit function is the same when transforming a covariance matrix to a correlation matrix, then the estimator is scale invariant. A similar concept is that of scale freeness. This concept concerns the relationship between parameter estimates based on untransformed variables and those based on linearly transformed variables. More specifically, if scaling factors can be determined that allow one to obtain transformed estimates from untransformed estimates (and vice versa), then the estimator is scale free. Of the estimators discussed in Section 2.3, ML and GLS are both scale invariant and scale free under general conditions. If parameters are constrained to nonzero constants, or if there a specific types of cross-group equality constraints, then ML and GLS may lose their scale-free properties. Unweighted least squares, by contrast, is neither scale invariant nor scale free.
2.4 Model and Parameter Testing A feature of ML and GLS estimation of the path model is that one can explicitly test the hypothesis that the model fits the data. Consider again Equation [2.15]. This is the log-likelihood under the null hypothesis that the specified model
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 30
30—STRUCTURAL EQUATION MODELING
holds in the population. The corresponding alternative hypothesis is that Σ is any symmetric positive definite matrix. Under the alternative hypothesis, the log-likelihood attains its maximum with S as the estimator of Σ. Thus, the loglikelihood under the alternative hypothesis, say log La, can be written as n logLa = − logjSj + trðSS−1 Þ 2 n = − logjSj + trðIÞ 2 n = − logjSj + t: 2
[2.20]
The statistic for testing the null hypothesis that the model fits in the population is referred to as the likelihood ratio (LR) test and is expressed as −2log
L0 = −2logL0 + 2logLa La = n logjΣj + trðΣ−1 SÞ − nðlogjSj + tÞ = n logjΣj + trðΣ−1 SÞ − logjSj − t :
[2.21]
Notice from the last equality in Equation [2.21] that the log-likelihood ratio is simply n × FML. The large sample distribution of the LR test is chi-square with degrees of freedom (df) given by the difference in the number nonredundant elements in Σ and the number of free parameters in the model. The LR chi-square test is used to test the null hypothesis that the population covariance matrix possesses the structure implied by the model against the alternative hypothesis that Σ is an arbitrary symmetric positive definite matrix. In the context of our science achievement example, the LR chi-square statistic indicates that the model does not fit the data (χ2 = 1321.13, df = 10, p < .000). Numerous explanations for the lack of fit are possible including nonnormality, missing data, sample size sensitivity, and incorrect model specification. These are taken up in detail in Chapter 5. In addition to a global test of whether the model fits perfectly in the population, one can also test hypotheses regarding the individual fixed and freed parameters in the model. We can consider three alternative ways to evaluate the fixed and freed elements of the model vis-à-vis overall fit. The first method rests on the difference between the LR chi-square statistics comparing a given model against a less restrictive model. A less restrictive model can be formed by freeing one of the currently restricted paths. Recall from Section 2.3.1 that the LR test of the null hypothesis is given as n ∗ FML. This initial null hypothesis, say H01, is tested against the alternative
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 31
Path Analysis—31
hypothesis Ha that Σ is a symmetric positive definite matrix. Consider a second hypothesis, say H02 that differs from H01 in that a single restriction ωj = 0 is relaxed. For example, in Figure 2.1 we may relax the restriction that SES does not affect SCIACH. Note that the alternative hypothesis is the same in both cases and thus will cancel in the algebra. Therefore, the change in the chi-square value can be written as w2 = nðFML1 − FML2 Þ,
[2.22]
where the distribution of Δχ2 is distributed as chi-square with degrees-offreedom equaling the difference in degrees of freedom between the model under H01 and the less restrictive model under H02. In the case of a single restriction described here, the Δχ2 test is evaluated with one degree of freedom. The second method of evaluating the components of the model concerns whether the restrictions placed on the model hold in the population. Denote sðΩÞ =
∂logLðΩÞ ∂Ω
[2.23]
as the score vector representing the change in the log-likelihood for a change in Ω. For the estimated parameters in the model, the elements of s(Ω) will be zero, because at the maximum of the likelihood, the vector of partial derivative is zero. However, for the restricted elements in Ω, say Ωr, the partial derivatives will only be zero if the restrictions hold exactly. If the restrictions don’t hold exactly, which would almost always be the case in practice, then the maximum of the likelihood would not be reached and the derivatives would not be zero. Thus, a test can be formed, referred to as the Lagrange multiplier (LM) test, which assesses the validity of the restrictions in the model (Silvey, 1959). The LM test can be written as ^ r Þ0 IðΩ ^ r Þ − 1 ½sðΩ ^ r Þ, LM = ½sðΩ
[2.24]
where I(Ωr) was earlier defined as the information matrix. The LM test is asymptotically distributed as chi-square with degrees of freedom equaling the difference between the degrees-of-freedom of the more restrictive model and the less restrictive model. Again, if one restriction is being evaluated, then the LM test is evaluated with one degree of freedom. The LM test in Equation [2.24] is also referred to as the modification index (Sörbom, 1989). This test is most commonly used for model modification and we will defer that discussion until Chapter 6. Finally, we can consider evaluating the impact of placing restrictions on the unrestricted model. Let r(Ω) represent a set of restrictions placed on a model. In our science achievement example, r(Ω) represents the paths fixed to
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 32
32—STRUCTURAL EQUATION MODELING
zero. The estimates r(Ωr) are zero by virtue of the specification. The question is whether the restrictive model holds for the set of unrestricted estimates, say r(Ωu). In other words, if a small (and perhaps nonsignificant) path coefficient was restricted to be zero (removed from the model), would that restriction hold in the population? If the restrictive model holds, then r(Ωu) should not differ significantly from zero. However, if the restrictive model does not hold, one would expect the elements of r(Ωu) to differ significantly from zero. The test for the validity of restricting parameters is given by the Wald test (W) written as (" # " #)−1 ^ uÞ ^ uÞ ∂rðΩ ∂rð Ω −1 ^ uÞ ^ uÞ : ^ uÞ W = rðΩ rðΩ IðΩ ^u ^u ∂Ω ∂Ω
0
[2.25]
The Wald test is asymptotically distributed as chi-square with degrees of freedom equaling the number of imposed restrictions. When interest focuses on evaluating one restriction, that is, ωj = 0, the W test in Equation [2.25] reduces to Wj =
^ 2j o Varð^ oj Þ
,
[2.26]
where Var(ωj) is the jth diagonal element of the asymptotic covariance matrix of the estimates. In large samples, Wj is distributed as chi-square with one degree of freedom. Note that the square root of Equation [2.26] gives z=
^j o , seð^ oj Þ
[2.27]
which has an asymptotic normal distribution with mean 0 and variance 1. This statistic can also be used to test the null hypothesis that ωj = 0. The LR difference test, the LM test, and the Wald test are known to be asymptotically equivalent (see Buse, 1982; Engle, 1984). Assessing the Impact of Restrictions in the Science Achievement Model Table 2.2 presents the z-tests (denoted as EST/S.E. in Mplus) for each estimated path in the science achievement model. It can be seen that with the exception of the regressions of UNDERSTD on CETSCI, SCIGRA10 on
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 33
Path Analysis—33
CERTSCI, SES on CERTSCI, and the correlation of CETSCI with SCIGRA6, all paths are statistically significant. As noted above, we can evaluate the effect of restricting one of these paths on the overall LR chi-square test by simply squaring the specific z-value of interest. So, for example, restricting the regression of SCIGRA10 on UNDERSTD (z = 11.315) to zero, we would expect the LR chisquare test to increase by z2 = 128.029. This would indicate a significant decrement in the overall fit of the model. Similarly, if we wished to restrict the path from UNDERSTD to CERTSCI (z = −0.929), the resulting change in the LR chi-square test would be z2 = 0.0841, which is not a significant decrement to model fit.
2.5 Interpretation of Model Parameters An important, though somewhat neglected practice in structural equation modeling is the interpretation of the structural coefficients. Indeed, an earlier review of substantive studies using structural equation modeling shows that once goodness-of-fit is established, rarely are the structural parameters interpreted (Elliott, 1994) with respect to their substantive meaning. However, if the goal of the model is to move beyond explanation and toward using the model to address specific substantive questions, then interpretation of the parameters is crucial.6 2.5.1 EFFECT DECOMPOSITION To begin, we need a vocabulary for interpreting the coefficients of the model.7 In the terminology of path analysis (e.g., Duncan, 1975), the elements of B and Γ represent the direct effects of the model. As such, they can be interpreted as any other type of regression coefficient. For example, an element of B gives the increase in the endogenous variable, for a unit increase in the value of another endogenous variable. Similarly, an element of Γ gives the increase in the endogenous variable for a unit increase in the exogenous variable. In either case, the direct effect of one variable on another is not mediated by another variable. If the metrics of the exogenous and endogenous variables of interest are substantively meaningful, then the meaning of these increases is straightforward. To take an example from our science achievement model, the regression coefficient relating SCIGRA10 to SCIGRA6 is a direct effect in the model and is contained in the Γ matrix. Similarly, the regression coefficient relating SCIACH to SCIGRA10 is also direct effect but is contained in the B matrix. In addition to interpreting the direct effects, we can make use of Equations [2.27] or [2.28] to assess the statistical significance of the direct effects.
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 34
34—STRUCTURAL EQUATION MODELING
In addition to the direct effects of the model, path analysis allows for the further decomposition of the total and indirect effects. Indeed, the decomposition of effects represents a classic approach to the development of path analysis (see, e.g., Duncan, 1975). It is perhaps pedagogically easier to first consider the decomposition of the total effect. The total effect is the sum of the direct effect and all indirect effects of an exogenous variable on an endogenous variable of interest. From the standpoint of the equations of the model, it is useful to consider the reduced form specification shown earlier in Equation [2.3]. In the context of the reduced form of the model, the coefficient matrix Π1 ≡ ðI − BÞ − 1 Γ is the matrix of total effects. In many respects, an analysis of the total effects and their substantive and/or statistical significance provides the information necessary to further use the model for prediction purposes. That is, often an investigator can isolate a particular endogenous outcome as the ultimate outcome of interest. The exogenous variables, on the other hand, may have clinical or policy relevance to the investigator.8 The mediating variables, then, represent the theorized processes of how changes in exogenous variables lead to changes in endogenous variables. However, the process may be less important in some contexts than the simple matter of the overall effect. An analysis of the total effects can provide this information. If, in a given context, it is important to understand meditating processes, then one needs to consider the indirect effects. An indirect effect is one in which an exogenous variable influences an endogenous variable through the mediation of at least one other variable. For example, it may be of interest to determine if students with higher science grades and higher science achievement scores are associated with teachers who are certified in science by virtue of higher instructional quality. To obtain an expression for the indirect effects recall that the total effect is the sum of the direct and all indirect effects. This then leads to an expression for the indirect effects of exogenous variables on endogenous variables. Specifically, if Γ is the matrix of direct effects of exogenous variables on endogenous variables, and ðI − BÞ − 1 Γ is the matrix of total effects, then it follows that the matrix containing total indirect effects is ðI − BÞ − 1 Γ − Γ. Table 2.3 provides a selected set of effect decompositions for the science achievement model. The total indirect effect of sixth grade reported science grades on science achievement is statistically significant as is the total indirect effect of SES on science achievement. The specific indirect effects of certification to teach science on science achievement are all nonsignificant. The specific indirect effects from UNDERST to science achievement are each statistically significant.
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 35
Path Analysis—35 Table 2.3
Selected Total Indirect and Specific Indirect Estimates for the Science Achievement Model Estimates
SE
Est./SE
Std
StdYX
0.959
0.037
26.208
0.959
0.158
0.294
0.033
8.820
0.294
0.037
Effects from SCIGRA6 to IRTSCI Total indirect Effects from SES to IRTSCI Total indirect
Effects from CERTSCI to IRTSCI Total indirect
−0.055
0.048
−1.151
−0.055
−0.005
Specific indirect IRTSCI SCIGRA10 CERTSCI
−0.050
0.048
−1.038
−0.050
−0.004
IRTSCI SCIGRA10 UNDERSTD CERTSCI
−0.006
0.007
−0.926
−0.006
−0.001
IRTSCI SCIGRA10 CHALLG UNDERSTD CERTSCI
0.000
0.000
0.839
0.000
0.000
0.021
−1.958
−0.041
−0.008
Effects from CHALLG to IRTSCI Total indirect
−0.041
Effects from UNDERSTD to IRTSCI Total indirect
0.194
0.018
10.834
0.194
0.045
Specific indirect IRTSCI SCIGRA10 UNDERSTD
0.207
0.019
10.786
0.207
0.048
−0.013
0.007
−1.955
−0.013
−0.003
IRTSCI SCIGRA10 CHALLG UNDERSTD
2.5.2 STANDARDIZED SOLUTIONS When observed variables have different or arbitrary scales, it is often necessary to use standardized coefficients to aid interpretation. Considering the
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 36
36—STRUCTURAL EQUATION MODELING
science achievement model, such variables as SES and SCIACH have understandable and usable metrics. In cases where the metrics are not readily interpretable, or perhaps arbitrary, it is useful to adopt a new metric for the variables so as to yield substantively interesting interpretations. One such approach to the problem is to standardize the structural parameters of the model. In the context of path analysis, consider an unstandardized path coefficient for an element of Γ, say γpq. Then, the standardized element is obtained as
^gpq
! ^ yp s g^ , = ^ xq pq s
[2.28]
^ xq are the variances of yp and xq, respectively. Note that s ^ yp and s ^ yp where s is obtained from a specific estimated diagonal element of the upper left-hand ^ ΦΓ ^ 0 + ΨÞðI ^ − BÞ ^ − 1 ðΓ ^ 0 − 1 , whereas partition of Equation [2.5], namely ðI − BÞ ^ in Equation [2.5]. Similarly, for ^ xq is obtained from a diagonal element of s an element of B, the standardized coefficient is given as
^ y0 ^ ^ 0 = s b 0: b pp ^ y pp s
[2.29]
Standardized coefficients can also be obtained for indirect, and total effects of the model. In the context of our science achievement example, an inspection of the standardized solutions in Table 2.2 indicate that holding constant student SES and teacher certification in science, the strongest direct effect in the model is the relationship between previous science grades and current science grades. In terms of the relations these variable have to science achievement, it appears that the indirect effect of previous science grades through current science grades is the strongest predictor of science achievement (see Table 2.3). Direct and indirect effects of student reported instructional quality on science achievement are moderate to weak. It is important to note that this interpretation is based on the initial specification of the model, which we pointed out does not fit the data as evidenced by the LR chi-square. As we will see in Chapter 5, evidence of poor fit may result in biased parameter estimates so our interpretations must be taken with caution. At this point in our discussion, we have not modified the model on the basis of statistical and/or substantive considerations. Model modification constitutes an important component of the conventional approach to structural equation modeling. In Section 2.4 we discussed the use of the Lagrange multiplier
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 37
Path Analysis—37
as a method that allows us to assess the validity of restrictions in the model. In Chapter 5, we examine the Lagrange multiplier in conjunction with other information as a means of modifying the initial model.
2.6 Conclusion This chapter introduced the basics of path analysis. We covered issues of identification, estimation, and testing as well as provided a substantive example. Although additional topics addressing assumptions and model testing are taken up in later chapters, the steps used in this example are characteristic of the conventional approach to structural equation modeling. Namely, a model was postulated that represents causal relationships implied by the inputprocess-output theory. Next, data were collected and variables were chosen that represent the theoretical constructs of interest. Finally, model parameters were estimated and tested as was the overall fit of the model. Throughout this chapter, an underlying assumption was that the variables were measured without error. This, of course, is a heroic assumption in most situations. Moreover, consequences of violating this assumption are fairly well known—namely measurement error in our exogenous and endogenous variables can attenuate regression coefficients and induce biased standard errors, respectively (Duncan, 1975; Jöreskog & Sörbom, 2000). Ideally, the goal would be to conduct path analysis on error-free measures. One approach to obtaining errorfree measures of theoretical variables is to develop multiple measures of the underlying construct and eventually employ the construct directly into the path analysis. In the next chapter, we address the issue of validating measures of our theoretical variables via the method of factor analysis. This discussion then leads into Chapter 4, which combines factor analysis and path analysis into a comprehensive structural equation methodology.
Notes 1. Path diagrams do not represent the theory or even the theoretical model, assuming there is one. Rather, the path diagram is a pictorial representation of a statistical model of the data. 2. In this chapter, I will use standard econometric terminology for describing path models. Thus, the terms endogenous variables and exogenous variables are terms derived from econometrics. Other related terms are dependent variables and independent variables or criterion variables and predictor variables, respectively. Moreover, I will use notation similar to that used in LISREL (Jöreskog & Sörbom, 2000).
02-Kaplan-45677:02-Kaplan-45677.qxp
6/24/2008
8:19 PM
Page 38
38—STRUCTURAL EQUATION MODELING 3. This and other path diagrams were drawn using AMOS 4.0. 4. Perhaps more accurately, the parameter vector Φ could be omitted from this list. The parameter vector Φ contains the variances and covariances of the exogenous variables and is not structured in terms of model parameters. In fact, estimates in Φ will be identical to the corresponding elements in the sample covariance matrix S. 5. The focus of attention on these estimators does not result in a loss of generality. Indeed, the maximum likelihood estimator that is discussed is referred to as FIML (full information maximum likelihood) in the econometrics literature. Moreover, for recursive models, two-stage least squares and unweighted least squares are identical. 6. This is not to suggest that goodness-of-fit is unimportant. Indeed, serious lack of fit may be due to specification errors that would, in turn, lead to biased structural coefficients. However, the evidence suggests that goodness-of-fit dominates the modeling effort with little regard to whether the model estimates are sensible or informative. 7. In this section, we will focus on terminology typically encountered in social and behavioral science applications of structural equation modeling. The literature on causal inference in structural equation models makes clear distinctions between statistical parameters and causal parameters. We defer that discussion to Chapter 11. 8. Implicit here is the idea that the exogenous variables of interest are truly exogenous. The issue of exogeneity is taken up in Chapter 5.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 39
3 Factor Analysis
I
n Chapter 2, we outlined the method of path analysis as an approach to understanding relationships between a set of observable variables described by systems of equations. We noted in Chapter 2, as well as in the introductory chapter, that theory often dictates what to measure but not exactly how to measure it. Yet, the development of scales that map theoretical variables into number systems constitutes arguably the most important step in the modeling process. In this chapter, we will consider the measurement of underlying constructs via the method of factor analysis. We consider first the unrestricted factor model, including principal components analysis (PCA). The discussion of statistical hypothesis testing of the number of factors in the unrestricted model will then lead into a discussion of the restricted model as a method of statistically testing the factor structure underlying a set of measurements. We consider the specification of the restricted model as well as issues of identification, estimation, and testing.
3.1 Model Specification and Assumptions The example used throughout this chapter explores the factor structure of student perceptions of school climate. This problem has important implications for the input-process-output model not only because student perceptions are important education indicators in their own right but they may also be predictive of achievement. The data for this example come from the responses of a sample of public school 10th grade students to survey items in the National Educational Longitudinal Study (NCES, 1988).1 Table 3.1 defines the items in the questionnaire. After mean imputation of missing data, the 10th grade sample of students was 12,669.2 39
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 40
40—STRUCTURAL EQUATION MODELING Table 3.1
Variables Used to Measure Students Perceptions of School Climate
Label
Variable
GETALONG
Students get along well with teachers
SPIRIT
There is real school spirit
STRICT
Rules for behavior are strict
FAIR
Discipline is fair
RACEFRND
Students make friends with students of other racial and ethnic groups
DISRUPT
Other students often disrupt class
TCHGOOD
The teaching is good
TCHINT
Teachers are interested in students
TCHPRAIS
When I work hard on schoolwork, my teachers praise my effort
TCHDOWN
In class I often feel “put down” by my teachers
STUDOWN
In school I often feel “put down” by other students
LISTEN
Most of my teachers really listen to what I have to say
FEELSAFE
I don’t feel safe at this school
IMPEDE
Disruptions by other students get in the way of my learning
MISBEHAV
Misbehaving students often get away with it
SOURCE: From the National Educational Longitudinal Study (NCES, 1988). NOTE: Survey items are measured on a 4-point scale ranging from strongly agree to strongly disagree.
Although it may be the case that a researcher has a particular model relating student perceptions of school climate to achievement in mind, of priority is the measurement of the constructs that are incorporated into the model. The researcher may postulate that there are several important dimensions to student perceptions. The question is whether a set of measurements that asks students to rate their agreement to statements about the climate of the school correlate in such a way as to suggest the existence of the factors in question. The model used to relate observed measures to factors is the linear factor analysis model and can be written as x = Λx ξ + δ,
[3.1]
where x is q × 1 vector of observed responses on q questions that are assumed to measure student perceptions of school climate for N students, Λx is a q × k matrix of factor regression weights (loadings), ξ is a k × 1 vector of common
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 41
Factor Analysis—41
factors that are mathematical instantiations of the underlying dimensions of student perceptions, and δ is a q × 1 vector of unique variables that contain both measurement error and specific error to be described below. Equation [3.1] expresses the observed variables in terms of a weighted set of common factors and a vector of unique variables. It is convenient to invoke assumptions that are common to linear models— namely that EðξÞ = 0, EðδÞ = 0,
and Covðξ,δÞ = 0:
Under these assumptions, the covariance matrix of the observed data can be written in the form of the fundamental factor analytic equation, Σ = Covðxx0 Þ = Λx Eðξξ0 ÞΛ0x + Eðδδ0 Þ = Λx ΦΛ0x + Θδ ,
[3.2]
where Σ is a q × q population covariance matrix, Φ is a k × k matrix of factor variances and covariances, and Θδ is a q × q diagonal matrix of unique variances. Note that Equation [3.2] can be considered a special case of the general model given in Equation [1.2] in Chapter 1 where the parameter vector Ω contains the parameters of the factor analysis model.
3.2 The Nature of Unique Variables Before moving on to the problem of identification and estimation of the parameters of the model in Equation [3.2], it would be useful to consider the nature of the unique variables contained in the vector δ of Equation [3.1]. The unique variables that constitute the elements of δ do not contain only measurement error. To see this, consider the model in Equation [3.1] for a vector of true scores t rather than observed scores x. According to classical true score theory (e.g., Lord & Novick, 1968), the vector of true scores are defined as t = x − e,
[3.3]
where e represents pure measurement error. It is reasonable to assume that the factor model that holds for the observed scores also holds for the true scores.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 42
42—STRUCTURAL EQUATION MODELING
However, the factor model for the true scores will not contain measurement error but will contain specific error due to the particular selection of variables in the model. The factor analytic model for the true scores can be written as t = Λx ξ + s:
[3.4]
The vector s contains specific variances, defined as the variances in the true scores that are due to the particular selection of variables (see Harman, 1976). Inserting Equation [3.4] into Equation [3.3], we see that the uniqueness term is δ = s + e. Despite the fact that unique variance is composed of specific variance and error variance, we typically assume that specific variances are small relative to measurement error variance.
3.3 Identification in the Unrestricted Factor Model Prior to presenting estimation of the unrestricted model, it is necessary to discuss parameter identification. The issue of identification was raised in Chapter 2 and concerns whether a unique solution to the parameters of a model exist. In the case of factor analysis, let us consider Equation [3.2] with the subscripts removed for simplicity. The basic problem of identification in the context of factor analysis concerns rotational indeterminacy—namely the fact that we can rotate the solution to a factor analysis in an infinite number of ways and obtain the same solution. Following Lawley and Maxwell (1971) define a k × k nonsingular orthogonal transformation matrix T. The properties of T are such that TT0 = TT − 1 = I, a k × k identity matrix. It can be shown that if Λ = ΛT, Φ = T−1 ΦT−1 ,
and Θ = Θ,
then 0
Σ = Λ Φ Λ + Θ = ΛTT−1 ΦT−1 T0 Λ0 + Θ = ΛΦΛ0 + Θ = Σ.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 43
Factor Analysis—43
This shows that any orthogonal transformation of the system will give rise to the same covariance matrix. When this is the case, we say that the model is not identified and that there are k × k = k2 indeterminancies that must be removed in order for there to be a unique solution to the factor model. The k2 elements correspond to the dimension of the transformation matrix T. To see how the identification problem is handled, first consider the case of orthogonal factors—that is, factors that are not correlated. For the orthogonal factor case, Φ = I. When there is only one factor, that is, k = 1, setting Φ = I (or φ = 1) removes the k2 indeterminancies completely. No orthogonal transformation of the system is possible, and the parameters are uniquely identified. This is the reason that we cannot rotate one factor. When k = 2 then k2 = 4 and setting Φ = I removes k(k + 1)/2 = 3 indeterminancies, leaving one remaining indeterminacy. Finally, we can consider the general case of k ≥ 2. Again, with Φ = I, we have removed k(k + 1)/2 indeterminancies leaving k2 − k(k + 1)/2 = k(k − 1)/2 indeterminancies to be removed. We can see from the above discussion that simply setting Φ = I does not remove all the indeterminancies in the model except in the case when there is only one factor (k = 1). The remaining k(k − 1)/2 must be placed in the elements of Λ. The manner in which these remaining restrictions are imposed depends on the method of estimation that is used. For the most part, the differences among estimation methods in the manner in which these restrictions are imposed are arbitrary. However, given that the restrictions are imposed in an arbitrary fashion to simply fix the reference factors, this arbitrariness can be exploited for purposes of factor rotation. This topic is further elaborated on below.
3.4 Nonstatistical Estimation in the Unrestricted Model: Principal Components Analysis and the Common Factor Model What has been covered so far concerns issues of identification of the parameters irrespective of the method of factor extraction. We can now turn to the problem of factor extraction directly. We consider principal components analysis and the common factor model, both of which use the method of principal axis factoring for factor extraction. We then briefly consider two statistical methods of factor analysis that rest on the common factor model, namely generalized least squares and maximum likelihood estimation. These estimation methods were discussed in greater detail in Chapter 2. Each method is applied to the problem of estimating the factors of student perception of school climate.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 44
44—STRUCTURAL EQUATION MODELING
3.4.1 PRINCIPAL AXIS FACTORING There are many ways in which factors can be extracted from a covariance matrix (see, e.g., Mulaik, 1972). The method of principal axis factoring is perhaps the most common. Principal axis factoring seeks to transform the original set of variables into a new set of orthogonal variables that retains the total amount of variance in the observed variables. Principal axis factoring does not assume a measurement model for the data per se but simply constitutes a mathematical transformation of the original variables. To fix ideas, consider a population covariance matrix Σ for a set of q observed variables. Following Tatsuoka (1988), a variance maximizing transformation of the original variables can be accomplished through the solution of the eigenvector/eigenvalue equation ðΣ − lIÞu = 0,
[3.5]
where λ are the characteristics roots, or eigenvalues, of Σ, and the vector u corresponds to the eigenvectors or principal axes of Σ. For there to be a nontrivial solution to Equation [3.5] (i.e., u ≠ 0), values for λ must be found that satisfy the determinental equation jΣ − lIj = 0:
[3.6]
The solution of Equation [3.6] is a determinental polynomial of order q. Given that Σ is symmetric and assumed to be of full rank, then all of the eigenvalues are real and positive (Harman, 1976). The matrix u is orthogonal, implying that u′u = uu′ = I and u−1 = u′. Thus, from Equation [3.5] it can be shown that Σ = uDu0 = uD1=2 D1=2 u0
[3.7]
0
= u u ,
where D is a diagonal matrix containing the p eigenvalues of Σ and u* = uD1/2. 3.4.2 PRINCIPAL COMPONENTS ANALYSIS It is often the case in an exploratory study that a researcher will use PCA as an initial approach to data reduction. PCA is not, technically, within the class of unrestricted factor models. However, PCA will provide results that are often quite similar to factor analysis and is included in many statistical packages as the default. Thus, we are including PCA within our discussion of the unrestricted factor models.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 45
Factor Analysis—45
A major difference between PCA and other forms of unrestricted factor analysis lies in the assumptions made about the existence of measurement error. Specifically, PCA does not assume that the variables are measured with error. Rather, PCA simply transforms the original set of measurements into orthogonal components retaining the original amount of variance in the data. Factor analysis, by contrast, specifically models measurement error, and extracts factors that account for maximum common variance in the observed variables. Often, PCA is used as a factor analysis model and a decision is made to retain fewer principal components for future rotation and interpretation.3 This distinction is made more explicit below. Consider now the construction of the new set of q × 1 principal components, formed by the linear combination of the original variables x weighted by the eigenvectors u. That is, z = u0 x:
[3.8]
When PCA is used as a factor analytic method, z is treated as analogous to ξ in Equation [3.1]. From here, we can obtain the variance of z as V ðzÞ = V ðu0 xÞ = u0 V ðxÞu
[3.9]
= u Σu: 0
But from Equation [3.7] Σ = uDu0 . Making use of the fact that u is an orthogonal matrix, Equation [3.9] can be written as V ðzÞ = u0 uDuu0
[3.10]
= IDI = D:
From Equation [3.10], we can see that the principal components are orthogonal to each other and that the variances of the principal components are the eigenvalues of Σ. In the process of extracting the principal components of Σ, the principal components are ordered in terms of decreasing size of their eigenvalues. With all components retained, the total variance of the principal components is equal to the total variance of the original variables. That is, q X i=1
szi =
q X i=1
li = trðΣÞ =
q X i=1
syi ,
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 46
46—STRUCTURAL EQUATION MODELING
where σzi is the variance of the ith principal component, σyi is the variance of the ith original variable, and “tr” is the trace operator. In the typical application of PCA as a factor analysis method, usually m < q principal components are retained because they account for a substantively important amount of the total variance. This is discerned using the fact that l1 + l2 + + lm trðΣÞ
provides a measure of the amount of variance explained by the first m principal components. Thus, in practice, the investigator might choose m principal components such that the relative variance is large, and then will work with the new m × 1 vector of principal components. It may also be the case in practice that an investigator will use standardized principal components, defined as z = D1=2 z:
Using Equation [3.10], it can be shown that V ðz Þ = I
as expected from standardized orthogonal variables. An Example Principal Components Analysis Initially, it might be useful to obtain information about the percent of variance accounted for by each component. As noted above, this is given by the ratio of the eigenvalue to the total trace variance. In the case of the correlation matrix, the trace equals total number of variables. Table 3.2 presents the decomposition of the total variance in terms of the principal components of the data.4 It can be seen that approximately two thirds of the total variance in the variables is accounted for by the first two principal components, and these two components are associated with eigenvalues greater than 1.0. The scree plot in Figure 3.1, which plots the eigenvalues against the number of components, also suggests two factors. These results point to two factors underlying the data as might be suspected from the wording of the items.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 47
Factor Analysis—47 Table 3.2 Component
Eigenvalue Decomposition of Student Perception Data Total
% of Variance
Cumulative %
1
8.408
56.055
56.055
2
1.422
9.483
65.538
3
0.645
4.299
69.837
4
0.534
3.557
73.394
5
0.482
3.214
76.609
6
0.450
3.002
79.610
7
0.424
2.825
82.436
8
0.409
2.724
85.160
9
0.389
2.592
87.752
10
0.371
2.475
90.228
11
0.339
2.260
92.488
12
0.322
2.148
94.636
13
0.295
1.964
96.599
14
0.259
1.724
98.323
15
0.252
1.667
100.000
NOTE: Extraction method: principal component analysis.
3.4.3 THE COMMON FACTOR MODEL As noted above, the principal axis method and specifically PCA do not assume a measurement model for the original data. In other words, in PCA, it is assumed that the variables are measured without error. Clearly, this is an unrealistic assumption in most applications and the issue is how to extract factors that explicitly takes into account measurement error. To begin, let us again consider the model in Equation [3.1]. We can consider the model for the vector of observed responses as arising from two parts: a common part that relates variables to each other and is assumed to have a set of common factors underlying their relationships, and a unique part that corresponds mostly to measurement error. In extracting factors, the goal is to consider the common part. However, on close examination we run into the problem that the number of unobserved variables (the k common factors plus the q unique variables) is larger than q observed variables. Thus, the common factor model is indeterminant.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 48
48—STRUCTURAL EQUATION MODELING
8
Eigenvalues
6
4
2
0 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15
Component Number
Figure 3.1
Scree Plot From Principal Component Analysis of Student Perceptions of School Climate
The issue of indeterminancy was considered by Thurstone (1947). Thurstone’s resolution to the problem was to insist that the number of common factors extracted from a correlation matrix be determined by the rank of the correlation matrix after appropriate estimates of common variance were inserted in the diagonal of the matrix. The common variances are referred to as communalities. As noted by Mulaik (1972, p. 136), the communality problem can be considered from the standpoint of principal axis factoring. Specifically, it is known that a symmetric matrix with a dominant principal diagonal, such as a correlation matrix, is positive semidefinite—also referred to as Gramian. The rank of a Gramian matrix is equal to the number of positive eigenvalues. In the common factor model, Thurstone required that the number of common factors be determined by the rank of the correlation matrix after replacing the diagonal elements with communality estimates—with the proviso that the resulting correlation matrix retain the Gramian property. The problem is that
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 49
Factor Analysis—49
when the diagonal elements are altered, it is not necessarily the case that the Gramian property will hold. The question for Thurstone was how to estimate communalities that resulted in an altered correlation matrix that remained Gramian. If the communalities were underestimated, then it would be possible that the resulting rank would be too high. Conversely, if the communalities were overestimated then the factors would account for more than simply the off-diagonal correlations and the rank would not be reduced (Mulaik, 1972, p. 136). Therefore, the goal was to choose estimates of communalities that resulted in minimal rank under the required Gramian conditions. With regard to the choice of communality estimates, the major work on this problem can be traced to Guttman (1954, 1956). Guttman provided three different estimates of communality that would result in lower bounds for minimum rank. In the interest of space, and given the fact that more recent statistical approaches have rendered the problem of communality estimation somewhat moot, we consider the most typical form of communality estimation—namely the use of the squared multiple correlation (SMC). The SMC of the qth variable with the other q − 1 variables was shown by Guttman to be the best estimate of the lower bound for communality. The idea was to compute the SMCs and insert them into the diagonal of the sample correlation matrix R and then subject the correlation matrix to principal axis factoring. Before factoring, however, it is necessary to slightly adjust offdiagonal elements to preserve the required Gramian property. An early method, suggested by Wrigley (1956), involved updating the analysis after each factor extraction. That is, for a given an a priori number of factors chosen, and with SMCs in the diagonal, the correlation matrix is subjected to principal axis factoring. Next, the sum of the squared estimated ^0 Λ ^ factor loadings from the first factoring, say Λ 1 1 , are used as “updated” communality estimates. These updated estimates are now inserted into the diagonal of the correlation matrix. This process continues until the difference between the current communality estimates and the previous communality estimates is less than an arbitrary constant. Once convergence is obtained, the final iterated solution is presented for interpretation. What we have described here is the method of iterated principal axis factoring and is found in most commonly used statistical software packages.
3.5 Rotation in the Unrestricted Model As discussed in Section 3.3, the manner in which the indeterminancies are removed for identification purposes is designed to provide an initial set of estimates. In other words, without dealing with the identification problem as
03-Kaplan-45677:03-Kaplan-45677.qxp
6/25/2008
10:35 AM
Page 50
50—STRUCTURAL EQUATION MODELING
described, no solution would be possible. However, because the initial set of estimates is completely arbitrary, another set of k2 restrictions can be chosen by rotating the initially unrotated solution. To see this, consider once again the orthogonal transformation matrix T. For the k = 2 case, this matrix is of the form
cos y T= − sin y
sin y : cos y
[3.11]
Note that this matrix has one parameter—namely, the angle of rotation θ. With the angle of rotation chosen, the last remaining indeterminacy is removed. In the general case, setting Φ = I removes k(k + 1)/2 indeterminancies, leaving k(k − 1)/2. These remaining indeterminancies can be solved by choosing k(k − 1)/2 angles of rotation. The decision to rotate the solution usually rests on a desire to achieve a simple structure representation of the factors (Thurstone, 1935, 1947). Simple structure criteria are designed to reduce the complexity of the variables—that is, the number of factors that the variable is loaded on. Ideally, under simple structure, each variable should have a factor complexity of one—meaning that a variable should load on one, and only one, factor. Methods of factor rotation aid in achieving this goal. As we will see, only with the advent of the restricted factor analysis model, do we have a rigorous approach to testing simple structure hypotheses. 3.5.1 ORTHOGONAL ROTATION: THE VARIMAX CRITERION In the interest of space, I will not review all of the methods of orthogonal factor rotation (see, e.g., Mulaik [1972] for a complete discussion). Suffice to say that the most popular method of orthogonal rotation is based on the varimax criterion (Kaiser, 1958). The basic idea behind the varimax criterion is that after rotation, the resultant loadings on a factor should be either large or small relative to the original loadings. Kaiser’s (1958) original method was based on an iterated approach that rotated pairs of factors at a time. Horst (1965) offered a “simultaneous varimax” solution that rotated all factors simultaneously. Following Lawley and Maxwell (1971), let Λ = ΛT,
and dj =
q X j=1
l2 ij ,
j = 1, . . . , q,
03-Kaplan-45677:03-Kaplan-45677.qxp
6/25/2008
10:35 AM
Page 51
Factor Analysis—51
where dj is the sum of the squared loadings for the kth column of Λ*. Then, varimax maximizes
V=
" q k X X i=1
j=1
# ðλ2 ji
2
− dj =qÞ :
[3.12]
Essentially, Equation [3.12] shows that the varimax criterion maximizes the sum of squared deviations of the squared loadings from the corresponding column mean. As shown in Lawley and Maxwell (1971, p. 73), this amounts to maximization with respect to the elements of the transformation matrix T. Once maximized, T contains elements whose angles of rotation satisfy the varimax criterion. 3.5.2 OBLIQUE ROTATION: THE PROMAX CRITERION We now consider the problem of oblique factors. In this case, we begin by setting diag(Φ) = I. This removes k indetermancies, leaving k2 − k = k(k − 1) left to remove. One approach to the problem would be to first orthogonally rotate to a set of loadings, say Λ(1) using, for example, varimax. Then, find a new set of loadings, say Λ(2), corresponding to a new Φ, say Φ(2), such that Φ(2) has unit diagonal elements and nonzero off-diagonal elements. The result is an oblique solution yielding correlations among the factors contained in Φ(2). The method just described was developed by Hendrickson and White (1964) and is referred as promax. Again, in the interest of space, I will not review the numerous methods of oblique factor rotation. See Mulaik (1972) for a complete discussion. 3.5.3 AN EXAMPLE PRINCIPAL AXIS FACTORING WITH PROMAX ROTATION Table 3.3 presents the promax rotated factor loadings based on principal axis factoring based on the extraction of two factors. An inspection of the loadings in combination with the meaning of the items suggests two interpretable factors. Factor 1 can be labeled POSITIVE SCHOOL CLIMATE and factor 2 can be labeled NEGATIVE SCHOOL CLIMATE. An inspection of the promax rotated factor correlation matrix in Table 3.3 shows that the positive and negative school climate factors are highly correlated. The fact that the correlation is positive reflects the scaling of the items. An inspection of Tables 3.4 shows the results of maximum likelihood estimation and promax rotation of the unrestricted student perceptions data. It can be seen that the substantive interpretation of the results are basically the same as the principal axis factoring results. Moreover, note that the chi-square
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 52
52—STRUCTURAL EQUATION MODELING Table 3.3
Promax Rotated Principal Axis Factor Loadings and Factor Correlations Positive Climate
Negative Climate
GETALONG
0.797
0.077
SPIRIT
0.673
0.128
STRICT
0.425
0.326
FAIR
0.665
0.104
RACEFRND
0.589
0.219
DISRUPT
0.155
0.651
TCHGOOD
0.863
−0.025
TCHINT
0.923
−0.104
TCHPRAIS
0.823
−0.028
PUTDOWN
−0.102
0.854
STUDOWN
0.082
0.678
LISTEN
0.870
−0.054
−0.027
0.748
IMPEDE
0.138
0.619
MISBEHAV
0.024
0.748
1
2
FEELSAFE
Promax Factor Correlations
1
1.000
2
0.713
1.000
tests for maximum likelihood leads to the conclusion that the three-factor model does not fit the data. This could be affected by sample size and nonnormality. However, the root mean square error of approximation along with the 90% confidence interval suggests approximately good fit of the model. Issues of goodness-of-fit will be taken up in Chapter 4.
3.6 Statistical Estimation in the Unrestricted Model: Maximum Likelihood and Generalized Least Squares Methods Up to now, focus of attention was on methods of extraction that did not require assumptions regarding the distribution of the data. These were essentially
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 53
Factor Analysis—53 Table 3.4
Promax Rotated Maximum Likelihood Factor Loadings and Factor Correlations Positive Climate
Negative Climate
GETALONG
0.745
0.150
SPIRIT
0.628
0.174
STRICT
0.392
0.367
FAIR
0.624
0.160
RACEFRND
0.545
0.272
DISRUPT
0.144
0.659
TCHGOOD
0.821
0.034
TCHINT
0.889
−0.044
TCHPRAIS
0.786
0.024
PUTDOWN
−0.110
0.856
STUDOWN
0.081
0.679
LISTEN
0.831
0.001
−0.029
0.746
IMPEDE
0.134
0.621
MISBEHAV
0.028
0.741
1
2
FEELSAFE
Promax Factor Correlations
1
1.000
2
0.674
1.000
χ2 (36 df) = 2639.403, p < .05 RMSEA = 0.052, p < .059 (90% CI = 0.050, 0.053)
nonstatistical methods of extraction. Perhaps the most important breakthrough in the statistical estimation of factor analysis was the use of maximum likelihood estimation proposed by Lawley (1940, 1941; see also Lawley & Maxwell, 1971). A feature of maximum likelihood estimation of the common factor model is that one can explicitly test the hypothesis that there are k common factors that underlie the data. Statistically, the method follows that considered in Chapter 2.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 54
54—STRUCTURAL EQUATION MODELING
As discussed in Chapter 2, the large sample distribution of the likelihood ratio test is chi-square with degrees of freedom given by the difference in the number nonredundant elements in Σ and the number of free parameters in the common factor model that need to be estimated. In the context of factor analysis, first note that to solve the identification problem discussed in Section 3.3, maximum likelihood estimation requires that Λ0 Θ − 1 Λ be diagonal. This has the effect of imposing k(k − 1)/2 constraints on the model. The number of free parameters to be estimated are then given as qk + q − k(k − 1)/2. Thus, the degrees of freedom are given by df = 12 q(q + 1) − [qk + q − 12 k(k − 1)] = 12 [(q − k)2 − (q + k)].
[3.22]
Once the parameters are estimated, the maximum likelihood solution can be further rotated to attain greater interpretability. The problem of factor rotation is described in Section 3.4.6. In addition to maximum likelihood, the method of generalized least squares can also be used to estimate the parameters of the factor model. The generalized least squares estimator was originated by Aitken (1935) but was applied to the factor analysis setting by Jöreskog and Goldberger (1972).
3.7 The Restricted Factor Model: Confirmatory Factor Analysis Up to this point, the discussion has centered on the unrestricted model— commonly referred to as exploratory factor analysis. The unrestricted model is contrasted in this section to the restricted model commonly referred to as confirmatory factor analysis. The term restricted factor analysis reflects a difference in the number and position of the restrictions imposed on the factor space (Jöreskog, 1969). Specifically, in the unrestricted solution we saw that identification is achieved by imposing k2 restrictions on the model. Because those restrictions are arbitrary, the solution can also be rotated to achieve simple structure. Yet, regardless of the rotation, the factor model will yield the same fit to the observed covariance matrix. In a restricted solution, by contrast, usually more than k2 restrictions are imposed. These restrictions are imposed on the elements of Λ in a manner that reflects an a priori hypothesis of simple structure. As a result, it is not possible to rotate the restricted model because doing so would destroy the positioning of the restrictions and hence the hypothesis under study.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 55
Factor Analysis—55
3.7.1 IDENTIFICATION IN THE RESTRICTED MODEL The specification as well as the assumptions of the restricted factor model is the same as shown in Equations [3.1] and [3.2]. In addition to the k2 restrictions imposed on the model as described in previous sections, it is also necessary to fix the metric of the latent variables. This can be accomplished in two ways. First, one can standardize the variances of the latent variables to one. This is how the metric is determined in the unrestricted model. When the metric is set in this fashion, then the factor has a mean of 0 and a variance of 1. The second approach to setting the scale to set the each factor’s scale to the scale of one of its indicators. To take the example used in this chapter, we may set the metric of the factor “perception of teacher quality” by fixing its loading on the variable “Teachers are interested in students” to be 1.0. This variable (as well as the others) is measured on a 4-point scale, and thus the factor is in the metric of a 4-point scale. The issue of how the metric of the factor influences interpretation is discussed in the context of the full structural equation model in Chapter 4. In any event, one or the other method of setting the scale of the latent variable must be chosen. Once the scale of the latent variables is determined, the next step is to decide on the pattern of fixed and freed loadings in the model. The fixed loadings are those that are usually fixed to zero (except, as noted before, when the need is to set the scale). The fixed loadings represent a priori hypotheses regarding the simple structure underlying the model. So, to take the perceptions of school climate example, results from the maximum likelihood estimation of the unrestricted model (Table 3.5) suggest that the variable “When I work hard on schoolwork, my teachers praise my effort” has very small loadings on the “perceptions of negative school climate” and “perceptions of misbehavior and disruptions” factors. Thus, in the restricted model, we may wish to fix this loading to zero. Doing so implies that we are hypothesizing that the small loading is exactly zero in the population. After the pattern of fixed loadings in the model have been specified, the remaining loadings are “free” to be estimated. The determination of the fixed loadings, in combination with the number of factors implies additional identification issues. To take an example, consider the case of two indicators loading on one factor. In this case, the number of distinct elements is given as –12 q(q + 1) = 3. After fixing the metric, say by fixing one of the loadings to 1.0, the number of parameters to be estimated is 4 (one loading, two error variances, and one factor variance). Thus, this model obtains −1.0 degrees of freedom, and hence, the model is not identified. With three indicators of one common factor, the number of distinct elements in the covariance matrix is 6, whereas the number of free parameters to be estimated (after setting the metric) is also 6 (two loadings, three error variances, and one factor variance). Thus the three-indicator/one-factor model is just-identified.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 56
56—STRUCTURAL EQUATION MODELING Table 3.5
Maximum Likelihood Estimation of the Restricted Model of Student Perceptions of School Climate Estimates
S.E.
Est./S.E.
Std
StdYX
POSITIVE CLIMATE GETALONG
1.000a
0.000
0.000
0.866
0.859
SPIRIT
0.999
0.010
104.674
0.865
0.762
STRICT
0.899
0.010
87.260
0.778
0.674
FAIR
0.989
0.010
100.728
0.856
0.744
RACEFRND
1.013
0.010
102.665
0.876
0.753
TCHGOOD
1.082
0.009
122.594
0.937
0.838
TCHINT
1.107
0.009
123.147
0.958
0.840
TCHPRAIS
1.026
0.009
112.010
0.888
0.795
LISTEN
1.064
0.009
117.955
0.921
0.820
DISRUPT
1.000a
0.000
0.000
0.903
0.776
PUTDOWN
0.832
0.010
85.759
0.752
0.741
STUDOWN
0.873
0.010
85.167
0.789
0.737
FEELSAFE
0.797
0.010
82.342
0.720
0.716
IMPEDE
0.958
0.011
85.337
0.865
0.738
MISBEHAV
0.993
0.011
89.840
0.897
0.772
56.355
0.765
0.765
NEGATIVE CLIMATE
NEGATIVE CLIMATE with POSITIVE CLIMATE 0.598
0.011
χ (89 df) = 4827.399, p < .05 2
RMSEA = 0.065, p < .050 (90% CI = 0.063, 0.066) CFI = 0.962 TLI = 0.956 a. Fixed to 1.0 to set the metric of the factor.
Finally, consider a two-indicator/two-factor model. Here, the number of distinct elements in the covariance matrix is 10, whereas the number of parameters to be estimated is 9 (two loadings, four error variances, two factor variances, and one factor covariance). Thus, this model is overidentified with one degree of freedom. Note that in this case, if the factor correlation is 0, then the individual two-indicator/one-factor models will not be identified. These identification concerns are nicely discussed in Bollen (1989).
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 57
Factor Analysis—57
3.7.2 TESTING IN THE RESTRICTED MODEL As in the unrestricted case, one can choose many methods of statistical estimation of the parameters of the restricted model. For example, one can use either the method of maximum likelihood discussed in Section 3.5.1 or the method of generalized least squares discussed in Section 3.5.2. In either case, the goals of the estimation procedure are the same—as are the underlying assumptions. The difference between the unrestricted model and the restricted model lies with issues of model testing. Specifically, consider the method of maximum likelihood discussed in Section 3.5.1. Maximum likelihood estimation of the parameters in the restricted model proceeds in much the same way as estimation in the unrestricted case. Unlike the unrestricted case however, wherein we were testing the null hypothesis that there exists k common factors underlying the data, here the likelihood ratio chi-square is used to test the null hypothesis that the specified pattern of fixed and free loadings holds in the population. This hypothesis implies that there are not only k common factors but also that a particular simple structure describes the relationship between the variables and the factors. Thus, the additional restrictions beyond the k2 restrictions necessary to obtain a unique solution will result in greater degrees of freedom compared with the unrestricted model. The degrees of freedom are obtained by subtracting the total number of estimated parameters, say t, from the –12 q(q + 1)distinct elements in Σ. In addition to the likelihood ratio chi-square statistic, a large number of alternative methods of model fit have been developed. These alternative methods of model fit were developed in part due to the sensitivity of the likelihood ratio chi-square test to sample size. These tests are described in more detail in Chapter 6. In addition to a global test of whether the restricted model holds in the population, one can also test hypotheses regarding the individual fixed and freed parameters (loadings, error variances, and factor variances and covariances) in the model. The methods of parameter testing described in Section 2.4 of Chapter 2 can be applied here as well. Restricted Factor Analysis of Student Perceptions of Classroom Climate Figure 3.2 displays the path diagram of the restricted factor model. This model was estimated using maximum likelihood and the results are displayed in Table 3.5. The results are mostly consistent with the findings from the exploratory factor analysis. Not surprisingly, the larger number of restrictions placed on the model (as indicated by the increased degrees of freedom relative to the exploratory factor analysis results) leads to a much larger
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 58
58—STRUCTURAL EQUATION MODELING
likelihood ratio chi-square value. As will be discussed in Chapter 4, the remaining goodness-of-fit indices give somewhat contradictory information. Finally, tests of the individual parameters of the model are also displayed in Table 3.5. These tests indicate that each estimate in the model is statistically significant.
3.8 Conclusion In this chapter, we considered the estimation of latent variables that serve to represent theoretical constructs. This activity can serve as an end in its own right—insofar as estimating the latent constructs underlying data provide information regarding the construct validity of our measures. In the next chapter, we consider the merging of factor analysis and path analysis into a comprehensive structural equation methodology and extend the discussion to cover modeling in multiple groups. In that context, the latent constructs serve to dissattenuate path coefficients and their standard errors from the affects of measurement error. However, as we will see, adding a measurement model to a path model can introduce additional complications to the methodology.
GETALNG–LISTEN
1
1
POS CLIMATE
Figure 3.2
DISRUPT–MISBEHAV
NEG CLIMATE
Stylized Path Diagram of Restricted Factor Model of Student Perceptions
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 59
Factor Analysis—59
Notes 1. The sample of students in the NELS survey does not represent a random sample of U.S. students. Rather, the NELS survey sampling scheme provides a proportionally representative sample of schools. Within the schools, classroom teachers are sampled for purposes of course coverage. This is followed by a sample of students within those classrooms. 2. More accurately, imputation based on the mean of nearby points was used. The argument is that because students are nested in schools, it is important to attempt to maintain values that reflect the nesting of students within schools. For this analysis, means based on five nearby points were chosen. Chapter 8 takes up the problem of factor analysis in multilevel settings. 3. This decision may rest on an inspection of a “scree” plot that plots the sizes of the eigenvalues. Other criteria may include the number of eigenvalues exceeding 1.0 or the percent of variance accounted for by the factor. 4. Extraction of the eigenvalues uses the R package prcomp.
03-Kaplan-45677:03-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 60
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 61
4 Structural Equation Models in Single and Multiple Groups
T
his chapter focuses on linking path analysis with factor analysis into a comprehensive methodology typically referred to as structural equation modeling.1 The rationale for linking these two methodologies into a comprehensive framework is that by doing so, we mitigate the problems associated with measurement error thereby obtaining improved parameter estimates both in terms of bias and sampling variability. The improvement resulting from combining path analysis with factor analysis comes with a price, however—namely, adding a measurement model to a path model will often dramatically increase the total number of degrees of freedom available for testing model fit. This is because, as we saw in Chapter 3, the restricted factor model will typically be associated with a large number of restrictions reflecting a simple structure hypothesis underlying the measurement instrument. These added restrictions make it all the more likely that a reasonably well fitting structural part of the model will be rejected due to problems within the measurement model. Moreover, the potential for misspecification in the measurement part of the model owing to these restrictions can, in some circumstances, propagate into the structural part of the model (Kaplan, 1988; Kaplan & Wenger, 1993). We take up these issues in more detail in Chapter 6 when we discuss modeling strategies. Despite these difficulties, structural equation modeling represents an extremely important advancement in statistical modeling when the goal is accurate estimation and inference within complex systems. The organization of this chapter is as follows. First, the basic model specification is presented. This is followed by a discussion of the problem of identification that pertains specifically to structural equation models. Next, we discuss the method of multiple group structural equation modeling as a means of addressing group differences while taking into account
61
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 62
62—STRUCTURAL EQUATION MODELING
problems of measurement error. This section begins by considering the general form of the multiple group structural equation model focusing on multiple group measurement models as a special case.2 Next, this section addresses the special problem of identification in the multiple group case. This is followed by a brief discussion of estimation in the multiple group case. The problem of testing is addressed next where the discussion will focus on a variety of strategies for assessing group differences in the latent variable context. From here, we discuss alternative methods for assessing group differences. In particular, we focus attention on the multiple indicators/multiple causes (MIMIC) approach to modeling group differences in latent variables. Finally, the chapter closes with a discussion of the problem of drawing causal inferences in studies of group differences in latent variable models. This chapter does not cover issues of estimation and testing in structural equation models because these issues are essentially the same as the estimation and testing issues covered in Chapters 2 and 3. An exception includes the issue of mean structure estimation that is discussed in Section 4.5.
4.1 Specification of the General Structural Equation Model We consider in this section the specification of the general structural equation model for continuous latent variables—linking the measurement model as described in Chapter 3 with the path analytic model described in Chapter 2. To fix notation, define the full structural model as follows: η = Bη + Γξ + ζ,
[4.1]
where η is an m × 1 vector of endogenous latent variables, ξ is a k × 1 vector of exogenous latent variables, B is an m × m matrix of regression coefficients relating the latent endogenous variables to each other, Γ is an m × k matrix of regression coefficients relating endogenous variables to exogenous variables, and ζ is an m × 1 vector of disturbance terms. The latent variables are linked to observable variables via measurement equations for the endogenous variables and exogenous variables. These equations are defined as y = Λy η + ε
[4.2]
x = Λx ξ + δ,
[4.3]
and
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 63
Structural Equation Models in Single and Multiple Groups—63
where Λy and Λx are p × m and q × k matrices of factor loadings, respectively, and ε and δ are p × 1 and q × 1 vectors of uniqueness, respectively. As noted elsewhere, structural equation modeling is a special case of a more general covariance structure model. Substituting Equation [4.1] into Equation [4.2], the covariance matrix for y and x can be written in terms of the parameters of the full model (Jöreskog, 1977) as Σ=
"
=
Σyy
sym:
Σxy
Σxx
Λy ðI − BÞ−1 ðΓΦΓ0 + ΨÞðI − BÞ0−1 Λy + Θε
Λy ðI − BÞ−1 ΓΦΛ0 x
Λx ΦΓ0 ðI − BÞ0−1 Λ0 y
Λx ΦΛ0 x þ Θδ
# , [4.4]
where Φ is the k × k covariance matrix of the exogenous latent variables, Ψ, is the m × m covariance matrix of the disturbance terms, and Θε and Θδ are the covariance matrices of the uniquenesses ε and δ, respectively. In terms of the parameter vector Ω, we have Ω = ðΛy , Λx , Θe , Θd , Φ, B, Γ, ΨÞ . Note that the lower diagonal element of Equation [4.4], Λx ΦΛ0x + Θd , is the covariance structure for the factor analytic model discussed in Chapter 3.
4.2 Identification of Structural Equation Models The problem of identification of path models and factor models was discussed in Chapters 2 and 3, respectively. Here we discuss identification as it pertains to the full model in Equation [4.1]. The general problem of identification remains the same—namely, whether unique estimates of the parameters of the full model can be determined from the elements of the covariance matrix of the observable variables. When combining the measurement and structural models together into a single analytic framework, a set of new identification conditions can be added to those that have already been presented. To begin, we note that by adding the measurement model to the path model, the identification conditions of the measurement model, specifically that of restricted factor analysis, are required as part of overall identification. In particular, it is essential to set the metric of the latent variables as discussed in Section 3.5 of Chapter 3. In the typical case, the metric of the exogenous latent variables are set by either fixing one loading in each column of Λx to 1.0, or by fixing the diagonal elements of Φ to 1.0. The metric of the endogenous latent variables are typically set by fixing a loading in each column of Λy to 1.0. With the metric of the latent variables determined, we can now consider a set of rules provided by Bollen (1989) that can be used to evaluate the identification status of structural equation models. The first rule is the counting rule
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 64
64—STRUCTURAL EQUATION MODELING
that was discussed in Chapters 2 and 3. To reiterate, let s = p + q, be the total number of endogenous and exogenous variables, respectively. Then, as we showed, the number of nonredundant elements in Σ is equal to –12s(s + 1). If t is the total number of parameters in the model that are to be estimated (i.e., the free parameters), then the counting rule states that a necessary condition for identification is that t ≤ –12s(s + 1) . If the equality holds, then we say that the model is just identified. If t is strictly less than –12s(s + 1) , then we say that the model is overidentified. If t is greater than –12s(s + 1), then the model is not identified. In addition to the counting rule, another simple rule for establishing the identification of structural equation models is based on treating structural equation models as restricted factor analysis models. This method is referred to as the two-step rule (Bollen, 1989). The basic idea can be motivated by considering the latent variable model shown in Figure 4.1. The first step of the two-step rule is to reparameterize the structural equation model as a restricted factor analysis model, recognizing that the elements in Φ can be translated to elements in Γ and B. If the restricted factor model is identified, then the second step focuses on the structural model as though it were among observed variables. If the observed variable model satisfies the identification conditions for path analysis discussed in Chapter 2, then the model as a whole is identified.
SCIGRA10
SCIGRA6
SES
SCIACH
CERTSCI
INVOLVE
1
MAKEMETH
Figure 4.1
OWNEXP
CHALLENGE
1
CHOICE
CHALLG
Expanded Science Achievement Model
UNDERST
WORKHARD
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 65
Structural Equation Models in Single and Multiple Groups—65
The two rules just described are either necessary or sufficient, but not both. Indeed, there appear to be no necessary and sufficient conditions for identification of the full model. If sufficient conditions for identification are not met then it may be necessary to directly attempt to solve the structural form equations in terms of the reduced form coefficients. Generally, however, except for extremely complex models, the counting rule will work most often in practice.
4.3 Testing and Interpretation in Structural Equation Models As noted earlier, it is not the intention of this chapter to discuss parameter estimation. Parameter estimation was discussed in Chapters 2 and 3, and the issues discussed in those chapters generalize to the estimation of the parameters of the full model. Issues of estimation under violations of underlying assumptions are taken up in Chapter 5. Nevertheless, the combination of measurement and path models into a comprehensive framework does offer interesting problems for testing that should briefly be addressed. Specifically, it is important to consider that the test of the fit of the model based on, say, the likelihood ratio chi-square statistic is now going to be based on many more degrees of freedom than usual. In general, it is possible to partition the total degrees of freedom into those based on restrictions in the measurement part of the model and those based on restrictions in the structural part of the model. Usually, the degrees of freedom from the measurement part of the model are greater than those from the structural part of the model. However, it is the structural part of the model that is typically the focus of substantive inquiry with the measurement part serving to provide unbiased estimates of structural model parameters. Thus, it is possible to reject a relatively well-fitting structural model because of a poorly developed measurement model. It is clear then, that the effort put forth in building a well-defined measurement model will benefit model fit as a whole. In Chapter 6 we consider the problem of model building strategies with respect to the debate around building measurement and structural models. We also consider this issue in Chapter 10. 4.3.1 STANDARDIZED SOLUTIONS IN THE FULL MODEL In the full model, two types of standardizations can be employed. The first type standardizes only the structural part of the model and allows for comparisons between only the structural coefficients. The second type standardizes both observed and latent variables so that all parameters in the model can be compared.
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 66
66—STRUCTURAL EQUATION MODELING
Following Jöreskog and Sörbom (2000) define a set of scaling factors as n o1 ^Φ ^Γ ^ 0 + ΨÞðI ^ − BÞ ˆ −1 ðΓ ˆ 0 −1 2 , Dη = diag½ðI − BÞ 1
^ 2, Dξ = ðdiag ΦÞ
[4.5]
[4.6]
n o1 ^ y ðI − BÞ ^ ε 2, ^ 0y + Θ ^Φ ^Γ ^ 0 + ΨÞðI ^ − BÞ ˆ −1 ðΓ ˆ 0−1 Λ Dy = diag½Λ
[4.7]
1 ^ x ΦΛ ^ 0 x + Θδ 2 , Dx = diag½Λ
[4.8]
and
where diag[·] refers to the diagonal elements of the matrix it is referencing. These scaling factors are applied in specific ways to yield standardized and completely standardized solutions (Jöreskog & Sörbom, 2000). For the standardized solutions, the latent variables are scaled to have variances equal to one while the observed variables remain in their original metric. For the completely standardized solution, both the observed and latent variables are scaled to have variances equal to one. An Example of a Structural Equation Model of Science Achievement In Chapter 2, we considered a model of science achievement. In that model, we considered two variables as representing teacher processes within the input-process-output theory of education. Those variables included student perceptions of the extent to which the teacher emphasizes understanding of science concepts (UNDERSTD) and student perceptions of whether they feel challenged in the classroom (CHALLG). In this example, we take a closer look at student perceptions of classroom context, postulate two latent variables of relevance to perceived classroom processes, and embed these latent variables into the path model discussed in Chapter 2. An inspection of the NELS:88 data set reveals numerous items that assess student perceptions of classroom climate and teacher behavior. These items fall roughly into those that assess student perceptions of how much the teacher emphasizes further study of science, interest in science, the importance of science to everyday life, and learning science facts and rules. Another set of items measure the extent of computer use in the science classroom—in terms
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 67
Structural Equation Models in Single and Multiple Groups—67
of science models, writing reports, making calculations, and so on. A third set of items assess the extent to which students feel challenged in the science classroom, work hard in science class, and are challenged to show understanding. A fourth set of items examine the extent to which students engage in hands-on science learning. Finally, a fifth set of items assess the extent to which students perceive themselves to be engaged in passive science learning—such as copying down teachers’ notes, listening to lectures, and so on. An exploratory factor analysis with oblique rotation (not shown) revealed a very clear five-factor structure along the lines of the sets of items just described. Of the five factors extracted, it was decided to use two factors that could be argued to be most relevant for science achievement. These were items that measured perceptions of hands-on involvement in science learning (INVOLVE) and those that measured the extent to which the students felt challenged in the classroom (CHALLG). In Chapter 2, the variables UNDERST and CHALLG were considered separate though highly correlated variables. On the basis of a more detailed exploratory factor analysis, we find that these two variables actually serve as indicators of a single CHALLG factor. It is argued that the extent to which students perceive to be challenged in the classroom can be predicted in part by the extent of active learning. This hypothesis is consistent with the general input-process-output paradigm discussed in Chapter 1. The path diagram of the expanded path model incorporating the latent variables is shown in Figure 4.1. The initial model is similar to that discussed in Chapter 2. Namely, it is hypothesized that the background student characteristics of previous science grades (SCIGRA6) and socioeconomic status (SES) influence science achievement indirectly through 10th grade science grades. The role of teacher certification in science is hypothesized to predict the extent of hand-on science involvement. This in turn is hypothesized to predict student perceptions of a challenging classroom environment, which in turn should predict science achievement through science grades. The analysis of the initial model was based on a sample of public school 10th grade students from the NELS:88 data set. After listwise deletion, the sample was 6,677. The analysis used the software program Mplus with maximum likelihood estimation. Table 4.1 presents the maximum likelihood estimates of the expanded science achievement model. An inspection of Table 4.1 reveals moderately large and significant effect of perceptions of hands-on involvement on perceptions of being challenged. However, the direct effect of perceptions of hands-on involvement is not a significant or very large predictor of 10th grade science grades. The results in Table 4.1 are supplemented with a breakdown of the total and indirect effects displayed in Table 4.2. Here, it can be seen that although
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 68
68—STRUCTURAL EQUATION MODELING Table 4.1
Maximum Likelihood Estimates of Expanded Science Achievement Model Estimates
SE
Est./SE
Std
StdYX
Measurement model INVOLV BY MAKEMETH
1.000
0.000
0.000
0.606
0.738
OWNEXP
0.724
0.027
26.469
0.439
0.605
CHOICE
0.755
0.029
26.036
0.458
0.507
1.000
0.000
0.000
0.917
0.748
CHALL BY CHALLG UNDERST
0.757
0.024
31.043
0.694
0.503
WORKHARD
0.867
0.026
32.921
0.795
0.723
0.251
0.027
9.282
0.166
0.166
0.012
0.031
0.399
0.021
0.006
0.264
0.026
10.319
0.242
0.133
1.217
0.036
33.687
1.217
0.381
0.788
0.021
37.147
0.788
0.416
Structural model CHALL ON INVOLV INVOLV ON CERTSCI SCIGRA10 ON CHALL SCIACH ON SCIGRA10 SCIGRA10 ON SCIGRA6 SES
0.240
0.028
8.734
0.240
0.098
CERTSCI
0.032
0.068
0.466
0.032
0.005
the direct effect of perceptions of hands-on involvement is a strong predictor of perceptions of a challenging classroom, its total and indirect effect on ultimate science achievement is quite modest—particularly when compared with prior grades in science. From a substantive standpoint, these results suggest that previous science grades and the perception of a challenging classroom environment are important predictors of grades in 10th grade science. The direct effect of perceptions of hands-on involvement is not as strong as its indirect effect through perceptions of a challenging classroom environment. That is, it appears that hands-on involvement
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 69
Structural Equation Models in Single and Multiple Groups—69 Table 4.2
Standardized Total and Indirect Effects for the Expanded Science Achievement Estimates
SE
Est./SE
Std
StdYX
0.959
0.038
24.954
0.959
0.159
0.959
0.038
24.954
0.959
0.159
0.293
0.035
8.455
0.293
0.037
0.293
0.035
8.455
0.293
0.037
0.040
0.083
0.478
0.040
0.002
0.039
0.083
0.466
0.039
0.002
0.001
0.003
0.399
0.001
0.000
Effects from SCIGRA6 to SCIACH Sum of indirect Specific indirect effects SCIACH SCIGRA10 SCIGRA6 Effects from SES to SCIACH Sum of indirect effects Specific indirect effects SCIACH SCIGRA10 SES
Effects from CERTSCI to SCIACH Sum of indirect effects Specific indirect effects SCIACH SCIGRA10 CERTSCI SCIACH SCIGRA10 CHALL INVOLV CERTSCI
is relevant only to the extent that it leads to perceptions of being challenged. Moreover, to the extent that grades in 10th grade science are predictive of overall science achievement, the role of perceptions of hands-on involvement is not as important as the perception of a challenging classroom environment. Interestingly, these results suggest that the role of teacher background (as measured by whether the teacher is certified in science) is not an important predictor of perceptions of hands-on involvement. Indeed, the results of this analysis suggest that teacher certification in science is a poor predictor of science grades or ultimate science achievement—especially when compared with SES, prior grades, and the perceptions of a challenging classroom.
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 70
70—STRUCTURAL EQUATION MODELING
4.4 Multiple Group Modeling: No Mean Structure This section considers the extension of the single group model discussed in this chapter and in Chapters 2 and 3, to multiple group situations. To motivate this section, consider the substantive problem of student perceptions of school climate as discussed in Chapter 3. Suppose that an investigator wishes to understand the differences between public and private schools in terms of student perceptions of school climate. Using, for example, the National Educational Longitudinal Study (NCES, 1988), it is possible to identify whether students belong to public or private (e.g., Catholic, other religious, and other nonreligious) schools. An investigator may have a program of research designed to study school-type differences in student perceptions of school climate eventually relating these differences to important educational outcomes such as academic achievement and student dropout. First, the investigator might wish to determine whether the measurement structure of the student perception items operates the same way across school types. Second, the investigator may be interested in knowing if the means of the factors of school climate are different between public and private school students. 4.4.1 MULTIPLE GROUP SPECIFICATION AND TESTING We begin by considering the problem of assessing the comparability of factor structures across groups.3 Jöreskog (1971) suggested a strategy for assessing the comparability of a factor structures between groups based on tests of increasingly restricted hypotheses. Let us reconsider the factor model subscripted with a group index g = 1, 2, . . . , G, denoted as xg = Λxg ξg + δg ,
[4.9]
where xg is a vector of observed measures, Λxg is a matrix of factor loadings, ξg is a vector of common factors, and δg is a vector of unique variables. The identification conditions discussed in Section 3.5 hold here as well, but now they must be in place for each group. Under the assumption that the samples are independent of one another, and also assuming that the values of the variables are realizations from a multivariate normal population, the log-likelihood function can be written for group g as log L0 ðΩÞg = −
and can be summed yielding,
ng logjΣg j + trðSg Σ−1 g Þ 2
[4.10]
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 71
Structural Equation Models in Single and Multiple Groups—71
log L0 ðΩÞ =
G X
log L0 ðΩÞ:
[4.11]
g =1
Minimizing the function in Equation [4.10] yields the maximum likelihood fitting function FML, written as FML = logjΣj + trðSΣ−1 Þ − logjSj − q:
[4.12]
Given the model specification and the requisite assumptions, the first test that may be of interest is the equality of covariance matrices across groups. Note that for this first step, no structure is imposed. Rather, the goal is simply to determine if the covariance matrices differ. Borrowing from Jöreskog’s (1971) notation, the null hypothesis for this first step can be written as H0 : Σ1 = Σ2 = = ΣG :
[4.13]
This hypothesis is tested using Box’s M test (Timm, 1975) and can be written as
M = nlogjSj −
G X
ng logjSg j,
[4.14]
g =1
which is asymptotically distributed as chi-square with degrees of freedom 1 d = ðg − 1Þqðq + 1Þ: 2
[4.15]
In standard multivariate procedures, such as multivariate analysis of variance, we typically wish to retain the null hypothesis of equal covariance matrices. In the testing strategy proposed by Jöreskog (1971), retaining H0Σ would suggest proceeding with an analysis using the pooled covariance matrix and a discontinuation of examining group differences. If the hypothesis of equality of covariances is rejected, then the next step in the sequence is to test the equality of the number of factors, without regard to the specific pattern of fixed and free loadings. This hypothesis can be represented as H0k : k1 = k2 = = kG ,
[4.16]
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 72
72—STRUCTURAL EQUATION MODELING
where k is a specified number of factors. Essentially, this test is conducted as separate unrestricted factor analysis models. Each is tested using chi-square with degrees of freedom dk = –21 [(q–k)2 – (q+k)]. Because the chi-square statistics are independent, they can be summed to obtain an overall chi-square test of the equality of number of factors, with degrees of freedom 1 dk = ½ðq − kÞ2 − ðq + kÞ: 2
[4.17]
If this hypothesis is rejected, then testing stops and analyses can take place within groups. If the hypothesis of equality of number of factors is not rejected, the next step in the testing strategy is a test of equality of factor loadings. This test is classically known as the test of factorial invariance and can be represented as H0 : Λ1 = Λ2 = = ΛG :
[4.18]
The test of factorial invariance is obtained by setting equality constraints across groups for common elements in the factor loading matrix Λ and allowing the remaining parameters to be free across groups. The result is a chisquare statistic that can be assessed with 1 1 d = gqðq + 1Þ − qk + q − qkðk + 1Þ − gq 2 2
[4.19]
degrees of freedom. If factorial invariance is rejected, then testing stops at this point. If factorial invariance is tenable, the next step is to assess equality of factor loadings and unique variables. Again, in Jöreskog’s (1971) notation, this can be represented as H0ΛΘ : Λ1 = Λ2 = = ΛG , Θ1 = Θ2 = = ΘG :
[4.20]
The test of the hypothesis in Equation [4.20] is obtained by setting equality constraints across groups for common elements in the factor loading matrix Λ and covariance matrix of the uniquenesses Θ. A chi-square test is obtained, which can be assessed with degrees of freedom, 1 1 dΛΘ = gqðq + 1Þ − qk + q − gkðk + 1Þ − q: 2 2
[4.21]
Again, if this test is not rejected, then further sequential testing is allowed. Finally, one can test for complete invariance of all parameters across groups by adding the constraint that the factor covariance matrices Φg to be equal across groups. The hypothesis can be represented as
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 73
Structural Equation Models in Single and Multiple Groups—73
H0ΛΘ : Λ1 = Λ2 = = ΛG , Θ1 = Θ2 = = ΘG ,
[4.22]
Φ 1 = Φ2 = = Φ G :
Jöreskog (1971) makes the important point that the hypothesis in Equation [4.22] is stronger than the hypothesis of equality of covariance matrices in Equation [4.13] because Equation [4.13] includes cases where Σ does not necessarily follow a common factor model. The test of the hypothesis in Equation [4.22] uses the pooled sample covariance matrix. The resulting chi-square statistic is assessed with degrees of freedom 1 1 dΛΘΦ = qðq + 1Þ − qk + q − gkðk + 1Þ − q: 2 2
[4.23]
It is important to point out that the testing sequence outlined by Jöreskog (1971) may lead to inflated significance values. Similar to the problem of multiple post hoc nonorthogonal comparisons in analysis of variance, the sequence of tests outlined by Jöreskog are likely not orthogonal and thus the significance values are unknown. This was discussed by Jöreskog (1971), who recommended that the choice of retaining or rejecting hypotheses should be based on the substantive importance of the hypothesis under investigation along with an understanding of the assumptions underlying the use of the likelihood ratio chi-square test. These assumptions are discussed in Chapter 5. It should also be noted that the modeling strategies just described can handle cases of partial invariance (Byrne, Shavelson, & Muthén, 1989). That is, given lack of total cross-group invariance in the factor loadings, say, one can test partial invariance of specific elements in the loading matrix. Finally, this modeling strategy can be extended to the general structural equation model. In this case, one would add constraints pertaining to equality of the structural coefficients in B, Γ, and Ψ. An Example of Multiple Group Modeling Recall that the example in Chapter 3 concerned student perceptions of school climate, with focus on only public schools. In this section, we examine if there are differences in the factor structure of student perceptions of school climate across students in public and private schools. The data for this example are the same as those used in Chapter 3 except that students in private schools were also extracted. Catholic schools and other religious private schools were omitted from the analysis. The total sample size for the private school students (after listwise deletion) was 346. To bring the sample sizes for
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 74
74—STRUCTURAL EQUATION MODELING
public school students in line with the sample size for private school students, a random sample of 500 public school students was drawn from the total sample of 11,000. An exploratory factor analysis of both groups separately (not shown) revealed that the same two factors held for both groups. Therefore, we proceed with the sequence of testing proposed by Jöreskog (1971), beginning with the test of factorial invariance. Table 4.3 displays the sequential testing of invariance restrictions. Preliminary analyses (not shown) revealed that the hypothesis of equality of covariance matrices was rejected, suggesting that we can begin to explore increasingly restrictive hypotheses of school type differences in student perceptions of school climate. The next hypothesis under Model 1 is that of invariance of factor loadings corresponding to Equation [4.17]. The analysis suggests that the hypothesis of invariance of factor loadings is rejected. Strictly speaking, this finding suggests that while the number of factors is the same for both groups, the relationship between the variables and their corresponding factors is not. According to the testing strategy proposed by Jöreskog (1971), hypothesis testing stops at this point. For completeness, however, we provide the remaining tests as well as chisquare difference tests that compare increasing restrictions.
4.5 Multiple Group Specification: Bringing in the Means Following the empirical example given in the section, An Example of Multiple Group Modeling, the next step in a program of research looking at tracking differences in self-concept and locus of control might be to consider if there are mean differences across tracks on the latent variables. The problem of estimating factor mean differences was considered by Sörbom (1974). This section outlines mean structure analysis, presents the identification and estimation Table 4.3 Modela
Sequential Chi-Square Tests of Invariance Constraints for Analysis of Public and Private School Student Perceptions of School Climate
Δχ2
Δdf
0.000
155.99
15
0.000
13.65
3
χ2
df
p-Value
Model 1
4555.33
191
0.000
Model 2
4711.32
206
Model 3
4724.97
209
a. Model 1: Invariance of factor loadings. Model 2: Invariance of factor loadings and measurement error variances. Model 3: Invariance of factor loadings, measurement error variances, and factor variances and covariances.
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 75
Structural Equation Models in Single and Multiple Groups—75
issues involved in mean structure analysis, and presents an empirical example. An excellent overview of the methods described in this section can be found in Hancock (2004) 4.5.1 MEAN STRUCTURE SPECIFICATION AND TESTING To estimate the differences between groups on latent variable means, it is necessary to expand the factor model to incorporate intercepts. That is, we consider expanding the model in Equation [4.9] as xg = τ + Λg ξg + δg ,
[4.24]
where τ is a q-dimensional vector of intercepts. The remaining terms were defined earlier. In the mean structure case, we add the assumption that Eðxg Þ = τ + Λg Eðξg Þ = τ + Λ g κg ,
[4.25]
where κg is a k-dimensional vector of factor means for group g. 4.5.2 IDENTIFICATION AND ESTIMATION OF THE MEAN STRUCTURE MODEL In addition to the standard forms of identification in the restricted factor model, there are issues of identification that are specific to the mean structure case. To begin, recall that the goal of mean structure analysis is to assess the differences between groups on factor means. Thus, it will typically be the case that the battery of measurements is the same for both groups, and it is therefore reasonable to assume that factorial invariance holds.4 Under the assumption of factorial invariance, the model in Equation [4.25] is not identified. As shown by Sörbom (1974) we can add a k-dimensional vector, say d, to κg and subtract Λd from τx yielding Eðxg Þ = τ − Λd + Λðκg + dÞ = τ + Λκg :
[4.26]
Because d is a k-dimensional vector, the model in Equation [4.25] is identified only if we add k restrictions. One way to accomplish this is setting, say, κg = 0, which fixes k restrictions. From here, the remaining factor mean estimates are interpreted as differences from group g.
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 76
76—STRUCTURAL EQUATION MODELING
An Example of Mean Structure Modeling Continuing with the example of private and public school student perceptions of school climate we proceed to estimate the school type mean differences on these three latent variables. For the purpose of this example, the public schools are the reference groups and are coded zero. Also, we estimate the factor mean differences under the assumption of measurement intercept and factor loading invariance. As noted above, under factorial invariance, designating the factor means for a particular group to be zero removes k restrictions (here, one) and allows for the identification of the factor means. The results are shown in Table 4.4. Note that the wordings of the items are on a scale from 1 to 4, with 4 representing disagreement with the statements that serve as indicators of their respective factors. The findings indicate that public school students, on average, tend to disagree with statements tapping into positive school climate compared to private school students. Moreover, private school students tend disagree with statements reflecting negative school climate, relative to the public school students. To the extent that public schools have a greater student-teacher ratio, more behavior problems on average, and so on, it appears that students with each of these schools accurately tend to accurately describe the school climate. An important caveat in these findings is that they do not account for variation within the public school or private school sectors separately. This issue is related to the clustered sampling scheme that generated the data and is discussed in Chapter 7 when we take up the topic of multilevel factor analysis.
4.6 An Alternative Model for Estimating Group Differences This section covers a special case of the general structural equation model considered above for the estimation of group differences on latent variables.
Table 4.4
Maximum Likelihood Estimates of Factor Mean Differences in Student Perceptions of School Climate Estimate a
S.E.
Est./S.E.
p
Positive climate
−0.279
0.067
−4.131
0.000
Negative climate
0.283
0.071
4.014
0.000
Factor
a. Estimate reflects factor mean difference between private school students (=1) and public school students (=0) under the assumption of factorial invariance. Note that scales are coded 1 = strongly agree, 4 = strongly disagree.
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 77
Structural Equation Models in Single and Multiple Groups—77
This model is referred to as the MIMIC model standing for the multiple indicators and multiple causes model and was proposed by Jöreskog and Goldberger (1975). Consider once again the goal of estimating school type differences on student perceptions of school climate. Denote by x a vector that contains dummy codes representing group membership (e.g., 1 = public school; 0 = private school). Then, the MIMIC model can be written as y = Λy η + ε, η = Γx + ζ,
[4.27]
x ≡ ξ:
The identity between x and ξ is obtained by fixing Λx = I, a q × q identity matrix, and Θδ = 0, a null matrix. Note that the covariance matrix of x is absorbed into the structural covariance matrix Φ. There are no special rules of identification that are associated with the MIMIC model apart from those that are required by the general structural equation model. Moreover, estimation of the parameters of the MIMIC model proceeds in the same way as estimation of the parameters in the general structural equation model. 4.6.1 AN EXAMPLE OF THE MIMIC MODEL The analysis of school-type differences on student perceptions of school climate is used here as an example. A path diagram of the MIMIC model is given in Figure 4.2 and the results are displayed in Table 4.5. The results of this analysis are virtually identical with the mean structure analysis reported in the section An Example of Multiple Group Modeling. Specifically, the regression of the latent school climate perception variables on the school type reflects the results found in the mean structure case. Again, we find that public school students perceive poor teacher quality, more negative school climate, and more problems with misbehavior and disruptions than their private school counterparts. 4.6.2 EXTENSIONS OF THE MIMIC MODEL The MIMIC model is perhaps one of the most flexible special cases of the general structural equation model for addressing substantive problems in the social and behavioral sciences. To show this, consider again the vector of exogenous variables x. In the example given in Section 4.6.1, the vector x was a dummy variable representing track placement. However, the MIMIC model can incorporate any type of exogenous variable—from continuous to categorical.
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 78
78—STRUCTURAL EQUATION MODELING
GETALNG–LISTEN
DISRUPT–MISBEHAV 1
1
POS CLIMATE
NEG CLIMATE
SCHOOL TYPE
Figure 4.2
Stylized MIMIC Model of Student Perception of School Climate
Indeed, x can be coded to represent orthogonal analysis-of-variance design vectors (e.g., Kirk, 1995) thus integrating experimental design notions into a latent variable framework. Perhaps a more interesting specification of the MIMIC model comes from the work of B. Muthén (1989) on estimating parameters in heterogeneous populations. Among other things, Muthén extended the specification of the MIMIC model to allow the regression of the indicators as well as the factor on the exogenous variables. The advantage of this extension is that it allows one to examine the extent to which there are group differences on specific items over and above the factor. This extended specification can be obtained as follows. First, consider again the full structural equation model in Equations [4.1] to [4.3] and reproduced here η = Bη + Γξ + ζ, y = Λy η + ε, x = Λx ξ + δ:
[4.28]
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 79
Structural Equation Models in Single and Multiple Groups—79 Table 4.5
MIMIC Model Results of School-Type Differences in Perceptions of School Climate Estimates
S.E.
Est./S.E.
GETALNG
1.000
0.000
0.000
SPIRIT
0.945
0.033
28.438
STRICT
0.880
0.036
24.315
FAIR
0.971
0.032
29.993
Measurement part POS BY
RACEFRND
1.001
0.033
30.000
TCHGOOD
1.087
0.031
35.201
TCHINT
1.072
0.029
37.514
TCHPRAIS
1.029
0.031
33.702
LISTEN
1.023
0.033
31.121
DISRUPT
1.000
0.000
0.000
PUTDOWN
0.825
0.034
24.162
STUDOWN
0.867
0.036
23.778
FEELSAFE
0.790
0.034
23.155
IMPEDE
0.963
0.040
23.975
MISBEHAV
0.960
0.038
25.324
−0.279
0.067
−4.161
0.284
0.070
4.041
0.694
0.045
15.337
NEG BY
Structural part POS ON SCHTYPE NEG ON SCHTYPE NEG WITH POS
As in the regular MIMIC model, let Λx = I, and Θδ = 0. In the extended model proposed by Muthén, let Λy = 1, and Θε = 0. This specification forces the loadings to reside as elements in B. As in the basic MIMIC model, the metric of the latent variable must be determined—typically by fixing a loading to 1.0. In this extended parameterization, the loadings reside in the B
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 80
80—STRUCTURAL EQUATION MODELING
matrix, and hence, one of the elements of B must be fixed to 1.0. Also, in this parameterization, the matrix Γ contains the regressions of the factor as well as its indicators on the exogenous variables. Moreover, in the case of a single factor, the vector ζ contains p + 1 elements. The first p elements are the uniquenesses associated with the elements of ε and the last element is the disturbance term ζ. An Example of the Extended MIMIC Model As discussed earlier, this model will provide exactly the same goodness-offit, estimates, and standard errors as the basic model in Figure 4.2. The advantage to the extended model is to determine if there are school-type differences in specific items, over and above differences in the latent variables. On the basis of the Lagrange multiplier diagnostics (not shown), the largest effect was the relation between school-type and the question that asked students to indicate the extent to which they agree with the statement “I don’t feel safe at this school.” On freeing that path, the relationship was negative, indicating that private school students significantly disagree with this statement relative to their public school counterparts, over and above the differences in the general negative school climate factor.
4.7 Issues of Selection Bias in Multiple Group Models A natural question that arises in the context of the educational tracking example is whether we can unequivocally ascribe mean differences in latent selfconcept to the effect of educational tracking. When there is random assignment of observations to conditions, we are in a stronger position to argue for causal effects of treatments. However, in this example, random assignment was not utilized. Indeed, numerous manifest and latent variables are in play that select children into educational tracks. This section considers the problem of nonrandom selection into treatment conditions as it pertains to issues of factorial invariance. We also consider methods that attempt to account for nonrandom selection mechanisms in latent variable models of group differences. 4.7.1 THE PROBLEM OF FACTORIAL INVARIANCE The problem of factorial invariance concerns the extent to which a factor model that is assumed to hold in a parent population also holds in subpopulations formed by means of some selection criterion. Taking the educational tracking issue as an example, the problem of factorial invariance concerns whether a single factor model of self-concept, which is assumed to hold in the
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 81
Structural Equation Models in Single and Multiple Groups—81
population of children, still holds within educational tracks, where the tracks are not usually formed by random assignment. To motivate the problem of factorial invariance, consider again the factor analytic model discussed in Chapter 3. As in Equation [3.2] the covariance matrix of the q variables can be written as Σ = ΛΦΛ0 + Θ,
[4.29]
where Σ is a q × q population covariance matrix, Λ is the matrix of factor loadings, Φ = E(ξξ′) is a k × k factor covariance matrix, and Θ is a q × q diagonal matrix of unique variances. Moreover, letting E(ξ) = κ, the q × 1 observed mean vector μ can be modeled as μ = τx + Λκ,
[4.30]
where τx is a q × 1 vector of measurement intercepts and κ is a k × 1 vector of factor means. Consider also a selection variable z and a selection function f(z) that determines the selection of a subpopulation from the parent population. For the tracking example, z may contain a vector of variables such as prior academic achievement, SES, and so on. At this point, however, we consider z and f(z) as unknown. Meredith (1993) distinguished between two types of factorial invariance, namely, strong factorial invariance and strict factorial invariance. For either type of invariance, certain assumptions must hold. First, it is assumed that a factor model holds in the parent population. Second, it is assumed that the conditional covariances of the factors and the uniquenesses given f(z) is a zero vector. Under these two assumptions, strong factorial invariance implies that for every subpopulation, denoted as s, μs = τx + Λκs ,
[4.31]
Σs = ΛΦs Λ0 + Θs :
[4.32]
and
Equations [4.31] and [4.32] mean that under the strong factorial invariance the structural intercepts and the factor loadings are invariant across the groups but that the factor means, factor covariance matrix, and covariance matrix of the uniquenesses may differ. In contrast, strict factorial invariance retains Equation [4.32] but now requires that the matrix of unique variances is constant across subpopulations—namely, that Σs = ΛΦs Λ0 + Θ:
[4.33]
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 82
82—STRUCTURAL EQUATION MODELING
The practical implications of factorial invariance as it applies to multiple group modeling concerns the potential for explicitly testing whether selection mechanisms are present that mitigate against arguing for the causal effect of treatments. As is shown below, such testing is possible in the MIMIC framework. 4.7.2 LATENT VARIABLE ANALYSIS OF COVARIANCE APPROACHES TO MODELING SELECTION An extension of the analysis of covariance (ANCOVA) to the latent variable context was proposed by Sörbom (1978). This approach allows for the incorporation of any number of covariates, and through a latent variable representation of the covariate, accounts for the problems of measurement error. The specification of the latent variable ANCOVA model requires a multiple group model with mean structures (Sörbom, 1974). The groups represent the treatment conditions of interest, say, the experimental group and the control group. Specifying a mean structure analysis allows for testing the experimental hypothesis of interest. However, within each group, a structural model is specified that regresses the latent outcomes of interest on the latent covariate. Setting equality constraints across groups on the slopes relating the outcome to the covariate allows for a latent variable extension of the homogeneity of regression test. The model proposed by Sörbom (1978) allows one to test factor mean differences under the assumption of homogeneity of regression. One limitation of Sörbom’s (1978) procedure is that the covariates should follow a factor analytic representation to be relevant to the structural equation modeling framework. That is, although there may be as many covariates as necessary, the benefit of Sörbom’s approach lies precisely in the latent variable specification of the covariate. However, there may be cases in which such a latent variable specification is not feasible. Again, considering the tracking example, the covariates of interest might include demographic characteristics and course taking patterns for which a factor analytic representation may be inappropriate.
4.7.3 A PROPENSITY SCORE APPROACH TO MODELING SELECTION An alternative to ANCOVA is based on the use of the propensity score. The propensity score was proposed by Rosenbaum and Rubin (1983) as a means of balancing treatment and control groups with respect to observed covariates in nonrandomized studies. In a typical application of this approach, a model is specified that predicts membership into groups. The predictors in this model are referred to
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 83
Structural Equation Models in Single and Multiple Groups—83
as covariates. The propensity score is defined as the conditional probability of assignment to a treatment group given a set of observed covariates and is obtained via a logistic or probit regression. Each observation is associated with a propensity to be assigned to the treatment group. The distribution of propensity scores is then divided into strata, and analyses of group differences are conducted within strata. Comparisons of group differences within and across strata provide evidence for whether or not the bias due to nonrandom selection has been accounted for by the propensity score adjustment. The propensity score methodology was integrated with multiple group MIMIC modeling by Kaplan (1999) as a means of addressing selection bias in latent variable models. Specifically, a MIMIC model of group differences on latent variables is specified for each strata formed by the propensity score. Equality constraints across strata of the type discussed in Section 4.4.1 are imposed on the regression coefficient relating the latent variable to the grouping variable (elements of Γ). A number of hypotheses can be tested under this multisample MIMIC model. Of considerable interest to the problem of assessing selection bias is the hypothesis of equality of group differences across strata. The coefficients in Γ represent the factor mean difference between the groups. An analysis of the coefficients in Γ can yield three types of interpretations. First, if there is no statistical and/or substantive factor mean difference between the groups and if this lack of an effect is found to be invariant across strata, it suggests that the selection characteristics are accounting for the differences between groups on the latent variables. Second, if there is a statistical and/or substantively significant effect that is invariant across strata it suggests that there is a factor mean difference between groups that is not due solely to selection effects. Third, if there is a significant improvement in model fit when allowing for the elements of Γ to vary freely across strata (hereafter referred to as the Γ-free model) compared with the Γ-invariance model, it suggests that there is an interaction between the selection characteristics and the group variable. In this case, further study of group differences within each stratum separately on the covariates may reveal the sources of the interaction. The method proposed by Kaplan (1999) rests on two fundamental assumptions. First, it is assumed that treatment assignment is “strongly ignorable” (Rosenbaum & Rubin, 1983). Second, it is assumed that at least strong factorial invariance holds. Multiple group mean and covariance structure modeling (Jöreskog, 1971; Sörbom, 1974) can be used to assess these assumptions. If either form of invariance is found not to hold, interpretation of the test for a selection by treatment interaction must proceed with great caution. However, lack of invariance across groups does not necessarily invalidate comparisons between groups and within groups.
04-Kaplan-45677:04-Kaplan-45677.qxp
6/24/2008
8:20 PM
Page 84
84—STRUCTURAL EQUATION MODELING
4.8 Conclusion This chapter covered the merging of path analysis and factor analysis into a comprehensive structural equation modeling methodology. We extended the basic model to cover modeling in multiple groups. In addition, we discussed new developments in modeling nonrandom selection that can occur in quasiexperimental applications of structural equation modeling. The general model discussed in this chapter, as well as the special cases discussed so far rest on a certain set of statistical assumptions. It is crucial for these assumptions to be met to have confidence in the inferences drawn from application of structural equation modeling. The next chapter discusses, in detail, the assumptions underlying structural equation modeling.
Notes 1. Duncan (1975) has used the term structural equation modeling to refer to what we are calling path analysis. Although the terminology is somewhat arbitrary, I have decided to maintain the common parlance used in this field. 2. The focus on multiple group measurement models is due to the fact that these are the most common models studied across groups. Presenting the multiple group measurement model does not result in a loss of generality. 3. It is possible to assess group differences in the unrestricted model, using a variety of factor comparability measures. See, for example, Harman (1976). 4. However, this is only reasonable if there is random selection of observations and random assignment to groups.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 85
5 Statistical Assumptions Underlying Structural Equation Modeling
A
s with all statistical methodologies, structural equation modeling requires that certain underlying assumptions be satisfied to ensure accurate inferences. These assumptions pertain to the intersection of the data and the estimation method. This chapter considers the major assumptions associated with structural equation modeling. These include, multivariate normality, completely random missing data, sufficiently large sample size, and correct model specification. In addition to these major assumptions, this chapter also discusses one additional assumption—namely the assumption of exogeneity. Assumptions regarding the sampling mechanism are deferred to Chapter 7 when we take up the issue of multilevel structural equation modeling.
5.1 Nonnormality A basic assumption underlying the standard use of structural equation modeling is that the observations are drawn from a continuous and multivariate normal population. This assumption is particularly important for maximum likelihood (ML) estimation because, as we saw in Chapter 2, the ML estimator is derived directly from the expression for the multivariate normal distribution. As noted in Chapter 2, if the data follow a continuous and multivariate normal distribution, then ML attains optimal asymptotic properties, namely, that the estimates are normally distributed, unbiased, and efficient. This section focuses on the effects of nonnormality on normal theory– based estimation as well as alternative estimators that have been proposed to address nonnormality. In the interest of space, we will not consider alternative remedies for handling nonnormal variables, such as transforming the original variables or creating item parcels. The interested reader should consult the excellent review by West, Finch, and Curran (1995). 85
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 86
86—STRUCTURAL EQUATION MODELING
5.1.1 EFFECTS OF NONNORMALITY ON NORMAL THEORY–BASED ESTIMATION The effects of nonnormality on estimates, standard errors, and tests of model fit are well known. The extant literature accumulated from the mid1980s through the 1990s based on statistical simulation studies suggests that nonnormality does not affect parameter estimates. In contrast, standard errors appear to be underestimated relative to the empirical standard deviation of the estimates. With regard to goodness-of-fit, the extant literature indicates that nonnonnormality can lead to substantial overestimation of likelihood ratio chi-square statistic, and this overestimation appears to be related to the number of degrees of freedom of the model (see, e.g., Boomsma, 1983; B. Muthén & Kaplan, 1985, 1992; Olsson, 1979). 5.1.2 ESTIMATORS FOR CONTINUOUS NONNORMAL DATA The mid-1980s also witnessed tremendous developments in alternative estimation methods in the presence of nonnormal data. Most notably is the work of Browne (1982, 1984) for continuous manifest variables and B. Muthén (1978, 1984) for categorical manifest variables. In both the continuous and categorical cases, the approach to estimation under nonnormality uses a class of discrepancy functions based on weighted least squares (WLS). The WLS discrepancy function can be written as FWLS = ðs − σÞW − 1 ðs − σÞ,
[5.1]
where s = vech(S), and σ = vech[Σ(Ω)] are vectorized elements of S and Σ(Ω), respectively.1 For the function in Equation [5.3] to be a proper discrepancy function,2 W must be positive definite. The key characteristic of WLS estimation is the choice of an appropriate weight matrix W. Following Browne (1982), consider an element of the vector s denoted as sij, the covariance of variable i and variable j. The expected value of the sample covariance element can be written as Eðsij Þ = sij :
[5.2]
As with any other sample statistic, one can obtain its variance and covariance with other sample statistics. The general form of the asymptotic covariance matrix of covariances in s can be written as ðN − 1Þacovðsij ; sgh Þ = sig sjh + sih sjg +
N −1 kijgh , N
[5.3]
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 87
Statistical Assumptions Underlying Structural Equation Modeling—87
where κijgh is a fourth-order cumulant—a component of the distribution related to the multivariate excess kurtosis. When the weight matrix in Equation [5.1] contains the elements shown in Equation [5.3], this is referred to as the asymptotic distribution free (ADF) estimator proposed by Browne (1982). Note that under the assumption of multivariate normality there is no excess kurtosis and therefore the term κijgh is zero so that Equation [5.3] reduces to ðN − 1Þacovðsij , sgh Þ = sig sjh + sih sjg :
[5.4]
With Equation [5.4] as the weight matrix, FWLS reduces to the generalized least squares estimator discussed in Chapter 2. The robustness properties of the ADF estimator have been the subject of numerous simulation studies. The results of early studies of the ADF estimator were somewhat mixed. For example, Browne (1982) found ADF estimates to be biased. Muthén and Kaplan (1985, 1992), on the other hand, found very little bias in ADF estimates. In all cases, the ADF chi-square was smaller than ML chi-square when applied to continuous nonnormal data. However, when ADF was applied to categorical data, B. Muthén and Kaplan (1992) found that the ADF chi-square was markedly sensitive and that this sensitivity increased as the size of the model increased. Moreover, ADF standard errors were noticeably downward biased, becoming worse as the model size increased. A troubling feature of the ADF estimator concerns the computational difficulties encountered for models of moderate size. Specifically, with p variables there are u = –12 p(p + 1) elements in the sample covariance matrix S. The weight matrix W is of order u × u. Therefore, the size of the weight matrix grows rapidly with the number of variables. So, if a model were estimated with 20 variables, the weight matrix would contain 22,155 distinct elements. Moreover, ADF estimation required that the sample size (for each group if relevant) exceed p + –12 p(p + 1) to ensure that the weight matrix is nonsingular. These constraints have limited the utility of the ADF estimator in applied settings. Below, we discuss some new estimators that appear to work well for small samples. More recently, three expectation-maximization (EM) based ML estimators have been developed for the structural equation modeling framework which provide robust chi-square tests and correct standard errors under nonnormality. These estimators are distinguished by the approach they take for the calculation of standard errors. The first method uses a first-order approximation of the asymptotic covariance matrix of the estimates to obtain the standard errors and is referred to as the MLF estimator. The second method is the conventional ML estimator that uses the second-order derivatives of the observed log-likelihood. The third method is based on a sandwich estimator derived from the information matrices of ML and MLF and produces the
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 88
88—STRUCTURAL EQUATION MODELING
correct asymptotic covariance matrix of the estimates that is not dependent on the assumption of normality, and which also yields a robust chi-square test of model fit. This estimator is referred to as MLR. The MLR is a robust full information ML estimator. All three estimators are available in the Mplus software program. 5.1.3 ESTIMATORS FOR CATEGORICAL VARIABLES It is rarely the case in social science applications of structural equation modeling that analysts have ratio scaled continuous measures. More often, researchers measure social and behavioral attributes using ordered categories but typically treat them as though they are interval scaled. Arguably, the most common scale is the Likert-type scale, but often dichotomous scales are encountered. Clearly, such data are, by definition, not normally distributed. Again, the concern is whether continuous normal theory–based estimators such as ML and generalized least squares can recover the parameters of models estimated on such data, and whether standard errors and test statistics are unduly affected by nonnormality induced by categorization and skewness. An important development in the estimation of structural model parameters for categorical variables is based on the work of B. Muthén (1978, 1983, 1984). Muthén’s approach can be outlined as follows. First, Muthén assumes that for each element of the observed categorical vector y there is a corresponding latent response variable denoted as y. For a given measure yi 8 Ci − 1 > > > > C > < i −2 yi = ... > > > > 1 > : 0
if νi, Ci − 1 < yi if νi, Ci − 2 < yi ≤ νi, Ci − 1 , if if
νi, 1 < yi yi ≤ νi, 1
[5.5]
≤ νi, 2
where the νis are threshold parameters to be estimated. Muthén’s approach assumes that y∗ follows a linear factor model y = τy + Λη + ε,
[5.6]
and a system of structural equations η = α + Bη + Γx + ζ:
[5.7]
Muthén’s approach differs from the standard model described in Chapter 4 in that it specifies the structural equation model with observed exogenous
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 89
Statistical Assumptions Underlying Structural Equation Modeling—89
variables x that do not follow a linear factor model. This specification allows for two types of cases of Muthén’s general model. In Case A, there is no x vector. This case allows Muthén’s framework to capture all of the models considered by Jöreskog and Sörbom (1993). For Case B, the vector x is included allowing one to capture models in which x is a nonstochastic vector of variables (such as dummy variables). An example of such a model would be the MIMIC model discussed in Chapter 4, where x may represent treatment group conditions. If it is reasonable to assume that continuous and normally distributed y∗ variables underlie the categorical y variables, then from classic psychometric theory a variety of latent correlations can be specified. Table 5.1 summarizes Pearson and the latent correlations. The first step in Muthén’s approach is to estimate the thresholds for the categorical variables using ML. In the second step, the latent correlations (e.g., tetrachoric correlations) are estimated. Finally, in the third step, a consistent estimator of the asymptotic covariance matrix of the latent correlations is obtained and implemented in a WLS estimator. Observe that Muthén’s approach is quite flexible insofar as any combination of categorical and continuous observed variables can be present in the data. The only requirement is the assumption that the categorical variables are associated with continuous normally distributed latent response variables. A trivariate normality test was offered by Muthén and Hofacker (1988). If the assumption of trivariate normality holds, then bivariate normality could be assumed for each pair. The first simulation study examining the performance of the categorical variable methodology (CVM) estimator compared with estimators for continuous variables was Muthén and Kaplan (1985). In this article, Muthén and Kaplan examined the ability of CVM to recover the parameters of the y∗ model when the variables were split into 25%/75% dichotomies. The results showed that CVM yielded a slight underestimation of chi-square but that the parameter estimates and sampling variability were well in line with expected values. Similar findings were observed for multiple group mean structure models Kaplan (1991b). Table 5.1
Observed and Latent Correlations
x-Variable Scale
y-Variable Scale
Observed Correlation
Latent Correlationa
Continuous Continuous Continuous Categorical Dichotomous
Continuous Categorical Dichotomous Categorical Dichotomous
Pearson Pearson Point-Biserial Pearson Phi
Pearson Polyserial Biserial Polychoric Tetrachoric
a. Assumption of an underlying continuous variable for categorical or dichotomous variables.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 90
90—STRUCTURAL EQUATION MODELING
5.1.4 RECENT DEVELOPMENTS IN ESTIMATION UNDER NONNORMALITY As noted above, a problem with estimation methods that explicitly address nonnormality is the reliance on very large sample sizes or unrealistically small models. This is because within the class of WLS methods, a weight matrix inversion is required as can be seen from Equation [5.1]. Accounting for nonnormality increases the size of the weight matrix dramatically making estimation difficult unless the sample sizes are very large or models are unrealistically small. Indeed, for small samples, the weight matrix may be singular. Recent developments in the WLS-based estimation of structural model parameters under nonnormality now do not require such large sample sizes. Based on work by Satorra (1992), Muthén and his colleagues (see also B. Muthén, 1993; B. Muthén, du Toit, & Spisic, 1997) developed a mean-adjusted WLS estimator (WLSM) and mean- and variance-adjusted WLS estimator (WLSMV). Both estimation methods are available in Mplus (L. Muthén & Muthén, 2006). The basic idea behind the WLSM and WLSMV is as follows. Note from above that the weight matrix W is chosen to be an estimate of the asymptotic covariance matrix of the estimates, denoted in Muthén et al. (1997) as Γ. This matrix requires inversion. A Taylor expansion of the asymptotic covariance matrix of the estimates yields a distinction between W and Γ with only W requiring inversion—not Γ. Therefore, from a computational standpoint, W can be chosen as any matrix that is easy to invert (such as the identity matrix). As such, a robust estimator yielding robust standard errors is developed that does not require extensive computations and does not require enormously large sample sizes. In addition to robust estimation, a robust mean-adjusted and mean- and variance-adjusted chi-square can be given (Satorra, 1992) that again does not derive from an estimation method that does not rest on potentially unstable matrix calculations. A Monte Carlo study conducted by B. Muthén (1993) demonstrated that the robust WLS estimator produced much better sampling variability behavior and considerably better chi-square performance compared with the conventional WLS approaches for handling nonnormality and categorical variable described above. Below, we apply these methods to the science achievement model, explicitly accounting for the mixture of categorical and continuous variables and compare the results with the traditional approaches. Alternative Estimation of the Science Achievement Model In this section, we reanalyze the science achievement model in Chapter 2 under a variety of different estimation algorithms that account for the mixture of continuous and categorical observed variables. This approach is appropriate given the nature of the scales used in this model. We consider estimation under
05-Kaplan-45677:05-Kaplan-45677.qxp
6/25/2008
11:27 AM
Page 91
Statistical Assumptions Underlying Structural Equation Modeling—91
WLS (B. Muthén, 1984) and compare the results with the new robust estimation methods for categorical data (B. Muthén et al., in press). Normal theory ML results are reproduced for comparative purposes. All analyses used Mplus (L. Muthén & Muthén, 2006). Table 5.2 presents the results of the estimators under study. A comparison of the WLS estimators against ML reveals some noticeable differences. In particular, the standard errors under the WLS-based estimators are generally smaller than those under normal theory ML. This is expected given theoretical studies of these estimators. The pattern of changes in the estimates does not reveal any particular pattern. Finally, and perhaps most noticeably, the chisquare test of model fit is smaller for the WLS-based estimators compared to ML. Indeed, the robust WLS estimators result in progressively smaller goodness-offit tests compared to WLS without such a correction.
Table 5.2
Categorical Variable Estimation of the Science Achievement Model Estimates (S.E.)
Effects
ML
WLS
WLSM
WLSMV
IRTSCI ON SCIGRA10
1.228 (0.034)
1.873 (0.052)
1.876 (0.051)
1.876 (0.051)
−0.033 (0.017)
−0.035 (0.012)
−0.038 (0.012)
−0.038 (0.012)
SCIGRA10 ON CHALLG SCIGRA6
0.781 (0.020)
0.500 (0.012)
0.544 (0.013)
0.544 (0.013)
SES
0.239 (0.026)
0.207 (0.017)
0.288 (0.017)
0.288 (0.017)
−0.040 (0.039)
−0.037 (0.025)
−0.049 (0.025)
−0.049 (0.025)
0.168 (0.015)
0.144 (0.013)
0.153 (0.013)
0.153 (0.013)
0.318 (0.010)
0.409 (0.011)
0.408 (0.011)
0.408 (0.011)
−0.031 (0.033)
−0.019 (0.025)
−0.017 (0.026)
−0.017 (0.026)
19.452 (0.491)
22.759 (0.501)
22.759 (0.501)
CERTSCI UNDERSTD CHALLG ON UNDERSTD UNDERSTD ON CERTSCI Residual variances UNDERSTD
1.858 (0.031)
CHALLG
1.250 (0.021)
SCIGRA10 IRTSCI Chi-square df
2.637 (0.043) 29.290 (0.483) 1321.31
1190.19
1001.68
901.51
10
10
10
9
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 92
92—STRUCTURAL EQUATION MODELING
It should be pointed out that the results of this application, though generally consistent with theoretical expectations, may not occur in practice. Indeed, the application of estimators that account for nonnormality may reveal other problems that could actually inflate chi-square and standard errors.
5.2 Missing Data In addition to the problem of nonnormal data, another problem that commonly occurs in social and behavioral science research is that of missing data. Generally, statistical procedures such as structural equation modeling assume that each unit of analysis has complete data. However, for numerous reasons, units may be missing values on one or more of the variables under investigation. The question addressed in this section concerns the extent to which inferences about the parameters and test statistics of structural equation models are influenced by the presence of incomplete data. 5.2.1 A NOMENCLATURE FOR MISSING DATA To begin, let us consider a standard nomenclature for missing data problems. To motivate this section consider first the case of missing values on one variable. For example, we may find that some responses to a question regarding teacher salary are missing for some subset of teachers. The effect of missing values in the univariate case is to reduce the sample from size n to size m, say. Statistical summaries of the data, such as the sample mean and variance, are based on m units who responded to the question. If the m observed units are a random sample from the n total sample, then the missing data are said to be missing at random (MAR) and the missing data mechanism is ignorable. However, if teachers with higher incomes tend not to report their income, then the probability of observing a value for teacher salary depends on the salary and hence the missing data are not MAR (NMAR) and the mechanism generating the missing data is nonignorable. Consider next the case of two variables with missing data occurring for only one variable. This is an example of a monotone missing data pattern. To place this problem in a substantive context consider examining the age and income of a sample of teachers, with missing values again occurring for income. Let X and Y represent measures of age and income for a sample of teachers, respectively. Following Rubin’s (1976) terminology, three cases are possible. First, if missing values for Y are independent of X and Y, then the missing data are MAR and the observed data are observed at random (OAR), and thus, the missing data are missing completely at random (MCAR). Second, if missing values on Y are dependent on values of X—that is, if income values are missing because of the age of the teachers, then we say that the missing values on Y are
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 93
Statistical Assumptions Underlying Structural Equation Modeling—93
MAR. This is because although the observed values on Y are not a random sample from the original sample, they are a random sample within subgroups defined on X (Little & Rubin, 2002). Finally, if the probability of responding to Y depends on Y even after controlling for X, then the data are neither MAR nor OAR. Here again, the missing data mechanism is nonignorable. 5.2.2 AVAILABLE CASE APPROACHES TO MISSING DATA In line with Little and Rubin (2002), we can consider the following approaches to handling missing data. Methods Based on Complete Data for All Units This approach creates complete data for all units. This can be accomplished in two ways. The first is based on listwise available data for all cases on any variable in the analysis. This is referred to as the listwise present approach (LPA). The second is based on pairwise available data where the focus of attention is on statistics calculated on pairs of observations (e.g., correlations). Here, observations are deleted if there are any missing values on any pair of variables under consideration. This is referred to as the pairwise present approach (PPA). The advantages to these approaches are their obvious simplicity. However, there are numerous serious problems with these approaches. Problems with LPA and PPA The main disadvantage with LPA is the loss of information. Also, LPA assumes that the remaining data are a random sample of the total sample. Thus, LPA implicitly assumes that the missing data are MCAR. Simple descriptive or inferential statistics may be used to assess whether this is a valid assumption. However, it can only assess MCAR and not MAR. A major problem with the pairwise present approach is that when p ≥ 3, PPA can yield nonpositive definite correlation and covariance matrices. If covariance or correlation matrices are not positive definite, then this may cause discrepancy function values to become negative and thus would violate the requirement for discrepancy functions to be bounded below by zero (Browne, 1982). A second problem with pairwise correlation matrices is that they do not maximize any proper likelihood function (B. Muthén, Kaplan, & Hollis, 1987). To see this, recall that under the assumption of multivariate normality, the asymptotic covariance matrix of the sample covariance matrix S takes on the form shown in Equation [5.4] rewritten here as acovðsij , sgh Þ =
1 ðsig sjh + sih sjg Þ, n
[5.8]
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 94
94—STRUCTURAL EQUATION MODELING
where n = N − 1. As shown by Heiberger (1977), however, the asymptotic covariance matrix of the pairwise covariance matrix can be written as acovðsij , sgh Þ =
nijgh ðsig sjh + sih sjg Þ, nij ngh
[5.9]
where nijgh are the number of complete observations on variables j, k, l, m. The difficulty with pairwise covariance matrices lies in the multiplier (nijgh/nijngh), which results in a violation of the Wishart distribution assumptions. With respect to structural equation modeling, this violation will likely affect the chisquare goodness-of-fit test. Finally, even in the case where pairwise deletion does not result in nonpositive definite matrices, the assumption of MCAR is assumed to hold for the subset of observations that remain. Thus, PPA is not recommended for use in structural equation modeling unless the amount of missing data is quite small and MCAR can be assumed to hold. 5.2.3 MODEL-BASED APPROACHES TO MISSING DATA One can consider treating missing data by explicitly modeling the mechanism that generates missing data. This approach requires defining a model for both the observed data and the missing data and maximizing the likelihood under the full model. As noted by Little and Rubin (2002), model-based approaches avoid ad hoc procedures, are generally quite flexible, and allow the calculation of standard errors that take into account missing data. A major breakthrough in model-based approaches to missing data in the structural equation modeling context was made simultaneously by Allison (1987) and B. Muthén et al. (1987). We consider the framework of Muthén et al. To begin, Muthén et al. (1987) consider the factor analysis model respecified here as y = τ + Λη + ε,
[5.10]
where y∗ is a vector of potentially observable variables and the remaining parameters were defined earlier in Chapter 3. In addition to the factor model in Equation [5.10], Muthén et al. (1987) consider a vector of latent selection variables s∗ associated with each y* that follow the specification s = Γη η + Γy y + δ,
[5.11]
where Γη and Γy are coefficient matrices that allow the selection to be a function of η, y∗, or both, and δ is a vector of disturbances. In essence, the vector s∗
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 95
Statistical Assumptions Underlying Structural Equation Modeling—95
represents the propensity that each y∗ will be selected to be observed as y. That is, we can define a threshold variable νj such that for the ith observation sij =
1, 0,
if sij > νj : otherwise
[5.12]
In other words, when sij = 1, the corresponding y∗j is selected to be observed. Otherwise, it is missing. The strength of the selectivity is determined by the size of the elements in Γη and Γy, and the amount of missing data is determined by the thresholds ν. From here, Muthén et al. (1987) derive the likelihood function of the recorded observations on y∗ by considering two vectors of parameters—a vector of parameters for the factor analysis model in Equation [5.10] and a vector of parameters for the selection model in Equation [5.11]. Let the stacked vector of factor analytic parameters be denoted as θ = ðτ, Λ, Ψ, Θε Þ,
[5.13]
where Ψ is the covariance matrix of η and Θε is the diagonal matrix of unique variances. Further, let the stacked vector of selection parameters be denoted as φ = ðΓ, Θδ , Θδε , νÞ,
[5.14]
where the matrix Θδ ε allows for the possibility that ε and δ are correlated. The approach taken by Muthén et al. (1987) is to arrange the data into G distinct missing data patterns. Let Σg and Sg be the population and sample covariance matrices for the gth missing data pattern, respectively. Then, following Little (1982), B. Muthén et al. (1987), and Rubin (1976), the log-likelihood can be written as log Lðθ, φjyÞ =
G X
log hg ðθjyÞ + log f ðθ, φjyÞ
[5.15]
g =1
where 1 log hg ðθjyÞ = const: − N g logjΣg j 2 1 g − N trðΣg Þ − 1 ½Sg + ðyg − μg Þðyg − μg Þ0 : 2
[5.16]
It can be seen that Equation [5.15] is composed of two parts. From B. Muthén et al. (1987), Equation [5.15] is referred to as the “true” likelihood
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 96
96—STRUCTURAL EQUATION MODELING
because it contains the parameters of the substantive model of interest as well as the parameters of the model that generates missing data. If we consider just the first term on the right-hand side of Equation [5.15], this is referred to by B. Muthén et al. (1987) as the “quasi-likelihood” because it ignores the mechanism that generates the missing data. The second term on the right-hand side of Equation [5.15] represents the mechanism that generates the missing data. The estimator that maximizes the quasi-likelihood is referred to by Muthén et al., as the full quasi-likelihood (FQL) estimator. At this point, attention is focused on the extent to which the mechanism that generates the missing data—that is, log f ðθ, φjyÞ , is ignorable or nonignorable. A highly restrictive missing data process that is ignorable is MCAR. In the context of Equation [5.11], MCAR implies that s∗ = δ, that is, Γη = 0, Γy = 0, and Θδ ε = 0.3 Under MCAR, maximization of Equation [5.15] will yield correct ML estimates. Although in the context of model-based approaches MCAR is the easiest to understand, it is also the most restrictive and most unrealistic. However, the model based approach suggested by B. Muthén et al. (1987) can be used to model the more realistic assumption of MAR. Following Muthén et al., consider first the case where missing data are predicted by the latent y∗ variables. From the standpoint of Equation [5.11] this can be represented as s = Γy + δ:
[5.17]
To provide a substantive context to the problem consider measuring the construct of student academic self-concept across two time periods, where we have complete data at Time 1 and missing data due to attrition at Time 2. Here we are considering the case where a respondent is missing data on the indicators of self-concept at Time 2 due to, say, an increase in any one of the respondents’ Time 1 measures. In other words, consider expanding Equation [5.17] as s = Γ1 y1 + Γ2 y2 + δ:
[5.18]
B. Muthén et al. (1987) show that when Γ2 = 0, and Θδε = 0, then the second term on the right-hand side of Equation [5.18] does not enter into the differentiation with respect to the model parameters in the first term of the right-hand side of Equation [5.18]. These conditions satisfy the definition of MAR. Next, consider the case where respondents omit data at Time 2 due to true academic self-concept rather than on any one or more of the unreliable measures of academic self-concept. In this case, the selection model can be written as s = Γη η + δ:
[5.19]
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 97
Statistical Assumptions Underlying Structural Equation Modeling—97
B. Muthén et al. (1987) show that in this case, the likelihood involves both model parameters θ and missing data parameters φ. Hence, the assumption of MAR is not satisfied in this case and the missing data mechanism is not ignorable. Some Studies Using the FQL Estimator. The FQL estimator was compared with LPA and PPA in an extensive simulation study by B. Muthén et al. (1987). The general findings were that the FQL estimator was superior to the more traditional approaches to handling missing data even under conditions where it was not appropriate to ignore the mechanism that generated the missing data. In another simulation study, Kaplan (1995b) demonstrated the superiority of the FQL estimator of the PPA approach to data that were missing completely at random by design. Problems With the FQL Estimator. The major problem associated with the FQL estimator is that it is restricted to modeling under a relatively small number of distinct missing data patterns. Specifically, for the covariance matrices for each distinct group to be positive definite, the number of respondents in any given group must be one more than the total number of variables in the model.4 With the exception of cases where missing data is by design,5 small numbers for distinct missing data patterns are rare in social and behavioral science applications. 5.2.4 MAR-BASED APPROACHES FOR MODELING MISSING DATA Recently, it has been possible to incorporate MAR-based approaches (Little & Rubin, 2002) to modeling missing data within standard structural equation modeling software. Specifically, Arbuckle (1996) suggested an approach to ML estimation of structural model parameters under incomplete data assuming MAR. The method appears to have been originally suggested by Finkbeiner (1979) for the analysis of confirmatory factor models under incomplete data. The general idea is as follows. Let xi be the vector of observed data for case i. The length of xi is equal to the number of variables with complete data. Following Arbuckle’s (1996) example of three variables, some cases will have complete data on all three variables, other cases will have complete data on two of the three variables, whereas still other cases will have complete data on only one variables. In the next step, mean vectors and covariance matrices are formed for cases that have the same pattern of observed data. For example, if Cases 1, 2, and 5 have complete data on all three variables, then a mean vector and covariance matrix would be formed for those three. Mean vectors and covariance matrices for the remaining cases are similarly formed.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 98
98—STRUCTURAL EQUATION MODELING
Once the mean vectors and covariance matrices have been formed, the full information ML approach of Arbuckle (1996) uses the fact that for the ith case, the log-likelihood function can be written as 1 1 log Li = Ci − logjΣi j − ðxi − μi Þ0 Σi− 1 ðxi − μi Þ, 2 2
[5.20]
and the log-likelihood of the entire sample is the sum of the individual loglikelihoods. As usual, the likelihood is maximized in terms of the parameters of the model. A comparable development in estimation under MAR was proposed by B. Muthén and incorporated in Mplus (L. Muthén & Muthén, 2006). In the first step the mean, variances, and covariances are estimated via ML using the EM algorithm suggested by Little and Rubin (2002) with no restrictions. This is referred to as the H1 model. Then, the model of interest (H0) is estimated conditional on the exogenous variables.6 If there are missing values on the exogenous variables, they are estimated via ML using EM and held fixed at those values when the H0 model is estimated. A fitting function similar to Equation [5.20] is used, yielding a large sample chi-square test of model fit. Simulation studies of ML under MAR have been undertaken. For example, Arbuckle (1996) compared his full-information ML approach with PPA and LPA under missing data conditions of MCAR and MAR. His results suggested that the ML approach performed about the same as PPA and LPA with respect parameter estimate bias but outperformed these ad hoc methods with respect to sampling variability. More recently, a simulation study by Enders and Bandalos (2001) demonstrated the effectiveness of full-information ML to PPA and LPA under MCAR and MAR.
5.3 Specification Error In addition to normality and ignorable missing data, structural equation models assume no specification errors. In the context of this book, specification error is defined to be the omission of relevant variables in any equation of the system of equations defined by the structural equation model.7 This includes the measurement model equations as well as the structural model equations. As with the problems of nonnormality and missing data, the question of concern in this section is the extent to which omitted variables affect inferences. However, for a complete understanding of this problem, it is necessary to link specification error to issues of sample size and statistical power.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 99
Statistical Assumptions Underlying Structural Equation Modeling—99
5.3.1 THE BASIC PROBLEM OF SPECIFICATION ERROR The basic problem of specification error is well known. From the standpoint of simple linear regression, specification errors in the form of omitted variables induce a correlation between the errors and exogenous variables in the model. As such, the ordinary least squares estimators will no longer be unbiased, and the bias will depend directly on the size of the correlation between the errors and exogenous variables. The extension of the ordinary least squares problem to systems of structural equations is relatively similar but depends, in part, on the chosen estimator. In particular, estimators such as two-stage least squares and limited information ML that focus on estimating the parameters of one equation at a time, limit the effects of specification errors to the particular equation where the error occurred. By contrast, full information estimators, such as full information ML and three-stage least squares tend to propagate errors throughout the system of equations (see, e.g., Intriligator, Bodkin, & Hsiao, 1996; White, 1982). We take up the problem of specification error propagation below. 5.3.2 STUDIES ON THE PROBLEM OF SPECIFICATION ERROR The mid-1980s saw a proliferation of studies on the problem of specification error in structural equation models. The general finding is that specification errors in the form of omitted variables can result in substantial parameter estimate bias (Kaplan, 1988). In the context of sampling variability, specification errors have been found to be relatively robust to small specification errors (Kaplan, 1989c). However, the z-test associated with free parameters in the model is affected in such a way that specification error in one parameter can propagate to affect the power of the z-test in another parameter in the model. Sample size, as expected theoretically, interacts with the size and type of the specification error. A consistent finding of studies on specification error is that specification error in one part of a model can propagate to other parts of the model. In the context of structural equation modeling, this error propagation was first noticed by Kaplan (1988) and studied more closely in a paper by Kaplan and Wenger (1993). The nature of specification error propagation depends on the concepts of asymptotic independence and separable hypotheses in restricted maximum likelihood estimation introduced by Aitchison (1962). Following Kaplan and Wenger (1993), consider the case where an investigator wishes to restrict two parameters simultaneously based on the Wald test given in Equation [2.26] in ^ 1 and o ^ 2 are Chapter 2. Let W21 be the Wald test of the joint hypothesis that o zero. Then, from Equation [2.26] the Wald test can be written as
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 100
100—STRUCTURAL EQUATION MODELING
^1 W21 = n½ o
^2 o
Varð^ o1 Þ ^ 2Þ Covð^ o1 , o
sym: Varð^ o2 Þ
^1 o , ^2 o
[5.21]
^ 2 ) are the variances of o ^ 2 , and Cov( o ^ 2 ) is the ^ 1 ) and Var(o ^ 1 and o ^ 1, o where Var(o ^ 2 , which is not assumed to be zero. In this case, the ^ 1 and o covariance of o determinant of the middle term in Equation [5.21] can be written as 2 ^ 2 Þ. o1 , o ^2 Þ, and C21 = Covð^ , where V1 = Varð^ D = V1 V2 − C21 o1 Þ, V2 = Varðω Thus, Equation [5.21] can be reexpressed as
W21 =
n 2 ^ 2 C21 − o ^ 1o ^ 21 C21 + o ^ 22 V1 : ½^ o V2 − o D 1
[5.22]
From Equation [5.22], it can be seen that the Wald test involves the covariance of the parameters C21. If C21 = 0, then the simultaneous Wald test in Equation [5.22] decomposes into the sum of two univariate Wald tests. When C21 = 0, we say that the test statistics are asymptotically independent and the hypotheses associated with these tests are separable (Aitchison, 1962; see also Satorra, 1989). The example just given for the case of two hypotheses can be expanded to sets of multivariate hypotheses (see Kaplan & Wenger, 1993). Of particular relevance to the problem of specification error propagation is the case of three hypotheses. Specifically, Kaplan and Wenger (1993) considered the problem of ^ 1 and ωˆ 3 that have zero covariance— restricting two parameter estimates, say o ^ 2 and ωˆ 3 has a ^ 1 has a nonzero covariance with o that is, C31 = 0. However, o ^ 1 and ωˆ 3 are asymptotically nonzero covariance with ωˆ 3. Then, in this case, o ^ 1 and ωˆ 3 are zero will not decomindependent, but the joint hypothesis that o pose into the sum of the individual hypotheses because of their “shared” covari^ 2 . In other words, these particular hypotheses are not separable. ance with o Kaplan and Wenger (1993) referred to hypotheses of this sort as transitive. For the hypotheses associated with these parameters to be separable, all three parameters must be mutually asymptotically independent. The above discussion suggests that the pattern of zero and nonzero values in the covariance matrix of the estimates is the underlying mechanism that governs the impact of testing parameters, either singly or in sets. In the context of specification error, if a parameter (or sets of parameters) has a zero covariance with another parameter, then the restriction of one (say on the basis of the Wald test) will not affect the estimate or standard error of the other parameter. In addition, the concept of transitive hypotheses suggest that if two parameters are asymptotically independent but not mutually asymptotically independent with respect to a third parameter, then the restriction of one will affect the other due to its shared covariance with that third parameter (Kaplan & Wenger, 1993).
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 101
Statistical Assumptions Underlying Structural Equation Modeling—101
5.3.3 IMPLICATIONS OF SPECIFICATION ERROR PROPAGATION FOR THE PRACTICE OF STRUCTURAL EQUATION MODELING A key point of the discussion on specification error propagation is that the form of the covariance matrix of the estimates is determined by the initial specification of the model. After that, each addition (or deletion) of parameters results in a change in the form of the covariance matrix of the estimates and hence, in the ways that specification errors will manifest themselves through the model. By and large, it is rather difficult to predict how specification error will be propagated throughout a model. This essential difficulty has implications for parameter testing discussed in Chapter 2. Specifically, any change in the model results in a change in other parameters of the model as dictated by the form of the covariance matrix of the estimates. In addition, the result of a series of additions (or deletions) of parameters on the basis of the Lagrange multiplier or Wald test will not sum to the effect of adding (or deleting) parameters all at once, unless mutual asymptotic independence holds. Thus, multivariate approaches to parameter testing such as the multivariate Lagrange multiplier or multivariate Wald test advocated by Bentler (1995), must proceed with great caution. Indeed, Kaplan and Wenger (1993) argued for a more cautious univariate testing approach trading statistical power for monitoring model changes.
5.4 An Additional Assumption: Exogeneity The assumptions multivariate normality, random or completely random missing data, and no specification error constitute the standard assumptions underlying structural equation modeling. Perhaps a more primary assumption and one that touches on specification issues relates to the “exogeneity” of exogenous variables. We will see that simply designating a variables as “exogenous” does not render it as such. Nor is the standard requirement of orthogonality of a variable and a disturbance term sufficient for a variable to be exogenous. A detailed discussion of the issue of exogeneity can be found in Kaplan (2004). Before beginning, it is useful to point out that there are three forms of exogeneity considered in the econometrics literature. The first is weak exogeneity. The second is strong exogeneity (which includes the feature of Granger noncausality). The third is super exogeneity, which includes the notion of parameter invariance. Pearl (2000) makes the important distinction that weak exogeneity and strong exogeneity are statistical concerns while super exogeneity concerns the issue of causality. In this section, we focus on the statistical problem of weak exogeneity and briefly touch on super exogeneity in Chapter 10.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 102
102—STRUCTURAL EQUATION MODELING
To motivate the problem of exogeneity from classic linear regression consider a matrix of variables denoted as z of order N × r, where N is the sample size and r is the number of variables. It is typically the case that we decide on a partitioning of z into endogenous variables constituting the outcomes to be modeled and exogenous variables that are assumed to account for the variation in the endogenous variables. Denote by y the N × p matrix of endogenous variables, and by x an N × q matrix of exogenous variables where r = p + q. Of interest is the joint distribution of y and x, denoted as f ðzjθÞ f ðz1 , z2 , . . . , zN jθÞ =
N Y
ðzi jθÞ,
[5.23]
i=1
where θ is a t-dimensional vector of parameters of the joint distribution of z. The parameter space of θ is denoted by Θ. We can rewrite Equation [5.23] in terms of the conditional distribution of y given x and the marginal distribution of x. That is, Equation [5.23] can be expressed as f ðy, xjθÞ = f ðyjx, ω1 Þf ðx, ω2 Þ,
[5.24]
where ω1 are the parameters associated with the conditional distribution of y given x and ω2 are the parameters associated with the marginal distribution of x. The parameter spaces of ω1 and ω2 are denoted as Ω 1 and Ω 2, respectively. Factoring the joint distribution in Equation [5.23] into the product of the conditional distribution and marginal distribution in Equation [5.24] represents no loss of information. However, if attention is focused on the conditional distribution only, then this assumes that the marginal distribution can be taken as given (Ericsson & Irons, 1994). The issue of exogeneity concerns the implications of this assumption. 5.4.1 PARAMETERS OF INTEREST Generally speaking, interest usually focuses on modeling the conditional distribution in Equation [5.24]. If this is the goal, then as noted above, it implies that the parameters of the marginal distribution of x carry no relevant information for the parameters of interest. In the case of simple linear regression, the parameters of interest can be defined as θ = ðβ, s2u Þ . Therefore, more formally, the parameters of interest are a function of ω1, that is, θ = gðω1 Þ .
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 103
Statistical Assumptions Underlying Structural Equation Modeling—103
5.4.2 VARIATION FREE Another important concept as it relates to exogeneity is that of variation free. Specifically, variation free means that for any value of ω2 in Ω2, ω1 can take on any value in Ω1, and vice versa (Spanos, 1986). To take an example, consider again the simple regression model. The parameters of interest of the conditional distribution are θ1 ≡ ðb, σ 2u Þ and the parameters of the marginal distribution are θ2 ≡ ðmx , s2x Þ . Furthermore, note that b = sxy =s2x , where σxy denotes the covariance of x and y. Following Ericsson (1994), if σxy varies proportionally with σx2, then σx2, which is in ω2 carries no information relevant for the estimation of b = sxy =s2x , which is in ω1. Therefore, ω1 and ω2 are variation free. 5.4.3 A DEFINITION OF WEAK EXOGENEITY The above concepts of factorization, parameters of interest, and variation free lead to a definition of weak exogeneity. Specifically, following Richard (1982; see also Ericsson & Irons, 1994; Spanos, 1986), a variable x is weakly exogenous for the parameters of interest, say ψ, if and only if there exists a reparameterization of θ as ω with ω = ðω1 , ω2 Þ such that (i) θ = g(ω1); (ii) ω1 and ω2 are variation free.
5.4.4 SOME EXAMPLES OF THE EXOGENEITY PROBLEM It may be of interest consider the standard situation for which weak exogeneity holds. For simplicity, consider the simple linear regression model. It is known that within the class of elliptically symmetric multivariate distributions, the multivariate normal distribution possess a conditional variance (scedasticity) that can be shown not to depend on the exogenous variables (Spanos, 1986). To see this, consider the bivariate normal distribution for two variables x and y. The conditional and marginal densities of the bivariate normal distribution can be written respectively as ðyjxÞ ffi N ½ðb0 + b1 xÞ, s2 , x ffi N ½m2 , s222 ,
[5.25]
where the top expression states the conditional distribution of y given x is approximately normally distributed with mean equal to β0 + β1 x and variance equal to σ2. The bottom expression states that the marginal distribution of x is approximately normally distributed with mean μ2 and variance σ222.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 104
104—STRUCTURAL EQUATION MODELING
Following our notation from above, arrange the conditional and marginal parameters accordingly: θ = ðm1 , m2 , s12 , s22 , s12 Þ, ω1 = ðb0 , b1 , s2 Þ,
[5.26]
ω2 = ðm2 , s22 Þ:
Note that for the bivariate normal (and by extension the multivariate normal), x is weakly exogenous for the estimation of the parameters in ω1 because the parameters of the marginal distribution ω2 do not appear in the set of the parameters for the conditional distribution ω1. In other words, the choices of values of the parameters on ω2 do not restrict in any way the range of values that the parameters in ω1 can take. The multivariate normal distribution, as noted above, belongs to the class of elliptically symmetric distributions. Other distributions include Student’s t, the logistic, and the Pearson type III distributions. Consider the case where the joint distribution can be characterized by a bivariate Student’s t distribution (symmetric but leptokurtotic). The conditional and marginal densities under the bivariate Student’s t can be written as (Spanos, 1999) νs2 1 1+ ðyjxÞ ffi St ðb0 + b1 xÞ, ½x − m2 2 ν + 1 , νs22 ν−1 xffi
[5.27]
St½m2 , s222 ; ν,
where ν are the degrees of freedom. Let θ = ðm1 , m2 , s211 , s222 , s12 Þ, ω1 = ðb0 , b1 , m2 , s222 , s2 Þ,
[5.28]
ω2 = ðm2 , s222 Þ:
Here, it can be seen that the parameters of the marginal distribution ω2 appear with the parameters of conditional distributions ω1. Thus, x is not weakly exogenous for the estimation of the parameters in ω1. One simple indirect test of exogeneity is to assess the assumption of joint normality of y and x using, say Mardia’s coefficient of multivariate skewness and kurtosis. If the joint distribution is something other than the normal, then parameter estimation must occur under the correct distributional form, and that inference may require specification of the parameters of the marginal distribution.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 105
Statistical Assumptions Underlying Structural Equation Modeling—105
5.4.5 WEAK EXOGENEITY IN STRUCTURAL EQUATION MODELS A perusal of the extant textbook and substantive literature utilizing structural equation modeling suggests that the exogeneity of the predictor variables, as defined above, is not addressed. Indeed, the extant literature reveals that only theoretical considerations are given when delimiting a variable as “exogenous.” An approach to remedying this problem is to focus on the reduced form of the model as a means of initial exogeneity testing. The reduced form specification of a structural model was given in Equation [2.3] of Chapter 2 and is derived from manipulating the structural form so that the endogenous variables are on one side of the equation and the exogenous variables are on the other side. An inspection of Equation [2.3] reveals that reduced form is nothing more than the multivariate general linear model. From here, Equation [2.3] can be used to assess weak exogeneity. Specifically, from the context of the reduced form, the parameters of interest are θ1 ≡ ðΠ1 , Ψ Þ . 5.4.6 AN INDIRECT TEST OF WEAK EXOGENEITY IN STRUCTURAL EQUATION MODELING We may wish to consider the standard situation for which weak exogeneity holds. It is known that under the assumption of multivariate normality of y and x, the parameters of the joint distribution and the marginal distribution are variation free. In particular, only under the assumption of normally and independently distributed observations does weak exogeneity hold. By contrast, if the joint distribution can be characterized by a multivariate Student’s t distribution, then the parameters of the marginal and conditional distributions will not be variation free and hence x will not satisfy the assumption of weak exogeneity. Thus, one indirect test of exogeneity is to test the assumption of joint multivariate normality of y and x via, say Mardia’s test of multivariate skewness and kurtosis. Note also that the assumption of normal, independent, and identically distributed observations also implies the assumption of homoscedastic errors. Thus any departure of the normal iid (independent and identically distributed) assumption, including that of homoscedasticity, calls into question the assumption of weak exogeneity of x because it suggests a relationship between the parameters of the marginal distribution and the conditional distribution. In the context of structural equation modeling, if attention turns to the reduced form of the model as described in Equation [5.25], then standard methods for assessing the normal iid assumption—including homoscedasticity assessments would be relatively easy to implement. In any case, users of structural equation modeling should be encouraged to study plots and obtain the necessary diagnostics to assess weak exogeneity.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 106
106—STRUCTURAL EQUATION MODELING
5.4.7 WEAK EXOGENEITY AND THE PRACTICE OF STRUCTURAL EQUATION MODELING The assumption of weak exogeneity holds two rather important ramifications for the practice of structural modeling. In the first place, most structural modeling software packages use some form of conditional estimation— conditional on the exogenous variables. As we have seen, conditional estimation is only valid if the exogenous variables are weakly exogenous. If exogeneity does not hold, it calls into serious question the use of conditional estimation. What follows from this assumption is that software programs allow for the characterization of alternative distributional forms for the joint distribution of the data. Second, from a predictive standpoint, lack of weak exogeneity calls into question the use of the variable for predictive studies. Clearly, it does not make sense to manipulate values of an exogenous variable for purposes of prediction when the parameters of that variable are a function of the parameters used for the prediction. Thus, as we argued above, exogeneity testing is perhaps the most crucial specification assumption and one that deserves serious attention.
5.5 Conclusion This chapter examined the major statistical assumptions underlying structural equation modeling—including normality, ignorable missing data, and no specification error. We also discussed the issue of the sampling mechanism as an important assumption underlying the use of structural equation modeling. In addition, we introduced a new assumption concerning the exogeneity of predictor variables. We argued that the reduced form specification of the model could be used to assess exogeneity. Assessing the assumptions of the joint normality and homoscedasticity offers a simple approach to assessing exogeneity. We will return to a general discussion of exogeneity with respect to problems of causal inference in structural equation modeling in Chapter 10. Setting aside issues of statistical assumptions, the practice of structural equation modeling possesses set of strategies for model evaluation and modification. The next chapter considers the variety of strategies available for model evaluation and modification.
Notes 1. The vech(·) operator takes the –12 s (s + 1) nonredundant elements of the matrix and strings it into a vector of dimension [–12 s (s + 1)]×1.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 107
Statistical Assumptions Underlying Structural Equation Modeling—107 2. See Section 2.4 in Chapter 2 for the properties of a discrepancy function. 3. The condition that Θδε = 0 relates to Rubin’s (1976) notion of no parameter space restrictions or ties between the model parameters and the missing data model parameters. 4. From a practical standpoint, this problem relates to the fact that software implementation of the FQL estimator requires the specification of multiple group models with each group representing a distinct missing data pattern. 5. An example of data missing by design arises in the context of balanced incomplete spiraling item assessments. This form of missing complete at random by design was used to study the properties of the FQL estimator in Kaplan (1995b). 6. We consider the validity of such conditional estimation when we take up the issue of exogeneity in Chapter 9. 7. In econometric treatments of simultaneous equation modeling, specification error often refers to incorrect functional form of the model. For example, specification error may refer to the use of a linear model specification when a nonlinear model is correct.
05-Kaplan-45677:05-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 108
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 109
6 Evaluating and Modifying Structural Equation Models
T
he previous chapter considered the assumptions underlying structural equation models, the consequences of violating the assumptions, and current remedies available to address these violations. It was argued that assessing assumptions was crucial insofar as violation of one or more of the assumptions can profoundly affect estimates, standard errors, and tests of goodness of fit. It was further argued in the previous chapter that it is essential to rule out or otherwise control for assumption violations in order to place confidence in the results of a structural equation modeling exercise. In terms of the conventional practice of structural equation modeling, it is often the case that content area researchers will attempt to evaluate the fit of the model and in some cases interpret parameter estimates regardless of whether the assumptions have been assessed and/or controlled. The problem of evaluating and interpreting structural equation models has dominated the methodological literature for many years. This chapter considers first the development and use of alternative fit indices for assessing the fit of a structural equation model. No attempt is made to feature every alternative fit index. Rather, we rely on a broad classification of the indices, discuss their use in practice, and summarize the statistical literature pertaining to their advantages and limitations. The section on alternative fit indices will then be followed by a discussion of model modification and statistical power. Here, the focus is on strategies of model modification using the Lagrange multiplier (LM) and Wald tests described in Chapter 2. The closely related issue of statistical power is also considered in this section.
109
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 110
110—STRUCTURAL EQUATION MODELING
6.1 Evaluating Overall Model Fit: Alternative Fit Indices For almost 30 years, attention has focused on the development of alternative indices that provide relatively different perspectives on the fit of structural equation models. The development of these indices has been motivated, in part, by the known sensitivity of the likelihood ratio chi-square statistic to large sample sizes as discussed in Chapter 5. Other classes of indices have been motivated by a need to rethink the notion of testing exact fit in the population—an idea that is deemed by some to be unrealistic in most practical situations. Finally, another class of alternative indices have been developed that focus on the crossvalidation adequacy of the model. In this section, we divide our discussion of alternative fit indices into three categories: (1) measures based on comparative fit to a baseline model, including those that add a penalty function for model complexity; (2) measures based on population errors of approximation; and (3) model selection measures based on the notion of cross-validation adequacy. Chapter 2 covered the likelihood ratio chi-square, and it will not be discussed here. 6.1.1 MEASURES BASED ON COMPARATIVE FIT TO A BASELINE MODEL Arguably, the most active work in the area of alternative fit indices has been the development of what can be broadly referred to as measures of comparative fit. The basic idea behind these indices is that the fit of the model is compared to the fit of some baseline model that usually specifies complete independence among the observed variables. The baseline model of complete independence is the most restrictive model possible and hence the fit of the baseline model will usually be quite large. The issue is whether one’s model of interest is an improvement relative to the baseline model. A subset of these types of indices is designed to take into account the degree of misspecification in the model. In some cases, these fit indices are augmented with a penalty function for each parameter estimated. These indices are typically scaled to lie between 0 and 1, with 1 representing perfect fit relative to this baseline model. The usual rule of thumb for these indices is that 0.95 is indicative of good fit relative to the baseline model. The sheer number of comparative fit indices precludes a detailed discussion of each one. We will consider a subset of indices here that serve to illustrate the basic ideas. Detailed discussions of these indices can be found in, for example, Hu and Bentler (1995). The quintessential example of a comparative fit index is the normedfit index (NFI) proposed by Bentler and Bonett (1980). This index can be written as
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 111
Evaluating and Modifying Structural Equation Models—111
NFI =
w2b − w2t , w2b
[6.1]
where χ2b is the chi-square for the model of complete independence (the socalled baseline model) and χ2t is the chi-square for the target model of interest. The model of complete independence will typically be associated with a very large chi-square value because the null hypothesis tested by χ2b states that there are no covariances among the variables in the population. Therefore, values close to 0 suggest that the target model is not much better than a model of complete independence among the variables. Values close to 1 suggest that the target model is an improvement over the baseline model. As noted above, a value of 0.95 is considered evidence that the target model fit is a good fit to the data relative to the baseline model. An index that is similar to the NFI but which takes into account the expected value of the chi-square statistic of the target model is the TuckerLewis index (TLI; Tucker & Lewis, 1973), also referred to as the nonnormed fit index (NNFI).1 The TLI can be written as TLI =
ðw2b =dfb − w2t =dft Þ , ðw2b =dfb − 1Þ
[6.2]
where dfb denotes the degrees of freedom for the model of complete independence and dft denotes the degrees of freedom for the target model of interest. This index may yield values that lie outside the 0 to 1 range. The NFI and TLI assume a true null hypothesis and therefore a central chi-square distribution for the test statistic. However, an argument could be made that the null hypothesis is never exactly true and that the distribution of the test statistic can be better approximated by a noncentral chi-square with noncentrality parameter λ. An estimate of the noncentrality parameter can be obtained as the difference between the statistic and its associated degrees of freedom. Thus, for models that are not extremely misspecified, an index developed by McDonald and Marsh (1990) and referred to as the relative noncentrality index (RNI) can be defined as
ðw2b − dfb Þ − ðw2t − dft Þ RNI = : w2b − dfb
[6.3]
The RNI can lie outside the 0 to 1 range. To remedy this, Bentler (1990) adjusted the RNI so that it would lie in the range of 0 to 1. This adjusted version is referred to as the comparative fit index (CFI).2
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 112
112—STRUCTURAL EQUATION MODELING
Finally, there are classes of CFIs that adjust existing fit indices for the number of parameters that are estimated. These are so called parsimony-based CFIs. The rationale behind these indices is that a model can be made to fit the data by simply estimating more and more parameters. Indeed, a model that is just-identified fits the data perfectly. Therefore, an appeal to parsimony would require that these indices be adjusted for the number of parameters that are estimated. One such parsimony-based index proposed by Mulaik et al. (1989) is the parsimony-NFI (PNFI) defined as PNFI =
dft NFI: dfb
[6.4]
The rationale behind the PNFI is as follows. Note that the baseline model of complete independence restricts all off-diagonal elements of the covariance matrix to zero. Thus, the degrees of freedom for the baseline model is dfb = pðp − 1Þ=2 , where p is the total number of observed variables and represents the degrees-of-freedom for the most restricted model possible. The more parameters estimated in the target model, the less restricted the model becomes relative to the baseline model and the greater the penalty attached to the NFI. As noted above, considerable attention has been paid to the development and application of CFIs. The extant literature is replete with studies on the behavior of these, and many other, CFIs. The major questions concern the extent to which these indices are sensitive to sample size, method of estimation, and distributional violations (Marsh, Balla, & McDonald, 1988). A detailed account of the extant studies on the behavior of the CFIs is beyond the scope of this chapter. An excellent review can be found in Hu and Bentler (1995). Suffice to say, that use of comparative indices has not been without controversy. In particular, Sobel and Borhnstedt (1985) argued early on that these indices are designed to compare one’s hypothesized model against a scientifically questionable baseline hypothesis. That is, the baseline hypothesis states that the observed variables are completely uncorrelated with each other. Yet, as Sobel and Borhnstedt have pointed out, one would never seriously entertain such a hypothesis, and that perhaps these indices should be compared with a different baseline hypothesis. Unfortunately, the conventional practice of structural equation modeling as represented in scholarly journals does not suggest that these indices have ever been compared to anything other than the baseline model of complete independence (see also Tanaka, 1993). An Example of Comparative Fit Indices Applied to the Science Achievement Model In Chapter 4, it was noted that the science achievement model did not fit the data as evidenced by the likelihood ratio chi-square statistic. In this section,
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 113
Evaluating and Modifying Structural Equation Models—113
we provide the alternative fit indices described above and interpret the fit of the model with respect to those indices. We will revisit our conclusions regarding the fit of the model in Section 6.2 after the model has been modified. Table 6.1 provides the TLI and CFI as described in Section 6.1. If we were to evaluate the fit of the model on the basis of indices that compare the specified model against a baseline model of independence, we would conclude here as well that the model does not fit the data well. That is, the TLI does not reach or exceed the criterion of 0.95 for acceptable fit. Also, the CFI, which does not rest on the assumption of a true population model but takes into account population noncentrality, does not suggest good fit. 6.1.2 MEASURES BASED ON ERRORS OF APPROXIMATION It was noted earlier that the likelihood ratio chi-square test assesses an exact null hypothesis that the model fits perfectly in the population. If the model in question is overidentified however, then it is quite unlikely that the model will fit the data perfectly even if the entire population were measured. Not only is it unreasonable to expect a model to hold even if we had access to the population, but small errors can also have detrimental effects on the likelihood ratio test when applied to large samples. It seems, therefore, that a more sensible approach is to assess whether the model fits approximately well in the population. The difficulty arises when trying to quantify what is meant by “approximately.” To motivate this work, it is useful to differentiate among different types of fit measures (Cudeck & Henly, 1991; see also Linhart & Zucchini, 1986). To fix ideas, let Σ0 be the population covariance matrix. In general, a given model will Table 6.1
Selected Alternative Measures of Model Fit for the Initial Education Indicators Model
Fit Measure
Value
p
χ (df = 39)
1730.524
0.000
2
Tucker-Lewis index
0.792
Comparative fit indexa
0.844
RMSEA
0.081
RMSEA lower bound
0.077
RMSEA upper bound
0.084
AIC
209277.638
BIC
209420.573
a. Same as relative noncentrality index except scaled to lie in the interval 0 to 1.
0.000
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 114
114—STRUCTURAL EQUATION MODELING
~ 0 = ΣðΩ0 Þ be the best not fit this covariance matrix perfectly. Further, let Σ ^ be ^ = ΣðΩÞ fit of the model to the population covariance matrix. Finally, let Σ the best fit of the model to the sample covariance matrix S. From these various fit measures, Browne and Cudeck (1993) defined three types of discrepancies that are of relevance to our discussion. First, there ~ 0 Þ referred to as the discrepancy due to approximation, which meais FðΣ0 , Σ sures the lack of fit of the model to the population covariance matrix. Second, ~ 0 , ΣÞ ^ , which measures there is the discrepancy due to estimation, defined as FðΣ the difference between the model fit to the sample covariance matrix and the model fit to the population covariance matrix. The discrepancy due to approximation is unobserved but may be approximated by
~ 0 ,ΣÞ ^ = n−1 q, E½FðΣ
[6.5]
where q are the number of unknown parameters of the model (Browne & Cudeck, 1993). Finally, there is the discrepancy due to overall error, defined ~ 0 Þ , which measures the difference between the elements of the as FðΣ0 , Σ population covariance matrix and the model fit to the sample covariance matrix. Here too, this quantity is unobserved but may be approximated by ^ = FðΣ0 ,Σ ~ 0 Þ + n−1 q, E½FðΣ0 ,ΣÞ
[6.6]
which is the sum of the discrepancy due to approximation and the discrepancy due to error of estimation (Browne & Cudeck, 1993). For completeness of our discussion we may wish to include the usual sample discrepancy function ^ defined in Chapter 2. F^ = FðS, ΣÞ Measures of approximate fit are concerned with the discrepancy due to approximation. Based on the work of Steiger and Lind (1980; see also Browne & Cudeck, 1989), it is possible to assess approximate fit of a model in the population. The method of Steiger and Lind (1980) for measuring approximate fit can be sketched as follows. First, it should be recalled that in line with statistical distribution theory, if the null hypothesis is true, then the likelihood ratio possesses a central chi-square distribution with d degrees of freedom. If the null hypothesis is false, which will almost surely be the case in most realistic situations, then the likelihood ratio statistic has as a noncentral chi-square distribution with d = –12 p (p + 1)−q degrees of freedom and noncentrality parameter, λ. The noncentrality parameter serves to shift the central chi-square distribution to the right. Continuing, let F0 be the population discrepancy value that would be obtained if the model were fit to the population covariance matrix. Generally, F0 > 0 unless the model fits the data perfectly, in which case F0 = 0. Further, let
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 115
Evaluating and Modifying Structural Equation Models—115
F^ be the corresponding sample discrepancy value function obtained when the model is fit to the sample covariance matrix. When F0 = 0, then n F^ is central chi-square distributed, where n = N − 1 . However, when F0 > 0 then n F^ is noncentral chi-square distributed with noncentrality parameter l = n F0 .3 Browne and Cudeck (1993) point out that when n F^ has a noncentral chi-square distribution, then F^ is a biased estimator of F0. Specifically, ^ = F0 + EðFÞ
d , n
[6.7]
with the bias being d/n. The bias in F^ can be reduced by forming the estimator d F^0 = F^ − : n
[6.8]
Because Equation [6.8] can yield negative values, we use F^0 = maxfF^ − n−1 d,0g
[6.9]
as the estimate of the error due to approximation (Browne & Cudeck, 1993). From Equation [6.8], we now have a population discrepancy function F0 and its estimator F^0. However, an inspection of Equation [6.8] shows that F^0 decreases with increasing degrees of freedom. Thus, to controlg for model complexity, Steiger (1990; see also Steiger & Lind, 1980) defines a root mean square error of approximation (RMSEA) as rffiffiffiffiffi F0 , ea = d
[6.10]
with point estimate
^ea =
sffiffiffiffiffi F^0
d qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi = maxf½F^ − n − 1 d,0g:
[6.11]
In using the RMSEA for assessing approximate fit, a formal hypothesis testing framework is employed. On the basis prior empirical examples, Steiger (1989) and Browne and Mels (1990) defined as “close fit” an RMSEA value less than or equal to 0.05. Thus, the formal null hypothesis to be tested is H0 : e ≤ 0:05:
[6.12]
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 116
116—STRUCTURAL EQUATION MODELING
In addition, a 90% confidence interval around ε can be formed enabling an assessment of the precision of the estimate (Browne & Cudeck, 1993; Steiger & Lind, 1980). Practical guidelines recommended by Browne and Cudeck (1993) suggest that values of εˆ between 0.05 and 0.08 are indicative of fair fit, whereas values between 0.08 and 0.10 are indicative of mediocre fit. An Example of the RMSEA Applied to the Science Achievement Model Recall that the NFI and TLI (as well as others) use the likelihood ratio chisquare and assume that the model fits perfectly in the population. We argued that this could be considered too restrictive and that we may wish to evaluate the approximate fit of the model. The RMSEA is designed to evaluate approximate fit of the model. From Table 6.1 we see that the RMSEA and its associated probability value applied to the science achievement model indicates that the null hypothesis of approximate fit must rejected. Moreover, an inspection of the 90% confidence interval also suggests poor approximate fit. 6.1.3 MODEL SELECTION CRITERIA Another important consideration in evaluating a structural model is its performance relative to other models. In some cases, substantively different specifications of a model relying on the same sample and variables may be compared, reflecting competing theoretical frameworks. More often, however, competing models are simply nested or nonnested modifications of an initial specification. This is the problem of model selection, and one criteria for selecting a model is whether the model is capable of cross-validating well in a future sample of the same size, from the same population, and sampled in the same fashion.4 In the context of cross-validation, an investigator may have a sufficiently large sample to allow it to be randomly split in half with the model estimated and modified on the calibration sample and then cross-validated on the validation sample. When this is possible, then the final fitted model from the calibration sample is applied to the validation sample covariance matrix with parameter estimates fixed to the estimated values obtained from the calibration sample. In other instances, investigators may not be in a position to work with samples large enough to allow separation into calibration and validation samples. Yet, the cross-validation adequacy of the model remains a desirable piece of information in selecting a model. In this section, we consider three indices that have been used for model selection in the structural equation modeling context: the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and the expected cross-validation index (ECVI).5
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 117
Evaluating and Modifying Structural Equation Models—117
Akaike Information Criterion In recent years, important contributions to statistical theory in general, and structural equation modeling in particular allow one to gauge the extent to which a model will cross-validate in a future sample based on the use of a single sample. These developments are based on the seminal work of Akaike (1973, 1987) in the area of information theory. The mathematical underpinnings of Akaike’s work are beyond the scope and focus of this chapter. However, the broad outlines of Akaike’s work can be sketched as follows. To begin, the approach taken in this line of inquiry requires adopting the viewpoint that the goal of statistics is the realization of appropriate predictions. By adopting this predictive viewpoint, attention shifts from the estimation of parameters to the estimation of distributions of future observations. The question now turns to the mechanism by which one can estimate a distribution. Here, Akaike (1985) outlines how work in physics and, in particular, the concept of entropy could be related to notions of statistical information.6 Indeed, Akaike shows that Fisher’s information matrix, described in Chapter 2, is a function of the entropy. The goal now is to link these concepts with the method of maximum likelihood. Akaike (1985) notes that a limitation of the method of maximum likelihood is that when selecting from a set of k competing models, maximum likelihood will always prefer the saturated model. Thus, it becomes important to have a sensible procedure whereby several parametric (and over-identified) models can be compared with one model being selected as “best” from a predictive point of view. The predictive viewpoint and the concept of entropy generalize the problem of estimation to one of estimating distributions of future observations and not just parameters. Estimating a distribution of future observations is referred to as a predictive distribution. The question is how one measures the “goodness” of this predictive distribution. One choice, according to Akaike, is to measure the deviation of the predictive distribution from some “true” distribution. A true distribution is a conceptual construct that forms the basis by which one designs an estimation procedure.7 It happens that this deviation can be linked back to the expected entropy. From here, Akaike derives the result that regardless of the form of the true distribution, the log-likelihood based on present data is an unbiased estimate of the expected log-likelihood of some future set of data. This observation leads to a measure of the “badness-of-fit” of the model, referred to as AIC and written as AIC = ð−2Þlog-likelihood + 2ðnumber of parametersÞ:
[6.13]
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 118
118—STRUCTURAL EQUATION MODELING
In the context of structural equation modeling, the AIC can be sketched as follows. Following Akaike (1987), let q0 be the number of unknown parameters under the null hypothesis Σ = ΣðΩÞ and let qa = –12 p(p + 1) be the number of unknown parameters under the alternative hypothesis that Σ is an arbitrary symmetric positive definite matrix. Akaike shows that for a particular model AICðH0 Þ = ð−2Þ max ln LðH0 Þ + 2q0 :
[6.14]
The first term on the right-hand side of Equation [6.14] is a measure of the fit of the model and the second term is penalty function. If q0 is low, it suggests that the model is parsimonious whereas if q0 is high the model is relatively less parsimonious. Similarly, for the alternative hypothesis AICðHa Þ = ð−2Þ max ln LðHa Þ + 2qa :
[6.15]
Recall that the likelihood ratio (LR) discussed in Chapter 2 can be written here as LR = ð−2Þ max ln LðH0 Þ − ð−2Þ max ln LðHa Þ,
[6.16]
which we noted was distributed as chi-square with df = qa − q0 degrees of freedom. Therefore, if we subtract AIC(Ha) from AIC(H0), we obtain AICðH0 Þ − AICðHa Þ = ð−2Þ max ln LðH0 Þ + 2q0 −ð−2Þ max ln LðH0 Þ + 2qa
[6.17]
= w − 2ðdf Þ: 2
When the goal is to use the AIC for model comparison and selection then AIC(Ha) is common to all computations based on the same data and cancels out of the comparisons. Therefore, AIC(H0) can be simply defined as AICðH0 Þ = w2 − 2ðdf Þ:
[6.18]
The use of the AIC requires fitting several competing models. As noted above, the model with the lowest value of the AIC among the competing models is deemed to fit the data best from a predictive point of view. The smallest value of the AIC is referred to as the minimum AIC (MAIC). A particularly important feature of the AIC is that it can be used for comparing nonnested models. Nonnested models are quite different in terms of their structural specification. However, as discussed below, the AIC can be used to select from a series of nested models that are formed on the basis of relaxing constraints in the model.
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 119
Evaluating and Modifying Structural Equation Models—119
Bayesian Information Criterion A statistic that is similar to the AIC but rests on Bayesian model selection theory is the Bayesian Information Criterion (BIC) (Schwartz, 1978). To motivate the BIC, consider two, not necessarily nested, models, M1 and M2. Following Raftery (1993; see also Kass and Raftery, 1995), we can write the posterior odds of M1 relative to M2 as pðM1 jYÞ pðYjM1 Þ pðM1 Þ = pðM2 jYÞ pðYjM2 Þ pðM2 Þ
[6.19]
= B12 π12 ,
where B12 is called the Bayes factor, and π12 is the prior odds of M1 relative to M2. Note that in the case of neutral prior odds, the Bayes factor is the ratio of the marginal likelihood of M1 to the marginal likelihood of M2. In the case where the prior odds are not neutral, the Bayes factor is the ratio of the posterior odds to the prior odds. It is possible to avoid using the prior probabilities in Equation [6.19] and still obtain a rough approximation to the Bayes factor. Specifically, we define the BIC as BIC = −2 ln L + q0 lnðnÞ,
[6.20]
where q0 is the number of parameters under the null hypothesis and n is the sample size. Following Raftery (1993), it can be shown that 1
B12 = e − =2 BIC12 :
[6.21]
Studies have shown that the BIC tends to penalize models with too many parameters more harshly than the AIC.
Expected Cross-Validation Index Another method for model selection that is also based on assessing crossvalidation adequacy of structural equation models is based on the work of Browne and Cudeck (1989, 1993) and uses aspects of the AIC and the different notions of discrepancies described in the Section 6.1.2. To begin, a cross-validation index (CVI) can be formed as ^ C Þ, CVI = FðSV , Σ
[6.22]
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 120
120—STRUCTURAL EQUATION MODELING
ˆ is where SV is the sample covariance matrix from the validation sample, and Σ c the fitted covariance matrix from the calibration sample. This index measures the extent to which the model fitted to the calibration sample also fits the validation sample. If we consider the expected value of the CVI over validation samples given the calibration sample we have (Browne & Cudeck, 1993) ^ C jΣ ^ C Þ = FðΣ0 ,Σ ^ C Þ + n−1 p , EðCVIÞ = E½FðSV ,Σ v
[6.23]
where nV is the size of the validation sample and p∗ = –12 p(p + 1) is the number of nonredundant elements in Σ. It can be seen that the CVI is a biased estimate of the overall discrepancy with the bias being n−1 v p . Note that one cannot −1 ^ C Þ because in some remove the bias by simply subtracting nv p from FðSV , Σ cases the resulting value of the fit function would take on an inadmissible negative value. And, in any case, n−1 v p is the same for all competing models that are fit to the calibration sample and would not change the rank ordering of the competing models (Browne & Cudeck, 1993). The above discussion assumes that one can split the sample to form a calibration sample and validation sample. Clearly, this is a disadvantage when one is working with small samples. The problem arises from the fact that the overall error (defined above) is larger in small samples. Thus, it is desirable not to split the sample but to develop a measure of cross-validation based only on the calibration sample. For the purposes of developing a single sample cross-validation index, we must assume that the sample sizes for the calibration and the validation samples are the same. Then, it can be shown (Browne & Cudeck, 1993) that the ECVI is approximately ^ C Þ = FðΣ0 ,Σ ~ 0 Þ + n−1 ðp + qÞ, ECVI = E½FðSV ,Σ
[6.24]
where q represents the number of free model parameters to be estimated. If we ^ =Σ ^ V , Brown and Cudeck (1989) show that the expected let S = SV and Σ value of the index ^ + 2n − 1 q c = FðS,ΣÞ
[6.25]
is approximately the ECVI. It should be noted that when maximum likelihood is used to obtain the discrepancy function values, the index c is related to the AIC and will result in the same rank ordering for competing models. Indeed, the ECVI is to be used in the same way as the AIC for selecting among competing models. That is, the model with the smallest ECVI is selected as the model that will cross-validate best.
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 121
Evaluating and Modifying Structural Equation Models—121
Selected Fit Indices From the Science Achievement Model Table 6.1 presents selected indices from the initial science achievement model estimated using the Mplus software program.8 Taken as a whole, the fit indices do not provide evidence of adequate model fit. However, as noted above, the values of the AIC and BIC do not stand alone as meaningful. Rather, we are required to compare these values with values from competing models. In Section 6.2, we compare the AIC and BIC values from the modified version of this model to the initial AIC and ECVI values in Table 6.1. 6.1.4 SUMMARY OF ALTERNATIVE FIT INDICES FOR THE SCIENCE ACHIEVEMENT MODEL It is clear from a variety of perspectives that the initial specification of the science achievement model does not fit the data. There is no evidence of exact or even approximate fit, nor is there evidence that the model fits well compared with a baseline model of complete independence. On the basis of this evidence, as well as our discussion in Chapter 5 of parameter estimate bias under misspecification, any substantive conclusions drawn from this model must be viewed cautiously. At this point, the usual practice is to modify the model to bring it in closer line with the data. The next section covers model modification and the associated issue of statistical power.
6.2 Model Modification and Statistical Power In practice, it is often the case that structural equation models are rejected on the basis of the likelihood ratio chi-square and/or one or more of the alternative indices of fit described in Section 6.1. The reasons for model rejection are, of course, many. The most obvious reasons include (a) violations of underlying assumptions, such as normality and completely random missing data; (b) incorrect restrictions placed on the model; and (c) sample size sensitivity. The problem of violating assumptions was discussed in Chapter 5. There we noted that in some cases violations of assumptions could be addressed—but we argued that an explicit presentation of the assumptions was crucial when evaluating the quality of the model. In addition, in Chapter 5, we noted that specification errors in the form of incorrect restrictions was a pernicious problem and intimately related to the issue of sample size sensitivity. In this section, we consider the problem of modifying models to bring them closer in line with the data. We consider model modification in light of statistical power thereby more formally integrating the problem of specification error and sample size sensitivity. When a model, such as our science achievement model, is rejected on the basis of the LR chi-square test, attempts are usually made to modify the model
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 122
122—STRUCTURAL EQUATION MODELING
to bring it line with the data. Assuming one has ruled out assumption violations such as nonnormality, missing data, and nonindependent observations (Kaplan, 1990a), methods of model modification usually involve relaxing restrictions in the model by freeing parameters that were fixed in the initial specification. The decision to free such parameters is often guided by the size of the LM statistic, which as discussed Chapter 2, possesses a one degree-offreedom chi-square distribution and gives the decrease in the overall LR chisquare test when the parameter in question is freed. The LM test is also referred to as the modification index (MI) (Sörbom, 1989). For each restricted but potentially identified parameter, there exists a LM test. Software programs generally list each of the LM test values and in some cases will provide the largest LM value. The temptation, of course, is to relax the fixed parameter associated with the maximum LM value. The difficulty here is that the parameter associated with the largest LM value may not be one that makes any substantive sense whatsoever (see, e.g., Kaplan, 1988). Regardless of whether the parameter with the largest associated LM value is relaxed first, typically more than one, and often many, modifications to the model are made. Two problems exist when engaging in numerous model modifications. First, extant simulation studies have shown that searching for specification errors via the LM test does not always result in locating the specification errors imposed on a “true” model—that is, the model that generated the covariance matrix (Kaplan, 1988, 1989c; Luijben, Boomsma, & Molenaar, 1987; MacCallum, 1986). A second problem associated with unrestricted model modifications on the basis of the LM test is the increase in the probability of Type II errors resulting from the general goal of not rejecting the null hypothesis that the model fits the data (Kaplan, 1989b; MacCallum, Roznowski, & Necowitz, 1992). In one sense, the way to mitigate the problem of Type II errors is to free paths that have maximum power. A general method for calculating the empirical power of the LR chi-square test was given by Satorra and Saris (1985). Their approach can be outlined as follows. First, estimate the model under the null hypothesis H0 and obtain the LR chi-square statistic. Second, estimate a new model, call it H1 (not to be confused with the unrestricted alternative Ha), which consists of the model under H0 with parameters fixed at their maximum likelihood estimates obtained from the first step but with the restriction of interest dropped and replaced with an alternative “true” parameter value to be tested. Note that estimating the model under H0 with parameters fixed at their maximum likelihood values will yield a chi-square test value of 0 with degrees of freedom equal to the number of degrees of freedom of the model. When estimating the H1 model under the H0 specification, the resulting chi-square will no longer be 0. Indeed, the chisquare statistic resulting from this test is distributed as a noncentral chi-square
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 123
Evaluating and Modifying Structural Equation Models—123
with noncentrality parameter λ. With the noncentral chi-square statistic and the degrees of freedom in hand, one can determine the power of the test using tables such as those provided in Saris and Stronkhorst (1984). An immediate problem with the procedure outlined above is that it requires estimating the model twice for every parameter of interest and for every alternative value of interest. Moreover, it will often be difficult in practice for researchers to specify a “true” value for the parameter to be tested. This problem was remedied by Satorra (1989), who recognized that the LM test could be used to approximate the noncentrality parameter for each restriction in the model. Because there exists an LM test associated with each fixed parameter in the model, one can obtain a one degree-of-freedom assessment of the power of the test. In practical terms, this means that for each restriction in the model, one can assess whether the test is powerful enough to reject the null hypothesis that the parameter in question is zero. 6.2.1 ESTIMATION OF OVERALL POWER A question that often arises in the application of structural equation modeling concerns the sample size necessary to estimate the model. This is an important question when concerned with obtaining research funding as well, insofar as funding agencies want to be confident that proposed studies have sufficient power. In other statistical methodologies, such as analysis of variance, it is possible to determine an overall sample size necessary to detect a specific effect size—sometimes quantified in terms of percentage of variance accounted for. The question here concerns whether it is possible to determine the sample size necessary to detect an effect in a structural equation model. The issue therefore centers on the type of effect one is interested in detecting. Based on the development of the RMSEA discussed earlier, MacCallum, Browne and Sugawara (1996) developed an approach to estimate the required sample size necessary to detect whether a model closely fits the data, against the alternative that the model is a mediocre fit to the data. The problem rests on first developing power calculations for the RMSEA. Drawing on Equations [6.7] through [6.10], MacCallum et al. (1996) show that the noncentrality parameter can be expressed in terms of the RMSEA as l = nde2 ,
[6.26]
where n = N − 1. Note that from the standpoint of the RMSEA, perfect fit implies that ε = 0, which in turns implies that the distribution of the test statistic is a central chi-square distribution because under the central chisquare distribution, the noncentrality parameter, λ = 0. However, using
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 124
124—STRUCTURAL EQUATION MODELING
Equation [6.26], MacCallum et al. (1996) showed that the test of the hypothesis of close fit e ≤ 0:05 can be tested with a noncentral chi-square with noncentrality parameter λ given in Equation [6.26]. But now suppose that the true value of ε is 0.08 (considered mediocre fit), and we test the hypothesis of close fit, e ≤ 0:05 . What is the power of this test? To be specific, let ε0 represent the null hypothesis of close fit and let εa represent the alternative hypothesis of mediocre fit. The distribution used to test the 2 null hypothesis in this case is the noncentral chi-square distribution wd, l0 and the distribution used to test the alternative hypothesis is the noncentral chi2 square distribution wd,la , where d are the degrees of freedom and λ0 and λa are the respective noncentrality parameters. From here, the power of the test of close fit is given as
p = Pr w2d, la ≥ w2c ,
[6.27]
2
where wc is the critical value of chi-square under the null hypothesis for a given Type I error probability α (MacCallum et al., 1996). Given that the power of the test can be determined from values of N, d, ε0, εa, and α, we can turn to the question of determining the sample size necessary to achieve a desired level of power. MacCallum et al. (1996) use an indirect approach of interval halving to solve the problem of assessing the sample size necessary to achieve a desired level of power for testing close fit. The details of this procedure can be found in their article. Suffice to say here that the minimum N necessary to achieve a desired level of power of the test of close fit against the alternative of mediocre fit is a function of the degrees of freedom of the model where models with large degrees of freedom require smaller sample sizes. MacCallum et al. (1996) are careful to point out that their procedure must be used with caution because, for example, models associated with a very large number of degrees of freedom may yield required minimum sample sizes that are smaller than the number of variables one has in hand. They also correctly point out that their procedure is designed for omnibus testing of close fit and that the sample size suggested for adequate power for the overall test of close fit may not be adequate for testing parameter estimates. This concern is in line with Kaplan (1989c), who found that power can be different in different parts of the model even for the same size specification error. Thus, sample size effects are not uniform throughout a model. 6.2.2 SAMPLE SIZE, POWER, AND EXPECTED PARAMETER CHANGE Returning to the problem of assessing power for each effect in the model, the issue of sample size sensitivity is only relevant when the null hypothesis is
06-Kaplan-45677:06-Kaplan-45677.qxp
6/25/2008
10:38 AM
Page 125
Evaluating and Modifying Structural Equation Models—125
false. To see this, recall again that the log-likelihood ratio test can be written as n × FML . Clearly, if the model fits perfectly, FML will be 0 and the sample size will have no affect. Sample size comes into play in its interaction with model misfit—where FML will then take on a value greater than zero. Thus, there is a need to gauge the relative affect of sample size against the degree of model misfit. A method of gauging the influence of sample size and model misfit in the context of power analysis is through the use of the expected parameter change (EPC) statistic. The EPC was developed by Saris, Satorra, and Sörbom (1987) as a means of gauging the size of a fixed parameter if that parameter were freed. To motivate this statistic, let ωi be a parameter that takes on the value ω0 (usu^i , ally zero) under the null hypothesis. Let doi = ∂ ln LðΩÞ=∂oi evaluated at o where ln LðΩÞ is the log-likelihood function. Then, Saris et al. (1987) defined the EPC as EPC = oi − o0 =
MI , doi
[6.28]
where MI is the modification index. In essence, the EPC is a point estimate of the alternative hypothesis for the parameter in question. With the EPC in hand, Saris et al. (1987) discuss four possible outcomes that can arise. First, one can obtain a large EPC and an associated large LM (or MI). In this case, one might be advised to free this parameter insofar as it is theoretically justified to do so. Second, one can obtain a large LM associated with a small EPC. Here, one might not be tempted to free the parameter despite the large predicted drop in the overall test statistic because the value of the parameter might be trivial. Indeed, this case might suggest sample size sensitivity given that other factors have been ruled out. Third, one could obtain a small MI associate with a large EPC. In this case, the situation is ambiguous. The problem could be one of sampling variability in the estimate or that the test is not powerful enough to detect this parameter. A more detailed power analysis—perhaps using the methods of Satorra and Saris (1985), might be necessary. Finally, one could obtain a small LM associated with a small EPC. In this case, it makes no substantive sense to free this parameter. Methodological studies of the EPC have shown that it outperforms the LM test with respect to locating known specification errors (Luijben et al., 1987). Moreover, when the EPC statistic and LM test are used in combination as a strategy for model modification they can provide useful information about sample size sensitivity and power (Kaplan, 1990a, 1990b). A problem with the EPC is that it is dependent on the metrics of the observed variables. Therefore, it may be desirable to standardize the EPC so as to allow relative comparisons of size. A standardized version of the EPC was developed by Kaplan (1989a). For example, if we let θEPC represent the expected g change statistic associated with a fixed element of the parameter matrix Γ (the
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 126
126—STRUCTURAL EQUATION MODELING
matrix of coefficients relating endogenous variables to exogenous variables), then a standardized version of θEPC would be calculated as g "
θSEPC g
#1 2 Varð^ξÞ = θEPC g , Varð^ ηÞ
[6.29]
ˆ and Varð^ ηÞ is expressed where Varð^ξÞ is an appropriate diagonal element of Φ in terms of other parameters of the model. It was noted by Chou and Bentler (1990) that the SEPC given in Equation [6.29] was relevant for observed variable models. However, if a full latent variable model was specified then Equation [6.29] would be sensitive to the scaling of the constructs. Therefore, they proposed a completely standardized EPC. An empirical study of the completely standardized EPC by Chou and Bentler (1990) demonstrated the utility of their expanded statistics. 6.2.3 MODEL MODIFICATION AND MODEL SELECTION As we discussed above, the AIC, ECVI, and BIC, can be used to select among competing nonnested models as well as competing nested models. In the latter case, such nested models are typically the result of model modifications. There is an interesting relationship between model modification and cross-validation. Kaplan (1991a) showed that the AIC is a function of the MI. Specifically, recall from Equation [6.18] that the AIC for any given model can be written as χ2 − 2(df ). Consider two models M1 and M2, where M2 is formed from M1 by relaxing a constraint based on information provided by the MI. For M1 and M2, there are corresponding AICs, denoted as AIC1 and AIC2. Let the chi-squares and degrees of freedom for M1 and M2 be noted as χ12, χ22, df1, and df2, respectively. Then, the difference in AICs can be expressed as AIC1 − AIC2 = ðw21 − w22 Þ − 2ðdf1 − df2 Þ = w2 − 2ðdf Þ,
[6.30]
expressing the change in AICs as a change in chi-square and a change in degrees of freedom. Note that from Chapter 2, the first term on the right-hand side of Equation [6.30] is asymptotically equivalent to the LM (MI) test. If we consider the relaxation of one constraint, then Δdf = 1, and we have AIC1 − AIC2 = MI − 2:
[6.31]
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 127
Evaluating and Modifying Structural Equation Models—127
Thus we see that the relaxation of a constraint improves the predictive validity of the model provided that the MI exceeds the constant 2. Kaplan (1991a) argued that this finding supports the caveat against an automatic approach to model modification. But, as we noted above, the use of the expected change statistic in conjunction with the MI is a better approach to model modification than the use of the MI alone. Modification of the Science Achievement Model Table 6.2 presents the modification indices, expected parameter changes, and standardized expected parameter changes associated with the science achievement model estimated in Chapter 4. An inspection of the MI and SEPC suggest that the most substantive change in the model in terms of improvement of fit and expected parameter estimate would arise from freeing the path from SES to SCIACH. From a substantive point of view, such a modification is plausible. However, when inspecting the input-process-output model in Figure 1.2. It is clear that the student inputs are hypothesized to be related to achievement outputs only through curriculum and instructional mediating variables. Thus, this modification calls into question presumption of full mediation as suggested by the input-process-output theory. Table 6.3 presents the estimates, standard errors, and goodness-of-fit statistics for the modified model with the SES to SCIACH path added. Two results of this modification are worth noting. First, as expected from the modification Table 6.2
Model Modification Indices and Expected Parameter Change Statistics for Full Science Achievement Model
Fixed Path INVOLV
ON SCIGRA10
MI
EPC
Std EPC
StdYX EPC
0.205
0.002
0.004
0.007
INVOLV
ON SCIACH
72.157
−0.013
−0.022
−0.127
INVOLV
ON SCIGRA6
1.424
−0.011
−0.018
−0.018
INVOLV
ON SES
11.464
−0.041
−0.068
−0.050
CHALL
ON SCIGRA10
53.596
0.111
0.121
0.221
CHALL
ON SCIACH
0.076
0.001
0.001
0.004
CHALL
ON SCIGRA6
68.436
0.112
0.122
0.118
CHALL
ON SES
5.210
0.040
0.044
0.032
CHALL
ON CERTSCI
0.004
0.003
0.003
0.001
SCIGRA10
ON SCIACH
576.194
−0.190
−0.190
−0.605
SCIACH
ON SCIGRA6
408.406
1.540
1.540
0.255
SCIACH
ON SES
732.887
2.455
2.455
0.313
SCIACH
ON CERTSCI
14.399
−0.865
−0.865
−0.043
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 128
128—STRUCTURAL EQUATION MODELING
index, the addition of the path from SES to SCIACH resulted in a significant drop in the value of the likelihood ratio chi-square. Moreover, the selected set of indices presented in Table 6.3 show improvement, although the RMSEA remains statistically significant. Second, the addition of this path does not result in a substantial change in the other paths in the model. That is, the estimated effects in the initial model remain about the same.
Table 6.3
Maximum Likelihood Estimates of Expanded Science Achievement Model (Modification of SCIACH Regressed on SES Added) Estimates
S.E.
Est./S.E.
Std
StdYX
1.000 0.724 0.755
0.000 0.027 0.029
0.000 26.469 26.036
0.606 0.439 0.458
0.738 0.605 0.507
1.000 0.757 0.867
0.000 0.024 0.026
0.000 31.043 32.921
0.917 0.694 0.795
0.748 0.503 0.723
CHALL ON INVOLV
0.251
0.027
9.282
0.166
0.166
INVOLV ON CERTSCI
0.012
0.031
0.399
0.021
0.006
SCIGRA10 ON CHALL
0.264
0.026
10.319
0.242
0.133
SCIACH ON SCIGRA10 SES
1.013 2.456
0.035 0.086
29.120 28.710
1.013 2.456
0.317 0.313
SCIGRA10 ON SCIGRA6 SES CERTSCI
0.788 0.240 0.032
0.021 0.028 0.068
37.146 8.734 0.466
0.788 0.240 0.032
0.416 0.098 0.005
Measurement model INVOLV BY MAKEMETH OWNEXP CHOICE CHALL BY CHALLG UNDERST WORKHARD Structural model
Selected goodness-of-fit indices χ2 (df = 38) = 953.726, p < .000; ; TLI = 0.884; CFI = 0.915; RMSEA = 0.060, p < .05; BIC = 208652.581
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 129
Evaluating and Modifying Structural Equation Models—129
An inspection of the modification indices and expected change statistics after the addition of the path from SES to SCIACH (not shown) revealed that the next largest MI associated with the largest EPC (and SEPC) was the path from SCIGRA6 to SCIACH. Again, from the point of view of the input-processoutput framework presented in Figure 1.2, the suggestion of this important effect calls into question the notion that background student characteristics affect achievement only through instructional/curricular experiences. In other words, prior science grades are an important direct predictor of science achievement over and above any instructional/curricular effects. The trajectory of instructional experiences leading to high science grades by grade 6 is not part of this model but do suggest the importance of a longitudinal perspective. Table 6.4 presents the results of estimates, standard errors, and goodness-offit statistics for the modified model with the SCIGRA6 to SCIACH path added. Here, too, it can be seen that the addition of this path resulted in a substantial improvement in fit on all measures except the PNFI. Moreover, the addition of this path did not substantially change any of the other paths in the model. Finally, it may be interesting to consider these modifications in line with the question of cross-validation. An inspection of Tables 6.1, 6.3, and 6.4 shows a substantial drop in the values of the AIC and BIC with the addition of these modifications. In line with the recommendations for the use of the AIC and BIC, we would want to choose the final model in Table 6.3 on the basis of its ability to cross-validate in a future sample of the same size. From a substantive point of view, the results speak to the importance of prior background, as measured by SES and prior grades, on current measures of science achievement. These effects seem to also have substantial indirect effects on current science grades. The role of instructional and teacher characteristics (as measured in this analysis) on science achievement appear weak or almost nonexistent. However, the role of instructional characteristics on science grades is significant. This result may not be too surprising insofar as instructional characteristics could be argued to influence achievement measures that are more closely aligned with instruction—such as classroom tests and quizzes. Indeed, the standardized indirect effect (not shown) of UNDERSTD on SCIACH is about equal to the standardized indirect effect of SES on SCIACH. Both of these indirect effects pass through SCIGRA10. Thus, instructional style as measured by the extent to which teachers press students to show understanding of science concepts has an important role to play in the prediction of science achievement, but only to the extent that it influences classroom achievement. 6.2.4 FACTORS INFLUENCING MODEL MODIFICATION AND POWER We discussed in Chapter 5 the problem of how specification errors propagate through models. We noted that the mechanism for such propagation was
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 130
130—STRUCTURAL EQUATION MODELING Table 6.4
Maximum Likelihood Estimates of Expanded Science Achievement Model (Modification of SCIACH Regressed on SCIGRA6 Added) Estimates
S.E.
Est./S.E.
Std
StdYX
Measurement model INVOLV BY MAKEMETH
1.000
0.000
0.000
0.606
0.738
OWNEXP
0.724
0.027
26.469
0.439
0.605
CHOICE
0.755
0.029
26.036
0.458
0.507
CHALLG
1.000
0.000
0.000
0.917
0.748
UNDERST
0.757
0.024
31.043
0.694
0.503
WORKHARD
0.867
0.026
32.921
0.795
0.723
0.251
0.027
9.282
0.166
0.166
0.012
0.031
0.399
0.021
0.006
0.264
0.026
10.319
0.242
0.133
CHALL BY
Structural model CHALL ON INVOLV INVOLV ON CERTSCI SCIGRA10 ON CHALL SCIACH ON SCIGRA10
0.748
0.037
20.019
0.748
0.235
SES
2.191
0.085
25.695
2.191
0.279
SCIGRA6
1.214
0.072
16.944
1.214
0.201
0.788
0.021
37.146
0.788
0.416
SCIGRA10 ON SCIGRA6 SES
0.240
0.028
8.734
0.240
0.098
CERTSCI
0.032
0.068
0.466
0.032
0.005
Selected goodness-of-fit indices χ2 (df = 37) = 675.121, p < .000; TLI = 0.917; CFI = 0.941; RMSEA = 0.051, p = 0.336, BIC = 208382.783
the extent to which model parameters could be characterized as asymptotically independent. If a set of parameters are asymptotically independent then their tests are referred to as separable. These concepts have implications for model modification and power analysis in the structural equation modeling context. As shown by Kaplan
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 131
Evaluating and Modifying Structural Equation Models—131
(1989c) and discussed in Chapter 5, specification errors can result in biased test statistics and hence give rise to incorrect power probabilities. Here, too, it was suggested that this problem was due to the pattern of zero and nonzero covariances among parameter estimates. On the basis of the results of Kaplan and Wenger (1993) as summarized in Chapter 5, it can be argued that this mechanism explains why power has been found to differ in different parts of a model for the same size specification error (Saris et al., 1987). Specifically, even though the magnitude of the specification errors is the same, their locations in different parts of a model imply different covariances with the remaining free parameters. It was this reason among others that led Saris et al. (1987) to develop the expected change statistic to supplement the modification index as a means of carrying out model modification (Kaplan & Wenger, 1993). For a review of the problem of statistical power in structural equation models, see Kaplan (1995a).
6.3 Conclusion The purpose of this chapter was to provide an overview of common methods for evaluating the fit and adequacy of a model as well as to discuss issues of power and model modification. In the next chapter, we discuss recent modeling methods that take into account more subtle features of data than that assumed by simple random sampling. In particular, we consider structural equation models for data that are derived from complex multistage sampling. As discussed in Chapter 5, ignoring the nested structure of data can have ramifications for issues of model fit and evaluation.
Notes 1. The expected value of a central chi-square is equal to its degrees of freedom. 2. The Mplus program provides the TLI and CFI only. 3. An important assumption is that the degree of misfit in the sample is about the same as the degree of misfit in the population. See Steiger, Shapiro, and Browne (1985). 4. Insisting the sample is drawn the same way is often ignored in this literature but must be emphasized due to the differences in the quality of estimation if the sampling scheme is ignored. For example, one can take a simple random sample of students that ignores their nesting within schools, or one can take a multistage clustered sample. In both cases, a different estimator would be used and the cross-validation indices would be affected by the differences. See also Chapter 8. 5. Mplus only presents the AIC and BIC. 6. Thermodynamic entropy is a measure of unavailable energy in a closed system. The physicist L. Boltzman showed that entropy could be defined in terms of the
06-Kaplan-45677:06-Kaplan-45677.qxp
6/24/2008
8:32 PM
Page 132
132—STRUCTURAL EQUATION MODELING probability distribution of molecules and showed that entropy was equal to the logprobability of a statistical distribution. 7. A true distribution can be realized through Monte Carlo simulations. 8. Note that other software programs for structural equation modeling such as AMOS (Arbuckle, 1999), LISREL (Jöreskog & Sörbom, 2000), and EQS (Bentler, 1995) may provide more or fewer fit indices than Mplus.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 133
7 Multilevel Structural Equation Modeling
T
he models discussed so far have assumed that observations constitute simple random samples from a population. There are many instances, however, where observations are not simple random samples from the population. For example, organizations such as schools are hierarchically structured, and the data generated from these types of organizations are typically obtained through some form of multistage sampling. Until relatively recently, the common approach to the analysis of hierarchically organized social science data would have been to either disaggregate data to the individual (e.g., student) level or aggregate data to the organizational (e.g., school) level. The difficulty with disaggregation or aggregation is they are not optimal approaches for a proper analysis of the actual structure of the data. Using students in schools as an example, the problem with disaggregation is that students will have the same values on observed and unobserved school level variables. As such, the usual regression assumption of independence of errors is violated, possibly leading to biased regression coefficients. In the case of data aggregation, the result could be a loss of variation such that measures of association among variables aggregated to the school level may be overestimated. To overcome the limitations associated with these problematic approaches to the analysis of hierarchical data, methodologists and statisticians have made important theoretical advances that allow for appropriate modeling of organizational systems such as schools. These methods have been referred to as multilevel linear models, mixed-effects and random-effects models, random coefficient models, and covariance components models. The differences in these terms reflect, in some respects, the fact that they have been used in many different research settings such as sociology, biometrics, econometrics, and statistics, respectively (see Kreft & de Leeuw, 1998, for an overview of the history
133
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 134
134—STRUCTURAL EQUATION MODELING
of these methods). In addition to statistical developments, software advances have allowed for relatively straightforward estimation of multilevel models. The purpose of this chapter is to describe recent methodological advances that have extended multilevel modeling to the structural equation modeling perspective. The organization of this chapter is as follows. Section 7.1 provides a discussion of multilevel structural equation modeling. The general problem of parameter estimation for multilevel structural models is discussed first. This is followed by a discussion of multilevel factor analysis in Section 7.2. An example of multilevel factor analysis is provided that reexamines student perceptions of school climate discussed in Chapter 3. Following this, multilevel structural equation modeling is described in the simple case of multilevel path analysis wherein within-organization level parameters are allowed to vary across organizations and, in turn, are modeled as a function of betweenorganization variables following their own path model. A multilevel path analysis of student science achievement is provided as an example. In addition to covering multilevel structural equation modeling, a related problem concerns structural equation modeling applied to complex sampling designs. Such designs are not uncommon on the social and behavioral sciences. For example, in the field of education, large-scale surveys are developed where simple random sampling is not feasible or desirable. To reflect the organization of schooling as well as to obtain sufficiently large samples of underrepresented groups, a form of multistage sampling, with specific forms of oversampling is often employed. To ensure proper inferences to the relevant population, sampling weights are typically employed. Recent developments in structural equation modeling now allow for the incorporation of sampling weights, and this issue is discussed in Section 7.4.
7.1 Basic Ideas in Multilevel Structural Equation Modeling When we carefully consider the problem of analyzing data arising from hierarchically nested systems, such as schools, it is clear that neither standard structural equation modeling nor standard multilevel modeling alone can give a complete picture of the problem under investigation. Indeed, use of either methodology separately could result in different but perhaps equally serious specification errors. Specifically, using conventional structural equation modeling assuming simple random samples alone would ignore the sampling schemes that are often used to generate educational data and would result in biased structural regression coefficients (B. Muthén & Satorra, 1989). The use of multilevel modeling alone would preclude the analyst from studying complex indirect and simultaneous effects within and across levels of the system. What is required, therefore, is a method that combines the best of both methodologies.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 135
Multilevel Structural Equation Modeling—135
One of the earliest attempts to develop multilevel latent variable modeling was by Schmidt (1969) who derived a maximum likelihood (ML) estimator for a general multilevel covariance structure model but did not attempt to introduce group level variables into the model. In a later paper, Longford and Muthén (1992) provided computational results for multilevel factor analysis models. B. Muthén and Satorra (1989) were the first to show the variety of possible special cases of multilevel covariance structure modeling, and B. Muthén (1989) suggested, among other things, how such models could be estimated with existing software. Later, Kaplan and Elliott (1997a) building on the work of B. Muthén (1989) derived the reduced form specification of a multilevel path model. This model was argued to be applicable to the problem of developing policy simulation models for validating education indicators (Kaplan & Elliott, 1997b; Kaplan & Kreisman, 2000). These earlier studies by Kaplan and Elliott (1997b), Kaplan and Kreisman (2000), and others made use of a limited information ML estimator referred to as MUML (Muthén’s ML Estimator). A number of important studies investigating the properties of the MUML estimator can be found in Yuan and Hayashi (2005) and Yuan and Bentler (2005). 7.1.1 FULL INFORMATION ML-BASED ESTIMATION FOR MLLVMS This MUML estimator provides for estimation and testing of full structural equation models but only for random intercept type analyses. Moreover, the MUML estimator relies on the computation of two covariance matrices—the pooled within-groups covariance matrix and the between-groups covariance matrix. Since the development of the MUML estimator, a number of new estimation methods have appeared that provide for full information ML estimation of the parameters of multilevel latent variable models (MLLVMs) that do not specifically require the estimation of two separate covariance matrices, and allow for random slopes as well as random intercepts in full structural equation models. These new estimators rely on the expectation-maximization (EM) algorithm (Dempster, Laird, & Rubin, 1977) for estimation under a general notion of missing data. In the context of hierarchical linear model (Raudenbush & Bryk, 2002), the EM algorithm was used to treat random coefficients as missing data. In the context of MLLVM, the EM algorithm was used by Lee and Poon (1998) and Bentler and Liang (2003) for two-level structural equation models, where the between-level part of a variable is viewed as missing. More recently, Asparouhov and Muthén (2003) developed three EM algorithm–based ML estimators that combine both approaches. The three EM algorithm–based ML estimators are distinguished by the approach they take for the calculation of standard errors. The first method uses a first-order approximation of the asymptotic covariance matrix of the estimates to obtain the standard errors and is referred to as the MLF estimator.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 136
136—STRUCTURAL EQUATION MODELING
The second method is the conventional ML estimator, which uses the secondorder derivatives of the observed log-likelihood. The third method is based on a sandwich estimator derived from the information matrices of ML and MLF and produces the correct asymptotic covariance matrix of the estimates that is not dependent on the assumption of normality, and which also yields a robust chi-square test of model fit. This estimator is referred to as MLR. The MLR is a robust full information ML estimator for MLLVMs. A small simulation study reported in Asparouhov and Muthén (2003) compared the ML and MLR estimator to a mean-adjusted and mean- and variance-adjusted ML estimator. The results demonstrated better performance of the MLR estimator for nonnormal variables than that obtained from the maximum likelihood estimator with mean and variance adjustment. 7.1.2 WEIGHTED LEAST SQUARES ESTIMATION FOR MLLVMS As with single-level latent variable models, the ML estimator assumes continuous manifest variables. The MLR estimator assumes continuous manifest variables as well but allows relaxation of the normality assumption. In practice, however, it is often the case that manifest variables are categorical in nature, and substantive applications may very well contain manifest variables of different scale types—including binary, ordered categorical, and continuous. Recently, Asparouhov and Muthén (2007), building on the single-level work of B. Muthén (1984) developed a weighted least squares estimator for MLLVMs that provides computationally efficient estimates and correct chi-square tests of model fit in the presence of categorical manifest variables. This is referred to as the weighted least squares mean adjusted estimator (WLSM). A small simulation study by Asparouhov and Muthén (2007) demonstrated that the WLSM estimator performed better than MLR when the manifest variables were categorical and virtually the same as MLR with the data were continuous and normally distributed. The ML and weighted least squares estimators are available in the Mplus software program (L. Muthén & Muthén, 2007). Throughout this chapter, we use the MLR estimator.
7.2 Multilevel Factor Analysis In this section, we outline multilevel latent variable modeling for continuous latent variables. Examples will use the MLR estimator described above and all analyses will use the Mplus software program (L. Muthén & Muthén, 2007). In line with common applications of single-level structural equation modeling, we begin with a discussion of the measurement problem by focusing on multilevel factor analysis. It should be noted that recent work by Fox and Glas
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 137
Multilevel Structural Equation Modeling—137
(2001) has extended multilevel modeling to the item response theory context. However, a full discussion of their work is beyond the scope of this chapter. To begin, consider a model that decomposes a p-dimensional response vector yig for student i in school g into the sum of a grand mean μ, a betweengroups part νg and a within-groups part uig. That is, yig = μ + νg + uig :
[7.1]
The total sample covariance matrix for the response vector yig can be written as ΣT = Σb + Σw ,
[7.2]
where ΣT is the population total sample covariance matrix, Σb is the population between-groups covariance matrix, and Σw is the population within-groups covariance matrix. Sample quantities can be defined as ng
y:g =
1X y ng i = 1 ig
[7.3]
ng
y =
G X 1X y N g = 1 i = 1 ig
[7.4]
ng G X 1 X Sw = ðy − y:g Þðyig − y:g Þ0 N − G g = 1 i = 1 ig
[7.5]
G 0 1 X ng ðy:g − yÞðy:g − yÞ , G − 1 g =1
[7.6]
Sb =
where y–.g is the sample mean for group g, y– is the grand mean, Sw is the sample pooled within-group covariance matrix, and Sb is the between-groups covariance matrix. As with the standard application of linear regression to data arising from multistage sampling, the application of factor analysis should also account for nested effects. For example, a battery of attitude items assessing student perceptions of school climate administered to students will most likely exhibit between-school variability. Ignoring the between-school variability in the scores of students within schools will result in predictable biases in the parameters of the factor analysis model. Therefore, it is desirable to extend multilevel methodology to the factor analysis framework. To start, let the vector of student responses be expressed in terms of the multilevel linear factor model as yig = ν + Λw ηwig + Λb ηbg + wig + bg ,
[7.7]
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 138
138—STRUCTURAL EQUATION MODELING
where yig was defined earlier, ν is the grand mean, Λw is the factor loading matrix for the within-group variables, ηwig is a factor that varies randomly across units within groups, Λb is the between-groups factor loading matrix, ηbg is a factor that varies randomly across groups, ⑀wig and ⑀bg are within- and betweengroup uniquenesses. Under the standard assumptions of linear factor analysis, here extended to the multilevel case, the total sample covariance matrix defined in Equation [7.2] can be expressed in terms of factor model parameters as ΣT = Λw Φw Λ0w + Θw + Λb Φb Λ0b + Θb ,
[7.8]
where Φw and Φb are the factor covariance matrices for the within-group and between-group parts and Θw and Θb are diagonal matrices of unique variances for the within-group and between-group parts. Generally speaking, it is usually straightforward to specify a factor structure for the within-school variables. It is also straightforward to allow for within-school variables to vary between schools. Conceptual difficulties often arise in warranting a factor structure to explain variation between groups. In an example given in Kaplan and Kreisman (2000) examining student perceptions of school climate, two clear factors were extracted for the within-school part, but the between-school part appeared to suggest one factor. The fact that it is sometimes difficult to conceptualize a factor structure for the betweengroups covariance matrix does not diminish the importance of taking the between-group variability into account when conducting a factor analysis on multilevel structured data. An Example of Multilevel Confirmatory Factor Analysis In this section, we provide examples of multilevel exploratory factor analysis and multilevel confirmatory factor analysis using data from the PISA 2003 database. The PISA is sponsored by the Organisation for Economic Co-operation and Development (OECD; 2004) and represents arguably the largest and most sophisticated international assessment of student academic competencies. Data are collected on 15-year-old students from the participating countries. We concentrate on the PISA 2003 cycle, which focused on mathematics and which contains information on over a quarter of a million students from 41 countries. It includes not only information on their performance in the major content domains but also their responses to the student questionnaires that they complete as part of the assessment. The student questionnaires cover a large variety of topics, including attitudes to the subject matter being assessed as well as considerable background information. In what follows, we analyze the data from the South Korean sample.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 139
Multilevel Structural Equation Modeling—139
In this analysis, we estimate a single-level and multilevel confirmatory factor analysis (CFA) with and without the addition of gender as a covariate. On the basis of initial exploratory factor analyses, we specified two within-school factors and one between-school factor. The first within-school factor can be labeled CALCULATING MATHEMATICS IN LIFE and the second withinschool factor can be labeled SOLVING EQUATIONS. The single betweenschool factor can be interpreted as representing perhaps an overall school level emphasis on mathematics instruction and can be labeled GENERAL MATHEMATICS EMPHASIS. The results of the single-level and multilevel CFAs without predictors are displayed in Table 7.1. Comparison of the single-level and multilevel results without predictors suggests that accounting for clustering slightly worsened model fit as evidenced by the larger likelihood ratio chi-square, comparative fit index, and root mean square of approximation. The estimates are also negligibly different with the exception that the standard errors for the multilevel solution are uniformly larger. It should be noted that taking into account clustering is known to improve fit in simulation studies. In the context of real data however, accounting for clustering is still appropriate but can also reveal other problems that can lead to poorer fit. As an additional analysis, we added gender as a predictor of the latent variables with males coded 0 and females coded 1. Adding a predictor to a CFA model yields the specification of a multiple indicator multiple cause (MIMIC) structural equation model (Jöreskog & Goldberger, 1975). A path diagram of this model is displayed in Figure 7.1 and unstandardized results and model fit statistics for the single-level and multilevel CFAs with gender as the predictor are displayed in Table 7.1. The results are shown in Table 7.1 under the columns titled “With Predictors.” Again, the multilevel results show a slight worsening of fit. However, conclusions regarding the gender effect remain the same—namely, we find significant gender differences on CALCULATING MATHEMATICS IN LIFE and SOLVING EQUATIONS for both the single-level and multilevel solutions.
7.3 Multilevel Path Analysis As noted above, multilevel regression models may not be suited for capturing the structural complexity within and between organizational levels. For example, it may be of interest to determine if school level variation in student science achievement can be accounted for by school-level variables. Moreover, one might hypothesize and wish to test direct and indirect effects of school level exogenous variables on that portion of student-level achievement that
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 140
140—STRUCTURAL EQUATION MODELING Table 7.1
Results of Confirmatory Factor Analysis of PISA 2003 Mathematics Assessment Single-Level CFA Without Predictors Estimate
SE
Multilevel CFA
With Predictors Estimate
SE
Without Predictors Estimate
SE
With Predictors Estimate
SE
Within-School Model Calculating Mathematics Train timetable Discount % Size (m2) of a floor Graphs in newspaper Distance on a map Petrol consumption rate
1.000 1.187∗ 1.140∗ 0.909∗ 1.184∗ 0.881∗
0.000 0.022 0.023 0.021 0.028 0.022
1.000 1.190∗ 1.140∗ 0.908∗ 1.185∗ 0.883∗
0.000 0.022 0.023 0.021 0.028 0.022
1.000 1.136∗ 1.125∗ 0.876∗ 1.113∗ 0.905∗
0.000 0.026 0.027 0.026 0.031 0.027
1.000 1.135∗ 1.124∗ 0.875∗ 1.109∗ 0.904∗
0.000 0.025 0.027 0.026 0.033 0.027
Solving equations 3x + 5 = 17 2(x + 3)=(x + 3)(x − 3)
1.000 1.060∗
0.000 0.015
1.000 1.059∗
0.000 0.015
1.000 1.039∗
0.000 0.020
1.000 1.036∗
0.000 0.021
Calculating Mathematics on MALE
−0.133∗
0.016
−0.113∗
0.024
Solving Equations on MALE
−0.039
0.023
0.001
0.038
0.284∗
0.009
0.197∗
0.008
Factor Covariances Calculating Mathematics with Solving Equations
0.286∗
0.009
0.197∗
0.008
Between-School Model General Mathematics Emphasis Train timetable Discount % Size (m2) of a floor Graphs in newspaper Distance on a map Petrol consumption rate 3x + 5 = 17 2(x + 3)=(x + 3)(x − 3)
1.000 1.373∗ 1.192∗ 1.047∗ 1.460∗ 0.752∗ 1.808∗ 1.987∗
0.000 0.067 0.062 0.063 0.102 0.072 0.132 0.136
−0.057
General Mathematics Emphasis on MALE Model Fit Indices χ2 AIC BIC
1.000 1.379∗ 1.195∗ 1.053∗ 1.474∗ 0.764∗ 1.814∗ 1.994∗
456.250 (19 df ) 86593.8 86758.6
526.500 (25 df ) 95130.6 95308.9
641.253 (39 df ) 85173.1 85443.3
0.000 0.068 0.060 0.064 0.105 0.074 0.143 0.145 0.051
670.784 (52 df ) 89797.8 90088.2
NOTE: Unstandardized estimates are displayed. SE = standard error; AIC = Akaike information criterion; BIC = Bayesian information criterion. Values are statistically significant at p < . 05.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 141
Multilevel Structural Equation Modeling—141 Within-School
Train timetable Discount % Calculating Mathematics in Life
Size of a floor Graphs in newspaper Distance on a map
Gender Petrol consumption rate
Solving Equations
3x + 5 = 17 2(x + 3) = (x + 3)(x − 3)
Between-School Train timetable
Discount %
Size of a floor
Gender
General Mathematics Emphasis
Graphs in newspaper
Distance on a map
Petrol consumption rate
3x + 5 = 17
2(x + 3) = (x + 3)(x − 3)
Figure 7.1
Multilevel Factor Analysis With a Covariate
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 142
142—STRUCTURAL EQUATION MODELING
varies over schools. We argue that these questions are important for a fuller understanding of organizational systems and such questions can be addressed via multilevel structural equation modeling. For ease of notation and development of concepts, we focus our discussion on multilevel path analysis. By focusing on this model, we are assuming that reliable and valid measures of the variables are available. We recognize that this assumption may be unreasonable for most social and behavioral science research, but as shown in the previous section, multilevel measurement models exist that allow one to examine heterogeneity in measurement structure. Indeed, as a matter of modeling strategy, it may be very informative to examine heterogeneity in measurement structure prior to forming scales to be used in multilevel path analysis. However, it is possible to combine multilevel path models and measurement models into a comprehensive multilevel structural equation model. The model that we will consider allows for varying intercepts and varying structural regression coefficients. Earlier work on multilevel path analysis by Kaplan and Elliott (1997a) building on the work of B. Muthén (1989) specified a structural model for varying intercepts only. This “intercepts as outcomes” model was applied to a specific educational problem in Kaplan and Elliott (1997b) and Kaplan and Kreisman (2000). In what follows, we write the within-school (Level-1) full structural equation model as yig = αg + Bg yig + Γg xig + rig ,
g = 1, 2, . . . , G,
[7.9]
where yig is a p-dimensional vector of endogenous variables for student i in school g, xig is a q-dimensional vector of within-school exogenous variables, αg is a vector of structural intercepts that can vary across schools, Bg and Γg are structural coefficients that are allowed to vary across schools, and rig is the within-school disturbance term assumed to be normally distributed with mean zero and constant within-school variance σ 2r. From here, we can model the structural intercepts and slopes as a function of between-school endogenous variables zg and between-school exogenous variables wg. Specifically, we write the Level-2 model as αg = α00 + α01 zg + α02 w g + g ,
[7.10]
Bg = B00 + B01 zg + B02 wg + ζg ,
[7.11]
Γg = Γ00 + Γ01 zg + Γ02 wg + θg :
[7.12]
Note how Equations [7.10] to [7.12] allow for randomly varying intercepts and two types of randomly varying slopes—namely, Bg are randomly
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 143
Multilevel Structural Equation Modeling—143
varying slopes relating endogenous variables to each other and Γg are randomly varying slopes relating endogenous variables to exogenous variables. These randomly varying structural coefficients are modeled as functions of a set of between-school predictors zg and wg. These between-school predictors appear in Equations [7.10] to [7.12], but their respective regression coefficients are parameterized to reflect a priori structural relationships. Of particular importance for substantive research is the fact that the full multilevel path model allows for a set of structural relationships among betweenschool endogenous and exogenous variables, which we can write as zg = τ + Δzg + Ωw g + δg ,
[7.13]
where τ, Δ, and Ω are the fixed structural effects. Finally, ε, ζ, θ, and δ are disturbance terms that are assumed to be normally distributed with mean zero and covariance matrix T with elements 0
s2E B szE T=B @ syE sdE
1 C C: A
s2z syz sdz
s2y sdy
[7.14]
s2d
After a series of substitutions we can obtain the reduced form of the Level-1 and Level-2 models and express yig as a function of a grand mean, the main effect of within-school variables, the main effect of between-school variables, and the cross-level moderator effects of between- and within-school variables. These reduced form effects contain the structural relations as specified in Equations [7.9] through [7.13]. The importance of this model is that if w consists of variables that could, in principle, be manipulated in the context of a hypothetical experiment, and then this model could be used to test cross-level causal hypotheses taking into account the structural relationships between and within levels.1 Although this discussion has focused on multilevel structural equation modeling with manifest variables, it is relatively straightforward to specify a multilevel structural equation among latent variables. A review of the extant literature had not uncovered an application of the full model described here using latent variables, except in the context of the analysis of longitudinal data, which is described next. 7.3.1 AN EXAMPLE OF MULTILEVEL PATH ANALYSIS A multilevel path analysis was employed to study within and between school predictors of mathematics achievement again using data from the PISA
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 144
144—STRUCTURAL EQUATION MODELING
2003 survey (OECD, 2004). The final outcome variable at the student level was a measure of mathematics achievement (MATHSCOR).2 Mediating predictors of mathematics achievement consisted of whether students enjoyed mathematics (ENJOY) and whether students felt mathematics was important in life (IMPORTNT). Student exogenous background variables included students’ perception of teacher qualities (PERTEACH), as well as both parents’ educational levels (MOMEDUC & DADEDUC). At the school level, a model was specified to predict the extent to which students are encouraged to achieve their full potential (ENCOURAG). A measure of teachers’ enthusiasm for their work (ENTHUSIA) was viewed as an important mediator variable between background variables and encouragement to make students achieve full potential. The variables used to predict encouragement via teachers’ enthusiasm consisted of math teachers’ use of new methodology (NEWMETHO), consensus between math teachers with regard to school expectations and teaching goals as they pertain directly to mathematics instruction (CNSENSUS), and the teaching conditions of the school (CNDITION). The teaching condition variable was computed from the shortage of school’s equipment, so higher values on this variable reflect a worse condition. A diagram of the multilevel path model is shown in Figure 7.2. The diagram is drawn to convey the fact that the intercepts of the endogenous variables, ENJOY, IMPORTNT, and MATHSCOR and the slope of MATHSCOR on ENJOY are regressed on the endogenous and exogenous school level variables. The results of the multilevel path analysis are displayed in Table 7.2. First, we estimated the intraclass correlations to determine the amount of variation in the student-level variables that can be accounted for by differences between schools. We found intraclass correlations (not shown) ranging from a low of 0.02 for the importance of math in one’s life to a high 0.259 for mathematics achievement. Under the heading “Within School,” we find that MOMEDUC, DADEDUC, ENJOY, and IMPORTNT are significant and positive predictors of MATHSCOR. We also observe that ENJOY is significantly and positively predicted by PERTEACH. Finally, MOMEDUC, PERTEACH, and ENJOY are positive and significant predictor of IMPORTNT. Of importance to this chapter are the results under the heading “Between School.” Here, we find that the resource conditions of the school (CNDITION) and the extent to which the school encourages students to use their full potential (ENCOURAG) are both significant predictors of math achievement. Enjoyment of mathematics is significantly related to whether there is consensus among mathematics teachers in with regard to expectations and teaching goals. Importance of mathematics is related to the resource conditions of the school. Teacher enthusiasm for their work is significantly predicts the extent to which they encourage students to use their full potential. Enthusiasm is predicted by use of new methods for teaching math and the extent of consensus
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 145
Multilevel Structural Equation Modeling—145 Table 7.2
Results of Multilevel Path Analysis Within-School Model
MATHSCOR on MOMEDUC
Estimate
SE
4.011∗
1.042
Between-School Model
RANDOM SLOPE on NEWMETHO
Estimate
SE
−4.632
2.652
DADEDUC
∗
4.813
0.929
ENTHUSIA
10.101
3.838
PERTEACH
6.273∗
2.765
CNSENSUS
−3.629
3.224
IMPORTNT
15.873∗
2.334
CNDITION
−8.181∗
2.532
ENCOURAG
−1.668
2.863
ENJOY on PERTEACH
0.457∗
0.026
IMPORTNT on
MATHSCOR on NEWMETHO ENTHUSIA
∗
6.806
6.550
−14.081
8.881
MOMEDUC
0.026∗
0.006
CNSENSUS
2.407
7.898
PERTEACH
0.245∗
0.021
CNDITION
3.366
6.683
ENJOY
0.534∗
0.015
ENCOURAG
14.594
7.299
ENJOY on NEWMETHO
0.008
0.025
ENTHUSIA
0.016
0.038
CNSENSUS
0.109∗
0.036
CNDITION
0.019
0.025
ENCOURAG
−0.035
0.024
IMPORTNT on NEWMETHO
−0.027
0.019
ENTHUSIA
0.028
0.031
CNSENSUS
0.057
0.030
CNDITION
0.044∗
0.020
ENCOURAG
0.002
0.020
ENCOURAG on ENTHUSIA
0.579∗
0.086
ENTHUSIA on NEWMETHO
0.164∗
0.044
CNSENSUS
0.323∗
0.067
CNDITION
−0.042
0.040
NOTE: Unstandardized estimates are displayed. SE = standard error. Values are statistically significant at p < .05.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 146
146—STRUCTURAL EQUATION MODELING
Within-School MOMEDUC
ENJOY
DADEDUC
MATHSCOR
IMPORTNT
PERTEACH
Between-School ENJOY IMPORTNT MATHSCOR
NEWMETHO
CNSENSUS
ENTHUSIA
ENCOURAG
CNDITION RANDOM SLOPE
Figure 7.2
Multilevel Path Model of Mathematics Achievement With Structural Model at the Within-School and Between-School Levels
around school expectations and teaching goals pertaining to mathematics instruction. The results for the random slope relating ENJOY to MATHSCOR reveals that teacher enthusiasm moderates the relationship between enjoyment of math and math achievement—with higher levels of teacher reported enthusiasm associated with a stronger positive relationship between enjoyment of math and math achievement. Finally, the condition of the school also demonstrates a significant moderating effect on the relationship between enjoyment
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 147
Multilevel Structural Equation Modeling—147
of math and math achievement, where poorer conditions of the school lowers the relationship between enjoyment of math and math achievement.
7.4 Incorporating Sampling Weights in Latent Variable Models The models described earlier in this chapter account for the multilevel nature of organizations such as schools. So far, though, it has been assumed that random samples from each level of analysis have been obtained. However, it is not often the case that random samples from each level of the organization are obtained, but rather it is more likely that samples are taken from each level of the system with unequal probabilities, often due to the necessity of oversampling underrepresented units of analysis. Such complex sampling is common with large-scale national and international assessments. For example, in the Early Childhood Longitudinal Study (ECLS-K) (NCES, 2001), a three-stage sampling design is employed. The first stage is primary sampling units of single counties or groups of counties. The second stage is schools within counties. The third stage of sampling is students within schools. A process of stratification as well as disproportionate sampling was employed in the first two stages of sampling. Disproportionate sampling was also employed at the third stage of sampling. Clearly, therefore, to obtain unbiased estimates of population parameters, some type of weighting scheme to reflect these design features must be used. The problem of using sampling weights in complex sample designs has been widely studied in the sampling literature (Kish, 1965; Kish & Frankel, 1974; Tryfos, 1996) and will not be discussed in detail in this chapter. Of concern to us, however, is that the issue of sampling weights applied in complex sample surveys have recently been discussed in the literature on structural equation modeling (Asparouhov, 2005; Kaplan & Ferguson, 1999; Stapleton, 2002). This section overviews the recent literature on the incorporation of sampling weights in the structural equation modeling framework and distills the important findings and recommendations that are relevant for the application of structural equation modeling to complex sample surveys. The first systematic study of sampling weights in the structural equation modeling framework was conducted by Kaplan and Ferguson (1999). In their study, Kaplan and Ferguson considered the case of single sample factor analysis applied to a simple random sample taken from a population with a mixture of strata of different sizes. In their design, Kaplan and Ferguson assumed that the size of the population and the size of the strata were known to the investigator, but due to the unequal strata sizes, it is necessary to apply sample weights.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 148
148—STRUCTURAL EQUATION MODELING
To motivate the central ideas in the Kaplan and Ferguson (1999) study, note first that a weight wi is simply the inverse of the probability of sample selection—namely, 1 wi = , pi
[7.15]
where pi is the probability of sample selection. With pi = n/N, where n is the size of the strata and N is the population size, the weights sum to the population sample size N. The weighting scheme studied by Kaplan and Ferguson (1999) was based on the Horvitz-Thompson estimator (Horvitz & Thompson, 1952). The central idea of the Horvitz-Thompson estimator is that when the inclusion probabilities are known, raw sampling weights can be computed and applied to analyses so that unbiased estimates of population parameters can be obtained. A well-known disadvantage to the use of raw sampling weights is that they sum to the population sample size. Clearly, this would have profound effects on the size of standard errors, but in the latent variable context, there would also be profound inflation of goodness-of-fit indices based on the likelihood ratio chi-square. Therefore, it may be preferable to use normalized sampling weights that sum to the actual sample size. In their analysis, Kaplan and Ferguson (1999) compared raw sampling weights to normalized sampling weights using a factor analysis model. Specifically, they employed the PRELIS software program (Jöreskog & Sörbom, 2000) to compute weighted variances and covariances that followed a factor analysis model. The weighted variances and covariances are calculated as Varw =
P
i
wi ðxi − xi Þ2 P i Wi
[7.16]
wi ðxi − xi Þðyi − yi Þ P , i wi
[7.17]
and Covw ðx, yÞ =
P
i
respectively. Normalized weights can also be calculated. A bootstrap design was used by Kaplan and Ferguson to examine the impact of sampling weights in a single group factor analysis framework. The results of the Kaplan and Ferguson study showed that ignoring sampling
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 149
Multilevel Structural Equation Modeling—149
weights led to much greater bias in the population parameter values compared with when either raw or normalized sampling weights are employed. Standard errors appeared to be somewhat underestimated when sampling weights are employed. Interestingly, Kaplan and Ferguson found that although the normalized weighting procedure yielded likelihood ratio chi-square values close to the true value, the remaining goodness-of-fit indices showed no discernible pattern relative regardless of weighting. Kaplan and Ferguson concluded that in terms of applications of latent variable modeling, incorporating sampling weights should be routine practice when the weights are provided or otherwise known. Weighting is crucial for accurate inferences in latent variable models with normalized sampling weights showing potential with regard to standard errors. Although the Kaplan and Ferguson (1999) study may have been the first to systematically examine the use of sampling weights in the latent variable modeling situation, their approach did not consider sampling weights in multilevel structural equation models of the sort discussed in Sections 7.2 and 7.3. Rather, more recent studies have extended that work to multilevel structural equation modeling. A recent study by Stapleton (2002) provided a systematic examination of sampling weights employed in multilevel structural equation models. Stapleton focused on three forms of weighting based on work by Potthoff, Woodbury, and Manton (1992). The first form of weighting employs standard raw weights and produces a sampling variance based on the population sample size. The second form of weighting employs relative weights that summed to the actual sample size, but has been shown to yield downward bias in estimates of sampling variance (Potthoff et al., 1992). The third weighting scheme produces weights that sum to the effective sample size but has been shown by Potthoff et al., to yield unbiased estimates of the sampling variance of the mean. Stapleton argued that the use of the effective sample size weights may address a conjecture by Kaplan and Ferguson that the underestimation of standard errors using relative weights was due to not adjusting the standard errors in the process of ML estimation. In a detailed simulation study using a prototype multilevel structural equation model, Stapleton (2002) corroborated the findings of Kaplan and Ferguson (1999) regarding raw and relative sampling weights but found that effective sampling weights yielded relatively robust estimates without the need to adjust standard errors. A more recent study by Stapleton (2006) examined five approaches for obtaining robust estimates and standard errors when structural equation modeling is applied to data from complex sampling designs. The approaches included (a) robust maximum likelihood estimation ignoring stratification
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 150
150—STRUCTURAL EQUATION MODELING
or clustering, (b) robust maximum likelihood with manual adjustment of standard errors using the square root of the average design effect for each variable in the analysis, (c) calculation of a weighted sample covariance matrix using design-effect adjusted weights, (d) Taylor linearization with pseudomaximum likelihood estimation but with only the cluster level identifier included in the analysis, and (e) linearization and pseudomaximum likelihood with cluster and strata identifiers included in the analysis. The aforementioned five methods were examined under six different sampling designs that correspond to the types of large-scale designs found in education, sociology, and the health sciences. The first design was simple random sampling. The second design was stratified random sampling with equal probability of selection under each strata. The third design was stratified random sampling with unequal selection probabilities. The fourth design was two-stage simple random sampling. The fifth design was two-stage complex sampling, which included unequal probabilities of selection at each stage. Finally, the sixth design was three-stage complex sampling with disproportionate selection and stratification at each level. Within context of a comprehensive Monte Carlo study, Stapleton (2006) concluded that, in general, the use of normalized sample weights for with complex sampling designs resulted in unbiased estimates of population parameters but negatively biased standard errors. Manual adjustment of the standard errors tended to result in over-inflation. Stapleton also concluded that design-effect adjusted weights were not recommended for structural equation modeling applications. With this approach, measurement and structural parameters were found to be unbiased, but residual and disturbance variances were found to be negatively biased. Standard errors were also biased and chi-square tests were underestimated leading to acceptance of the null hypothesis of model fit too often. Stapleton found that the linearization methods provided robust estimates and standard errors and was recommended when structural equation modeling is applied to complex sampling designs. Fortunately, these methods are available in Mplus and also in LISREL. 7.4.1 AN EXAMPLE USING SAMPLING WEIGHTS This example reanalyzes the single-level CFA given in earlier but incorporating student-level sampling weights. The results are given in Table 7.3. We find that there are noticeable differences in the parameter estimates but minor difference in the standard errors. Goodness-of-fit statistics show slightly better fit compared with the unweighted results. Nevertheless, in this example, the differences are not extreme perhaps reflecting the nature of the sampling design for the South Korean sample.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 151
Multilevel Structural Equation Modeling—151 Table 7.3
Results of Confirmatory Factor Analysis (CFA) of PISA 2003 Mathematics Assessment Using Student Sampling Weights Single-Level CFA Without Weights Without Predictors Estimate
SE
With Predictors Estimate
SE
Single-Level CFA With Weights Without Predictors Estimate
SE
With Predictors Estimate
SE
Within-School Model Calculating Mathematics Train timetable
1.000
0.000
1.000
0.000
1.000
0.000
1.000
0.000
Discount %
1.187∗
0.022
1.190∗
0.022
1.179∗
0.023
1.182∗
0.023
Size (m2) of a floor
1.140∗
0.023
1.140∗
0.023
1.141∗
0.022
1.140∗
0.022
Graphs in newspaper
0.909
∗
0.021
∗
0.908
0.021
∗
0.904
0.021
∗
0.903
0.021
Distance on a map
1.184∗
0.028
1.185∗
0.028
1.185∗
0.028
1.189∗
0.028
Petrol consumption rate
0.881∗
0.022
0.883∗
0.022
0.880∗
0.022
0.884∗
0.022
1.000
0.000
1.000
0.000
1.000
0.000
1.000
0.000
Solving equations 3x + 5 = 17 2(x + 3)=(x + 3)(x − 3)
∗
1.060
0.015
∗
1.059
0.015
∗
1.060
0.016
∗
1.060
0.015
Calculating Mathematics on MALE
−0.133
0.016
−0.138
0.016
Solving Equations on MALE
−0.039
0.023
−0.046
0.023
0.284∗
0.009
0.282∗
0.009
∗
∗
Factor Covariances Calculating Mathematics with Solving Equations
0.286∗
0.009
0.284∗
0.009
Model Fit Indices χ2
456.250 (19 df )
526.500 (25 df )
432.962 (19 df )
AIC
86593.8
95130.6
87553.9
500.190 (25 df ) 95164.1
BIC
86758.6
95308.9
87718.9
95342.3
NOTE: SE = standard error; AIC = Akaike information criterion; BIC = Bayesian information criterion. Values are statistically significant at p < . 05.
7.5 Conclusion This chapter provided a review of past studies and recent developments in multilevel latent variable modeling for continuous and categorical latent variables. In the interest of space, some topics were not addressed but which are
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 152
152—STRUCTURAL EQUATION MODELING
nevertheless important. These include multilevel latent variable models for complex sampling designs and multilevel mixture models. To an important extent, multilevel latent variable models address one aspect of a complex sampling design—namely, the sampling that results in nested data. The national and international databases that were used in this study possess a nested data structure, where the sampling design is developed to reflect the natural organizational structure of the system under investigation— such as schools. However, these databases also possess additional complexities related to the sampling design. Specifically, in many cases, there is oversampling of under-represented units in the population. For example, the Early Childhood Longitudinal Study (NCES, 2001) uses a multistage sampling design with a very complex weighting scheme that addresses, among other things, nonresponse, oversampling of Asian and Pacific Islander students, and children who move from one school to another. As a consequence of this complexity, sampling weights must be employed in order to properly address the unequal probabilities of selection into the sample. For a review of these issues in the context of structural equation modeling, see Kaplan and Ferguson (1999), Stapleton (2002), and Asparouhov (2005). Mplus can implement sampling weights, and it is clear from the extant research that the complex sampling design must be taken into account when estimating the parameters of a multilevel structural equation model. This chapter focused on the estimation framework embedded in the Mplus software program. In addition to this framework, Rabe-Hesketh, Skrondal, and Pickles (2004) developed a generalized linear latent and mixed models (GLLAMM) framework for multilevel latent variable modeling that extends generalized linear mixed models (GLMMs) to the latent variable case. The GLLAMM approach is not based on structuring the within- and betweengroups covariance matrices. Rather, GLAMM adopts a univariate approach that specifies a response model, a structural model for the latent variables, and the distribution of the latent variables. The GLLAMM framework, which is implemented in the software program Stata, can handle (a) an arbitrary number of levels, (b) missing data under missing-at-random and not-missing-at-random assumptions, (c) unbalanced designs, (d) random coefficients with unbalanced covariates, (e) flexibility of factor structures including free factor loadings, (f) regressions among latent variables that vary at different levels, and (g) a large variety of response functions, including ordered categorical responses, counts processes, and mixed response types. Recent developments have extended multilevel structural equation modeling to cases with categorical latent variables, including multilevel latent class analysis and multilevel Markov chain models. These extensions rest on merging multilevel modeling ideas with finite mixture models. A demonstration of these methods using data from the Early Childhood Longitudinal Study was
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 153
Multilevel Structural Equation Modeling—153
recently given in Kaplan, Kim, and Kim (in press). However, the linkage of multilevel latent variable models with finite mixture modeling is richer than that considered in the Kaplan et al. chapter—allowing for models of, say students nested within schools, but where there might exist unobserved heterogeneity among schools that can be captured by finite mixture modeling. In the final analysis, multilevel latent variable modeling and its special cases provide a natural framework for cross-sectional studies. The the next chapter, we consider latent growth curve modeling for longitudinal data.
Notes 1. This point is related to a specific counterfactual model of causality based on manipulability theory (see Woodward, 2003). 2. The math achievement variable is calculated using plausible value methodology, in which five plausible values are obtained from a posterior distribution of latent math ability. We used the first plausible value for this analysis.
07-Kaplan-45677:07-Kaplan-45677.qxp
6/24/2008
8:21 PM
Page 154
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 155
8 Latent Growth Curve Modeling
T
hus far, the examples used to motivate the utility of structural equation modeling have been based on cross-sectional data. Specifically, it has been assumed that the data have been obtained from a sample of individuals measured at one point in time. Although it may be argued that most applications of structural equation modeling are applied to cross-sectional data, it can also be argued that most social and behavioral processes under investigation are dynamic, that is, changing over time. In this case, cross-sectional data constitute only a snapshot of an ongoing dynamic process and interest might naturally center on the study of this process. Increasingly, social scientists have access to longitudinal data that can provide insights into how outcomes of interest change over time. Indeed many important data sets now exist that are derived from panel studies (e.g., NCES, 1988; NELS:88; The National Longitudinal Study; The Longitudinal Study of American Youth; and the Early Childhood Longitudinal Study; to name a few). Access to longitudinal data allows researchers to address an important class of substantive questions—namely, the growth and development of social and behavioral outcomes over time. For example, interest may center on the development of mathematical competencies in young children (Jordan, Hanich, & Kaplan, 2003a, 2003b; Jordan, Kaplan, & Hanich, 2002). Or, interest may center on growth in science proficiency over the middle school years. Moreover, in both cases, interest may focus on predictors of individual growth that are assumed to be invariant across time (e.g., gender) or that vary across time (e.g., a student’s absenteeism rate during a school year). This chapter considers the methodology of growth curve modeling—a procedure that has been advocated for many years by researchers such as Raudenbush and Bryk (2002); Rogosa, Brandt, and Zimowski, (1982); and Willett (1988) for the study of intraindividual differences in change (see also Willett & Sayer, 1994). The chapter is organized as follows. First, we consider
155
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 156
156—STRUCTURAL EQUATION MODELING
growth curve modeling as a general multilevel problem. This is followed by the specification of a growth curve model as a latent variable structural equation model. In this section, we consider the problem of how time is measured and incorporated into the model. The next section considers the addition of predictors into the latent growth curve model, as well as using the growth parameters as predictors of proximal and distal outcomes. This is followed by a discussion of growth curve modeling extensions that accommodate multivariate outcomes, nonlinear curve fitting, autoregressive structures. This chapter will not consider other important issues of structural equation modeling to dynamic data. In particular, we will not consider the stationarity of factors in longitudinal factor analysis (e.g., Tisak & Meredith, 1990), nor will we consider recent developments in the merging of time-series models and structural equation models (e.g., Hershberger, Molenaar, & Corneal, 1996). For a detailed account of growth curve modeling, see Bollen and Curran (2006).
8.1 Growth Curve Modeling: A Motivating Example and Basic Ideas To motivate the development of growth curve modeling, let us revisit the input-process-output model in Figure 1.2. A criticism of the input-processoutput model, as diagrammed in Figure 1.2 is that it suggests a static educational system rather than a system that is inherently dynamic. For example, the outcomes of achievement and attitudes are, arguably, constructs that develop and change over time. Therefore, it may be of interest to adopt a dynamic perspective and ask how outcomes change over time and how those changes are influenced by time-invariant and time-varying features of the educational system. In addition to examining the change in any one of these outcomes over time, it may be of interest to examine how two or more outcomes change together over time. For the purposes of the example that will be used throughout this chapter, we study change in science achievement and science attitudes separately and together. To set the framework for this application, Figure 8.1 shows the empirical trajectories for 50 randomly chosen students on the science achievement assessment over the five waves of LSAY. The figure shows considerable variability in both level and trend in science achievement over the waves of LSAY. Figure 8.2 shows the general trend in science attitudes over the five grade levels. Unlike achievement in science, attitudes toward science show a general linear decline over time. The advantage of growth curve modeling is that we can obtain an estimate of the initial level of science
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
9:35 PM
Page 157
IRT Science Achievement Scores
Latent Growth Curve Modeling—157
90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 0
1
2
3
4
LSAY Waves
Science Attitude Scores
Figure 8.1
Fifty Random Science Achievement Observed Trajectories
21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0
1
2
3
4
LSAY Waves
Figure 8.2
Fifty Random Science Attitude Observed Trajectories
achievement and the rate of change over time and link these parameters of growth to time-varying and time-invariant variables. In this example, such predictors will include student gender as well as teacher and parental push variables. However, in addition to applying univariate growth curve models, we also examine how these outcomes vary together in a multivariate growth curve application.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 158
158—STRUCTURAL EQUATION MODELING
8.2 Growth Curve Modeling From the Multilevel Modeling Perspective The specification of growth models can be viewed as falling within the class of multilevel linear models (Raudenbush & Bryk, 2002), where Level 1 represents intraindividual differences in initial status and growth, and Level-2 models individual initial status and growth parameters as a function of interindividual differences. To fix ideas, consider a growth model for a continuous variable such science achievement. We can write a Level-1 equation expressing outcomes over time within an individual as yip = p0p + p1p ti + eip ,
[8.1]
where yip is the achievement score for person p at time i, π0p represents the initial status at time t = 0, π1p represents the growth trajectory, ti represents a temporal dimension that here isassumed to be the same for all individuals— such as grade level, and εip is the disturbance term. Later in this chapter, we consider more flexible alternatives to specifying time metrics. Quadratic growth can also be incorporated into the model by extending the specification as yip = p0p + p1p ti + π2p ti2 + eip ,
[8.2]
where π2p captures the curvilinearity of the growth trajectory. Higher-order terms can also be incorporated. In Section 8.4.2, we explore an alternative to the quadratic growth model in Equation [8.2] by allowing for general nonlinear curve fitting. The specification of Equations [8.1] and [8.2] can be further extended to handle predictors of individual differences in the initial status and growth trajectory parameters. In the terminology of multilevel modeling, individuals would be considered Level-2 units of analysis. In this case, two models are specified, one for the initial status parameter and one for the growth trajectory parameter. Consider, for example, a single time-invariant predictor of initial status and growth for person p, denoted as xp. An example of such a predictor might be socioeconomic status of the student. Then, the Level-2 model can be written as p0p = mp0 + gp0 xp + z0p
[8.3]
p1p = mp1 + gp1 xp + z1p ,
[8.4]
and
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 159
Latent Growth Curve Modeling—159
where mp0 and mp1 are intercept parameters representing population true status and population growth when xp is zero; gp0 and gp1 are slopes relating xp to initial status and growth, respectively. The model specified above can be further extended to allow individuals to be nested in groups such as classrooms. In this case, classrooms become a Level-3 unit of analysis. Finally, the model can incorporate time-varying predictors of change. In the science achievement example, such a time-varying predictor might be changes in parental push or changes in attitudes toward science over time. Thus, this model can be used to study such issues as the influence of classroom-level characteristics and student-level invariant and varying characteristics on initial status and growth in reading achievement over time.
8.3 Growth Curve Modeling From the Structural Modeling Perspective Research by B. Muthén (1991) and Willett and Sayer (1994) have shown how the general growth model described in the previous section can also be incorporated into a structural equation modeling framework. In what follows, the specification proposed by Willett and Sayer (1994) is described. The broad details of the specification are provided; however, the reader is referred to Willett and Sayer’s (1994) article for more detail. The Level-1 individual growth model can be written in the form of the factor analysis measurement model in Equation [4.24] of Chapter 4 as y = τy + Λy η + ε,
[8.5]
where y is a vector representing the empirical growth record for person p. For example, y could be science achievement scores for person p at the 7th, 8th, 9th, 10th, and 11th grades. In this specification, τy is an intercept vector with elements fixed to zero and Λy is a fixed matrix containing a column of ones and a column of constant time values. Assuming that time is centered at the seventh grade,1 the time constants would be 0, 1, 2, 3, and 4. The matrix η contains the initial status and growth rate parameters denoted as π0p and π1p, and the vector ε contains measurement errors, where it is assumed that Cov(ε) is a diagonal matrix of constant measurement error variances. Because this specification results in the initial status and growth parameters being absorbed into the latent variable vector η, which vary randomly over individuals, this model is sometimes referred to as a latent variable growth model (B. Muthén, 1991). The growth factors, as in the multilevel specification, are random variables.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 160
160—STRUCTURAL EQUATION MODELING
Next, it is possible to use the standard structural model specification discussed in Chapter 4 to handle the Level-2 components of the growth model, corresponding. Considering the Level-2 model without the vector of predictor variables x, the model can be written as η = α + Bη + ζ,
[8.6]
where η is specified as before, α contains the population initial status and growth parameters mp0 and mp1 , B is a null matrix, and ζ is a vector of deviations of the parameters from their respective population means. Again, this specification has the effect of parameterizing the true population initial status parameter and growth parameter into the structural intercept vector α. Finally, the covariance matrix of ζ, denoted as Ψ, contains the variances and covariances of true initial status and growth. The Level-2 model given in Equation [8.6] does not contain predictor variables. The latent variable growth model can, however, be extended to include exogenous predictors of initial status and growth. To incorporate exogenous predictors requires using the x-measurement model of the sort described in Chapter 4. Specifically, the model is written as x = τx + Λx ξ + δ,
[8.7]
where here x is a vector of exogenous predictors, τx contains the mean vector, Λx is an identity matrix, ξ contains the exogenous predictors deviated from their means, and δ is a null vector. This specification has the effect of placing the centered exogenous variables in ξ (Willett & Sayer, 1994, p. 374). Finally, the full specification of the structural equation model given in Equation [4.1] can be used to model the predictors of true initial status and true growth, where, due to the centering of the exogenous predictors, it retains its interpretation as the population mean vector of the individual initial status and growth parameters (Willett & Sayer, 1994, p. 375). An important feature of the structural equation modeling approach to growth curve modeling is its flexibility in handling structured errors. That is, the assumption of independent and homoscedastic errors can be relaxed allowing for heteroscedasticity and autocorrelation. In the former case, heteroscedasticity can be incorporated by relaxing the equality constraints among error variances in the diagonal of Θ ε. Autocorrelation can be incorporated into growth curve models by allowing free off-diagonal elements in Θ ε.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 161
Latent Growth Curve Modeling—161
8.3.1 AN EXAMPLE OF UNIVARIATE GROWTH CURVE MODELING The data for this example come from the Longitudinal Study of American Youth (LSAY; Miller, Hoffer, Sucher, Brown, & Nelson, 1992).2 LSAY includes two sets of schools, a national probability sample of approximately 60 high schools (Cohort 1) and approximately 60 middle schools (Cohort 2). An average of 60 10th graders (Cohort 1) and 60 7th graders (Cohort 2) from each of the 60 high schools and middle schools have been followed since 1987, gathering information on students’ family and school background, attitudes, and achievement. In addition to general background information, achievement and attitude measures were obtained. Achievement tests in science and mathematics were given to the students each year. The items for the mathematics and science achievements tests were drawn from the item pool of the 1986 National Assessment of Educational Progress (NAEP) tests (NAEP, 1986). The measure of student attitudes toward science is based on a composite which consists of an equally weighted average of four attitudinal subscales, namely interest, utility, ability, and anxiety. There are nine variables in this composite, for example, “I enjoy science”; “I enjoy my science class”; and so on. Variables were recoded so that high values indicate a positive attitude toward science. The composite is measured on a 0 to 20 metric. For the purposes of this example, we concentrate the younger cohort, measured at grades 7, 8, 9, 10, and 11. In addition to science achievement test scores, we also include gender (male = 1) as a time-invariant predictor. Timevarying predictors include a measure of parent academic push (PAP) and student’s science teacher push (STP). PAP is an equally weighted average of eight variables. Both student and parent responses are used in this composite. Questions asked of the students are related to parental encouragement for making good grades, doing homework, and interest in school activities. Questions asked of the parents were related to their knowledge of their child’s performance, homework, and school projects. This composite is measured on a 0 to 10 metric. Although a composite measure of parent science push composite was available for Cohort 2, it was not used because the items composing this composite were not measured at all the time points. Science teacher push is a composite based on five student response variables referring to teacher encouragement of science. Response values for this composite range from 0 to 5. The sample size for this study was 3,116. Analyses used Mplus (L. Muthén & Muthén, 2006) under the assumption of multivariate normality of the data.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 162
162—STRUCTURAL EQUATION MODELING
Missing data were handled by full information maximum likelihood imputation as discussed in Chapter 5. The analysis proceeds by assessing growth in science achievement and science attitudes separately, then together in a multivariate growth curve model. Growth in Science Achievement. Column 1 of Table 8.1 presents the results of the linear growth curve model without predictors. A path diagram of this model is shown in Figure 8.3. This model is estimated allowing for heteroscedastic but non-autocorrelated disturbances. The initial status is set at Table 8.1
Selected Results of Growth Curve Model of Science Achievement Model 1a
Effect Intercept Slope Var(intercept) Var(slope) r(intercept and slope)
Model 2b
Model 3c
Maximum Likelihood Estimates 50.507∗
50.632∗
47.042∗
2.207∗
1.813∗
1.810∗
71.665∗
68.935∗
67.755∗
2.409∗
1.563∗
1.602∗
−0.392∗
−0.352∗
−0.365
∗
Intercept on gender Slope on gender
0.667
0.736∗
−0.078
−0.083
SCIACH1 on PAP1
0.344∗
SCIACH2 on PAP2
0.340∗
SCIACH3 on PAP3
0.483∗
SCIACH4 on PAP4
0.377∗
SCIACH5 on PAP5
0.278∗
SCIACH1 on STP1
0.109
SCIACH2 on STP2
0.161
SCIACH3 on STP3
0.556∗
SCIACH4 on STP4
0.495∗
SCIACH5 on STP5
0.293∗
BIC
111359.030
116079.778
227152.878
a. Linear growth curve model—no covariates. b. Linear growth curve model—gender as time-invariant covariate. c. Linear growth curve model—gender as time-invariant covariate; parent academic push (PAP) and science teacher push (STP) as time-varying covariates. p < .05.
∗
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 163
Latent Growth Curve Modeling—163
Initial Status
Growth Rate
1 1
1
0
SCIACH1
Figure 8.3
1
1
1
SCIACH2
2
SCIACH3
3
SCIACH4
4
SCIACH5
Initial Growth Curve Model of Science Achievement
seventh grade. The results indicate that the average seventh grade science achievement score is 50.51 and increases an average of 2.21 points a year. The correlation between the initial status and rate of change is negative suggesting the possibility of a ceiling effect. Figure 8.1 presents a random sample of 50 model-estimated science achievement trajectories. Column 2 of Table 8.1 presents the results of the linear growth curve model with gender as a time-invariant predictor of initial status and growth rate. A path diagram of this model is shown in Figure 8.4. The results indicate a significant difference in favor of boys for seventh grade science achievement, but no significant difference between boys and girls in the rate of change over the five grades. Column 3 of Table 8.1 presents the results of the linear growth curve model with the time-varying covariates of PAP and STP included. The results for gender remain the same. A path diagram of this model is shown in Figure 8.5. The results for the time-varying covariates suggest that early PAP is a stronger predictor of early science achievement compared with STP. However, the effects of both time-varying covariates balance out at the later grades.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 164
164—STRUCTURAL EQUATION MODELING
Gender
Initial Status
1
1
0
SC IAC H1
Figure 8.4
Growth Rate
1
1
1
SC IAC H2
1
2
SC IAC H3
3
SC IAC H4
4
SC IAC H5
Growth Curve Model of Science Achievement With Time-Invariant Predictors
Growth in Attitudes Toward Science. Column 1 of Table 8.2 shows the results of the simple linear growth curve model applied to the science attitude data. Path diagrams for this and the remaining models are not shown. The results show a seventh grade average attitude score of 14.25 points (on a scale of 1 to 20) and a small but significant decline over time. Moreover, a strong negative correlation can be observed between initial science attitudes and the change over time. This suggests, as with achievement, that higher initial attitudes are associated with slower change in attitudes over time.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 165
Latent Growth Curve Modeling—165
Gender
Initial Status
1
1
Growth Rate
1
0
SCIACH1
STP1
Figure 8.5
1
1
SCIACH2
PAP2
PAP1
1
2
SCIACH3
PAP3
STP2
3
4
SCIACH4
PAP4
STP3
STP4
SCIACH5
PAP5
STP5
Growth Curve Model of Science Achievement With Time-Invariant and Time-Varying Predictors
Column 2 of Table 8.2 examines sex differences in initial seventh grade science attitudes and sex differences in the rate of decline over time. As with science achievement we observe initial differences in attitudes at seventh grade with boys exhibiting significantly higher positive attitudes compared with girls. However, there appears to be no sex differences in the rate of attitude change over time.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 166
166—STRUCTURAL EQUATION MODELING Table 8.2
Selected Results of Growth Curve Model of Attitudes Toward Science Model 1a
Model 2 b
Effect
Model 3 c
Maximum Likelihood Estimates
Intercept
14.251∗
14.058∗
12.018∗
Slope
−0.095∗
−0.094∗
0.043
∗
∗
Var(intercept)
3.422
3.388
2.889∗
Var(slope)
0.121∗
0.121∗
0.107∗
−0.578∗
−0.578∗
−0.564∗
0.369∗
0.413∗
−0.003
−0.009
r(intercept and slope) Intercept on gender Slope on gender ATTITUDE1 on PAP1
0.148∗
ATTITUDE2 on PAP2
0.113∗
ATTITUDE3 on PAP3
0.086∗
ATTITUDE4 on PAP4
0.077∗
ATTITUDE5 on PAP5
0.091∗
ATTITUDE1 on STP1
0.284∗
ATTITUDE2 on STP2
0.330∗
ATTITUDE3 on STP3
0.332∗
ATTITUDE4 on STP4
0.312∗
ATTITUDE5 on STP5
0.218∗
BIC
69089.797
73588.288
184562.117
a. Linear growth curve model—no covariates. b. Linear growth curve model—gender as time-invariant covariate. c. Linear growth curve model—gender as time-invariant covariate; parent academic push (PAP) and science teacher push (STP) as time-varying covariates. p < .05.
∗
Column 3 of Table 8.2 adds the time-varying covariates to model in Column 2. The results here are somewhat different than those found for achievement. Specifically, we observe that PAP is a relatively weak predictor of science attitudes compared with STP. Moreover, an inspection of correlations between sex and each of the time-varying predictors can be interpreted as representing whether sex differences are occurring for PAP and STP. The results indicate small and mostly nonsignificant sex differences in these time-varying covariates.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 167
Latent Growth Curve Modeling—167
8.4 Extensions of the Basic Growth Curve Model An important feature of growth curve modeling within the structural equation modeling perspective is its tremendous flexibility in handling a variety of different kinds of questions involving growth. In this section, we consider four important extensions of growth curve modeling. First, we consider the multivariate growth curve modeling, including models for parallel and sequential processes. Second, we consider model extensions for nonlinear curve fitting. Third, we consider an extension that incorporates an autoregressive component to the model. Finally, we briefly consider some flexible alternatives to addressing the time metric. It should be noted that these three extensions do not exhaust the range of analytical possibilities with growth curve modeling. For a more comprehensive treatment of the extensions of growth curve modeling, see Bollen and Curran (2006). 8.4.1 MULTIVARIATE GROWTH CURVE MODELING Consider the case where an investigator wishes to assess the relationship between growth in mathematics and reading proficiency. It can be argued that these achievement domains are highly related. Indeed, one may argue that because measures of mathematics proficiency require reading proficiency, reading achievement might be a causal factor for growth in mathematics proficiency. For now, however, we are only interested in assessing how these domains change together. A relatively straightforward extension of the growth curve specification given in Equations [8.5] to [8.7] allows for the incorporation of multiple outcome measures (Willett & Sayer, 1996). Important information about growth in multiple domains arises from an inspection of the covariance matrix of η denoted above as Ψ. Recall that in the case of univariate growth curve modeling the matrix Ψ contains the covariance (or correlation) between the initial status parameter π0 and the growth parameter π1. In the multivariate extension, Ψ contains the measures of association among the initial status and growth rate parameters of each outcome. Thus, for example, we can assess the degree to which initial levels of reading proficiency are correlated with initial proficiency levels in mathematics and also the extent to which initial reading proficiency is correlated with the rates of growth in mathematics. We may also ask whether rates of growth in reading are correlated with rates of growth in mathematics. As in the univariate case, the multivariate case can be easily extended to include time-invariant and time-varying predictors of all the growth curve parameters. If both mathematics and reading proficiency are measured across the same time intervals, then we label this a parallel growth process. However, an
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 168
168—STRUCTURAL EQUATION MODELING
interesting additional extension of multivariate growth curve modeling allows the developmental process of one domain to predict the developmental process of a later occurring outcome (see, e.g., B. Muthén & Curran, 1997). For example, one might argue that development in reading proficiency in first, second, and third grades predict the development of science achievement in fourth, fifth, and sixth grades. For this extension, the decision where to center the level of the process is crucial. One could choose to center initial reading proficiency at first grade and initial science proficiency at fourth grade. However, it may be the case that reading proficiency at first grade shows little variation and thus may not be a useful predictor of initial science proficiency. Perhaps a more sensible strategy would be to center initial reading proficiency at the third grade and center initial science proficiency at fourth grade. One might expect more variation in reading proficiency at the third grade and this variation might be more predictive of science proficiency at the fourth grade. As in the univariate case, the issue of centering will most often be based on substantive considerations. An Example of Multivariate Growth Curve Modeling An inspection of Figures 8.1 and 8.2 suggests the need to study changes in science achievement and science attitudes together. In the interest of space, we fit the full time-invariant and time-varying model to the achievement and attitude data in one analysis. A path diagram for this model is not shown. The results are shown in Table 8.3. The results generally replicate those of the univariate analyses, and in the interest of space, the time-varying covariate results are not shown. However, it is important to focus on the correlations between the growth parameters for achievement and attitudes. The results indicate a positive correlation between seventh grade science achievement and seventh grade science attitudes (r = 0.458). Moreover, we observe that higher rates of growth in science achievement are associated with higher rates of growth in attitudes toward science (r = 0.381). An apparent contradiction arises when considering the negative correlation between initial science achievement and rate of change in science attitudes. Again, an explanation might be a ceiling effect, insofar as higher achievement scores are associated with higher attitudes and therefore attitudes toward science cannot change much more. 8.4.2 NONLINEAR CURVE FITTING In practical applications of growth curve modeling, it might be the case that a nonlinear curve better fits the data. An approach to nonlinear curve fitting, suggested by Meredith and Tisak (1990), entails freeing a set of the factor loadings associated with the slope. Specifically, considering the science achievement
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 169
Latent Growth Curve Modeling—169 Table 8.3
Selected Results of Multivariate Growth Curve Model of Science Achievement and Attitudes Toward Science
Effect Estimates Ach. intercept Ach. slope Att. intercept Att slope Var(ach. intercept)
Maximum Likelihood 46.749∗ 2.325∗ 12.066∗ 0.064 69.775∗
Var(ach. slope)
2.318∗
Var(att. intercept)
3.164∗
Var(att. slope)
0.167∗
r(ach. intercept/att. intercept)
0.458∗
r(ach. intercept/att. slope)
−0.231∗
r(ach. intercept/ach. slope)
−0.394∗
r(att. intercept/att slope)
−0.616∗
r(att. intercept/ach. slope)
−0.178∗
r(ach. slope/att. slope)
0.381∗
Ach. intercept on gender
0.846∗
Ach slope on gender Att intercept on gender Att slope on gender BIC
−0.151 0.416∗ −0.012 295164.119
p < .05.
∗
model, the nonlinear curve fitting approach suggested by Meredith and Tisak would require that the first loading be fixed to zero to estimate the intercept, the second loading would be fixed to one to identify the metric of the slope factor, but the third through fifth loadings would be free. In this case, the time metrics are being empirically determined. When this type of model is estimated, it perhaps makes better sense to refer to the slope factor as a shape factor. An Example of Nonlinear Curve Fitting In this example, we estimate the science achievement growth model allowing estimation of a general shape factor. As suggested by Meredith and Tisak
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 170
170—STRUCTURAL EQUATION MODELING
(1990), we fix the first and second loadings as in the conventional growth curve modeling case and free the loadings associated with the third, fourth, and fifth waves of the study. The results are displayed in Table 8.4. It is clear from an inspection of Table 8.4 that the nonlinear curve fitting model results in a substantial improvement in model fit. Moreover, we find that there are significant sex differences with respect to the intercept in the nonlinear curve fitted model. 8.4.3 AUTOREGRESSIVE LATENT TRAJECTORY MODELS Recently, Bollen and Curran (2004) and Curran and Bollen (2001) advocated the blending of an autoregressive structure into conventional growth curve modeling. They refer to this hybrid model as the autoregressive latent trajectory (ALT) model. Table 8.4
Maximum Likelihood Estimates From Nonlinear Curve Fitting Models Model 0 Estimates
Model 1 Estimates
Intercept by Ach1
1.000
1.000
Ach2
1.000
1.000
Ach3
1.000
1.000
Ach4
1.000
1.000
Ach5
1.000
1.000
Ach1
0.000
0.000
Ach2
1.000
1.000
Ach3
3.351
3.299
Ach4
3.928
3.869
Ach5
5.089
Shape by
Ach. intercept Ach. shape r(shape, intercept)
5.004 ∗
49.966∗
1.693∗
1.770∗
−0.397
−0.398∗
50.360
∗
Intercept on 0.737∗
Male Shape on
−0.091
Male BIC p < .05.
∗
111240.859
115769.718
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 171
Latent Growth Curve Modeling—171
It is not difficult to make the case for specifying an ALT model for developmental research studies. Consider the example used throughout this chapter where the focus is on modeling the development of science proficiency through the middle and high school years. We can imagine that interest centers on how change in science proficiency predicts later outcomes of educational relevance—such as majoring in science-related disciplines in college. It is not unreasonable, therefore, to assume that in addition to overall growth in science proficiency prior science scores predict later science scores thus suggesting an autoregressive structure. In the case of long periods between assessment waves, we might reasonably expect small autoregressive coefficients, as opposed to more closely spaced assessment waves. Nevertheless, if the ALT model represents the true data generating structure, then omission of the autoregressive part may lead to substantial parameter bias. A recent article by Sivo, Fan, and Witta (2005) found extensive bias for all parameters of the growth curve model as well as biases in measures of model fit when a true autoregressive component was omitted from the analysis. For the purposes of this chapter, we focus on the baseline lag-1 ALT model with a time-invariant predictor. This will be referred to as the ALT(1) model. The ALT(1) specification indicates that the outcome at time t is predicted only by the outcome at time t − 1. It should be noted lags greater than one can also be specified. As with conventional growth curve modeling, the ALT model can be extended to include more than one outcome, each having its own autoregressive structure, as well as extensions that include proximal or distal outcomes and time-varying and timeinvariant predictors. To contextualize the study consider the example of an ALT model for the development of reading competencies in young children. The first model is a baseline lag-1 ALT model. This model can be written in structural equation modeling notation as y = α + Λη + By + δ,
[8.8]
η = τ + Γη + ζ,
[8.9]
where y is a vector of repeated measures, Λ is a matrix of fixed coefficients that specify the growth parameters, η is a vector of growth parameters, B is a matrix of regression coefficients relating the repeated measures to each other, and δ is a vector of residual variances with covariance matrix Cov(δ δ′) = Θ. A path diagram of ALT(1) model is shown in Figure 8.6.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 172
172—STRUCTURAL EQUATION MODELING
Growth Rate
Initial Status
1
1
0
SCIACH1
Figure 8.6
1
1
1
1
2
SCIACH2
SCIACH3
3
SCIACH4
4
SCIACH5
Autoregressive Latent Trajectory(1) [ALT(1)] Model of Science Achievement
An Example of an ALT Model For this example, we estimate an ALT(1) among the science scores from the LSAY example. Model 0 of Table 8.5 displays the results for the ALT(1) model without the addition of gender as a time-invariant predictor. It can be seen that the autocorrelation effects are small but statistically significant. Model 1 under Table 8.5 adds gender to the ALT model. The addition of gender in Model 1 appears to worsen the overall fit of the model as evidenced by the increase in the BIC.
8.4.4 ALTERNATIVE METRICS OF TIME Up to this point, we have assumed a highly restrictive structure to the data. Specifically, we have assumed that each wave of measurement is equidistant and that we have complete data on all units of analysis at each time point. In many cases, this assumption is too restrictive and we need a way of handling more realistic time structures. For example, in developmental research, the
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 173
Latent Growth Curve Modeling—173 Table 8.5
Maximum Likelihood Estimates From Autoregressive Latent Trajectory Models Model 0 Estimates
Model 1 Estimates
Ach5 ON Ach4
0.135∗
0.135∗
0.103∗
0.103∗
0.102∗
0.102∗
0.031∗
0.031∗
∗
49.911∗
0.246∗
0.329∗
−0.575
−0.573∗
Ach4 ON Ach3 Ach3 ON Ach2 Ach2 ON Ach1 Ach. intercept Ach. slope r(slope, intercept)
50.335
∗
Intercept on 0.814∗
Male Slope on
−0.156
Male BIC
111195.653
115723.216
p < .05.
∗
wave of assessment may not be nearly as important as the chronological age of the child. In this case, there might be a quite a bit of variability in chronological ages at each wave of assessment. In other cases, the nature of the assessment design is such that each child has his or her own unique interval between testing. In this section, we introduce two approaches that demonstrate the flexibility in dealing with the time metric in longitudinal studies: The cohort sequential design and the individual varying metrics of time design. The Cohort Sequential Design. In cohort sequential designs, we consider age cohorts within a particular time period (Bollen & Curran, 2006). Thus, at Wave 1 of the study, we may have children who vary in age from 5 years to 7 years. At Wave 2 of the study, we may have children varying in age from 7 years to 9 years years, and so on. Notice, there is an overlap of ages at each wave. As Bollen and Curran (2006) point out, there are two ways that this type of data structure can be addressed. First, we can go back to treating wave as the metric of time and use age of respondent as a covariate in the study. The
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 174
174—STRUCTURAL EQUATION MODELING
second approach is to exploit the inherent missing data structure. In this case, we could arrange the data as shown in Table 8.6 patterned after Bollen and Curran (2006, p. 77). Notice that there are three cohorts and five time points. Any given child in this example can provide between one and four repeated measures. The pattern of missing data allows estimation using maximum likelihood imputation under the assumption of missing-at-random (Allison, 1987; Arbuckle, 1996; Muthén et al., 1987). Thus, the growth parameters spanning the entire time span can be estimated. As Bollen and Curran (2006) point out, however, this approach suffers from the potential of cohort effects. That is, children in Cohort 1 may have been 7 years old at the second wave of assessment, but children in Cohort 2 would have been 7 at the first wave of assessment. Individually Varying Metrics of Time. Perhaps a more realistic situation arises when individuals have their own unique spacing of assessment waves. An example of this would be the situation where a researcher is collecting individual longitudinal assessments in schools. At the beginning of the semester, the researcher and his or her assistants begin data collection. Because it is probably not feasible that every child in every sampled school can be assessed on exactly the same day, the assessment times may spread over, say, a 2-week period. At the second wave of assessment, the first child assessed at Wave 1 is not necessarily the same child assessed at Wave 2. Indeed, in the worst case scenario, if the first child assessed at Wave 1 is the last child assessed at Wave 2, the length of time between assessments will be much greater than if the child is the last one assessed at Wave 1 and the first assessed at Wave 2. Although I have presented the extreme case, the consequences for a study of development, especially in young children, would be profound. A better approach is to mark the date of assessment for each child and use the time between assessments for each child as his or her own unique metric of time. Time can be measured in days, weeks, or months, with the decision based on developmental considerations.
Table 8.6
Cohort Sequential Data Structure Age of Assessment
Cohort
Time 1
Time 2
Time 3
Time 4
1
6
7
8
9
2
7
8
9
10
3
8
9
10
11
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 175
Latent Growth Curve Modeling—175
8.5 Evaluating Growth Curve Models Using Forecasting Statistics It may be useful to consider if there are aspects of model fit that are pertinent to the questions being addressed via the use latent growth curve models. Clearly, we can apply traditional statistical and nonstatistical measures of fit, such as the likelihood ratio chi-square, RMSEA, NNFI, or the like. In many cases, the Bayesian information criterion is used to compare latent growth curve models as well. However, these measures of fit are capturing whether the restrictions that are placed on the data to provide estimates of the initial status and growth rate are supported by the data. In addition, these measures are assessing whether such assumptions as non-autocorrelated errors are supported by the data. The application of traditional statistical and nonstatistical measures of fit does provide useful information. However, because growth curve models provide estimates of rates of change, it may be useful to consider whether the model predicted growth rate fits the empirical trajectory over time. So, for example, if we know how science achievement scores have changed over the five waves of LSAY, we may wish to know if our growth curve model accurately predicts the known growth rate. In the context of economic forecasting, this exercise is referred to as ex post simulation. The results of an ex post simulation exercise is particularly useful when the goal of modeling is to make forecasts of future values. To evaluate the quality and utility of latent growth curve models, Kaplan and George (1998) studied the use of six different ex post (historical) simulation statistics originally proposed by Theil (1966) in the domain of econometric modeling. These statistics evaluate different aspects of the growth curve. The first of these statistics discussed by Kaplan and George was the root mean square simulation error (RMSSE) as rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi RMSSE =
T 1 ðy s T t =1 t
− yta Þ2 ,
[8.10]
where T is the number of time periods, y ts is the simulated (i.e., predicted) value at time t, and y ta is the actual value at time t. The RMSSE provides a measure of the deviation of the simulated growth record from the actual growth record and is the measure most often used to evaluate simulation models (Pindyck & Rubinfeld, 1991). Another measure is the root mean square percent simulation error (RMSPE), which scales the RMSSE by the average size of the variable at time t.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 176
176—STRUCTURAL EQUATION MODELING
The RMSPE is defined as RMSPE =
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi T 1 T t =1
yts − yta yta
:
[8.11]
A problem with the RMSPE is that its scale is arbitrary. Although the lower bound of the measure is zero, the upper bound is not constrained. Thus, it is of interest to scale the RMSSE to lie in the range of 0 to 1. A measure that lies between 0 and 1 is Theil’s inequality coefficient, defined as rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi T 1 ðy s T t =1 t
U = rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi T 1 ðy s Þ2 T t =1 t
+
− yta Þ2 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : T 1 ðy a Þ2 T t =1 t
[8.12]
An inspection of Equation [8.12] shows that perfect fit of the simulated growth record to the actual growth record is indicated by a value U = 0. However, if U = 1, the simulation adequacy is as poor as possible. An interesting feature of the inequality coefficient in Equation [8.12] is that can be decomposed into components that provide different perspectives on the quality of simulation performance. The first component of Theil’s U is the bias proportion, defined as
UM =
ð y s − y a Þ2 1
T
,
ðyts − yta Þ2
[8.13]
T t =1
where ys and ya are the means of the simulated and actual growth record, respectively, calculated across the T time periods. The bias proportion provides a measure of systematic error because it considers deviations of average actual values from average simulated values (Pindyck & Rubinfeld, 1991). The ideal would be a value of U M = 0. Values greater than 0.1 or 0.2 are considered problematic. Another component of Theil’s U is the variance proportion defined as
US =
ðss − sa Þ2 , 1 T s − y a Þ2 ðy t t T t =1
[8.14]
08-Kaplan-45677:08-Kaplan-45677.qxp
6/25/2008
10:39 AM
Page 177
Latent Growth Curve Modeling—177
where σs and σa are the standard deviations of the respective growth records calculated across the T time periods. The variance proportion provides a measure of the extent to which the model tracks the variability in the growth record. If U S is large, it suggests that the actual (or simulated) growth record varied a great deal while the simulated (or actual) growth record did not deviate by a comparable amount. A final measure based on the decomposition of the inequality coefficient is the covariance proportion, defined as
UC =
2ð1 − rÞss sa , T 1 s − y a Þ2 ðy t t T
[8.15]
t =1
where ρ is the correlation coefficient between y ts and y at. The covariance proportion U C provides a measure of unsystematic error, that is, error that remains after having removed deviations from average values. The decomposition of U results in the relation U M + U S + U C = 1,
[8.16]
and an ideal result for a simulation model would be U M = 0, U S = 0, and U C = 1. Values greater than zero for U M and/or U S are indicative of some problem with the model vis-à-vis tracking the empirical growth record. 8.5.1 COMPARISON OF STANDARD GOODNESS-OF-FIT TESTS AND FORECASTING STATISTICS FOR SCIENCE ACHIEVEMENT GROWTH MODEL Table 8.7 displays the forecasting statistics described above for the science achievement model. It can be seen that the simple linear trend model (Model 1) demonstrates the best historical forecasting performance as measured by all six forecasting statistics. Model 2 incorporates the time-invariant predictor of gender. Here, it can be seen that historical forecasting performance worsens. When time-varying predictors of teacher push and parent push are added to account for the variability in the growth curve, the historical forecasting performance improves as measured by U S as expected. Figure 8.7 compares the observed growth in science achievement with the modelpredicted growth curves.
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 178
178—STRUCTURAL EQUATION MODELING Table 8.7
Observed and Predicted Science Achievement Means and Forecasting Statistics for the Science Achievement Model Predicted Means
Observed Means
Model 1
Model 2
Model 3
7
50.345
50.507
50.591
49.953
8
52.037
52.714
52.404
51.748
Grade
9
56.194
54.921
54.217
55.722
10
56.840
57.128
56.030
56.511
11
58.970
59.335
57.843
57.021
0.681 0.012 0.006 0.004 0.014 0.982
1.098 0.019 0.010 0.361 0.409 0.230
0.935 0.016 0.008 0.539 0.202 0.259
Forecasting Statistics RMSSE RMSPSE U UM US UC 60
Science Achievement Score
58
56
54
52
50
48 7
8
9
10
11
Grade
Figure 8.7
Observed Means
Model 1
Model 2
Model 3
Observed Versus Model-Predicted Science Achievement Means
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 179
Latent Growth Curve Modeling—179
8.6 Conclusion This chapter focused on the extension of structural equation modeling to the study of growth and change. We outlined how growth can be considered a multilevel modeling problem with intraindividual differences in students modeled at Level 1, individual differences modeled at Level 2, and individuals nested in groups modeled Level 3. We discussed how this specification could be parameterized as a structural equation model and discussed how the general model could be applied to (a) the study of growth in multiple domains, (b) the study of binary outcomes, and (c) intervention studies. In addition to the basic specification, we also discussed approaches to the evaluation of growth curve models—focusing particularly on the potential of growth curve modeling for prediction and forecasting. We argued that growth curve modeling could be used to develop predictions of outcomes at future time points, and we discussed the use of econometric forecasting evaluation statistics as an alternative to more traditional forms of model evaluation.
Notes 1. Clearly, other choices of centering are possible. Centering will not affect the growth rate parameter but will affect the initial status parameter. 2. LSAY was a National Science Foundation funded national longitudinal study of middle and high school students. The goal of LSAY was to provide a description of students’ attitudes toward science and mathematics focusing also on these areas as possible career choices (Miller et al., 1992, p. 1).
08-Kaplan-45677:08-Kaplan-45677.qxp
6/24/2008
8:31 PM
Page 180
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 181
9 Structural Models for Categorical and Continuous Latent Variables
T
his chapter describes what can be reasonably considered the state of the art in structural equation modeling—namely, structural equation models that combine categorical and continuous latent variables for cross-sectional and longitudinal designs. The comprehensive modeling framework described in this chapter rests on the work of B. Muthén (2002, 2004),which builds on the foundations of finite mixture modeling (e.g., McLachlan & Peel, 2000) and conventional structural equation modeling for single and multiple groups as described in Chapter 4. It is beyond the scope of this chapter to describe every special case that can be accommodated by the general framework. Rather, this chapter touches on a few key methods that tie into many of the previous chapters. The organization of this chapter is as follows. First, we set the stage for the applications of structural equation modeling for categorical and continuous latent variables with a brief review of finite mixture modeling and the expectation-maximization (EM) algorithm, following closely the discussion given in McLachlan and Peel (2000). This is followed by a discussion of applications of finite mixture modeling for categorical outcomes leading to latent class analysis and variants of Markov chain modeling. Next, we discuss applications of finite mixture modeling to the combination of continuous and categorical outcomes, leading to growth mixture modeling. We focus solely on growth mixture modeling because this methodology encompasses structural equation modeling, factor analysis, and growth curve modeling for continuous outcomes. The chapter closes with a brief overview of other extensions of the general framework that relate to previous chapters of this book.
181
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 182
182—STRUCTURAL EQUATION MODELING
9.1 A Brief Overview of Finite Mixture Modeling The approach taken to specifying models that combine categorical and continuous latent variables is finite mixture modeling. Finite mixture modeling relaxes the assumption that a sample is drawn from a population characterized by a single set of parameters. Rather, finite mixture modeling assumes that the population is composed of a mixture of unobserved subpopulations characterized by their own unique set of parameters. To fix notation, let z = ðz01 , z02 , . . . , z0n Þ0 denote the realized values of a p-dimensional random vector Z = ðZ01 , Z02 , . . . Z0n Þ0 based on a random sample of size n. An element Zi of the vector Z has an associated probability density function f(zi). Next, define the finite mixture density as f ðzi Þ =
K X
pk fk ðzi Þ,
ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ,
[9.1]
k=1
where fk ðzi Þ are component densities with mixing proportions ð0 ≤ pk ≤ 1Þ and Σkk = 1 πk = 1. It may be instructive to consider how data are generated from a K-class finite mixture model.1 Following McLachlan and Peel (2000), consider a categorical random variable Ci, referred to here as a class label, which takes on values 1, 2, . . . , K with associated probabilities p1 , p2 , . . . , pk . In this context, the conditional density of Zi given that the class label Ci = k is fk ðzi Þ and the marginal density of Zi is f (zi). We can arrange the class label indicators in a K-dimensional vector denoted as C = ðC01 , C02 , . . . , C0n Þ0 with corresponding realizations c = ðc1 , c2 , . . . , cn Þ0 . Here, the elements of ci are all zero except for one element whose value is unity indicating that zi belongs to the kth mixture class. It follows then, that the K-dimensional random vector Ci possesses a multinomial distribution, namely, Ci ∼ MultK ð1, πÞ,
[9.2]
where the elements of π defined earlier arise from the fact that c
c
c
prfCi = ci g = p11i p22i pKKi :
[9.3]
A practical way of conceptualizing the finite mixture problem is to imagine that the vector Zi is drawn from population J consisting of K groups ðJ1 , J2 , . . . , JK Þ with proportions p1 , p2 , . . . , pk . Then, the density function of Zi in group Jk given Ci = k is fk ðzi Þ for k = 1, 2, . . . , K . Note that the proportion
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 183
Structural Models for Categorical and Continuous Latent Variables—183
πk can be thought of as the prior probability that individual i belongs to mixture class k. Thus, from Bayes’s theorem, the posterior probability that individual i belongs to class k given the data zi can be written as τk ðzi Þ =
pk fk ðzi Þ , f ðzi Þ
ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ:
[9.4]
Estimated posterior probabilities from Equation [9.4] provide one approach for assessing the adequacy of the finite mixture model, as will be demonstrated in the examples below. In the context of this chapter, it is necessary to provide a parametric form of the finite mixture model described in this section. The parametric form of the finite mixture model in Equation [9.1] can be written as f ðzi ; ΩÞ =
K X
pk fk ðzi ; yk Þ,
ði = 1, 2, . . . , n; k = 1, 2, . . . , K Þ,
k=1
[9.5]
where Ω is a parameter vector containing the unknown parameters of the mixture model, namely, Ω = ðp1 , p2 , . . . , pK − 1 , ΘÞ,
[9.6]
where Θ contains the parameters θ1 , θ2 , . . . , θK , and where π = ðp1 , p2 , . . . , pK Þ
[9.7]
is the vector of mixing proportions defined earlier. Because the probabilities in Equation [9.7] sum to unity, one of them is redundant as represented in Equation [9.6]. As outlined below, the vector Θ will contain the parameters of the various models under consideration—such as growth mixture models. For now, we consider Θ to be any general parameter vector whose elements are distinct from π.
9.2 The Expectation-Maximization Algorithm Standard estimation algorithms for structural equation models, such as maximum likelihood (ML) and the class of weighted least squares estimators, were covered in Chapter 2. The method of estimation typically employed for finite mixture models is ML using the EM algorithm. The EM algorithm was originally developed as a means of obtaining maximum likelihood estimates in the
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 184
184—STRUCTURAL EQUATION MODELING
context of incomplete data problems (Dempster et al., 1977; see also Little & Rubin, 2002). However, it was soon recognized that a wide array of statistical models, including the latent class model, could be conceptualized as incomplete data problems, including finite mixture modeling. Specifically, in context of finite mixture models, the component label vector c is not observed. The EM algorithm proceeds by specifying the complete-data vector, denoted here as zcomp = ðz0 , c0 Þ0 :
[9.8]
The complete-data log-likelihood must account for the distribution of the class-label indicator vector as well as the distribution of the data. Thus, from Equation [9.5], the complete data log-likelihood for Ω can be written as log Lcomp ðΩÞ =
K X n X
cik flog pk + log fk ðzi jθÞg,
k=1 i=1
[9.9]
where cik is an element of c. The form of Equation [9.9] shows the role of cik as an indicator of whether individual i is a member of class k. The EM algorithm involves two steps. The E-step begins by taking the conditional expectation of Equation [9.9] given the observed data z using the current estimates of Ω based on a set of starting values, say Ω(0). Following McLachlan and Peel (2000), the conditional expectation is written as QðΩ; Ωð0Þ Þ = EΩð0Þ flog Lcomp ðΩÞjzg:
[9.10]
Let Ω(m) be the updated value of Ω after the mth iteration of the EM algorithm. Then the E-step on the (m + 1)th iteration calculates Q(Ω, Ω(m)). With regard to the class-label vector c, the E-step of the EM algorithm computes the conditional expectation of Cik given z, where Cik is an element of C. Specifically, on the (m + 1)th iteration, the E-step computes EΩðmÞ ðCik jzÞ = pr ΩðmÞ fCik = 1jzg = τk ðzi ; ΩðmÞ Þ,
[9.11]
where τk ðzi ; ΩðmÞ Þ is the posterior probability of class membership defined in Equation [9.4]. The M-step of the EM algorithm maximizes Q(Ω, Ω(m)) with respect to Ω providing the updated estimate Ω(m + 1). Note that the E-step replaces cik in Equation [9.9] with τk ðzi ; ΩðmÞ Þ. Therefore, the updated estimate of the mixing proportion for class k, denoted as piðm + 1Þ is
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 185
Structural Models for Categorical and Continuous Latent Variables—185
ðm + 1Þ
pk
=
n X
τk ðzi ; ΩðmÞ Þ=n ðk = 1, 2, . . . , K Þ:
i=1
[9.12]
9.3 Cross-Sectional Models for Categorical Latent Variables In this section, we discuss models for categorical latent variables, with applications to cross-sectional and longitudinal designs. This section is drawn from Kaplan (in press). To motivate the use of categorical latent variables consider the problem of measuring reading ability in young children. Typical studies of reading ability measure reading on a continuous scale. Using the methods of item response theory (see, e.g., Hambleton & Swaminathan, 1985), reading measures are administered to survey participants on multiple occasions, with scores equated in such a way as to allow for a meaningful notion of growth. However, in large longitudinal studies such as the Early Childhood Longitudinal Study (NCES, 2001), not only are continuous scale scores of total reading proficiency available for analyses but also mastery scores for subskills of reading. For example, a fundamental subskill of reading is letter recognition. A number of items constituting a cluster that measures letter recognition are administered, and, according to the ECLS-K scoring protocol, if the child receives 3 out of 4 items in the cluster correct, then the child is assumed to have mastered the skill with mastery coded “1” and nonmastery coded as “0.” Of course, there exist other, more difficult, subskills of reading, including beginning sounds, ending sounds, sight words, and words in context with subskill cluster coded for mastery. Assume for now that these subskills tap a general reading ability factor. In the context of factor analysis, a single continuous factor can be derived that would allow children to be placed somewhere along the factor. Another approach might be to derive a factor that serves to categorize children into mutually exclusive classes on the latent reading ability factor. Latent class analysis is designed to accomplish this categorization.
9.3.1 LATENT CLASS ANALYSIS Latent class models were introduced by Lazarsfeld and Henry (1968) for the purposes of deriving latent attitude variables from responses to dichotomous survey items. In a traditional latent class analysis, it is assumed that an individual belongs to one and only one latent class, and that given the individual’s latent class membership, the observed variables are independent of one another—the so-called local independence assumption (see Clogg, 1995). The
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 186
186—STRUCTURAL EQUATION MODELING
latent classes are, in essence, categorical factors arising from the pattern of response frequencies to categorical items, where the response frequencies play a role similar to that of the correlation matrix in factor analysis (Collins, Hyatt, and Graham, 2000). The analogues of factor loadings are probabilities associated with responses to the manifest indicators given membership in the latent class. Unlike continuous latent variables, categorical latent variables serve to partition the population into discrete groups based on response patterns derived from manifest categorical variables. 9.3.2 SPECIFICATION, IDENTIFICATION, AND TESTING OF LATENT CLASS MODELS The latent class model can be written as follows. Let
Pijkl =
A X a=1
da rija rjja rkja rlja ,
[9.13]
where δa is the proportion of individuals in latent class a. The parameters ρi|a, ρj|a, ρk|a, and ρl|a are the response probabilities for items i, j, k, and l, respectively conditional on membership in latent class a. In the case of the ECLS-K reading example, there are five dichotomously scored reading subskill measures, which we will refer to here as A, B, C, D, and E. Denote the response options for each of the measures respectively by i, j, k, l, and m, (i = 1, . . . , I; j = 1, . . . , J; k =1, . . . , K; l =1, . . . , L; m =1, . . . , M) and denote the categorical latent variable as ξ. Then, the latent class model can be written as Ajx Bjx Cjx Djx
pABCDEx = pxc pic pjc pkc plc pEjx mc , ijklmc
[9.14]
x
where pc is the probability that a randomly selected child will belong to latent Ajx class c (c = 1, 2, . . . , C) of the categorical latent variable ξ, pic is the conditional probability of response i to variable A given membership in latent Bjx Cjx Djx class c, and pjc , pkc , plc , and pEjx mc are likewise the conditional probabilities for items B, C, D, and E, respectively. For this example, the manifest variables are dichotomously scored, and so there are two response options for each item.2 Identification of a latent class model is typically achieved by imposing the constraint that the latent classes and the response probabilities that serve as indicators of the latent classes sum to 1.0—namely, that X c
pxc =
X i
Ajx
pic =
X j
Bjx
pjc =
X k
Cjx
pkc =
X l
Djx
plc =
X m
Ejx pmc = 1:0, [9.15]
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 187
Structural Models for Categorical and Continuous Latent Variables—187
where the first term on the left-hand side of Equation [9.15] indicates that the latent class proportions must sum to 1.0, and the remaining terms on the lefthand side of Equation [9.15] denote that the latent class indicator variables sum to 1.0 as well (McCutcheon, 2002).3 To continue with our reading example, suppose that we hypothesize that the latent class variable ξ is a measure of reading ability with three classes (1 = advanced reading ability, 2 = average reading ability, and 3 = beginning reading ability). Assume also that we have a random sample of first semester kindergarteners. Then, we might find that a large proportion of kindergartners in the sample who show mastery of letter recognition (items A and B, both coded 1/0) are located in the beginning reading ability class. A smaller proportion of kindergartners demonstrating mastery of ending sounds and sight words might be located in the average reading ability class, and still fewer might be located in the advanced reading class. Of course at the end of kindergarten and hopefully by the end of first grade, we would expect to see the relative proportions shift.4 An Example of Latent Class Analysis The following example comes from Kaplan and Walpole (2005) using data from the Early Childhood Longitudinal Study: Kindergarten Class of 1998–1999 (NCES, 2001). The ECLS-K database provides a unique opportunity to estimate the prospects of successful reading achievement (which Kaplan and Walpole define as the ability to comprehend text) by the end of first grade for children with different levels of entering skill and different potential barriers to success. The ECLS-K data available for their example include longitudinal measures of literacy achievement for a large and nationally representative sample—a sample unprecedented in previous investigations of early reading development. Data used in the Kaplan and Walpole (2005) example consist of the kindergarten base year (Fall 1998/Spring 1999) and first grade follow-up (Fall 1999/Spring 2000) panels of ECLS-K. Only first-time public school kindergarten students who were promoted to and present at the end of first grade were chosen for this study. The sample size for their example was 3,575.5 The measures used in their example consisted of a series of reading assessments. Using an item response theory framework, the reading assessment yielded scale scores for (1) letter recognition, (2) beginning sounds, (3) ending sounds, (4) sight words, and (5) words in context. In addition to reading scale scores, ECLS-K provides transformations of these scores into probabilities of proficiency as well as dichotomous proficiency scores, the latter which Kaplan and Walpole used in their study. The reading proficiencies were assumed to follow a Guttman simplex model, where mastery at a specific skill level implies mastery at all previous skill levels. Details regarding the construction of these proficiency scores can be found in Kaplan and Walpole (2005).
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 188
188—STRUCTURAL EQUATION MODELING
Table 9.1 presents the response probabilities measuring the latent classes for each wave of the study separately. The interpretation of this table is similar to the interpretation of a factor loading matrix. The pattern of response probabilities across the subsets of reading tests suggest the labels that have been given to the latent classes—namely, low alphabet knowledge (LAK), early word reading (EWR), and early word comprehension (ERC). The extreme differences across time in the likelihood ratio chi-square tests are indicative of sparse cells, particularly occurring at spring kindergarten. For the purposes of this chapter, I proceed with the analysis without attempting to ameliorate the problem. Table 9.1
Response Probabilities and Class Proportions for Separate Latent Class Models: Total Sample Subtest Response Probabilitiesa LRb
BS
ES
SW
WIC
Class Proportions
LAKc
0.47
0.02
0.01
0.00
0.00
0.67
EWR
0.97
0.87
0.47
0.02
0.00
0.30
ERC
1.00
0.99
0.98
0.97
0.45
0.03
LAK
0.56
0.06
0.00
0.00
0.00
0.24
EWR
0.99
0.92
0.63
0.05
0.00
0.62
ERC
0.00
0.99
0.99
0.96
0.38
0.14
Latent Class
χ2LR (29 df)
Fall K 3.41
Spring K 4831.89∗
Fall First LAK
0.52
0.08
0.01
0.00
0.00
0.15
EWR
1.00
0.92
0.71
0.05
0.03
0.59
ERC
1.00
0.99
0.98
0.98
0.42
0.26
LAK
0.19
0.00
0.00
0.00
0.00
0.04
EWR
0.98
0.90
0.79
0.35
0.00
0.18
ERC
1.00
0.99
0.98
0.99
0.60
0.78
11.94
Spring First 78.60∗
a. Response probabilities are for passed items. Response probabilities for failed items can be computed from 1 − prob (mastery). b. LR = letter recognition, BS = beginning sounds, ES = ending letter sounds, SW = sight words, WIC = words in context. c. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension. p < .05. Extreme value likely due to sparse cells.
∗
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 189
Structural Models for Categorical and Continuous Latent Variables—189
The last column of Table 9.1 presents the latent class membership proportions across the four ECLS-K waves for the full sample. We see that in fall of kindergarten, approximately 67% of the cases fall into the LAK class, whereas only approximately 3% of the cases fall into the ERC class. This breakdown of proportions can be compared with the results for Spring of first grade; by that time, only 4% of the sample are in the LAK class, whereas approximately 78% of the sample is in the ERC class.
9.4 Longitudinal Models for Categorical Latent Variables: Markov Chain Models The example of latent class analysis given in the previous sections presented results over the waves of the ECLS-K but treated each wave cross-sectionally. Nevertheless, it could be seen from Table 9.1 that response probabilities did change over time as did latent class membership proportions. Noting these changes, it is important to have a precise approach to characterizing change in latent class membership over time. In this section, we consider changes in latent class membership over time. We begin by describing a general approach to the study of change in qualitative status over time via Markov chain modeling, extended to the case of latent variables. This is followed by a discussion of latent transition analysis, a methodology well-suited for the study of stagesequential development. 9.4.1 IDENTIFICATION, ESTIMATION, AND TESTING OF MARKOV CHAIN MODELS In this section, we briefly discuss the problem of parameter identification, estimation, and model testing in Markov chain models. As with the problem of identification in factor analysis and structural equation models, identification in Markov chain models is achieved by placing restrictions on model. With regard to manifest Markov chains, identification is not an issue. All parameters can be obtained directly from manifest categorical responses. In the context of latent Markov chain models with a single indicator, the situation is somewhat more difficult. Specifically, identification is achieved by restricting the response probabilities to be invariant over time. As noted by Langeheine & Van de Pol (2002), this restriction simply means that measurement error is assumed to be equal over time. For four or more time points, it is only required that the first and last set of response frequencies be invariant. As with latent class analysis, parameters are estimated via ML using the EM algorithm as discussed in Sections 9.1 and 9.2.
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 190
190—STRUCTURAL EQUATION MODELING
After obtaining estimates of model parameters, the next step is to assess whether the specified model fits the data. In the context of Markov chain models and latent class extensions, model fit is assessed by comparing the observed response proportions against the response proportions predicted by the model. Two statistical tests are available for assessing the fit of the model based on comparing observed versus predicted response proportions. The first is the classic Pearson chi-square statistic. As an example from the latent class framework, the Pearson chi-square test can be written as w2 =
X ðFijkl − fijkl Þ2 ijkl
fijkl
,
[9.16]
where Fijkl are the observed frequencies of the IJKL contingency table and fijkl are the expected cell counts. The degrees of freedom are obtained by subtracting the number of parameters to be estimated from the total number of cells of the contingency table that are free to vary. In addition to the Pearson chi-square test, a likelihood ratio statistic can be obtained that is asymptotically distributed as chi-square, where the degrees of freedom are calculated as with the Pearson chi-square test. Finally, the Akaike information criterion (AIC) and Bayesian information criterion (BIC) discussed in Chapter 6 can be used to choose among competing models. 9.4.2 THE MANIFEST MARKOV MODEL The manifest Markov model consists of a single chain, where predicting the current state of an individual only requires data from the previous occasion. In line with the example given in Section 4, consider measuring mastery of ending letter sounds at four discrete time points. The manifest Markov model can be written as 32 43 Pijkl = d1i t21 jji tkjj tljk ,
[9.17]
where Pijkl is the model-based expected proportion of respondents in the defined population in cell (i, j, k, l). The subscripts, i, j, k, and l are the manifest categories for times 1, 2, 3, and 4, respectively, with i = 1, . . . I; j = 1, . . . J; k = 1, . . . K; and l = 1, . . . L. In this study, there are two categorical responses for i, j, k, and l—namely, mastery or nonmastery of ending letter sounds Thus, I = J = K = L = 2. The parameter is the observed proportion of individuals at time 1 who have or have not mastered ending letter sounds and corresponds to 32 the initial marginal distribution of the outcome. The parameters t21 jji , tkjj , and 21 43 tljk , are the transition probabilities. Specifically, the parameter tjji represents the
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 191
Structural Models for Categorical and Continuous Latent Variables—191
transition probability from time 1 to time 2 for those in category j given they were in category i at the beginning of the study. The parameter t32 kjj represents the transition probability from time 2 to time 3 for those in category k given they were in category j at the previous time point. Finally, the parameter t43 ljk is the transition probability from time 3 to time 4 for those in category 1given that they were in category k at the previous time point. The manifest Markov model can be specified to allow transition probabilities to be constant over time or to allow transition probabilities to differ over time. The former is referred to as a stationary Markov chain while the latter is referred to as a nonstationary Markov chain. Application of the Manifest Markov Model Table 9.2 presents the results of the nonstationary manifest Markov model applied to the development of competency in ending sounds.6 It can be seen that over time, the probabilities associated with moving from nonmastery of ending sounds to master of ending sounds changes. For example, at the beginning of kindergarten and the beginning of first grade, the proportions who have not mastered beginning sounds and the proportion who then go on to master ending sounds is relatively constant. However, the transition from nonmastery of ending sounds to mastery of ending sounds is much greater from the beginning of first grade to the end of first grade. Nevertheless, approximately 25% of the sample who did not master ending sounds at the beginning of first grade does not appear to have mastered ending sounds by the end of first grade. 9.4.3 THE LATENT MARKOV MODEL A disadvantage of the manifest Markov model is that it assumes that the manifest categories are perfectly reliable measures of a true latent state. In the context of the ending sounds example, this would imply that the observed categorical responses measure the true mastery/nonmastery of ending sounds. Rather, it may be more reasonable to assume that the observed responses are fallible measures of an unobservable latent state, and it is the study of transitions across true latent states that are of interest. The latent Markov model was developed by Wiggins (1973) to address the problem of measurement error in observed categorical responses and as a result, to obtain transition probabilities at the latent level. The latent Markov model can be written as Pijkl =
A X B X C X D X a=1 b=1 c =1 d=1
2 32 3 43 4 d1a r1ija t21 bja rjjb tcjb rkjc tdjc rljd ,
[9.18]
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 192
192—STRUCTURAL EQUATION MODELING Table 9.2
Results of the Nonstationary Manifest Markov Chain Model Applied to Mastery of Ending Sounds
Ending Sounds Time 1 (Rows) by Ending Sounds Time 2 (Columns)a 1
2
1
0.55
0.45
2
0.10
0.90
Ending Sounds Time 2 (Rows) by Ending Sounds Time 3 (Columns) 1
2
1
0.57
0.43
2
0.10
0.90
Ending Sounds Time 3(Rows) by Ending Sounds Time 4 (Columns) 1
2
1
0.25
0.75
2
0.03
0.97
Goodness-of-fit testsb χ2P (8 df) = 133.77, p < .05 χ2LR (8 df) = 150.23, p < .05 BIC = 13363.49 a. 1 = nonmastery, 2 = mastery. 2 b. χ2p refers to the Pearson chi-square test, χLR refers to the likelihood ratio chi-square test.
where the parameters in Equation [9.18] taken on slightly different meanings from those in Equation [9.17]. In particular, the parameter δ1a represents a latent distribution having A latent states. The linkage of the latent states to manifest responses is accomplished by the response probabilities ρ. The response probabilities thus serve a role analogous to that of factor loadings in factor analysis. Accordingly, r1ija refers to the response probability associated with category i given membership in latent state a. The parameter r2jjb is interpreted as the response probability associated with category j given membership in latent state b at time 2. Remaining response probabilities are similarly interpreted. As with the manifest Markov model, the transition from time 1 to time 2 in latent state membership is captured by t21 jji . At time 2, the latent state is measured by the response probabilities r2jjb . Remaining response and transition probabilities are analogously interpreted. Note that an examination of Equation [9.18] reveals that if the response probabilities were all 1.0 (indicating perfect measurement of the latent variable), then Equation [9.18] would essentially reduce to Equation [9.17]—the manifest Markov model.
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 193
Structural Models for Categorical and Continuous Latent Variables—193
Application of the Latent Markov Model Table 9.3 compares the transition probabilities for the manifest Markov model and the latent Markov model under the assumption of a stationary Markov chain. The results show small but noticeable differences in the transition probabilities when taking account measurement error in the manifest categorical responses. 9.4.4 LATENT TRANSITION ANALYSIS Although the application of Markov models for the analysis of psychological variables goes back to Anderson (1959; as cited in Collins & Wugalter, 1992), most applications focused on single manifest measures. However, as with the early work in the factor analysis of intelligence tests (e.g., Spearman, 1904), it was recognized that many important psychological variables are
Table 9.3
Comparison of Transition Probabilities for Manifest and Latent Markov Chain Model With Homogenous Transition Probabilities: Application to Ending Letter Sounds
Manifest Markov Chain
Latent Markov Chain
Ending Sounds Time 1 (Rows) by Ending Sounds Time 2 (Columns)a 1
2
1
2
1
0.50
0.50
1
0.47
0.53
2
0.38
0.62
2
0.38
0.62
Ending Sounds Time 2 (Rows) by Ending Sounds Time 3 (Columns) 1
2
1
2
1
0.50
0.50
1
0.47
0.53
2
0.38
0.62
2
0.38
0.62
1
2
Ending Sounds Time 3(Rows) by Ending Sounds Time 4 (Columns) 1
2
1
0.50
0.50
1
0.47
0.53
2
0.38
0.62
2
0.38
0.62
Goodness-of-fit tests χ2P (13 df) = 6946.62, p < .05 χ (13 df) = 6169.320, p < .05 2 LR
BIC = 19341.68 a. 1 = nonmastery, 2 = mastery.
7040.50, p < .05 6299.62, p < .05 19471.99, p < .05
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 194
194—STRUCTURAL EQUATION MODELING
latent—in the sense of not being directly observed but possibly measured by numerous manifest indicators. The advantages to measuring multiple latent variables via multiple indicators are the known benefits with regard to reliability and validity. Therefore, it might be more realistic to specify multiple manifest categorical indicators of the categorical latent variable and combine them with Markov chain models. The combination of multiple indicator latent class models and Markov chain models provides the foundation for the latent transition analysis of stagesequential dynamic latent variables. In line with Collins and Flaherty (2002), consider the current reading example where the data provide information on the mastery of five different skills. At any given point in time, a child has mastered or not mastered one or more of these skills. It is reasonable in this example to postulate a model that specifies that these reading skills are related in such a way that mastery of a later skill implies mastery of all preceding skills. At each time point, the child’s latent class membership defines his or her latent status. The model specifies a particular type of change over time in latent status. This is defined by Collins and Flaherty (2002) as a “model of stage-sequential development, and the skill acquisition process is a stage-sequential dynamic latent variable” (p. 289). It is important to point out that there is no fundamental difference between latent transition analysis and latent Markov chain modeling. The difference is practical, with latent transition analysis being perhaps better suited conceptually for the study of change in developmental status. The model form for latent transition analysis uses Equation [9.18] except that model estimation is undertaken with multiple indicators of the latent categorical variable. The appropriate measurement model for categorical latent variables is the latent class model. Application of Latent Transition Analysis Using all five of the subtests of the reading assessment in ECLS-K, this section demonstrates a latent transition analysis. It should be noted that a specific form of the latent transition model was estimated—namely, a model that assumes no forgetting or loss of previous skills. This type of model is referred to as a longitudinal Guttman process and was used in a detailed study of stage sequential reading development by Kaplan and Walpole (2005). A close inspection of the changes over time in class proportions shown in Table 9.1 points to transition over time in the proportions who master more advanced reading skills. However, these separate latent class models do not provide simultaneous estimation of the transition probabilities, which are crucial for a study of stage-sequential development over time.
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 195
Structural Models for Categorical and Continuous Latent Variables—195
In Table 9.4, the results of the latent transition probabilities for the full latent transition model are provided. On the basis of the latent transition analysis, we see that for those in the LAK class at Fall kindergarten, 30% are predicted to remain in the LAK class, while 69% are predicted to move to the EWR class and 1% are predicted to transition to ERC in Spring kindergarten. Among those in the EWR class at Fall kindergarten, 66% are predicted to remain in that class, and 34% of the children are predicted to transition to the ERC class in Spring kindergarten. Among those children who are in the LAK class at Spring Kindergarten, 59% are predicted to remain in that class at Fall of first grade, while 40% are predicted to transition to the EWR class, with 1% predicted to transition to the ERC class. Among those children who are in the EWR class in Fall kindergarten, 82% are predicted to stay in the EWR class while 18% are predicted to transition to the ERC class. Finally, among those children who are in the LAK class in Fall of first grade, 30% are predicted to remain in that class at Spring of first grade, while 48% are predicted to transition to the EWR class by Spring of first grade, with 22% Table 9.4
Transition Probabilities From Fall Kindergarten to Spring First Grade
Wave
LAKa
Fall K
EWR
ERC
Spring K
LAK
0.30
0.69
0.01
EWR
0.00
0.66
0.34
ERC
0.00
0.00
1.00
Spring K
Fall First
LAK
0.59
0.40
0.01
EWR
0.00
0.82
0.18
ERC
0.00
0.00
1.00
Fall First
Spring First
LAK
0.30
0.48
0.22
EWR
0.01
0.13
0.86
ERC
0.00
0.00
1.00
Goodness-of-fit tests χ2P (1048528 df) = 12384.21, p = 1.0 χ2LR (1048528 df) = 6732.31, p = 1.0 BIC = 44590.80 a. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension.
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 196
196—STRUCTURAL EQUATION MODELING
transitioning to the ERC class. Among those children in the EWR class at fall of first grade, 13% are assumed to remain in that class with 86% transitioning to the ERC class by Spring of first grade. 9.4.5 MIXTURE LATENT MARKOV MODEL (THE MOVER-STAYER MODEL) A limitation of the models described so far is that they assume that the sample of observations arises from a single population that can be characterized by a single Markov chain (latent or otherwise) and one set of parameters—albeit perhaps different for certain manifest groups such as those children living above or below poverty. It is possible, however, that the population is composed of a finite and unobserved mixture of subpopulations characterized by qualitatively different Markov chains. To the extent that the population consists of finite mixtures of subpopulations, then a “one-size-fits-all” application of the Markov model can lead to biased estimates of the parameters of the model as well as incorrect substantive conclusions regarding the nature of the developmental process in question. A reasonable strategy for addressing this problem involves combining Markov chain–based models under the assumption of a mixture distribution (see, e.g., McLachlan & Peel, 2000 for an excellent overview of finite mixture modeling). This is referred to as the mixture latent Markov model.7 An important special case of the mixture latent Markov model is referred to as the mover-stayer model (Blumen, Kogan, & McCarthy, 1955). In the mover-stayer model, there exists a latent class of individuals who transition across stages over time (movers) and a latent class that does not transition across stages (stayers). In the context of reading development, the stayers are those who never move beyond, say, mastery of letter recognition. Variants of the mover-stayer models have been considered by Van de Pol and Langeheine (1989; see also Mooijaart, 1998). The mixture latent Markov model can be written as Pijkl =
S X A X B X C X D X s=1 a=1 b=1 c =1 d =1
2 32 3 43 4 ps d1ajs r1ijas t21 bjas rjjbs tcjjbs rkjcs tljks rljds ,
[9.19]
where πs represents the proportion of observations in Markov chain s (= 1, 2, . . . , S), and the remaining parameters are interpreted as in Equation [9.18], with the exception that they are conditioned on membership in Markov chain s. The model in Equation [9.19] is the most general of those considered in this article with the preceding models being derived as special cases. For example, with s = 1, Equation [9.19] reduces to the latent Markov model in Equation [9.18]. Also, with s = 1, and no transition probabilities, the model in Equation [9.19] reduces to the latent class model of Equation [9.13].
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 197
Structural Models for Categorical and Continuous Latent Variables—197
Application of the Mover-Stayer Model For this example, we estimate the full latent transition analysis model with the addition of a latent class variable that is hypothesized to segment the sample into those who do transition over time in their development of more complex reading skills (movers) versus those that do not transition at all (stayers). The results of the mover-stayer latent transition analysis are given in Table 9.5. In this analysis, it is assumed that the stayer class has zero probability of moving. An alternative specification can allow the “stayers” to have a probability that is not necessarily zero but different from the mover class. From the upper panel of Table 9.5, it can be seen that 97% of the sample transition across stages, with 71% of the movers beginning their transitions to full literacy from the LAK status, 26% beginning EWR status, and 2% already in the ERC status. The stayers represent only 3% of the sample, corresponding to approximately 90 children. These children are in the low alphabet knowledge class and are not predicted to move. The lower panel of Table 9.5 gives the transition probabilities for the whole sample. In many cases, it would be necessary to compute the transition probabilities separately for the movers, but because all the stayers are in the LAK class, they do not contribute to the transition probabilities for the movers. The slight differences between the mover transition probabilities compared with the transition probabilities in Table 9.4 are due to the fact that 3% of the sample is in the stayer class. Finally, it may be interesting to note that based on a comparison of the BICs the results of the mover-stayer specification provides a better fit to the manifest response frequencies than the latent transition analysis model in Table 9.4. However, the discrepancy between the likelihood ratio chi-square and Pearson chi-square is, again, indicative of sparse cells and would need to be inspected closely.
9.5 Models for Categorical and Continuous Latent Variables Having introduced the topic of categorical latent variables, we can now move to models that combine categorical and continuous latent variables. The basic idea here, as before, is that a population might be composed of finite mixtures of subpopulations characterized by their own unique parameters, but where the parameters are those of models based on continuous latent variables—such as factor analysis and structural equation models. For this section, we focus on finite mixture modeling applied to growth curve modeling because growth curve modeling encompasses many special cases, including factor analysis, structural equation modeling, and MIMIC modeling.
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 198
198—STRUCTURAL EQUATION MODELING Table 9.5
Transition Probabilities for the Mover-Stayer Model: Total Sample
Movers and Stayers (Rows) by Time 1 Classes (Columns)
Proportion of Total Sample
LAK
EWR
ERC
Movers
0.71
0.26
0.02
0.97
Stayers
1.00
0.00
0.00
0.03
Results for Movers Fall K Classes (Rows) by Spring K Classes (Columns) LAK
EWR
ERC
LAK
0.34
0.65
0.01
EWR
0.00
0.62
0.38
ERC
0.00
0.00
1.00
Spring K Classes (Rows) by Fall First Classes (Columns) LAK
EWR
ERC
LAK
0.61
0.39
0.00
EWR
0.00
0.84
0.16
ERC
0.00
0.00
1.00
Fall First Classes (Rows) by Spring First Classes (Columns) LAK
EWR
ERC
LAK
0.22
0.55
0.23
EWR
0.01
0.12
0.87
ERC
0.00
0.00
1.00
Goodness-of-fit tests χ2P (1048517 df) = 10004.46, p = 1.0 χ2LR (1048517 df) = 5522.87, p = 1.0 BIC = 43397.29 a. LAK = low alphabet knowledge, EWR = early word reading, ERC = early reading comprehension.
9.5.1 GENERAL GROWTH MIXTURE MODELING Conventional growth curve modeling and its extensions were discussed in Chapter 8. The power of conventional growth curve modeling notwithstanding, a fundamental constraint of the method is that it assumes that the manifest growth trajectories are a sample from a single finite population of trajectories characterized by a single average level parameter and a single average growth rate. However, it may be the case that the sample is derived from a mixture of populations, each having its own unique growth trajectory. For
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
9:36 PM
Page 199
Structural Models for Categorical and Continuous Latent Variables—199
Figure 9.1
Sample of 100 Empirical Growth Trajectories
Fall Third
Spring First
Fall First
Spring K
Fall K
Ecls Scale Scores
example, children may be sampled from populations exhibiting very different classes of math development—some children may have very rapid rates of growth in math that level off quickly, others may show normative rates of growth, while still others may show very slow or problematic rates of growth. An inspection of Figure 9.1 reveals heterogeneity in the shapes of the growth curves for a sample of 100 children who participated in the Early Childhood Longitudinal Study. If such distinct growth functions are actually present in the data, then conventional growth curve modeling applied to a mixture of populations will ignore this heterogeneity in growth functions and result in biased estimates of growth. Therefore, it may be preferable to relax the assumption of a single population of growth and allow for the possibility that the population is composed of mixtures of distinct growth trajectory shapes. Growth mixture modeling begins by unifying conventional growth curve modeling with latent class analysis (e.g., Clogg, 1995) under the assumption that there exists a finite mixture of populations defined by unique trajectory classes. An extension of latent class analysis sets the foundation for growth mixture modeling. Specifically, latent class analysis can be applied to repeated measures at different time points. This is referred to as latent class growth analysis (see, e.g., B. Muthén, 2001; Nagin, 1999). As with latent class analysis, latent class growth analysis assumes homogenous growth within classes. Growth mixture modeling relaxes the assumption of homogeneous growth within classes and is capable of capturing two significant forms of heterogeneity. The first form of heterogeneity is captured by individual differences in growth through the specification of the conventional growth curve model. The second form of heterogeneity is more basic—representing heterogeneity in classes of growth trajectories.
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 200
200—STRUCTURAL EQUATION MODELING
9.5.2 SPECIFICATION OF THE GROWTH MIXTURE MODEL The growth mixture model is similar to that given for the conventional growth curve model. The difference lies in allowing there to be different growth trajectories for different classes. Thus, in line with Equations 8.5 and 8.6, we can represent the presence of trajectory classes as yi = ν + Ληi + Kxi + εi
[9.20]
ηi = αc + Bc ηi + Γc xi + ζi ,
[9.21]
and
where the subscript c represents trajectory class (c = 1, 2, . . . , C). The advantage to using growth mixture modeling lies in the ability to characterize across-class differences in the shape of the growth trajectories. Assuming that the time scores are constant across the classes, the different reading trajectory shapes are captured in αc. Relationships among growth parameters contained in Bc are also allowed to be class-specific. The modeling framework is flexible enough to allow differences in measurement error variances (Θ) and structural disturbance variances (Ψ = Var(ζ)) across classes as well. Finally, of relevance to this chapter, the different classes can show different relationships to a set of covariates x. In the context of our example, Equation [9.21] allows one to test whether poverty level has a differential effect on growth depending on the shape of the growth trajectories Again, one might hypothesize that there is a small difference between poverty levels for children with normative or above average rates of growth in math, but that poverty has a strong positive effect for those children who show below normal rates of growth in math. Application of Growth Mixture Modeling The results of the conventional growth curve modeling provide initial information for assessing whether there are substantively meaningful growth mixture classes. To begin, the conventional growth curve model can be considered a growth mixture model with only one mixture class. From here, we specified two, three, and four mixture classes. We used three criteria to judge the number of classes that we decided to retain. The first criterion was the proportion of ECLS-K children assigned to the mixture classes. The second criterion was BIC, which was used to assess whether the extraction of additional latent classes improved the fit of the model. The third criterion was the adequacy of classification using the average posterior probabilities of classification. On the basis of these three criteria, and noting that the specification of
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 201
Structural Models for Categorical and Continuous Latent Variables—201
the model did not include the covariate of poverty level, we settled on retaining three growth mixture classes. A plot of the three classes can be found in Figure 9.2. From Table 9.6 and Figure 9.2, we label the first latent class, consisting of 35.5% of our samples, as “below average developers.” Students in this class evidenced a spring kindergarten mean math achievement score of 23.201, a linear growth rate of 1.317, and a de-acceleration in growth of .005. We labeled the second latent class, comprising of 58.3% of our sample, as “average developers.” Students in this class evidenced a spring kindergarten mean math achievement score of 33.646, a linear growth rate of 1.890, and a de-acceleration of .006. Finally, we labeled the third latent class, consisting of 35.5% of our sample as, “above average developers.” Students in this class evidenced a spring kindergarten mean math achievement score of 54.308, a linear growth rate of 1.988, and a de-acceleration of −.016. When poverty level was added into the growth mixture model, three latent classes were again identified.8 The above average developer class started significantly above their peers and continued to grow at a rate higher than the rest of their peers. Interestingly, the above average achiever group was composed entirely of students living above the poverty line. The average achiever group 120
Math IRT Score
100
80
60
40
20
0 Fall K
Spring K
Fall 1st
Spring 1st
Times of Assessment Average Developers Above Average Developers Below Average Developers
Figure 9.2
The Three-Class Growth Mixture Model
Spring 3rd
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 202
202—STRUCTURAL EQUATION MODELING Table 9.6
Results of Three-Class Growth Mixture Model Class 1
Class 2
Class 3
Coefficient
Model 1
Model 2
Model 1
Model 2
Model 1
Model 2
Intercept (I)
23.201
24.968
33.646
34.943
54.308
56.081
Linear slope (S)
1.317
1.365
1.890
1.912
1.988
1.989
Quadratic (Q)
−0.005
−0.006
−0.006
−0.007
−0.016
−0.017
I on below poverty
−4.513
−10.418
S on below poverty
−0.194
−0.434
Q on below poverty
−24.376 −0.129 ∗
0.002
0.002
0.012
∗
Not statistically significant.
was composed of both students who lived above and below the poverty line. The below average achiever group was composed disproportionately of below poverty students but did contain some above poverty students. A plot of the three-class solution with poverty added to the model can be found in Figure 9.3. 120
Math IRT Score
100 80 60 40 20 0 Fall K
Spring K
Fall 1st
Spring 1st
Spring 3rd
Time of Assessment Above Poverty/Average Below Poverty/Below Average Above Poverty/Above Average
Figure 9.3
Below Poverty/Average Above Poverty/Below Average Below Poverty/Below Average
The Three-Class Growth Mixture Model With Poverty Status Added
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 203
Structural Models for Categorical and Continuous Latent Variables—203
The posterior probabilities of classification without and with poverty added to the model can be found in Tables 9.7 and 9.8, respectively. We observe that students who should be classified above average achievers had a .882 probability of being correctly classified as below average developers. Students in the average developer class had a .855 probability of being correctly classified by the model as average achievers. Finally, students in the above average class had a .861 probability of being correctly classified by the model into the below average class. The posterior probabilities do not change dramatically with the addition of poverty to the model, as seen in Table 9.8.
Table 9.7
Average Posterior Probabilities for the Three-Class Solution for Baseline Model Class 1
Class 2
Class 3
Class 1
0.882
0.027
0.090
Class 2
0.138
0.855
0.007
Class 3
0.138
0.001
0.861
NOTE: Class 1 = average developing; Class 2 = above average; Class 3 = below average.
Table 9.8
Average Posterior Probabilities for the Three-Class Solution With Poverty Status Included Class 1
Class 2
Class 3
Class 1
0.858
0.041
0.101
Class 2
0.155
0.826
0.019
Class 3
0.191
0.008
0.801
NOTE: Class 1 = average developing; Class 2 = above average; Class 3 = below average.
9.6 Conclusion This chapter provided an overview of models for categorical latent variables and the combination of categorical and continuous latent variables. Methodologies that were reviewed in this section included latent class modeling, manifest and latent Markov modeling, latent transition analysis, and mixture latent transition analysis (the mover-stayer model). In the context of combining continuous and categorical latent variables, we focused on growth mixture modeling.
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 204
204—STRUCTURAL EQUATION MODELING
The general framework that underlies these methodologies recognizes the possibility of population heterogeneity arising from finite mixture distributions. In the case of the mover-stayer model, the heterogeneity manifests itself in a subpopulation of individuals who do not exhibit any stage transition over time. In the case of growth mixture modeling, the heterogeneity manifests itself as subpopulations exhibiting qualitatively different growth trajectories. As we noted in the introduction to this chapter, the general framework developed by Muthén and his colleagues is quite flexible, and covering every conceivable special case of the general framework is simply not practical. Suffice to say here that the general framework can be applied to all of the models discussed prior to this chapter—including mixture factor analysis, mixture structural equation modeling in single and multiple groups, mixture MIMIC modeling, and perhaps most interestingly, mixture multilevel structural equation modeling. This latter methodology allows for heterogeneity in the parameters of multilevel models. An application to education would allow models for students nested in schools to exhibit unobserved heterogeneity that might be explained by unique student and school characteristics. Still another powerful application of the general framework focuses on estimating causal effects in experimental studies—the so-called complier average causal effects (CACE) method (see, e.g., Jo & Muthén, 2001). For example, in a field experiment of an educational intervention, not all individuals who receive the experimental intervention will comply with the protocol. Standard approaches analyze the treatment and control groups via an intent-to-treat analysis, essentially ignoring noncompliance. The result of such an approach can, in principle, bias the treatment effect downward. A viable alternative would be compare the treatment compliers to those in the control group who would have complied had they received the treatment. However, this latter group is unobserved. The CACE approach under the general framework uses finite mixture modeling and information about treatment compliers to form a latent class of potential complier, and forms the experimental comparison between these two groups.9 While certainly not exhaustive, it is hoped that this chapter provides the reader with a taste the modeling possibilities that the general framework allows. The models in this chapter scratch only the surface of what has been described as “second-generation” structural equation modeling.
Notes 1. From here on, we will use the term “class” to refer to components of the mixture model. The term is not to be confused with latent classes (e.g., Clogg, 1995) although finite mixture modeling can be used to obtain latent classes (McLachlan & Peel, 2000).
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 205
Structural Models for Categorical and Continuous Latent Variables—205 2. Note that latent class models can handle polytomously scored items. 3. For dichotomous items, it is only necessary to present the value of one latent class indicator. 4. Methods for assessing latent class membership over time are discussed in Section 10.4. 5. The sampling design of ECLS-K included a 27% subsample of the total sample at Fall of first grade to reduce the cost burden of following the entire sample for four waves but to allow for the study of summer learning loss (NCES, 2001). 6. A nonstationary Markov model is one that allows heterogeneous transition probabilities over time. In contrast, stationary Markov models assume homogeneous transition probabilities over time. 7. It should be noted that finite mixture modeling has been applied to continuous growth curve models under the name general growth mixture models (B. Muthén, 2004). These models have been applied to problems in the development of reading competencies (Kaplan, 2002), and math competencies (Jordan, Kaplan, Nabors-Olah, & Locuniak, 2006 ). 8. It is sometimes the case that adding covariates can change the number of mixture classes. See Kaplan (2002) for an example of this problem in the context of reading achievement. 9. This is an admittedly simple explanation. The CACE approach makes very important assumptions—including random assignment, and stable unit treatment value (Jo & Muthén, 2001).
09-Kaplan-45677:09-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 206
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 207
10 Epilogue Toward a New Approach to the Practice of Structural Equation Modeling
Methodology is a frustrating and rewarding area in which to work. Just as there is no best way to listen to a Tchaikovsky symphony, or to write a book, or to raise a child, there is no best way to investigate social reality. Yet methodology has a role to play in all of this. By showing that science is not the objective, rigorous, intellectual endeavor it was once thought to be, and by demonstrating that this need not lead to anarchy, that critical discourse still has a place, the hope is held out that a true picture of the strengths and limitations of scientific practice will emerge. And with luck, this insight may lead to a better and certainly more honest, science. —Caldwell (1982), as cited in Spanos, (1986)
The only immediate utility of all sciences is to teach us how to control and regulate future events by their causes. —Hume (1739)
A
s stated in the Preface, one goal of this book was to provide the reader with an understanding of the foundations of structural equation modeling and hopefully to stimulate the use of the methodology through examples that show how structural modeling can illuminate our understanding of social reality—with problems in the field of education serving as motivating examples. At this point, we revisit the question of whether structural equation 207
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 208
208—STRUCTURAL EQUATION MODELING
modeling can illuminate our understanding of social reality. I argue in this chapter that the answer to this question rests not so much on the specific statistical details of the method, but rather on the approach taken to the application of the method. However, as we will see, the approach taken to the application of the method is intimately connected to the statistical underpinnings of the method itself. Taking the position that the application of the method, and not the method itself, is linked to what we can learn about social reality, this chapter reconsiders the conventional approach to structural equation modeling as represented in most textbooks and substantive applications wherein structural modeling has been employed. The conventional approach to structural equation modeling is considered in light of recent work in the practice of econometric methodology—particularly simultaneous equation modeling. It is not the intention of this chapter to argue that the econometric approach is the “gold standard” of structural equation modeling practice in the social sciences. Rather, the purpose of this chapter is to examine an alternative formulation of modeling practice in econometrics and to argue that the current discourse on econometric practice may have value when considered in light of the conventional practice of structural equation modeling found in other social sciences. In doing so, one goal of this chapter is to remind the reader of the econometric history underlying structural equation modeling and to outline how that history might have influenced the history of the methodology in the other social sciences. In addition to outlining an alternative approach to the practice of structural equation modeling, I argue that developments in our understanding of causal inference in the social and behavioral sciences must be brought into current practice to exploit the utility of structural equation modeling. These developments include recent thinking on the counterfactual theory and related manipulationist theory of causation. The organization of this chapter is as follows. In the next section, we summarize the conventional practice of structural equation modeling to set the framework for the ensuing critique. This is followed by a sketch of the so-called “textbook” practice of simultaneous equation modeling in econometrics. Following this, we outline of the history and components of an alternative methodology proposed by Spanos (1986, 1990, 1995) referred to as the probabilistic reduction approach. Following the outline of Spanos’s methodology, we discuss the implications of the probabilistic reduction approach to the practice of structural equation modeling. The chapter then turns to the problem of causal inference. Here, we focus attention on philosophical and methodological work on the counterfactual and manipulationist theories of causal inference that has informed econometric practice and may be useful to the practice of structural equation modeling in the other social science disciplines. Finally, we close with a summary.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 209
Epilogue—209
10.1 Revisiting the Conventional Approach to Structural Equation Modeling The conventional approach to structural equation modeling was represented in Chapter 1. Throughout this book, reference was made to how various statistical and nonstatistical techniques within structural equation modeling were used in conventional practice. The conventional approach can be reiterated as follows. First, the investigator postulates a theoretical framework to set the stage for the specification of the model. In some cases, attempts are made to relate the theoretical framework directly to the specification of the model as typically portrayed in a path diagram. It is common to find an implicitly articulated one-to-one relationship between the theory and the path diagram— implying that the theory and the diagram correspond to each other up to the inclusion of disturbance terms. Next, a set of measures are selected to be incorporated into the model. In cases where multiple measures of hypothesized underlying constructs are desired, investigators may digress into a study of the measurement properties of the data before incorporating the variables into a full latent variable model. It can be inferred from a reading of the extant literature that there is very close relationship assumed between the theoretical variables and the empirical latent variables. In the next phase, the specified model as portrayed in the diagram is estimated. Rarely is the choice of the estimator based on an explicit assessment of its underlying assumptions. Even if such a thorough assessment of the assumptions were made, in many cases, analysts are limited in their choice of estimators due to such real constraints as sample size requirements. In other words, investigators may very well understand the limitations of, say, maximum likelihood estimation to categorical and other nonnormal variables, but the sample size requirements for successful implementation of, say, weighted least squares estimators may be prohibitive.1 After the model parameters have been estimated, the fit of the model is almost always assessed. It is quite common to find the presentation of alternative fit indices alongside the standard likelihood ratio chi-square statistic. These indices are presented despite the fact that they are based on conceptually different notions of model fit. For example, displaying the likelihood ratio chisquare test of exact fit with the nonnormed fit index which assesses fit against a baseline model of independence is conceptually dubious insofar as the “alternative hypotheses” being evaluated are entirely different. As we noted in earlier chapters, it is often the case that a model is determined not to fit the data on a number of criteria. The lack of model fit could be the result of the violation of one or more of the assumptions underlying the chosen estimator. But regardless of the reasons for model misfit, the conventional approach to structural equation modeling takes the next step of model
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 210
210—STRUCTURAL EQUATION MODELING
modification. The modification of the model typically proceeds using the modification index in conjunction with the expected change statistic. By necessity, post hoc model modification is typically supplemented with post hoc justification of how the modification fits into the theoretical framework. In any case, at some point in the cycle, model modification stops. Once the model is deemed to fit the data, it is common to relate the findings back to the original substantive question being posed. However, the results of the model are often related back to the original question in a cursory manner. Seldom is it the case that specific parameter estimates are directly interpreted. Nor do we find a discussion of how the parameter estimates, their signs, and statistical significance support theoretical propositions. Rarer still do we find examples of comparisons of structural models representing different theoretical positions, with models being selected on the basis of, say, the Akaike information criterion statistic.2 Finally, it is rarely the case that models are used for policy or clinically relevant prediction studies. To summarize, the conventional approach to structural equation modeling in the social sciences can be described in five steps: (1) a model is specified and considered to be a relatively close instantiation of a theory, (2) measures are gathered, (3) the model is estimated, (4) then typically modified, and finally (5) the results are related back to the original question. Interestingly, the approach to structural equation modeling in the social sciences parallels the conventional approach to econometric modeling described by Pagan (1984), who wrote Four steps almost completely describe it: a model is postulated, data gathered, a regression run, some t-statistics or simulation performance provided and another empirical regularity was forged.
Next, we outline the history that led to the conventional approach to econometric practice characterized by Pagan to serve as a comparison the conventional practice of structural equation modeling in the social sciences.
10.2 The Conventional Approach to Econometric Practice In his historical account of econometric practice, Spanos (1989) argues that the Harvard monograph by Haavelmo (1944) formally launched econometrics as a distinct discipline. Moreover, Spanos laments the fact that this monograph, although heavily cited, was rarely read and that there were many key aspects of the work that have been neglected in practice. Neglect of these key aspects of Haavelmo’s work may have contributed to the conventional practice of econometric modeling and the difficulties it generated.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 211
Epilogue—211
A central aspect of Haavelmo’s approach was the notion of the joint distribution of the process underlying the available data as being of most importance to identification, estimation, and hypothesis testing. The joint distribution of the observed random variables over the time period of collection is referred to by Spanos (1989) as the Haavelmo distribution. We consider the Haavelmo distribution in more detail in Section 10.3. The second aspect of Haavelmo’s contribution, which was arguably ignored in the conventional practice of econometrics, concerned the notion of statistical adequacy. Statistical adequacy was a concept introduced by R. A. Fisher and brought to econometrics by Koopmans (1950) and is a property of a statistical model applied to the observed data when the underlying assumptions of the model are met. In cases where a statistical model is not statistically adequate, inferences drawn from the statistical model are suspect at best. Of central importance to the argument presented in this chapter is that statistical adequacy must be established before testing theoretical suppositions because the validity of these tests depends on the validity of the statistical model. A third aspect of Haavelmo’s approach concerns his view of data mining. Specifically, this issue concerns the distinction between the statistical model and the estimable econometric model used for testing specific theoretical questions of interest. The statistical model carries with it aspects of the underlying theory insofar as the theory dictates which variables to collect and, possibly, how to measure them. However, the statistical model is designed to capture the probabilistic structure of the data only and is, in an important sense, theory neutral. The relationship between the statistical model and the theoretical parameters of interest is handled by Haavelmo through the process of identification— which in Haavelmo’s methodology is intimately linked with the probabilistic structure of the observed data. The final element of Haavelmo’s methodology, which seems to have been neglected in the conventional practice of econometrics, concerns the error term. Specifically, in Haavelmo’s methodology, the statistical model is specified in consideration of the probabilistic structure of the observed random variables—not the error term. Spanos (1989) notes that this distinction separates the post-Haavelmo paradigm in econometric methodology from the preHaavelmo paradigm that rested on the Gaussian theory of errors.
10.2.1 COMPONENTS OF THE TEXTBOOK APPROACH TO ECONOMETRICS As Spanos (1989) noted, a lack of careful reading of Haavelmo resulted in what came to be called the “textbook” practice of econometrics (Spanos, 1986). The textbook practice was perhaps best exemplified by two important early
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 212
212—STRUCTURAL EQUATION MODELING
econometric textbooks: Goldberger (1964) and Johnston (1972). It is interesting to point out that Goldberger was influential in the application of structural equation modeling to social sciences other than economics. Indeed, Goldberger collaborated with the sociologist O. D. Duncan producing a classic edited volume on structural equation modeling in the social science (Goldberger & Duncan, 1972; Jöreskog, 1973). Goldberger also collaborated with Karl Jöreskog on important applications to structural equation modeling—including the MIMIC model discussed in Chapter 4 (Jöreskog & Goldberger, 1975).3 The textbook approach to econometrics as represented by Johnston’s and Goldberger’s texts incorporated aspects of Haavelmo’s probabilistic approach only through the assumed structure of the error term. Moreover, Haavelmo’s notions of obtaining a statistically adequate model did not influence the practice of simultaneous equation modeling because there was a prevailing view that the use of sample information without underlying theory was inappropriate (Spanos, 1989).4 Clearly, under this viewpoint, there is no incentive to consider the underlying probabilistic structure of the data. By default, data mining is also discouraged. The response to the textbook practice of econometrics was a series of sustained critiques from a variety of perspectives. A discussion of these critiques can be found in Spanos (1990). Suffice to say here that the critiques of the textbook practice of econometrics centered on the validity of employing experimental design reasoning to purely observational data and on the role of statistically adequate models. A specific critique offered by Spanos (1989, 1990) had its origins in the London School of Economics tradition (see, e.g., Hendry, 1983) and focused on the importance of the probabilistic structure of the data and is based on a rereading and adaptation of Haavelmo’s original contributions. This approach is described next.
10.3 The Probabilistic Reduction Approach In Chapter 1, we noted that econometric simultaneous equation modeling could not compete with Box-Jenkins time-series models in terms of predictive performance. One problem with simultaneous equation modeling centered on the distinction between dynamic and static models. However, regardless of the specific problem, econometricians were beginning to realize that simultaneous equation models were not producing the kind of reliable predictions of the behavior of the economy that the Cowles Commission had envisioned. The problem, it seemed, lied in a conventional practice of econometric modeling that deviated from what was originally intended by founders such as Haavelmo (Haavelmo, 1943, 1944; see also Spanos, 1989). The result was that from the mid-1970s to the present, there has been a sustained critique of the conventional approach to econometric modeling.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 213
Epilogue—213
I argue that one response to this critique offered by Spanos (1986, 1990, 1995) may provide an alternative to the conventional practice of structural equation modeling in the social sciences. Spanos refers to this alternative approach as the probabilistic reduction approach. 10.3.1 THE HISTORICAL BACKGROUND OF THE PROBABILISTIC REDUCTION APPROACH In the development of the probabilistic reduction approach, Spanos (1995) traces the general problem of simultaneous equation modeling to two historical paradigms in statistics: (1) Fisher’s experimental design paradigm and (2) the Gaussian theory of errors paradigm. The conventional practice of simultaneous equation modeling in econometrics resulted from a combination of the influence of these paradigms and a lack of careful reading of Haavelmo’s (1943, 1944) original work. Fisher’s Experimental Design Paradigm. In the case of Fisher’s paradigm, the experimental design represents the theory and the statistical model is chosen before the data are collected. Indeed, the correspondence between the statistical model and the experimental design as representing the theory are nearly identical, with the statistical model differing from the design by the incorporation of an error term. The major contributions of Fisher’s paradigm notwithstanding,5 the conventional approach to simultaneous equation modeling borrowed certain features of the paradigm that are problematic in light of the reality of economic and social science phenomena. Specifically, as noted by Spanos (1995) the social theory under investigation (e.g., input-process-output theory in education) replaces the experimental designer. Moreover, the theory is required to lead to a theoretical model that does not differ in any substantial way from the statistical model. In other words, adapting the Fisher paradigm to economics and social science applications of structural modeling assumes that the theory and the designer are one and the same and that the statistical model and the theoretical model as derived from the theory differ only up to the inclusion of a white-noise disturbance term. The Theory of Errors Paradigm. The theory of errors paradigm had its roots in the mathematical theory of approximation and led to the method of least squares proposed by Legendre in 1805. A probabilistic foundation was given to the least squares approach by Gauss in 1809 and developed into a “theory of errors” by Laplace in 1812. The basic idea originally proposed by Legendre was that a certain function was optimally approximated by another function via the minimization of the sum of the squared deviations about the line. The probabilistic formulation
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 214
214—STRUCTURAL EQUATION MODELING
proposed by Gauss and later Laplace was that if the errors were the result of insignificant omitted factors, then the distribution of the sum of the errors would be normal as the number of errors increased. If it could be argued that the omitted variables were essentially unrelated to the systematic part of the model, then the phenomena under study could be treated as if it were a nearly isolated system (as cited in Spanos, 1995; Stigler, 1986). Arguably, the theory of errors paradigm had a more profound influence on econometric and social science modeling than the Fisher paradigm. Specifically, the theory of errors paradigm led to a tremendous focus on statistical estimation. Indeed, a perusal of most econometric textbooks shows that the dominant discussion is typically around the choice of an estimation method. The choice of an alternative estimator, whether it be two-stage least squares, limited-information maximum likelihood, instrumental variable estimation, or generalized least squares, is the result of viewing ordinary least squares as not living up to its optimal properties in the context of real data. A Comparison of the Two Approaches. The common denominator between the Fisher paradigm and the theory of errors paradigm is the assumptions made regarding the error term. In both cases, the assumptions made regarding the error term lead to the view that the phenomenon under study exists as a nearly isolated system. Where the two traditions differ however, is in their views of redesign and data mining (Spanos, 1995). Specifically, in the Fisher paradigm it is entirely possible that an experiment can be redesigned. Moreover, given that the design is the de facto reality under study, data mining could lead to “discovering a theory in the data.” In the context of the theory of errors paradigm, the data are nonexperimental in nature and thus data mining is nonproblematic. Spanos (1995) cites the example of Kepler. Spanos writes, “Kepler’s insight was initially suggested by looking at the data and not by a theory. Indeed, the theory came much later in the form of Newton’s theory of universal gravitation” (pp. 195–196). In addition, in nonexperimental research, such “experiments” cannot be redesigned. Regardless of the similarities and differences between the Fisher and theory of errors paradigms, the conventional approach to econometric modeling, and indeed statistical modeling in the social sciences generally, adopted aspects of both. In particular, econometric modeling historically took a dim view with respect to data mining, and social science applications of structural equation modeling have been somewhat silent on this issue. As noted above, this could be the result of confusion between the theory and the experimental design that arises from the Fisher paradigm. A close look at this bias in the context of nonexperimental data leads to the conclusion that the bias is somewhat irrational. Moreover, as we will see when we outline the probabilistic reduction approach, this negative view toward data mining disappears, and instead, the activity becomes positively encouraged.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 215
Epilogue—215
10.4 Elements of the Probabilistic Reduction Approach The probabilistic reduction approach to structural equation modeling is presented in Figure 10.1. A key feature of Figure 10.1 is the separation of the theory from the actual data-generating process, or DGP. In this formulation, a theory is a conceptual construct that serves to provide an idealized description of the phenomena under study. For example, the input-process-output “model” discussed in Chapter 1, is actually a theory insofar as it describes, in entirely conceptual terms, the processes that leads to important educational outcomes.6 The constructs that make up the theory are not observable entities, nor are they latent variables derived from observable data. Yet, the theory should be articulated well enough to suggest what measures to obtain even if it does not directly suggest the scales on which they should be measured. Finally, the theory should be sufficiently detailed to allow for predictions based on a statistical model. That is, the statistical model, to be described below, should be capable of a reparameterization sufficiently detailed to allow tests of predictions suggested by the theory. 10.4.1 THE DATA-GENERATING PROCESS We next consider a very important component of the probabilistic reduction approach—namely the actual data-generating process or DGP. In the simplest terms, the DGP is the actual phenomenon that the theory is put forth to explain. In essence, the DGP corresponds to the reality that generated the observed data. It is the reference point for both the theory and the statistical model. In the former case, the theory is put forth to explain the reality under investigation—be it the cyclical behavior of the economy or the organizational structure of schooling that generates student achievement. In the latter case, the statistical model is designed to capture the systematic nature of the observed data as generated by the DGP. 10.4.2 THE THEORETICAL MODEL A theoretical model, according to Spanos, is a mathematical formulation of the theory. The theoretical model is not necessarily the statistical model with a white-noise term added. In social science applications of structural equation modeling, we tend not to see theoretical models as such. Instead, we view the statistical model with the restrictions added as somehow separate from a theoretical model. It is argued below that the restrictions placed on a statistical model, and indeed the issue of identification, implies an underlying theoretical model even if not directly referred to as such. 10.4.3 THE ESTIMABLE MODEL In some cases, the theoretical model may not be capable of being estimated. This is because the theoretical model is simply a mathematical formulation of
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 216
216—STRUCTURAL EQUATION MODELING
Theory
DGP
Theoretical Model
Observed Data
Estimable Model
Statistical Model
Estimation Misspecification Reparameterization Model Selection
Empirical Social Science Model
Figure 10.1
Diagram of the Probabilistic Reduction Approach to Structural Equation Modeling
SOURCE: Adapted from Spanos (1986).
a theory, and the latter does not always provide information regarding what can be observed or how it should be measured. One only need think of “school quality” as an important theoretical variable of the input-process-output theory to realize how many different ways such a theoretical variable can be measured. Therefore, a distinction needs to be made regarding the theoretical model and an estimable model, where the estimable model is specified with an eye toward the DGP (Spanos, 1990). As an example, let us assume the appropriateness of the input-processoutput theory. If interest centers on the measurement of school quality via a
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 217
Epilogue—217
survey of school climate, this will have bearing on the form of the estimable model as well as the form of the statistical model (to be described next). If school quality actually referred to the distribution of resources to classrooms, then clearly the estimable model will differ from the theoretical model and auxiliary measurements might need to be added. It may be interesting to note that the theoretical model and estimable model coincides when data are generated from an experimental arrangement. However, we noted that such arrangements are rare in social science applications of structural equation modeling. 10.4.4 THE STATISTICAL MODEL The statistical model describes an internally consistent set of probabilistic assumptions made about the obtained data series. As Spanos (1990) notes, the statistical model should be an adequate and convenient summary of the observed data. The term “adequate” is used in the sense that it does not exclude systematic information in the data. The term “convenient” is used to suggest that the statistical model can be used to consider aspects of the theory. To be clear, the statistical model is not a one-to-one instantiation of the theory. Rather, within the probabilistic reduction approach, the statistical model is chosen to adequately represent the probabilistic information in the data (Spanos, 1990). However, the choice of a statistical model is partly guided by theory insofar as the statistical model must be capable of being used to answer theoretical question of interest. It is in the context of our discussion of the statistical model that we may wish to revisit the issue of data mining. In the probabilistic reduction approach, the statistical model is specified to capture as much systematic probabilistic information in the data as possible. No theoretical specification is imposed. To take an example from educational research, the lack of independence among observations due to nesting of students in schools is unrelated to the number of plots or other exploratory methods used to detect it. As such, data mining in the form of plots and other methods of exploratory data analysis is not only valid but also strongly encouraged as a means of capturing the systematic information in the data. Because the notion of the statistical model is unique to the probabilistic reduction approach, it is required that we develop the concept more fully. To begin, consider the joint distribution of the data denoted as f ðy, xjθÞ. Generally, statistical models such as regression involve a reduction of the joint distribution of the observed data. Such a reduction can be written as f ðy, xjθÞ = f ðyjx; θ1 Þf ðx; θ2 Þ,
[10.1]
where the first term on the right-hand side of Equation [10.1] is the conditional distribution of the endogenous variables given the exogenous variables, and the second term on the right-hand side is the marginal distribution of the
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 218
218—STRUCTURAL EQUATION MODELING
exogenous variables. The parameter vectors θ1 and θ2 index the parameters of the conditional distribution and marginal distribution, respectively. To take an example from simple regression, the vector θ1 contains the intercept, slope, and disturbance variance parameters of the regression model, while the vector θ2 contains the mean and variance of the marginal distribution of x. Weak Exogeneity. The development of a statistically adequate model proceeds by focusing attention on the conditional distribution of y given x. However, this immediately raises the question of whether one can ignore the marginal distribution of x. This question concerns the problem of weak exogeneity (Ericsson & Irons, 1994; Richard, 1982) and represents the first and perhaps most important assumption that needs to be addressed. The problem of weak exogeneity was discussed in Chapter 5. Suffice to say that with regard to the choice of the variables in the model vis-à-vis the theory, the assumption of weak exogeneity requires serious attention. Continuing with our discussion, if we can assume weak exogeneity, then we can focus our attention on the conditional distribution f ðy, xjθ1 Þ . In the context of structural equation modeling, we may write f ðy, xjθ1 Þ as y = Πx + ζ ,
[10.2]
which we note is the reduced form specification discussed in Chapter 2 and, in fact, is the multivariate general linear model. Within the probabilistic reduction approach applied to structural equation modeling, the reduced form specification constitutes the statistical model while the structural form constitutes the theoretical model. Prior to testing restrictions implied by the theory via the structural form, it is necessary to assess the statistical adequacy of the reduced form. A Note on Identification. It may be interesting to note that the probabilistic reduction approach yields two notions of identification (Spanos, 1990). First, in the context of the statistical model, identification concerns the adequacy of the sample information for estimating the parameters of the joint distribution of the data. It could be the case that there is insufficient information in the form of colinearity that limits the estimation of the statistical model. Colinearity was not explicitly discussed in this book. For a discussion of colinearity in the context of structural equation modeling, see Kaplan (1994). Second, identification problems in the form of insufficient sample information can be distinguished from identification problems related to insufficient theoretical information—in essence whether structural parameters can be identified from reduced form parameters. However, it must be made clear that the probabilistic reduction approach does not view theoretical identification issues as separate from the statistically adequate model on which it rests.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 219
Epilogue—219
The two forms of identification are related, but distinction is useful from the view point of the probabilistic reduction approach (see Spanos, 1990). 10.4.5 THE EMPIRICAL SOCIAL SCIENCE MODEL It is important to note that the statistical model discussed in Section 10.4.4 refers to a model that captures the systematic probabilistic information in the data. Once a convenient and adequate statistical model is formulated, the empirical social science model is reparameterized for purposes of description, explanation, or prediction. The reparameterization that would be easily recognized by practitioners of structural equation modeling is in the form of restricting parameters to zero.7 In other words, after a statistical model is chosen, the next step is to restrict the model in ways suggested by theory or as a means of testing competing theories. For example, after formulating an adequate representation of the reduced form of the science achievement model, one could test a set of theoretical propositions of the sort implied by the path diagram in Figure 2.1.8 The path diagram, therefore, represents the empirical model of interest—providing a pictorial representation of the restrictions to be placed on a statistically adequate reduced form model. 10.4.6 RECAP: MODELING STEPS USING THE PROBABILISTIC REDUCTION APPROACH It is important to be clear regarding the modeling steps that are suggested by the probabilistic reduction approach and to contrast them with the conventional approach described above. The probabilistic reduction approach assumes that there exists a theory (or theories) that the investigator wishes to test. It is assumed that the theory is sufficiently detailed insofar as it is able to suggest the type of measures to be obtained. The theory is assumed to describe some actual phenomenon—referred to as the DGP. In this regard, there is no philosophical difference between the probabilistic reduction approach and the conventional approach. Assuming that a set of data has been gathered, the next step is to specify a convenient and adequate statistical model of the observed data. Such a statistically adequate model should account for all the systematic probabilistic information in the data. That is, the statistical model is developed on the joint distribution of the data. All means necessary to model the probabilistic nature of the joint distribution should be used because the statistical parameters of the joint distribution have no theoretical interpretation at this point. Indeed, the probabilistic reduction approach advocated by Spanos calls for the free use of data plots and other forms of data summary in an effort to arrive at an adequate and convenient statistical model. Note that model assumptions relate to the conditional distribution of the
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 220
220—STRUCTURAL EQUATION MODELING
data, and it may be necessary to put forth numerous statistical models until one is finally chosen. These assumptions include exogeneity, normality, linearity, homogeneity, and independence. Weak exogeneity becomes a very serious assumption at this step because evidence against weak exogeneity implies that conditional estimation is inappropriate—that is, the conditional and marginal distributions must be both taken into consideration during estimation. In any case, a violation of one or more of these assumptions requires respecification and adjustment until a statistically adequate model is obtained. The next step in the probabilistic reduction approach is to begin testing theoretical propositions of interest via parameter restrictions placed on a statistically adequate model. Note that whereas the resulting statistical model may be based on considerable data mining, this does not present a problem because the parameters of the statistical model do not have a direct interpretation relative to the theoretical parameters. However, the process of parameter restriction of the statistical model is based on theoretical suppositions and should not be data specific. Indeed, as Spanos points out, the more restrictions placed on the model, the less data-specific the theoretical/estimable model becomes. From the point of view of structural equation modeling in the social sciences, this means that we tend to favor models with many degrees of freedom. In contrast to the probabilistic reduction approach, the conventional approach typically starts with an over-identified model wherein the more overidentifying restrictions the better from a theoretical point of view. However, the process of model modification that characterizes the conventional approach becomes problematic insofar as it does not rest on a statistically adequate and convenient summary of the probabilistic structure of the data.
10.5 Structural Equation Modeling and Causal Inference In the previous section, a detailed account of an alternative to the conventional application of structural equation modeling was offered. This alternative approach to conventional structural equation modeling focuses almost entirely on the statistical features of the methodology and its common practice. Moreover, in our discussion, attention was paid to the use of the probabilistic reduction approach to improve prediction. Although prediction is critically important in the social and behavioral sciences, an equally important activity is the testing of causal propositions and developing explanations of substantive processes. It is important to contrast models used for prediction versus models used for causal inference and explanation. In the former case, it is sufficient to have used the probabilistic reduction approach to capture the covariance structure of the data. In the latter case, the logic of causal inference lies outside of the statistical analysis and requires that we examine variables with regard to their potential for manipulation and control.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 221
Epilogue—221
Historically, developers and practitioners of structural equation modeling have been reluctant to consider it as a tool for assessing causal claims. However, in what is undeniably a classic study of the problem of causality, Pearl (2000) in his book Causality: Models, Reasoning and Inference deals directly with, among many other things, the reluctance of practitioners to use structural equation modeling for warranting causal claims. Pearl noted that many of those who have been instrumental in developing structural equation modeling and propagating its use have either explicitly warned against using causal language in regards to its practice (e.g., Muthén, 1987), or have simply not discussed causality at all. However, as Pearl pointed out, the founders of structural equation modeling (especially Haavelmo, 1943; Koopmans et al., 1950; Wright, 1921, 1934) have noted that it can be used to warrant causal claims as long as we understand that certain causal assumptions must be made first.9 Haavelmo, for instance, believed that structural equations were statements about hypothetical controlled experiments. Pearl sees the elimination of causal language in structural equation modeling as arising from two distinct sets of issues. First, from the econometric end, Pearl argues that the Lucas’s (1976) critique may have led economists to avoid causal language. The Lucas critique centers on the use of econometric models for policy analysis because such models contain information that changes as a function of changes in the phenomenon under study. The following quotation from Lucas (1976; as cited in Hendry, 1995) illustrates the problem. Given that the structure of an econometric model consists of optimal decision rules for economic agents, and that optimal decision rules vary systematically with changes in the structure of the series relevant to the decision maker, it follows that any change in policy will systematically alter the structure of econometric models. (Lucas, 1976, as cited in Hendry, 1995, p. 529)
As Hendry (1995) summarizes, “a model cannot be used for policy if implementing the policy would change the model on which that policy was based, since then the outcome of the policy would not be what the model had predicted” (p. 172). From the more modern structural equation modeling perspective, Pearl argues that the reluctance to use of causal language may have been due to practitioners wanting to gain respect from the statistical community who have traditionally eschewed invoking assumptions that they deemed untestable. Finally, Pearl lays some of the blame at the feet of the founders, who, he argues, developed an algebraic language for structural equation modeling that precluded making causal assumptions explicit. Despite these concerns, a great deal of philosophical and methodological research has developed that, I argue, provides a sensible foundation for testing causal claims within the structural equation modeling context. Specifically, that foundation rests on the counterfactual model of causation. Next, I provide a brief review modern philosophical ideas and econometric theory related specifically to the counterfactual theory of causation.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 222
222—STRUCTURAL EQUATION MODELING
10.6 The Counterfactual Theory of Causation My focus on the counterfactual theory of causation and the careful formulation of model-based counterfactual claims rests on the argument that properly developed measures that are closely aligned with the data-generating mechanism provides a system for testing counterfactual claims in the context of structural equation models. The probabilistic reduction approach described earlier is, in my view, a more statistically sophisticated approach to developing measures and models that are closely aligned with the DGP than the conventional approach. The counterfactual theory of causation provides a logical overlay to the probabilistic reduction approach and can lead to a sophisticated study of causation within structural equation modeling. It should be mentioned at the outset that this section of the chapter neither covers all aspects of a theory of causation that is of relevance to structural equation modeling nor does it overview existing debates between those holding a so-called structural view of causation (e.g., Heckman, 2005) versus those holding a treatment effects view of causation (e.g., Holland, 1986). A more comprehensive review of these issues in general can be found in Kaplan (in press). Instead, this section deals with specific theories of causation that arguably hold great promise in improving the practice of structural equation modeling for advancing the social and behavioral sciences. 10.6.1 MACKIE AND THE INUS CONDITION FOR CAUSATION A great deal has been written on the counterfactual theory of causation. For the purposes of this chapter, I will focus specifically on the work of Mackie (1980) in his seminal work The Cement of the Universe as well as Hoover’s (1990, 2001) applications of Mackie’s thinking within the econometric framework. A specific extension of the counterfactual theory by Woodward (2003) which advocates a manipulationist view of causation is also discussed. I argue that these works on counterfactual propositions sets the basis for a more nuanced approach to causal inference amenable to structural equation modeling. The seminal work on the counterfactual theory of causation can be found in Lewis (1973). An excellent recent discussion can be found in Morgan and Winship (2007). To begin, Mackie (1980) situates the issue of causation in the context of a modified form of a counterfactual conditional statement—namely, if X causes Y, then this means that X occurred and Y occurred, and Y would not have occurred if X had not. This strict counterfactual proposition is challenging because there are situations were we can conceive of Y occurring if X had not.10 Thus, Mackie suggests that a counterfactual statement must be augmented by considering the circumstances or conditions under which the causal event took place—or what Mackie refers to as a causal field. To quote Mackie (1980),
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 223
Epilogue—223 What is said to be caused, then, is not just an event, but an event-in-a-certainfield, and some ’conditions’ can be set aside as not causing this-event-in-thisfield simply because they are part of the chosen field, though if a different field were chosen, in other words if a different causal question were being asked, one of those conditions might well be said to cause this-event-in-that-otherfield. (p. 35)
Contained in a casual field can be a host of factors that could qualify as causes of an event. Following Mackie (1980), let A, B, C, . . . , and so on, be a list of factors within a causal field that lead to some effect whenever some conjunction of the factors occurs. A conjunction of events may be ABC or DEF or JKL, and so on. This allows for the possibility that ABC might be a cause or DEF might be a cause, and so forth. For simplicity, assume the collection of factors is finite—namely ABC, DEF, and JKL. Each specific conjunction, such as ABC is sufficient but not necessary for the effect. In fact, following Mackie, ABC is a “minimal sufficient” condition insofar as none of its constituent parts are redundant. That is, AB is not sufficient for the effect, and A itself is neither a necessary nor sufficient condition for the effect. However, Mackie states that the single factor, in this case, A, is related to the effect in an important fashion—namely, “[I]t is an insufficient but non-redundant part of an unnecessary but sufficient condition: it will be convenient to call this . . . an inus condition” (p. 62). It may be useful to briefly examine the importance of Mackie’s work in the context of a substantive illustration. For example, in testing models that can be used to examine ways of improving reading proficiency in young children, Mackie would have us first specify the causal field or context under which the development of reading proficiency takes place. Clearly, this would be the home and schooling environments. We could envision a large number of factors that could qualify as causes of reading proficiency within this causal field. In Mackie’s analysis, the important step would be to isolate the set of conjunctions, any one of which might be minimally sufficient for improved reading proficiency. A specific conjunction might be phonemic awareness, parental support and involvement, and teacher training in early literacy instruction. This set is the minimal sufficient condition for reading proficiency in that none of the constituent parts are redundant. Any two of these three factors is not sufficient for reading proficiency and one alone—say, focusing on phonemic awareness, is neither necessary nor sufficient. However, phonemic awareness is an inus condition for reading proficiency. That is, the emphasis on phonemic awareness is insufficient as it stands, but it is also a nonredundant part of a set of unnecessary but (minimally) sufficient conditions. Mackie’s analysis, therefore, provides a framework for considering the exogenous and mediating effects in a structural equation model. Specifically, when delineating the exogenous variables and mediating variables in a structural equation model, explicit attention should be paid to the causal field
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 224
224—STRUCTURAL EQUATION MODELING
under which the causal variables are assumed to operate. This view encourages the practitioner to provide a rationale for the choice of variables in a particular model and how they might work together as a field within which a select set of causal variables operates. This exercise in providing a deep description of the causal field and the inus conditions for causation should be guided by theory and, in turn, can be used to inform and test theory. 10.6.2 CAUSAL INFERENCE AND COUNTERFACTUALS IN ECONOMETRICS Because structural equation modeling has its roots in econometrics, it is useful to examine aspects of the problem of causal inference from that disciplinary perspective. Within econometrics, an important paper that synthesized much of Mackie’s (1980) notions of inus conditions for causation was Hoover (1990). Hoover’s essential point is that causal inference is a logical problem and not a problem whose solution is to be found within a statistical model per se.11 Moreover, Hoover argues that discussions of causal inference in econometrics are essential and that we should not eschew the discussion because of its seemingly metaphysical content. Rather, as with medicine, but perhaps without the same consequences, the success or failure of economic policy might very well hinge on a logical understanding of causation. A central thesis of the present chapter is that such a logical understanding of causation is equally essential to rigorous studies in the other social and behavioral sciences that use structural equation modeling. In line with Mackie’s analysis, Hoover suggests that the requirement that a cause be necessary and sufficient is too strong, but necessity is crucial in the sense that every consequence must have a cause (Holland, 1986). As such, Hoover views the inus condition as particularly attractive to economists because it focuses attention on some aspect of the causal problem without having to be concerned directly with knowing every minimally sufficient subset of the full cause of the event. In the context the social and behavioral sciences, these ideas should also be particularly attractive. As in the aforementioned example of reading proficiency, we know that it is not possible to enumerate the full cause of reading proficiency, but we may be able isolate an inus condition—say parental involvement in reading activities. Hoover next draws out the details of the inus condition particularly as it pertains to the econometric perspective. Specifically, in considering a particular substantive problem, such as the causes of reading proficiency, we may divide the universe into antecedents that are relevant to reading proficiency, C, and those that are irrelevant, non-C. Among the relevant antecedents are those that we can divide into their disjuncts Ci and then further restrict our attention
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 225
Epilogue—225
to the conjuncts of particular inus conditions. But what of the remaining relevant causes of reading proficiency in our example? According to Mackie, they are relegated to the causal field. Hoover views the causal field as the standing conditions of the problem that are known not to change, or perhaps to be extremely stable for the purposes at hand. In Hoover’s words, they represent the “boundary conditions” of the problem. However, the causal field is much more than simply the standing conditions of a particular problem. Indeed, from the standpoint linear statistical models generally, those variables that are relegated to the causal field are part of what is typically referred to as the error term. Introducing random error into the discussion allows Mackie’s notions to be possibly relevant to indeterministic problems such as those encountered in the social and behavioral sciences. However, according to Hoover, this is only possible if the random error terms are components of Mackie’s notion of a causal field. Hoover argues that the notion of a causal field has to be expanded for Mackie’s ideas to be relevant to indeterministic problems. In the first instance, certain parameters of a causal process may not, in fact, be constant. If parameters of a causal question were truly constant, then they can be relegated to the causal field. Parameters that are mostly stable over time can also be relegated to the causal field, but should they in fact change, the consequences for the problem at hand may be profound. In Hoover’s analysis, these parameters are part of the boundary conditions of the problem. Hoover argues that most interventions are defined within certain, presumably constant, boundary conditions— although this may be questionable outside of economics. In addition to parameters, there are also variables that are not of our immediate concern and thus part of the causal field. Random errors, in Hoover’s analysis, contain the variables omitted from the problem and are “impounded” in the causal field. “The causal field is a background of standing conditions and, within the boundaries of validity claimed for the causal relation, must be invariant to exercises of controlling the consequent by means of the particular causal relation (INUS condition) of interest” (Hoover, 2001, p. 222). Hoover points out that for the inus condition to be a sophisticated approach to the problem of causal inference, the antecedents must truly be antecedent. Frequently, this requirement is presumed to be met by appealing to temporal priority. But the assumption of temporal priority is often unsatisfactory. Hoover gives the example of laying one’s head on a pillow and the resulting indentation in the pillow as an example of the problem of simultaneity and temporal priority.12 Mackie, however, sees the issue somewhat more simply— namely the antecedent must be directly controllable. This focus on direct controllability is an important feature Woodward’s (2003) manipulability theory of causation described next.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 226
226—STRUCTURAL EQUATION MODELING
10.7 A Manipulationist Account of Causation Within Structural Equation Modeling A very important discussion of the problem of manipulability was given by Woodward (2003) who directly dealt with causal interpretation in structural equation modeling. First, Woodward considers the difference between descriptive knowledge versus explanatory knowledge. While not demeaning the usefulness of description for purposes of classification and prediction, Woodward is clear that his focus is on causal explanation. For Woodward, a causal explanation is an explanation that provides information for purposes of manipulation and control. To quote Woodward, my idea is that one ought to be able to associate with any successful explanation a hypothetical or counterfactual experiment that shows us that and how manipulation of the factors mentioned in the explanation . . . would be a way of manipulating or altering the phenomenon explained . . . Put in still another way, an explanation ought to be such that it can be used to answer what I call the what-if-things-had-been-different question . . . (p. 11)
We clearly see the importance of the counterfactual proposition in the context of Woodward’s manipulability theory of causation. However, unlike Mackie’s analysis of the counterfactual theory, Woodward goes a step further by linking the counterfactuals to interventions. For Woodward, the types of counterfactual propositions that matter are those that suggest how one variable would change under an intervention that changes another variable. 10.7.1 INVARIANCE AND MODULARITY A key aspect of Woodward’s theory is the notion of invariance. Specifically, it is crucial to the idea of a causal generalization regarding the relationship between two variables (say X and Y) that the relationship remains invariant after an intervention on X. According to Woodward, a necessary and sufficient condition for a generalization to describe a causal relationship is that it be invariant under some appropriate set of interventions. This is central for Woodward insofar as invariance under interventions is what distinguishes causal explanations from accidental association. It should be briefly noted that a stronger version of invariance is super-exogeneity, which links the statistical concept of weak exogeneity to the problem of invariance (Ericsson & Irons, 1994; Kaplan, 2004). With regard to causal processes represented by systems of structural equations, another vital issue to the manipulability theory of causation is that of modularity (Hausman & Woodward, 1999, 2004). Quoting from Woodward (2003),
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 227
Epilogue—227 A system of equation is modular if (i) each equation is level invariant under some range of interventions and (ii) for each equation there is a possible intervention on the dependent variable that changes only that equation while the other equations in the system remain unchanged and level invariant. (p. 129)
In the above quote, level-invariance refers to invariance within equations, while modularity refers generally to invariance between equations, so-called equation invariance. In the context of structural equation modeling, level invariance and modularity require very careful consideration. The distinction between the two concepts expands the notion of how counterfactual propositions can be examined. Level-invariance concerns a type of local counterfactual proposition—local in the sense that it refers to invariance to interventions within a particular equation. In other words, the truth of the counterfactual proposition is localized to that particular equation. Modularity, on the other hand, concerns invariance in one equation given interventions occurring in other equations in the system. In the context of the social and behavioral sciences, modularity is, arguably, a more heroic and more serious assumption. For a general critique of modularity, see Cartwright (2007).
10.7.2 OBSERVATIONALLY EQUIVALENT MODELS A particularly troublesome issue related to level-invariance and modularity concerns the relationship between the reduced form and structural form of a structural model and the attendant issue of observationally equivalent models. We saw in Chapter 2 that the reduced form of a model (essentially equivalent to multivariate regression) can be used to obtain structural parameters provided the parameters are identified. Following Woodward (2003), consider the following structural model y = bx + u,
[10.3]
z = gx + ly + v:
[10.4]
The reduced form of this model can be written as y = bx + u,
[10.5]
z = px + w,
[10.6]
where p = g + bl , and w = lu + v . The problem is that for just-identified models, the reduced form solution provides exactly the same information about the pattern of covariances as the structural form solution. As Woodward
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 228
228—STRUCTURAL EQUATION MODELING
points out, although these two sets of equations yield observationally equivalent information, they are distinct causal representations. To see this, note that Equations [10.3] and [10.4] say that x is a direct cause of y and x and y are direct causes of z. But, Equations [10.5] and [10.6] say that x is a direct cause of y and z and says nothing about y being a direct cause of z. If Equations [10.3] and [10.4] represent the true causal system and is assumed to be modular in Woodward’s sense, then Equations [10.4] and [10.5] cannot be modular. For example, if y is fixed to a particular value due to intervention, then this implies that β = 0. Nevertheless, despite this intervention, Equation [10.4] will continue to hold. In contrast, given modularity of Equations [10.3] and [10.4], we see that Equation [10.5] will change because π is a function of β. We see then, that the structural form and reduced form are distinct causal systems, and although they provide identical observational information as well as inform the problem of identification, they do not provide identical causal information. Moreover, given that numerous equivalent models can be formed, the criterion for choosing among them, according to Woodward, is that the model satisfies modularity, because that will be the model that fully represents the causal mechanism and set of relationships (Woodward, 2003, p. 332). In what sense does the manipulability theory of causation inform modeling practice? For Woodward (2003), the problem is that the model possessing the property of modularity cannot be unambiguously determined from among competing observationally equivalent models. Only the facts about causal processes can determine this. For Woodward therefore, the prescription for modeling practice is that researchers should theorize distinct causal mechanisms and hypothesize what would transpire under hypothetical interventions. This information is then mapped into a system of equations wherein each equation represents a clearly specified and distinct causal mechanism. The right-hand side in any given equation contains those variables on which interventions would change the variables on the left-hand side. And, although different systems of equations may be mathematically equivalent, this is only a problem if we are postulating relatively simple associations. As Pearl (2000) points out, mathematically equivalent models are not syntactically equivalent when considered in light of hypothetical interventions. That is, each equation in a system of equations should “encode” counterfactual information necessary for considering hypothetical interventions (Pearl, 2000; Woodward, 2003). 10.7.3 PEARL’S INTERVENTIONAL INTERPRETATION OF STRUCTURAL EQUATION MODELING Although we have focused mainly on Woodward’s treatment of structural equation modeling, it should also be pointed out that Pearl (2000), among other things, offered an interventionist interpretation of structural equation modeling. Briefly, Pearl notes that in practice researchers will often imbue
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 229
Epilogue—229
structural parameters with more meaning than they would covariances or other statistical parameters. For example, the interpretation of a purely mediating model y = bz + u,
[10.7]
z = gx + v
[10.8]
is interpreted quite differently from the case where we also allow x to directly influence y—that is, y = bz + lx + u,
[10.9]
z = gx + v:
[10.10]
In the purely mediating model given in Equations [10.7] and [10.8], the effect of intervening on x is to change y by βγ. In the model in Equations [10.9] and [10.10], the effect of intervening on x is to change y by bg + l . The difference between the interpretations of these two models is not trivial. They represent important causal information regarding what would obtain after an intervention on x. For Pearl (2000), structural equations are meant to define an equilibrium state, where that state would be violated when there is an outside intervention (p. 157). As such, structural equations encode not only information about the equilibrium state but also information about which equations must be perturbed to explain the new equilibrium state. For the two models just described, an intervention on x would lead to different equilibrium states. Much more can be said regarding Woodward’s (2003) manipulability theory of causation as well as Pearl’s (2000) interventional interpretation of structural equation modeling, but a full account of their ideas is simply beyond the scope of this chapter. Suffice to say that in the context of structural equation modeling, Woodward’s (2003) as well as Pearl’s (2000) expansion of the counterfactual theory of causation to the problem of hypothetical interventions on exogenous variables provides a practical framework for using structural equation modeling to guide causal inference and is line with how its founders (Haavelmo, 1943; Marschak, 1950; Simon, 1953) viewed the utility of the methodology.
10.8 Conclusion Over the past 10 years, there have been important developments in the methodology of structural equation modeling—particularly in methods
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 230
230—STRUCTURAL EQUATION MODELING
such as multilevel structural equation modeling, growth curve modeling, and structural equation models that combine categorical and continuous latent variables. These developments indicate a promising future with respect to statistical and substantive applications. However, it is still the case that the conventional approach to structural equation modeling described earlier in this chapter dominates its applications to substantive problems and it is also still the case that practitioners remain reluctant to fully exploit structural equation modeling for testing causal claims. How might we reconcile statistical issues with causal issues and at the same time improve the practice of structural equation modeling? In this regard, Pearl (2000) offers a distinction between statistical and causal concepts that I argue is helpful as we attempt to advance the use of structural equation modeling in the social and behavioral sciences. Specifically, Pearl defines a statistical parameter as a quantity determined in terms of a joint probability distribution of observables without regard to any assumptions related to the existence of unobservables. Thus, EðyjxÞ, the regression coefficient β, and so on are examples of statistical parameters. By contrast, a causal parameter would be defined from a causal model, such as path coefficients, the expected value of y under an intervention, and so on. Furthermore, a statistical assumption is any constraint on the joint distribution of the observables—for example, the assumption of multivariate normality. A causal assumption, by contrast is any constraint on the causal model that is not based on statistical constraints. Causal assumptions may or may not have statistical implications. An example would be identification conditions, which are causal assumptions that can have statistical implications. Finally, in Pearl’s view, statistical concepts include: correlation, regression, conditional independence, association, likelihood, and so on. Causal concept, by contrast, include randomization, influence, effect, exogeneity, ignorability, intervention, invariance, explanation, and so forth. Pearl argues that researchers should not necessarily ignore one set of concepts in favor of the other but to treat each with the proper set of tools. In the context of structural equation modeling, I argue that the probabilistic reduction approach provides an improved set of tools that focus on the statistical side of modeling, whereas the counterfactual and manipulationist views of causation articulated by, for example, Hoover (2001), Mackie (1980), and Woodward (2003) provide the set of tools and concepts for engaging in causal inference. I argue that keeping the distinction between statistical and causal activities clear, but boldly and critically engaging in both, should help us realize the full potential of structural equation modeling as a valuable tool in the array of methodologies for the social and behavioral sciences.
10-Kaplan-45677:10-Kaplan-45677.qxp
6/24/2008
8:22 PM
Page 231
Epilogue—231
Notes 1. However, with the advent of new estimation methods, such as those discussed in Chapter 5, this may become less of a concern in the future. 2. Except perhaps indirectly when using the Akaike information criterion for nested comparisons. 3. It is beyond the scope of this chapter to conduct a detailed historical analysis, but it is worth speculating whether Goldberger’s important influence in structural equation modeling may partially account for the conventional practice observed in the social sciences. 4. As noted in Spanos (1989) this view was based on the perceived outcome of a classic debate between Koopmans (1947) and Vining (1949). 5. Included are such important contributions as randomization, replication, and blocking. 6. A difficulty that arises in the context of this discussion is the confusion of terms such as theory, model, and statistical model. No attempt will be made to resolve this confusion in the context of this chapter and thus it is assumed that the reader will understand the meaning of these terms in context. 7. Of course, nonzero restrictions and equality constraints are also possible. 8. Note that one can also use the multilevel reduced form discussed in Chapter 7 for this purpose as well. 9. Haavelmo, Wright, and Koopmans were referring to simultaneous equation modeling, but the point still holds for structural equation modeling as understood in this book. 10. An example might be a match being lit without it being struck—for example, if it were hit by lightning. 11. In this regard, there does not appear to be any inherent conflict between the probabilistic reduction approach described earlier and the counterfactual model of causal inference. 12. This example was originally put forth by Emanuel Kant in the context of an iron ball depressing a cushion.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 232
References Aitchison, J. (1962). Large-sample restricted parametric tests. Journal of the Royal Statistical Society Series B, 24, 234–250. Aitken, A. C. (1935). On least squares and linear combinations of observations. Proceedings of the Royal Society, 55, 42–48. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second international symposium on information theory. Budapest: Akademiai Kiado. Akaike, H. (1985). Prediction and entropy. In A. C. Atkinson & S. E. Feinberg (Eds.), A celebration of statistics (pp. 1–24). New York: Springer-Verlag. Akaike, H. (1987). Factor analysis on AIC. Psychometrika, 52, 317–332. Allison, P. D. (1987). Estimation of linear models with incomplete data. In C. C. Clogg (Ed.), Sociological methodology (pp. 71–103). San Francisco: Jossey-Bass. Anderson, T. W. (1959). Some scaling models and estimation procedures in the latent class model. In U. Grenander (Ed.), Probability and statistics: The Harald Cramer volume (pp. 9–38). New York: Wiley. Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure. Annals of Statistics, 1, 135–141. Anderson, T. W., & Rubin, H. (1956). Statistical inference in factor analysis. In J. Neyman (Ed.), Proceedings of the third Berkeley symposium for mathematical statistics and probability (Vol. 5, pp. 111–150). Berkeley: University of California Press. Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling: Issues and techniques (pp. 243–277). Mahwah, NJ: Lawrence Erlbaum. Arbuckle, J. L. (1999). AMOS 4.0 users’ guide. Chicago: Smallwaters. Asparouhov, T. (2005). Sampling weights in latent variable modeling. Structural Equation Modeling, 12, 411–434. Asparouhov, T., & Muthén, B. (2003). Full-information maximum-likelihood estimation of general two-level latent variable models with missing data (Mplus Working Paper). Asparouhov, T., & Muthén, B. (2007, August). Computationally efficient estimation of multilevel high-dimensional latent variable models. In Proceedings of the Joint Statistical Meeting in Salt Lake City. ASA Section on Biometrics. Bentler, P. M. (1990). Comparative fit indices in structural models. Psychological Bulletin, 107, 238–246. Bentler, P. M. (1995). EQS structural equations program manual. Encino, CA: Multivariate Software. 232
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 233
References—233 Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness-of-fit in the analysis of covariance structures. Psychological Bulletin, 88, 588–606. Bentler, P. M., & Liang, J. (2003). Two-level mean and covariance structures: Maximum likelihood via an EM algorithm. In S. Reise & N. Duan (Eds.), Multilevel modeling: Methodological advances, issues, and applications (pp. 53–70). Mahwah, NJ: Lawrence Erlbaum. Berndt, E. R. (1991). The practice of econometrics: Classic and contemporary. New York: Addison-Wesley. Bidwell, C. E., & Kasarda, J. D. (1975). School district organization and student achievement. American Sociological Review, 40, 55–70. Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis. Cambridge, MA: MIT Press. Blumen, I. M., Kogan, M., & McCarthy, P. J. (1955). The industrial mobility of labor as a probability process. Ithaca, NY: Cornell University Press. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Bollen, K. A., & Curran, P. J. (2004). Autoregressive latent trajectory (ALT) models: A synthesis of two traditions. Sociological Methods and Research, 32, 336–383. Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation perspective. New York: Wiley. Boomsma, A. (1983). On the robustness of LISREL (maximum likelihood estimation) against small sample size and non-normality. Unpublished dissertation, University of Groningen, Groningen, The Netherlands. Browne, M. W. (1982). Covariance structures. In D. M. Hawkins (Ed.), Topics in applied multivariate analysis (pp. 72–141). London: Cambridge University Press. Browne, M. W. (1984). Asymptotic distribution free methods in the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83. Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance structures. Multivariate Behavioral Research, 24, 445–455. Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. Browne, M. W., & Mels, G. (1990). RAMONA user’s guide. Columbus: Department of Psychology, Ohio State University. Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: An expository note. The American Statistician, 36, 153–157. Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structure. Psychological Bulletin, 105, 456–466. Caldwell, B. (1982). Beyond positivitism: Economic methodology in the twentieth century. London: George Allen & Unwin. Cartwright, N. (2007). Hunting causes and using them: Approaches in philosophy and economics. Cambridge: Cambridge University Press. Chambers, J. M. (1998). Programming with data: A guide to the S language. New York: Springer-Verlag. Chou, C.-P., & Bentler, P. M. (1990). Model modification in covariance structure modeling: A comparison among likelihood ratio, Lagrange multiplier, and Wald tests. Multivariate Behavioral Research, 25, 115–136.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 234
234—STRUCTURAL EQUATION MODELING Clogg, C. C. (1995). Latent class models. In G. Arminger, C. C. Clogg, & M. E. Sobel (Eds.), Handbook of statistical modeling in the social and behavioral sciences (pp. 81–110). San Francisco: Jossey-Bass. Collins, L. M., & Flaherty, B. P. (2002). Latent class models for longitudinal data. In J. A. Hagenaars & A. L. McCutcheon (Eds.), Applied latent class analysis (pp. 287–303). Cambridge, UK: Cambridge University Press. Collins, L. M., Hyatt, S. L., & Graham, J. W. (2000). Latent transition analysis as a way of testing models of stage-sequential change in longitudinal data. In T. D. Little, K. U. Schnabel, & J. Baumert (Eds.), Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples (pp. 147–161). Mahwah, NJ: Lawrence Erlbaum. Collins, L. M., & Wugalter, S. E. (1992). Latent class models for stage-sequential dynamic latent variables. Multivariate Behavioral Research, 27, 131–157. Cooper, R. L. (1972). The predictive performance of quarterly econometric models of the United States. In B. G. Hickman (Ed.), Econometric models of cyclical behavior (pp. 813–974). New York: Columbia University Press. Cowles, A., III. (1933). Can stock market forecasters forecast? Econometrica, 1, 309–324. Cudeck, R., & Henly, S. J. (1991). Model selection in covariance structure analysis and the “problem” of sample size. Psychological Bulletin, 109, 512–519. Curran, P. J., & Bollen, K. A. (2001). The best of both worlds: Combining autoregressive and latent curve models. In L. M. Collins & S. A. G. (Eds.), New methods for the analysis of change. Washington, DC: American Psychological Association. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society Series B, 39, 1–38. Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic Press. Elliott, P. R. (1994). An overview of current practice in structural equation modeling. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8, 430–457. Engle, R. F. (1984). Wald, likelihood ratio, and Lagrange Multiplier tests in econometrics. In Z. Griliches & M. D. Intriligator (Eds.), Handbook of Econometrics (pp. 776–826). Amsterdam: North-Holland. Ericcson, N. R. (1994). Testing exogeneity: An introduction. In N. R. Ericcson & J. S. Irons (Eds.), Testing exogeneity (pp. 3–38). Oxford, UK: Oxford University Press. Ericsson, N. R., & Irons, J. S. (Eds.). (1994). Testing exogeneity. Oxford, UK: Oxford University Press. Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing. Psychometrika, 44, 409–420. Fisher, F. (1966). The identification problem in econometrics. New York: McGraw-Hill. Fox, J. P., & Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271–288. Galton, F. (1889). Natural inheritance. London: Macmillan. Goldberger, A. S. (1964). Econometric theory. New York: Wiley. Goldberger, A. S., & Duncan, O. D. (1972). Structural equation methods in the social sciences. New York: Seminar Press.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 235
References—235 Goodman, L. (1968). The analysis of cross-classified data: Independence quasiindependence, and interactions in contingency tables with our without missing entries. Journal of the American Statistical Association, 63, 1091–1131. Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19, 149–161. Guttman, L. (1956). “Best possible” systematic estimates of communalities. Psychometrika, 21, 273–285. Haavelmo, T. (1943). The statistical implications of a system of simultaneous equations. Econometrica, 11, 1–12. Haavelmo, T. (1944). The probability approach in econometrics. Econometrica (Supplement), 12, 1–115. Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff. Harman, H. H. (1976). Modern factor analysis. Chicago: University of Chicago Press. Hausman, D., & Woodward, J. (1999). Independence, invariance, and the causal Markov condition. British Journal for the Philosophy of Science, 50, 521–583. Hausman, D., & Woodward, J. (2004). Modularity and the causal Markov condition: A restatement. British Journal for the Philosophy of Science, 55, 147–161. Heckman, J. J. (2005). The scientific model of causality. In R. M. Stolzenberg (Ed.), Sociological methodology (Vol. 35, pp. 1–97). Boston: Blackwell. Heiberger, R. M. (1977). Regression with pairwise-present covariance matrix. Proceedings of the Statistical Computing Section, 1977. Washington, DC: American Statistical Association. Hendrickson, A. E., & White, P. O. (1964). PROMAX: A quick method for rotation to oblique simple structure. British Journal of Mathematical and Statistical Psychology, 17, 65–70. Hendry, D. F. (1983). Econometric modelling: The consumption function in retrospect. Scottish Journal of Political Economy, 30, 193–220. Hendry, D. F. (1995). Dynamic Econometrics. Oxford: Oxford University Press. Hershberger, S. L., Molenaar, P. C. M., & Corneal, S. E. (1996). A hierarchy of univariate and multivariate structural time series models. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling. Mahwah, NJ: Lawrence Erlbaum. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960. Hood, W. C., & Koopmans (Eds.). (1953). Studies in econometric method (Vol. 14). New York: Wiley. Hoover, K. D. (1990). The logic of causal inference: Econometrics and the conditional analysis of causality. Economics and Philosophy, 6, 207–234. Hoover, K. D. (2001). Causality in macroeconomics. Cambridge, UK: Cambridge University Press. Horst, P. (1965). Factor analysis of data matrices. New York: Holt, Rinehart, and Winston. Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685. Hu, L., & Bentler, P. M. (1995). Evaluating model fit. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications. Thousand Oaks, CA: Sage.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 236
236—STRUCTURAL EQUATION MODELING Hume, D. (1739). A treatise of human nature. Oxford, UK: Oxford University Press. Intriligator, M. D., Bodkin, R. G., & Hsiao, C. (1996). Econometric models, techniques, and applications. Upper Saddle River, NJ: Prentice Hall. Jo, B., & Muthén, B. O. (2001). Modeling of intervention effects with noncompliance: A latent variable modeling approach for randomized trials. In G. A. Marcoulides & R. E. Schumacker (Eds.), New developments and techniques in structural equation modeling (pp. 57–87). Mahwah, NJ: Lawrence Erlbaum. Johnston, J. (1972). Econometric methods (2nd ed.). New York: McGraw-Hill. Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003a). Arithmetic fact mastery in young children: A longitudinal investigation. Journal of Experimental Child Psychology, 85, 103–119. Jordan, N. C., Hanich, L. B., & Kaplan, D. (2003b). A longitudinal study of mathematical competencies in children with specific mathematics difficulties versus children with co-morbid mathematics and reading difficulties. Child Development, 74, 834–850. Jordan, N. C., Kaplan, D., & Hanich, L. B. (2002). Achievement growth in children with learning difficulties in mathematics: Findings of a two-year longitudinal study. Journal of Educational Psychology, 94, 586–597. Jordan, N. C., Kaplan, D., Nabors-Oláh, L., Locuniak, M. N. (2006). Number sense growth in kindergarten: A longitudinal investigation of children at risk for mathematics difficulties. Child Development, 77, 153–175. Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443–482. Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183–202. Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36, 409–426. Jöreskog, K. G. (1973). A general method for estimating a linear structural equation system. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 85–112). New York: Academic Press. Jöreskog, K. G. (1977). Structural equation models in the social sciences: Specification, estimation and testing. In P. R. Krishnaiah (Ed.), Applications of statistics (pp. 265–287). Amsterdam: North-Holland. Jöreskog, K. G., & Goldberger, A. (1972). Factor analysis by generalized least squares. Psychometrika, 37, 243–259. Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631–639. Jöreskog, K. G., & Lawley, D. N. (1968). New methods in maximum likelihood factor analysis. British Journal of Mathematical and Statistical Psychology, 21, 85–96. Jöreskog, K. G., & Sörbom, D. (1993). LISREL 8.14. Chicago: Scientific Software International. Jöreskog, K. G., & Sörbom, D. (2000). LISREL 8.30 and PRELIS 2.30. Lincolnville, IL: Scientific Software International. Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200. Kaplan, D. (1988). The impact of specification error on the estimation, testing, and improvement of structural equation models. Multivariate Behavioral Research, 23, 69–86.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 237
References—237 Kaplan, D. (1989a). Model modification in covariance structure analysis: Application of the expected parameter change statistic. Multivariate Behavioral Research, 24, 285–305. Kaplan, D. (1989b). Power of the likelihood ratio test in multiple group confirmatory factor analysis under partial measurement invariance. Educational and Psychological Measurement, 49, 579–586. Kaplan, D. (1989c). A study of the sampling variability and z-values of parameter estimates from misspecified structural equation models. Multivariate Behavioral Research, 2, 41–57. Kaplan, D. (1990a). Evaluation and modification of covariance structure models: A review and recommendation. Multivariate Behavioral Research, 25, 137–155. Kaplan, D. (1990b). Rejoinder on evaluating and modifying covariance structure models. Multivariate Behavioral Research,, 25, 197–204. Kaplan, D. (1991a). On the modification and predictive validity of covariance structure models. Quality and Quantity, 25, 307–314. Kaplan, D. (1991b). The behaviour of three weighted least squares estimators for structured means analysis with non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 4, 333–346. Kaplan, D. (1994). Estimator conditioning diagnostics for covariance structure models. Sociological Methods and Research, 23, 200–229. Kaplan, D. (1995a). Statistical power in structural equation modeling. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 100–117). Thousand Oaks, CA: Sage. Kaplan, D. (1995b). The impact of BIB-spiralling induced missing data patterns on goodness-of-fit tests in factor analysis. Journal of Educational and Behavioral Statistics, 20, 69–82. Kaplan, D. (1999). On the extension of the propensity score adjustment method for the analysis of group differences in MIMIC models. Multivariate Behavioral Research, 34, 467–492. Kaplan, D. (2002). Methodological advances in the analysis of individual growth withrelevance to education policy. Peabody Journal of Education, 77, 189–215. Kaplan, D. (2004). On exogeneity. In D. Kaplan (Ed.), Sage handbook of quantitative methodology for the social sciences (pp. 409–423). Thousand Oaks: CA: Sage. Kaplan, D. (2005). Finite mixture dynamic regression modeling of panel data with implications for dynamic response analysis. Journal of Educational and Behavioral Statistics, 30(2), 169–187. Kaplan, D. (2008). An overview of Markov chain methods for the study of stage-sequential developmental processes. Developmental Psychology, 44, 457–467. Kaplan, D., & Elliott, P. R. (1997a). A didactic example of multilevel structural equation modeling applicable to the study of organizations. Structural Equation Modeling: A Multidisciplinary Quarterly, 4, 1–24. Kaplan, D., & Elliott, P. R. (1997b). A model-based approach to validating education indicators using multilevel structural equation modeling. Journal of Educational and Behavioral Statistics, 22, 323–348. Kaplan, D., & Ferguson, A. J. (1999). On the utilization of sample weights in latent variable models. Structural Equation Modeling, 6, 305–321. Kaplan, D., & George, R. (1998). Evaluating latent variable growth models through ex post simulation. Journal of Educational and Behavioral Statistics, 23, 216–235.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/25/2008
10:40 AM
Page 238
238—STRUCTURAL EQUATION MODELING Kaplan, D., Harik, P., & Hotchkiss, L. (2000). Cross-sectional estimation of dynamic structural equation models in disequilibrium. In R. Cudeck, S. H. C. du Toit, & D. Sorbom (Eds.), Structural equation modeling: Present and future. A festschrift in honor of Karl G. Jöreskog (pp. 315–339). Lincolnville, IL: Scientific Software International. Kaplan, D., Kim, J.-S., and Kim, S.-Y. (in press). Multilevel Latent Variable Modeling: Current research and recent developments. In R. Millsap & A. Maydeus-Olivaras (Eds.), Sage handbook of quantitative methods in psychology. Thousand Oaks, CA Sage. Kaplan, D., & Kreisman, M. B. (2000). On the validation of indicators of mathematics education using TIMSS: An application of multilevel covariance structure modeling. International Journal of Educational Policy, Research, and Practice, 1, 217–242. Kaplan, D., & Walpole, S. (2005). A stage-sequential model of literacy transitions: Evidence from the Early Childhood Longitudinal Study. Journal of Educational Psychology, 97, 551–563. Kaplan, D., & Wenger, R. N. (1993). Asymptotic independence and separability in covariance structure models: Implications for specification error, power, and model modification. Multivariate Behavioral Research, 28, 483–498. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Educational Research Association, 90, 773–795. Keesling, J. W. (1972). Maximum likelihood approaches to causal analysis. Unpublished doctoral dissertation, University of Chicago, Chicago. Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences. Pacific Grove, CA: Brooks/Cole. Kish, L. (1965). Survey sampling. New York: Wiley. Kish, L., & Frankel, M. R. (1974). Inference from complex samples. Journal of the Royal Statistical Society Series B, 36, 1–37. Koopmans, T. C. (1947). Measurement without theory. Review of Economics and Statistics, 29, 161–172. Koopmans, T. C. (Ed.). (1950). Statistical inference in dynamic economic time series (Vol. 10). New York: Wiley. Koopmans, T. C., Rubin, H., & Leipnik, R. B. (1950). Measuring the equation systems of dynamic economics. In T. C. Koopmans (Ed.), Statistical inference in dynamic economic models. (pp. 53–237). New York: Wiley. Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. Thousand Oaks, CA: Sage. Land, K. C. (1973). Identification, parameter estimation, and hypothesis testing in recursive sociological models. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 19–49). New York: Seminar Press. Langeheine, R., & Van de Pol, F. (2002). Latent Markov chains. In J. A. Hagenaars & A. L. McCutcheon (Eds.), Applied latent class analysis (pp. 304–341). Cambridge, UK: Cambridge University Press. Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum likelihood. Proceedings of the Royal Society of Edinburgh, 60, 64–82. Lawley, D. N. (1941). Further investigations in factor estimation. Proceedings of the Royal Society of Edinburgh, 61, 176–185. Lawley, D. N., & Maxwell, A. E. (1971). Factor analysis as a statistical method. London: Butterworth.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 239
References—239 Lazarsfeld, P. (1950). The logical and mathematical foundations of latent structure analysis. In S. S. Stouffer (Ed.), Measurement and prediction (pp. 361–412). Princeton, NJ: Princeton University Press. Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton Mifflin. Lee, S.-Y., & Poon, W.-Y. (1998). Analysis of two-level structural equation models via EM algorithms. Statistica Sinica, 8, 749–766. Lewis, D. (1973). Counterfacturals. Malden, MA: Blackwell. Linhart, H., & Zucchini, W. (1986). Model selection. New York: Wiley. Little, R. J. A. (1982). Models for nonresponse in sample surveys. Journal of the American Statistical Association, 77, 237–250. Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley. Longford, N. T., & Muthén, B. (1992). Factor analysis of clustered observations. Psychometrika, 57, 581–597. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. Lucas, R. E. (1976). Econometric policy evaluation: A critique. In K. Brunner & A. H. Meltzer (Eds.), Carnegie-Rochester conference series on public policy: The Phillips curve and labor markets (Vol. 1, pp. 19–46). Amsterdam: North-Holland. Luijben, T., Boomsma, A., & Molenaar, I. W. (1987). Modification of factor analysis models in covariance structure analysis. Heymans Bulletins Psychologische Instituten, University of Groningen. MacCallum, R. (1986). Specification searches in covariance structure modeling. Psychological Bulletin, 100, 107–120. MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130–149. MacCallum, R. C., Roznowski, M., & Necowitz, L. B. (1992). Model modifications in covariance structure analysis: The problem of capitalization on chance. Psychological Bulletin, 111, 490–504. Mackie, J. L. (1980). The cement of the universe: A study of causation. Oxford, UK: Oxford University Press. Magnus, J. R., & Neudecker, H. (1988). Matrix differential calculus with applications in statistics and econometrics. New York: Wiley. Marschak, J. (1950). Statistical inference in economics. In T. Koopmans (Ed.), Statistical inference in dynamic economic models (Cowles Commission Monograph No. 10, pp. 1–50). New York: Wiley. Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit indexes in confirmatory factor analysis: The effect of sample size. Psychological Bulletin, 103, 391–411. McCutcheon, A. L. (2002). Basic concepts and procedures in single- and multiple-group latent class analysis. In J. A. Hagenaars & A. L. McCutcheon (Eds.), Applied latent class analysis. Cambridge, UK: Cambridge University Press. McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality and goodness-of-fit. Psychological Bulletin, 107, 247–255. McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 240
240—STRUCTURAL EQUATION MODELING Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543. Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107–122. Miller, J. D., Hoffer, T., Sucher, R. W., Brown, K. G., & Nelson, C. (1992). LSAY condebook: Student, parent, and teacher data for 1992 chohort two for longitudinal years one through four (1987–1991). DeKalb: Northern Illinois University. Mooijaart, A. (1998). Log-linear and Markov modelling of categorical longitudinal data. In. C. C. J. H. Bijleveld & L. J. Th. van der Kamp (Eds.), Longitudinal data analysis: Designs, models, and methods. Thousand Oaks, CA: Sage. Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. Cambridge, UK: Cambridge University Press. Mulaik, S. (1972). The foundations of factor analysis. New York: McGraw-Hill. Mulaik, S. A., James, L. R., Van Alstine, J., Bennett, N., Lind, S., & Stillwell, C. D. (1989). An evaluation of goodness-of-fit indices for structural equation models. Psychological Bulletin, 105, 430–445. Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43, 551–560. Muthén, B. (1983). Latent variable structural equation modeling with categorical data. Journal of Econometrics, 22, 43–65. Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132. Muthén, B. (1987). Response to Freedman’s critique of path analysis: Improve credibility by better methodological training. Journal of Educational Statistics, 12, 178–184. Muthén, B. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557–585. Muthén, B. (1991). Analysis of longitudinal data using latent variable models with varying parameters. In L. Collins & J. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions (pp. 1–17). Washington, DC: American Psychological Association. Muthén, B. (1993). Goodness of fit with categorical and other non-normal variables. In K. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 205–234). Newbury Park, CA: Sage. Muthén, B. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29, 81–117. Muthén, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (Ed.), Sage handbook of quantitative methodology for the social sciences (pp. 345–368). Thousand Oaks, CA: Sage. Muthén, B., & Hofacker, C. (1988). Testing the assumptions underlying tetrachoric correlations. Psychometrika, 53, 563–578. Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38, 171–189. Muthén, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: A note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45, 19–30. Muthén, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data that are not missing completely at random. Psychometrika, 51, 431–462.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 241
References—241 Muthén, B., & Satorra, A. (1989). Multilevel aspects of varying parameters in structural models. In R. D. Bock (Ed.), Multilevel analysis of educational data. San Diego, CA: Academic Press. Muthén, B. O. (2001). Latent variable mixture modeling. In G. A. Marcoulides & R. E. Schumacker (Eds.), New developments and techniques in structural equation modeling. Mahaw, NJ: Lawrence Erlbaum. Muthén, B. O., & Curran, P. J. (1997). General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods, 2, 371–402. Muthén, B. O., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical outcomes. Unpublished manuscript, University of California, Los Angeles. Muthén, L. K., & Muthén, B. O. (2006). Mplus: Statistical analysis with latent variables. Los Angeles: Muthén & Muthén. Muthén, L. K., & Muthén, B. (1998–2007). Mplus user’s guide (5th ed.). Los Angeles: Muthén & Muthén. Nagin, D. S. (1999). Analyzing developmental trajectories: A semi-parametric, groupbased approach. Psychological Methods, 4, 139–157. National Assessment of Educational Progress (NAEP). (1986). The NAEP 1986 Technical Report. Princeton, NJ. Educational Testing Service National Center for Education Statistics. (1988). National educational longitudinal study of 1988. Washington, DC: U.S. Department of Education. National Center for Education Statistics. (2001). Early childhood longitudinal study: Kindergarten class of 1998–99: Base year public-use data files user’s manual (No. NCES 2001–029). Washington, DC: Government Printing Office. Olsson, U. (1979). On the robustness of factor analysis against crude classification of the observations. Multivariate Behavioral Research, 14, 485–500. Organisation for Economic Co-operation and Development. (2004). The PISA 2003 assessment framework: Mathematics, reading, science, and problem solving knowledge and skills. Paris: Author. Pagan, A. R. (1984). Model evaluation by variable addition. In D. F. Hendry & K. F. Wallis (Eds.), Econometrics and quantitative economics (pp. 275–314). Oxford, UK: Basil Blackwell. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge University Press. Pearson, K., & Lee, A. (1903). On the laws of inheritance in man. Biometrika, 2, 357–462. Pindyck, R. S., & Rubinfeld, D. L. (1991). Econometric models & economic forecasts. New York: McGraw-Hill. Potthoff, R. F., Woodbury, M. A., & Manton, K. G. (1992). “Equivalent sample size” and “equivalent degrees of freedom” refinements using survey weights under superpopulation models. Journal of the American Statistical Association, 87, 383–396. Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural equation modeling. Psychometrika, 69, 167–190. Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen, & J. S. Long (Eds.), Testing Structural Equation Models (pp. 163–180). Newbury Park, CA: Sage.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 242
242—STRUCTURAL EQUATION MODELING Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousands Oaks, CA: Sage. Richard, J.-F. (1982). Exogeneity, causality, and structural invariance in econometric modeling. In G. C. Chow & P. Corsi (Eds.), Evaluating the reliability of macroeconomic models (pp. 105–118). New York: Wiley. Rogosa, D. R., Brandt, D., & Zimowski, M. (1982). A growth curve approach to the measurement of change. Psychological Bulletin, 90, 726–748. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. Rubin, D. (1976). Inference and missing data. Biometrika, 63, 581–592. Saris, W. E., & Stronkhorst, H. (1984). Causal modeling in nonexperimental research. Amsterdam: Sociometric Research Foundation. Saris, W. E., Satorra, A., & Sörbom, D. (1987). The detection and correction of specification errors in structural equation models. In C. C. Clogg (Ed.), Sociological methodology 1987 (pp. 105–129). San Francisco: Jossey-Bass. Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A unified approach. Psychometrika, 54, 131–151. Satorra, A. (1992). Asymptotic robust inference in the analysis of mean and covariance structures. In P. V. Marsden (Ed.), Sociological methodology 1992 (pp. 249–278). Oxford, UK: Blackwell. Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90. Schmidt, W. H. (1969). Covariance structure analysis of the multivariate random effects model. Unpublished doctoral dissertation, University of Chicago. Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464. Shavelson, R. J., McDonnell, L. M., & Oakes, J. (Eds.). (1989). Indicators for monitoring mathematics and science: A sourcebook. Santa Monica, CA: Rand Corporation. Silvey, S. D. (1959). The Lagrangian multipler test. Annals of Mathematical Statistics, 30, 389–407. Simon, H. A. (1953). Causal ordering and identifiability. In W. C. Hood & T. C. Koopmans (Eds.), Studies in econometric method (pp. 49–74). New York: Wiley. Sivo, S. A., Fan, X., & Witta, E. L. (2005). The biasing effects of unmodeled ARMA time series processes on latent growth curve model estimates. Structural Equation Modeling, 12, 215–232. Sobel, M. E. (1990). Effect analysis and causation in linear structural equation models. Psychometrika, 55, 495–515. Sobel, M. E., & Bohrnstedt, G. W. (1985). Use of null models in evaluating the fit of covariance structure models. In N. B. Tuma (Ed.), Sociological methodology 1985 (pp. 152–178). San Francisco: Jossey-Bass. Sörbom, D. (1974). A general method for studying differences in factor means and factor structure between groups. British Journal of Mathematical and Statistical Psychology, 27, 229–239. Sörbom, D. (1978). An alternative to the methodology of analysis of covariance. Psychometrika, 43, 381–396. Sörbom, D. (1989). Model modification. Psychometrika, 54, 371–384. Spanos, A. (1986). Statistical foundations of econometric modeling. Cambridge, UK: Cambridge University Press.
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 243
References—243 Spanos, A. (1989). On rereading Haavelmo: A retrospective view of econometric modeling. Econometric Theory, 5, 405–429. Spanos, A. (1990). Towards a unifying methodological framework for econometric modeling. In C. W. J. Granger (Ed.), Modelling economic series. Oxford, UK: Oxford University Press. Spanos, A. (1995). On theory testing in econometrics: Modeling with non-experimental data. Journal of Econometrics, 67, 189–226. Spanos, A. (1999). Probability theory and statistical inference. Cambridge, UK: Cambridge University Press. Spearman, C. (1904). General intelligence, objectively determined and measured. American Journal of Psychology, 15, 201–293. Stapleton, L. M. (2002). The incorporation of sample weights into multilevel structural equation models. Structural Equation Modeling, 9, 475–502. Stapleton, L. M. (2006). An assessment of practical solutions for structural equation modeling with complex sample data. Structural Equation Modeling, 13, 28–58. Steiger, J. H. (1989). Causal modeling : A supplementary module for SYSTAT and SYGRAPH. Evanston, IL: SYSTAT. Steiger, J. H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173–180. Steiger, J. H., & Lind, J. M. (1980). Statistically based tests for the number of common factors. Paper presented at the Psychometric Society, Iowa City, IA. Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). On the multivariate asymptotic distribution of sequential chi-square tests. Psychometrika, 50, 253–264. Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900. Cambridge, MA: Harvard University Press. Stouffer, S. S., Suchman, E. A., Devinney, L. C., Star, S. A., & Williams, R. M., Jr. (1949). The American Soldier (Vol. 1). Princeton, NJ: Princeton University Press. Tanaka, J. S. (1993). Multifaceted conceptions of fit in structural equation models. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 10–39). Newbury Park, CA: Sage. Tatsuoka, M. M. (1988). Multivariate analysis: Techniques for educational and psychological research (2nd ed.). New York: Macmillan. Theil, H. (1966). Applied economic forecasting. Amsterdam: North-Holland. Thomson, G. H. (1956). The factorial analysis of human ability. Boston: Houghton Mifflin. Thurstone, L. L. (1935). The vectors of the mind. Chicago: University of Chicago Press. Thurstone, L. L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press. Timm, N. H. (1975). Multivariate analysis with applications in education and psychology. Monterey, CA: Brooks/Cole. Tisak, J., & Meredith, W. (1990). Longitudinal factor analysis. In A. von Eye (Ed.), Statistical methods in longitudinal research (Vol. 1, pp. 125–149). New York: Academic Press. Tryfos, P. (1996). Sampling methods for applied research. New York: Wiley. Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1–10. Tuma, N. B., & Hannan, M. T. (1984). Social dynamics: Models and methods. New York: Academic Press. Van de Pol, F., & Langeheine, R. (1989). Mover-stayer models, mixed Markov models and the EM algorithm; with an application to labour market data from the
Ref-Kaplan-45677:Ref-Kaplan-45677.qxp
6/24/2008
6:43 PM
Page 244
244—STRUCTURAL EQUATION MODELING Netherlands Socio-Economic Panel. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 485–495). Amsterdam: North-Holland. Vernon, P. E. (1961). The structure of human abilities (2nd ed.). London: Methuen. Vining, R. (1949). Koopmans on the choice of variables to be studied and of methods of measurement. Review of Economics and Statistics, 31, 77–94. West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications. Thousand Oaks, CA: Sage. White, H. (Ed.). (1982). Model specification [Special issue]. Amsterdam: North-Holland. Wiggins, L. M. (1973). Panel analysis. Amsterdam: Elsevier. Wiley, D. E. (1973). The identification problem in structural equation models with unmeasured variables. In A. S. Goldberger & O. D. Duncan (Eds.), Structural equation models in the social sciences (pp. 69–83). New York: Academic Press. Willett, J. B. (1988). Questions and answers in the measurement of change. In E. Z. Rothkopf (Ed.), Review of Research in Education (Vol. 15, pp. 345–422). Washington, DC: American Educational Research Association. Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time. Psychological Bulletin, 116, 363–381. Willett, J. B., & Sayer, A. S. (1996). Cross-domain analyses of change over time: Combining growth modeling and covariance structure analysis. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling. Mahwah, NJ: Lawrence Erlbaum. Woodward, J. (2003). Making things happen: A theory of causal explanation. Oxford, UK: Oxford University Press. Wright, S. (1918). On the nature of size factors. Genetics, 3, 367–374. Wright, S. (1921). Correlation and causation. Journal of Agriculture Research, 20, 557–585. Wright, S. (1934). The method of path coefficients. Annals of Mathematical Statistics, 5, 161–215. Wright, S. (1960). Path coefficients and path regressions: Alternative or complementary concepts? Biometrics, 16, 189–202. Wrigley, C. (1956). An empirical comparison of various methods for the estimation of communalities (Contract Rep. No. 1). Berkeley: University of California Yuan, K.-H., & Bentler, P. M. (2005). Asymptotic robustness of the normal theory likelihood ratio statistic for two-level covariance structure models. Journal of Multivariate Analysis, 94, 328–343. Yuan, K.-H., & Hayashi, K. (2005). On Muthén’s maximum likelihood for two-level covariance structure models. Psychometrika, 70, 147–167.
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 245
Index Akaike information criterion (AIC) model evaluation, 113t, 116, 117–118, 121, 131n6, 132n7 model modification, 126–127, 128t, 129, 130t American Soldier, The (Stouffer, Suchman, Devinney, Star and Williams), 5–6 Approximation errors, 113–116 Asymptotically independent, 100 Asymptotic distribution free (ADF) estimation, 87 Autoregressive latent trajectory (ALT) model, 167, 170–172, 173t Bayes factor, 119 Bayesian information criterion (BIC) model evaluation, 113t, 116, 119, 121 model modification, 126–127, 128t, 129, 130t Bias proportion, 176 Biometrics, 3–4 Bishop, Y., 6 Calibration sample, 116, 119–120 Categorical latent variables finite mixture modeling, 185–203 historical development, 5–6, 12n3 Categorical variable methodology (CVM), 88–89, 91t Causal field, 222–223 Causal inference counterfactual theory of causation, 222–225 econometrics, 224–225
interventionist interpretation, 228–229 invariance, 226–227 manipulability theory of causation, 226–229 modularity, 226–227 observationally equivalent models, 227–228 structural equation modeling, 220–229 Causality (Pearl), 221 Cement of the Universe, The (Mackie), 222 Characteristics roots, 44, 45–46, 47t, 59n3 Class label, 182, 204n1 Common factor model, 43, 47–49 Communality, 48–49 Comparative fit index (CFI), 111–112, 113t, 131n2 Complier average causal effects (CACE), 204, 205n9 Confirmatory factor analysis historical development, 3 multilevel, 138–139, 140t, 141f, 150, 151t restricted factor model, 54–58 Constrained parameters, 15 Continuous latent variables finite mixture modeling, 197–203 historical development, 1–5 Counterfactual theory of causation causal field, 222–223 econometrics, 224–225, 231n11 inus condition, 222–224, 225 structural equation modeling, 222–225 Counting rule path analysis, 20 structural equation model, 63–64, 65 Covariance proportion, 177 245
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 246
246—STRUCTURAL EQUATION MODELING Covariance structure specification path analysis, 17–18, 19–20, 38n4 structural equation model, 63 Cowles Commission for Research in Economics, 4, 12n1 Cross-validation index (CVI), 119–120 Data-generating process (DGP), 8, 215 Data mining, 211, 212 Devinney, L. C., 5–6 Direct effect path analysis, 33–34, 36 structural equation model, 67–69 Discrepancy due to approximation, 114–115 Discrepancy due to estimation, 114–115 Discrete Multivariate Analysis (Bishop, Fienberg, and Holland), 6 Early Childhood Longitudinal Study (ECLS) (2001), 147, 152–153, 155, 185, 187, 199, 205n5 Econometrics causal inference, 224–225 conventional approach, 210–212 counterfactual theory of causation, 224–225, 231n11 data mining, 211, 212 error term, 211, 212 Haavelmo distribution, 211 historical development, 210–212, 231n3 path analysis, 13, 16, 18, 20, 21, 24 simultaneous equation modeling, 5, 13, 16, 24, 212 statistical adequacy, 211 structural equation modeling, 4, 9–10, 210–212, 224–225 textbook approach, 211–212 Educational system, 10–12 See also student achievement examples Educational tracking, 80–83 Effect decomposition path analysis, 33–34, 35t, 36, 38n7 structural equation model, 67–69 Empirical social science model, 219, 231n7 Endogenous variables path analysis, 14, 16, 17, 33–34, 37n2 structural equation model, 62–63, 64
Error term, 211, 212 Estimable model, 215–217 Exogeneity parameters of interest, 102 strong exogeneity, 101 structural equation modeling, 101–106 super exogeneity, 101, 226 variation free, 103 weak exogeneity, 101, 103–106, 218, 226 Exogenous variables path analysis, 14, 16–17, 20, 27, 33–34, 37n2, 38n8 structural equation model, 62–63, 64 Expectation-maximization (EM) algorithm finite mixture modeling, 183–185 model assumptions, 87–88, 98 multilevel modeling, 135–136 Expected cross-validation index (ECVI) model evaluation, 116, 119–120, 121, 131n5 model modification, 126–127 Expected parameter change (EPC), 124–126, 127–129 Exploratory factor analysis historical development, 3 structural equation model, 67 unrestricted factor model, 42–54 Factor analysis characteristics roots, 44, 45–46, 47t, 59n3 common factor model, 43, 47–49 communality, 48–49 confirmatory factor analysis, 54–58 effect decomposition, 46, 47t, 59n4 exploratory factor analysis, 42–54 fitted covariance matrix, 41 fixed parameters, 55–56 free parameters, 55–56 generalized least squares (GLS) estimation, 54, 57 Gramian matrix, 48–49 historical development, 1–3 indeterminancy, 42–43, 47–48, 49–50 iterated principal axis factoring, 49 linear factor model, 40–41
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 247
Index—247 maximum likelihood (ML) estimation, 51–54, 55–56, 57 model assumptions, 41 model specification, 39–41 model testing, 57–58 multilevel modeling, 136–139, 140t, 141f, 150, 151t not identified, 43 oblique rotation, 51–52, 53t orthogonal rotation, 42–43, 44, 45, 50–51 parameter estimation, 43–49 parameter identification, 42–43, 49–50, 55–56 path diagram, 57–58 principal axis factoring, 44, 49, 51–52 principal components analysis (PCA), 43, 44–47, 48f promax criterion, 51–52, 53t rotation, 49–52 school climate perceptions, 39–41, 46, 47t, 48f, 51–52, 55–56, 57–58, 59n2 score vector, 41 simple structure, 2, 3 squared multiple correlation (SMC), 49 true score theory, 41–42 unique variables, 41–42 varimax criterion, 50–51 See also Structural equation model Factorial invariance, 72, 74, 80–82 Fienberg, S., 6 Finite mixture modeling categorical-continuous latent variables, 197–203 categorical latent variables, 185–203 class label, 182, 204n1 complier average causal effects (CACE), 204, 205n9 cross-sectional models, 185–189 expectation-maximization (EM) algorithm, 183–185 growth mixture modeling, 198–203, 205n8 latent class models, 185–189, 205n2 latent Markov model, 191–193 latent status, 194 latent transition analysis, 193–197, 198t
local independence assumption, 185–186 longitudinal models, 189–197 manifest Markov model, 190–191, 193t Markov chain models, 189–197 maximum likelihood (ML) estimation, 183–185 mixture latent Markov model, 196–197, 205n7 mover-stayer model, 196–197 nonstationary manifest Markov model, 191, 192t, 205n6 overview, 182–183 stage-sequential dynamic latent variables, 194 student reading achievement, 185, 187–189, 191, 192t, 193t, 194–196, 197, 198t, 200–203 Fisher information matrix, 26 Fisher’s experimental design, 213, 214, 231n5 Fitted covariance matrix factor analysis, 41 path analysis, 23–26, 29 Fixed parameters factor analysis, 55–56 multiple group modeling, 71–72, 79–80 path analysis, 15, 30–31 Forecasting statistics, 175–178 Free model, 83 Free parameters factor analysis, 55–56 multiple group modeling, 71–72 path analysis, 15, 30–31 structural equation model, 64 Full-information maximum likelihood (FIML) estimation, 24, 38n5 Full quasi-likelihood (FQL) estimation, 96–97, 107n4 Generalized least squares (GLS) estimation factor analysis, 54, 57 historical development, 3 model assumptions, 87, 88 path analysis, 23, 27–29, 38n5 Generalized linear latent and mixed model (GLLAMM), 152
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 248
248—STRUCTURAL EQUATION MODELING Generalized linear mixed model (GLMM), 152 Genetics, 1–2 Gramian matrix, 48–49 Growth curve modeling alternative time metrics, 167, 172–174 autoregressive latent trajectory (ALT) model, 167, 170–172, 173t basic ideas, 156–157 bias proportion, 176 covariance proportion, 177 forecasting statistics, 175–178 inequality coefficient, 176 maximum likelihood (ML) estimation, 162, 166t, 173t model extensions, 167–174 multilevel modeling, 158–159 multivariate modeling, 167–168, 169t nonlinear curve fitting, 167, 168–170 parallel growth process, 167–168 path diagram, 162, 163f root mean square percent simulation error (RMSPSE), 175–176 root mean square simulation error (RMSSE), 175–176 structural equation modeling, 159–166 student science achievement, 156–157, 161–166, 168–170, 169t, 172, 173t, 177–178 student science attitudes, 156–157, 161–166, 168–170, 169t, 172, 173t, 177–178 univariate modeling, 161–167 variance proportion, 176–177 Growth mixture modeling, 198–203, 205n8
Inus condition, 222–224, 225 Invariance causal inference, 226–227 multiple group modeling, 72, 74, 80–82 path analysis, 29 strict factorial, 81 strong factorial, 81 Invariance model, 83 Iterated principal axis factoring, 49 Just identified path analysis, 20, 22 structural equation model, 64
Haavelmo distribution, 211 Holland, P., 6
Lagrange multiplier (LM) test model assumptions, 101 model modification, 122, 125, 126–127 path analysis, 31, 37 Latent class models, 185–189, 205n2 Latent Markov model, 191–193 Latent status, 194 Latent structure analysis, 5–6, 12n3 Latent variable analysis, 82 Likelihood ratio (LR) test model modification, 122, 127–128 path analysis, 30–31, 36 structural equation model, 65 Linear factor model factor analysis, 40–41 model assumptions, 88–89 LISREL software program, 5 Listwise present approach (LPA), 93–94, 97, 98 Local independence, 6, 185–186 London School of Economics, 212 Longitudinal Study of American Youth (LSAY), 155, 161, 179n2
Ignorable mechanism, 92–93, 96–97 Indeterminancy, 42–43, 47–48, 49–50 Indirect effect path analysis, 34, 35t, 36 structural equation model, 67–69 Inequality coefficient, 176 Input-process-output theory of education, 10–12
Mackie, J. L., 222 Manifest Markov model, 190–191, 193t Manipulability theory of causation, 226–229 Markov chain models, 189–197 Maximum likelihood first-order (MLF) estimation, 135–136
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 249
Index—249 Maximum likelihood (ML) estimation factor analysis, 51–54, 55–56, 57 finite mixture modeling, 183–185 growth curve modeling, 162, 166t, 173t historical development, 3, 4 model assumptions, 85, 87–88, 91t, 94–97, 98, 99 model evaluation, 117–118, 120 model modification, 128t, 130t multilevel modeling, 135–136 multiple group modeling, 71, 76t path analysis, 23, 24–27, 28t, 29, 38n5 structural equation model, 67, 68t Maximum likelihood robust (MLR) estimation, 136–138 Mean structure estimation, 74–76 Mean structure specification, 75 Mean structure testing, 75 Minimum Akaike information criterion (MAIC), 118 Missing at random (MAR), 92–93, 96–98 Missing completely at random (MCAR), 92–94, 96–97, 98, 107n5 Missing data case approaches, 93–94 full quasi-likelihood (FQL) estimation, 96, -97, 107n4 ignorable mechanism, 92–93, 96–97 listwise present approach (LPA), 93–94, 97, 98 MAR-based approaches, 97–98 missing at random (MAR), 92–93, 96–98 missing completely at random (MCAR), 92–94, 96–97, 98, 107n5 model-based approaches, 94–97 nomenclature, 92–93 nonignorable mechanism, 92–93, 96–97 not missing at random (NMAR), 92–93 observed at random (OAR), 92–93 pairwise present approach (PPA), 93–94, 97, 98 structural equation modeling, 92–98
Mixture latent Markov model, 196–197, 205n7 Model assumptions asymptotically independent, 100 asymptotic distribution free (ADF) estimation, 87 exogeneity, 101–106 expectation-maximization (EM) algorithm, 87–88, 98 factor analysis, 41 generalized least squares (GLS) estimation, 87, 88 Lagrange multiplier (LM) test, 101 linear factor model, 88–89 maximum likelihood (ML) estimation, 85, 87–88, 91t, 94–97, 98, 99 missing data, 92–98 mutually asymptotically independent, 100 nonnormality estimation, 85–92 separable hypotheses, 100 specification error, 98–101, 107n7 student science achievement, 90–92 transitive hypotheses, 100 Wald test, 99–100, 101 weighted least squares (WLS) estimation, 86–87, 89, 90–92 Model evaluation Akaike information criterion (AIC), 113t, 116, 117–118, 121, 131n6, 132n7 alternative fit indices, 110–121 approximation errors, 113–116 Bayes factor, 119 Bayesian information criterion (BIC), 113t, 116, 119, 121 calibration sample, 116, 119–120 comparative fit index (CFI), 111–112, 113t, 131n2 comparative fit measures, 110–113 cross-validation index (CVI), 119–120 discrepancy due to approximation, 114–115 discrepancy due to estimation, 114–115 expected cross-validation index (ECVI), 116, 119–120, 121, 131n5
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 250
250—STRUCTURAL EQUATION MODELING maximum likelihood (ML) estimation, 117–118, 120 minimum Akaike information criterion (MAIC), 118 model selection criteria, 116–121, 131n4 nonnormed fit index (NNFI), 111, 113t, 131n1 normed-fit index (NFI), 110–111, 116 parsimony-based indices, 112 parsimony normed-fit index (PNFI), 112 predictive distribution, 117 relative noncentrality index (RNI), 111 root mean square error of approximation (RMSEA), 113t, 115–116 student science achievement, 112–113, 116 Tucker-Lewis index (TLI), 111, 113t, 116, 131n1 validation sample, 116, 119–120 Model modification Akaike information criterion (AIC), 126–127, 128t, 129, 130t Bayesian information criterion (BIC), 126–127, 128t, 129, 130t expected cross-validation index (ECVI), 126–127 expected parameter change (EPC), 124–126, 127–129 Lagrange multiplier (LM) test, 122, 125, 126–127 likelihood ratio (LR) test, 122, 127–128 maximum likelihood (ML) estimation, 128t, 130t model selection criteria, 126–129, 130t modification index (MI), 122, 125, 126–127 power estimation, 123–126 power influences, 129–131 root mean square error of approximation (RMSEA), 123–124, 128 sample size, 123–126
standardized expected parameter change (SEPC), 125–126, 127–129 statistical power, 121–131 student science achievement, 121–122, 127–129, 130t Model specification factor analysis, 39–41 multiple group modeling, 70–76, 84n2 path analysis, 14–18 structural equation model, 62–63 Model testing factor analysis, 57–58 multiple group modeling, 70–74 path analysis, 29–33 structural equation model, 65–69 Modification index (MI) model modification, 122, 125, 126–127 path analysis, 31 Modularity, 226–227 Mover-stayer model, 196–197 Mplus software program, 27, 67, 88, 90, 91, 98, 121, 132n8, 136, 152, 161 M test, 71 Multilevel latent variable model (MLLVM), 135–136 Multilevel modeling basic ideas, 134–136 expectation-maximization (EM) algorithm, 135–136 factor analysis, 136–139, 140t, 141f, 150, 151t generalized linear latent and mixed model (GLLAMM), 152 generalized linear mixed model (GLMM), 152 growth curve modeling, 158–159 maximum likelihood first-order (MLF) estimation, 135–136 maximum likelihood (ML) estimation, 135–136 maximum likelihood robust (MLR) estimation, 136–138 multilevel latent variable model (MLLVM), 135–136
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 251
Index—251 multiple indicators/multiple causes (MIMIC) model, 139 Muthén’s maximum likelihood (MUML) estimation, 135–136 path analysis, 139, 142–147 path diagram, 139, 141f, 144, 146f sampling weights, 147–151 student mathematics achievement, 138–139, 140t, 141f, 143–147, 150, 151t, 153n2 weighted least squares mean-adjusted (WLSM) estimation, 136 weighted least squares (WLS) estimation, 136 Multiple group modeling educational tracking, 80–83 exploratory factor analysis, 74 factorial invariance, 72, 74, 80–82 fixed parameters, 71–72, 79–80 free model, 83 free parameters, 71–72 invariance model, 83 latent variable analysis, 82 maximum likelihood (ML) estimation, 71, 76t mean structure estimation, 74–76 mean structure specification, 75 mean structure testing, 75 model specification, 70–76, 84n2 model testing, 70–74 M test, 71 multiple indicators/multiple causes (MIMIC) model, 62, 76–80 no mean structure, 70–74 parameter estimation, 75–76 parameter identification, 75–76 propensity score, 82–83 school climate perceptions, 70, 73–74, 76–80 selection bias, 80–83 strict factorial invariance, 81 strong factorial invariance, 81 unique variables, 70, 72, 80 Multiple indicators/multiple causes (MIMIC) model, 62, 76–80, 139 Multivariate growth curve modeling, 167–168, 169t
Muthén’s maximum likelihood (MUML) estimation, 135–136 Mutually asymptotically independent, 100 National Educational Longitudinal Study (NELS) (1988), 14, 39–41, 59n1, 70, 155 Nonignorable mechanism, 92–93, 96–97 Nonlinear curve fitting, 167, 168–170 Nonnormality estimation categorical variable methodology (CVM), 88–89, 91t continuous nonnormal data, 86–88 normal theory-based estimation, 85, 86 recent developments, 7, 90–92 structural equation modeling, 85–92 Nonnormed fit index (NNFI), 111, 113t, 131n1 Nonrecursive path model identification, 21–23 path analysis, 16–17, 21–23 Nonstationary manifest Markov model, 191, 192t, 205n6 Normalization rule, 19 Normed-fit index (NFI), 110–111, 116 Not identified factor analysis, 43 path analysis, 20, 22, 23 structural equation model, 64 Not missing at random (NMAR), 92–93 Oblique rotation factor analysis, 51–52, 53t structural equation model, 67 Observed at random (OAR), 92–93 Order condition, 22–23 Organization for Economic Cooperation and Development (OECD), 138 Orthogonal rotation, 42–43, 44, 45, 50–51 Overidentified path analysis, 20, 22 structural equation model, 64
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 252
252—STRUCTURAL EQUATION MODELING Pairwise present approach (PPA), 93–94, 97, 98 Parallel growth process, 167–168 Parameters of interest, 102 Parsimony normed-fit index (PNFI), 112 Path analysis constrained parameters, 15 counting rule, 20 covariance structure specification, 17–18, 19–20, 38n4 direct effect, 33–34, 36 econometrics, 13, 16, 18, 20, 21, 24 effect decomposition, 33–34, 35t, 36, 38n7 endogenous variables, 14, 16, 17, 33–34, 37n2 exogenous variables, 14, 16–17, 20, 27, 33–34, 37n2, 38n8 Fisher information matrix, 26 fitted covariance matrix, 23–26, 29 fixed parameters, 15, 30–31 free parameters, 15, 30–31 full-information maximum likelihood (FIML) estimation, 24, 38n5 generalized least squares (GLS) estimation, 23, 27–29, 38n5 historical development, 3–4 indirect effect, 34, 35t, 36 just identified, 20, 22 Lagrange multiplier (LM) test, 31, 37 likelihood ratio (LR) test, 30–31, 36 maximum likelihood (ML) estimation, 23, 24–27, 28t, 29, 38n5 model specification, 14–18 model testing, 29–33 modification index (MI), 31 multilevel modeling, 139, 142–147 nonrecursive path model, 16–17 nonrecursive path model identification, 21–23 normalization rule, 19 not identified, 20, 22, 23 order condition, 22–23 overidentified, 20, 22 parameter estimation, 23–29 parameter identification, 18–23 parameter identification defined, 19 parameter identification rules, 19–21
parameter interpretation, 33–37, 38n6 parameter testing, 29–33 path diagram, 14–15, 37n1 rank condition, 21–22 recursive path model, 16–17 recursive rule, 20–21 reduced form specification, 17–18 scale freeness, 29 scale invariance, 29 score vector, 31 standardized solutions, 35–37 student science achievement, 14–15, 20, 26–27, 28t, 30, 31–33, 34, 35t, 36 total effect, 34, 35t, 36 unweighted least squares (ULS) estimation, 28–29 Wald test, 32 weighted least squares (WLS) estimation, 27–29 See also Structural equation model Path diagram factor analysis, 57–58 growth curve modeling, 162, 163f multilevel modeling, 139, 141f, 144, 146f path analysis, 14–15, 37n1 Pearl, J., 221 Predictive distribution, 117 Principal axis factoring, 44, 49, 51–52 Principal components analysis (PCA), 43, 44–47, 48f Probabilistic reduction approach data-generating process (DGP), 215 diagram, 216f elements of, 215–220 empirical social science model, 219, 231n7 estimable model, 215–217 Fisher’s experimental design, 213, 214, 231n5 historical development, 213–214 identification, 218–219 modeling steps, 219–220 statistical model, 217–219 theoretical model, 215 theory, 215, 231n6 theory of errors, 211, 213–214 weak exogeneity, 218
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 253
Index—253 Promax criterion, 51–52, 53t Propensity score, 82–83 Psychometrics, 1–3 RAND Corporation Indicators Model, 10, 11f Rank condition, 21–22 Recursive path model, 16–17 Recursive rule, 20–21 Reduced form specification, 17–18 Relative noncentrality index (RNI), 111 Root mean square error of approximation (RMSEA), 113t, 115–116 Root mean square percent simulation error (RMSPSE), 175–176 Root mean square simulation error (RMSSE), 175–176 Rotation factor analysis, 49–52 oblique rotation, 51–52, 53t, 67 orthogonal rotation, 42–43, 44, 45, 50–51 Scale freeness, 29 Scale invariance, 29 School climate perceptions factor analysis, 39–41, 46, 47t, 48f, 51–52, 55–56, 57–58, 59n2 multiple group modeling, 70, 73–74, 76–80 Score vector factor analysis, 41 path analysis, 31 SEMNET, 6, 12n5 Separable hypotheses, 100 Simple structure, 2, 3 Simultaneous equation modeling econometrics, 5, 13, 16, 24, 212 historical development, 4, 5 See also Econometrics; Path analysis Specification error basic problem, 99 problem research, 99–100 specification error propagation, 101 structural equation modeling, 98–101, 107n7 Squared multiple correlation (SMC), 49 Stage-sequential dynamic latent variables, 194
Standardized expected parameter change (SEPC), 125–126, 127–129 Standardized solutions path analysis, 35–37 structural equation model, 65–69 Star, S. A., 5–6 Statistical adequacy, 211 Statistical assumptions. See Model assumptions Statistical model, 217–219 Stouffer, S. S., 5–6 Strict factorial invariance, 81 Strong exogeneity, 101 Strong factorial invariance, 81 Structural equation model counting rule, 63–64, 65 covariance structure specification, 63 direct effect, 67–69 effect decomposition, 67–69 endogenous variables, 62–63, 64 exogenous variables, 62–63, 64 exploratory factor analysis, 67 free parameters, 64 indirect effect, 67–69 just identified, 64 likelihood ratio (LR) test, 65 maximum likelihood (ML) estimation, 67, 68t model specification, 62–63 model testing, 65–69 not identified, 64 oblique rotation, 67 overidentified, 64 parameter identification, 63–65 parameter interpretation, 65–69 standardized solutions, 65–69 student science achievement, 64f, 66–69 total effect, 67–69 two-step rule, 64–65 See also Model assumptions; Model evaluation; Model modification; Model specification; Model testing Structural equation modeling biometrics, 3–4 causal inference, 220–229 comprehensive methodology, 61, 84n1
Index-Kaplan-45677:Index-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 254
254—STRUCTURAL EQUATION MODELING conventional approach, 7–10, 209–210 counterfactual theory of causation, 222–225 econometrics, 4, 9–10, 210–212, 224–225 educational system, 10–12 growth curve modeling, 159–166 historical development, 1–10 manipulability theory of causation, 226–229 probabilistic reduction approach, 212–220 psychometrics, 1–3 recent developments, 6–7 See also Factor analysis; Finite mixture modeling; Growth curve modeling; Multilevel modeling; Multiple group modeling; Path analysis Student mathematics achievement, 138–139, 140t, 141f, 143–147, 150, 151t, 153n2 Student reading achievement, 185, 187–189, 191, 192t, 193t, 194–196, 197, 198t, 200–203 Student science achievement growth curve modeling, 156–157, 161–166, 168–170, 169t, 172, 173t, 177–178 model assumptions, 90–92 model evaluation, 112–113, 116 model modification, 121–122, 127–129, 130t path analysis, 14–15, 20, 26–27, 28t, 30, 31–33, 34, 35t, 36 structural equation model, 64f, 66–69 Student science attitudes, 156–157, 161–166, 168–170, 169t, 172, 173t, 177–178 Suchman, E. A., 5–6 Super exogeneity, 101, 226
Theoretical model, 215 Theory, 215, 231n6 Theory of errors, 211, 213–214 Total effect path analysis, 34, 35t, 36 structural equation model, 67–69 Transitive hypotheses, 100 True score theory, 41–42 Tucker-Lewis index (TLI), 111, 113t, 116, 131n1 Two-step rule, 64–65 Unique variables factor analysis, 41–42 multiple group modeling, 70, 72, 80 Univariate growth curve modeling, 161–167 Unweighted least squares (ULS) estimation, 28–29 Validation sample, 116, 119–120 Variance proportion, 176–177 Variation free, 103 Varimax criterion, 50–51 Wald test model assumptions, 99–100, 101 path analysis, 32 Weak exogeneity, 101, 103–106, 218, 226 Weighted least squares mean-adjusted (WLSM) estimation model assumptions, 90–92 multilevel modeling, 136 Weighted least squares mean/varianceadjusted (WLSMV) estimation, 90–92 Weighted least squares (WLS) estimation model assumptions, 86–87, 89, 90–92 multilevel modeling, 136 path analysis, 27–29 Williams, R. M., Jr., 5–6
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 255
About the Author David Kaplan received his PhD in Education from UCLA in 1987, after which he joined the faculty of the University of Delaware where he remained until 2006. He is currently Professor of Quantitative Methods in the Department of Educational Psychology at the University of Wisconsin–Madison. His current research focuses on the problem of causal inference in nonexperimental settings within a “structuralist” perspective. He also maintains a strong and active interest in the development and testing of statistical models for social and behavioral processes that are not necessarily directly observed—including latent variable models, growth curve models, mixture models, and Markov models. His collaborative work involves applications of advanced statistical methods to problems in education and human development. His Web site can be found at http://www.education.wisc.edu/edpsych/facstaff/kaplan/kaplan.htm.
255
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 256
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 257
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 258
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 259
ABA-Kaplan-45677:ABA-Kaplan-45677.qxp
6/24/2008
4:04 PM
Page 260