Causality and Psychopathology
american psychopathological association Volumes in the Series Causality and Psychopatho...
23 downloads
684 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Causality and Psychopathology
american psychopathological association Volumes in the Series Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures (Shrout, Keyes, and Ornstein, eds.) Mental Health in Public Health (Cottler, ed.) Trauma, Psychopathology, and Violence: Causes, Correlates, or Consequences (Widom)
Causality and Psychopathology FINDING THE DETERMINANTS OF DISORDERS AND THEIR CURES
EDITED BY
patrick e. shrout, ph.d. Professor of Psychology Department of Psychology New York University New York, NY
katherine m. keyes, ph.d., mph Columbia University Epidemiology Merit Fellow, Department of Epidemiology Columbia University New York, NY
katherine ornstein, mph Department of Epidemiology Mount Sinai School of Medicine New York, NY
1
2011
1
Oxford University Press Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Copyright 2011 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. ____________________________________________ Library of Congress Cataloging-in-Publication Data American Psychopathological Association. Meeting (98th : 2008 : New York, N.Y.) Causality and psychopathology / edited by Patrick E. Shrout, Katherine M. Keyes, Katherine Ornstein. p. ; cm. Includes bibliographical references and index. ISBN 978-0-19-975464-9 (alk. paper) 1. Psychology, Pathological—Etiology—Congresses. 2. Psychiatry—Research— Methodology—Congresses. I. Shrout, Patrick E. II. Keyes, Katherine M. III. Ornstein, Katherine. IV. Title. [DNLM: 1. Mental Disorders—epidemiology—United States—Congresses. 2. Mental Disorders—etiology—United States—Congresses. 3. Psychopathology— methods—United States—Congresses. WM 140] RC454.A4193 2008 616.89—dc22 2010043586 ISBN-13: 978-0-19-975464-9 ____________________________________________ Printed in USA on acid-free paper
Preface
Research in psychopathology can reveal basic insights into human biology, psychology, and social structure; and it can also lead to important interventions to relieve human suffering. Although there is sometimes tension between basic and applied science, the two are tied together by a fascination with causal explanations. Basic causal stories such as how neurotransmitters change brain function ‘‘downstream’’ are always newsworthy, even if the story is about a mouse or rat brain. However, applications of causal understanding to create efficacious prevention or intervention programs are also exciting. Although good causal stories make the headlines, many psychopathology researchers collect data that are descriptive or correlational in nature. However, for decades epidemiologists have worked with such nonexperimental data to formulate causal explanations about the etiology and course of disorders. Even as these explanations have been reported in textbooks and have been used in courts of law to settle claims of responsibility for adverse conditions, they have also been criticized for going too far. Indeed, many scientists shy away from using explicit causal language when reporting observational data, to avoid being criticized for lack of rigor. Nonetheless, the subtext in the reports of associations and developments always implies causal mechanisms. Because of the widespread interest in causal explanation, along with concerns about what kinds of causal claims can be made from survey data, longitudinal studies, studies of genetic relationships, clinical observations, and imperfect clinical trials, the American Psychopathological Association decided to organize its 2008 scientific meeting around the topic of causality and psychopathology research. Those who were invited to speak at the 2.5-day conference included authors of influential works on causation, statisticians whose new methods are informative about causal processes, as well as experts in psychopathology. This volume contains revised and refined versions of the papers presented by the majority of the invited speakers at that unique meeting. Not all of the authors have done work in psychopathology research, and not all have previously written explicitly about causal inference. Indeed, the goal of the meeting and this volume is to promote new creative thinking v
Preface
about how causal inference can be promoted in psychopathology research in the years to come. Moreover, the collection is likely to be of interest to scientists working in other areas of medicine, psychology, and social science, especially those who combine experimental and nonexperimental data in building their scientific literature. The volume is divided into three sections. The first section, ‘‘Causal Theory and Scientific Inference,’’ contains contributions that address crosscutting issues of causal inference. The first two chapters introduce conceptual and methodological issues that thread through the rest of the volume, while the third chapter provides a formal framework for making and examining causal claims. The fourth chapter introduces genetic analysis as a kind of prototype of causal thinking in psychopathology, in that we know that the variation in the genotype can lead to variation in the phenotype but not vice versa. The author also argues for the practical counterfactual thinking implied by the ‘‘interventionist’’ approach to causal inference developed by J. Woodward and colleagues. The final chapter in this section provides a stimulating illustration of the dramatically different inferences one can reach from observational studies and clinical trials from the Women’s Health Initiative. The focus of this chapter is the effect of hormonereplacement therapy on coronary heart disease, as well on the risk of several forms of cancer. Because this example did not have to face the difficulties of diagnosis and nosology that confront so much psychopathology research, the authors were able to focus on important issues such as selection bias and heterogeneity of effects in reconciling data from trials and observational studies. The second section, ‘‘Innovations in Methods,’’ presents new tools and perspectives for exploring and supporting causal theories in epidemiology. Although the substantive focus is psychopathology research, the methods are generally applicable for other areas of medicine. The first chapter in this section (Chapter 6) proposes a novel but formal analysis of causal claims that can be made about direct and indirect causal paths using graphical methods. Chapter 7 describes a statistical method called ‘‘growth mixture modeling,’’ which can examine a variety of hypotheses about longitudinal data that arise out of causal theories. Chapter 8 describes new ways to improve the efficiency of clinical trials by providing causally relevant information. The last two chapters (Chapters 9 and 10) provide insights into how naturally occurring genetic variation can be leveraged to strengthen inferences made about both genetic and environmental causal paths in psychopathology. The final section, ‘‘Causal Thinking in Psychiatry,’’ features critical analyses of causal claims within psychiatry by some of the best known psychopathology researchers. These chapters examine claims in developmental vi
Preface
psychopathology (Chapter 11), posttraumatic stress disorder (Chapter 12), research on therapeutics (Chapter 13), and nosology (Chapter 14). The convergence of this diverse and talented group to one meeting and one volume was facilitated by active involvement of the officers and Council of the American Psychopathological Association (APPA) during 2008. We particularly thank Ezra Susser, the secretary of APPA, who was especially generative in planning the meeting and, therefore, the volume. We also acknowledge the valuable suggestions made by the other officers and councilors of APPA: James J. Hudziak, Darrel A. Regier, Linda B. Cottler, Michael Lyons, Gary Heiman, John E. Helzer, Catina O’Leary, Lauren B. Alloy, John N. Constantino, and Charles F. Zorumski. The meeting itself benefited enormously from the special efforts of Gary Heiman and Catina O’Leary, and it was supported by the National Institute of Mental Health through grant R13 MH082613. This volume is dedicated to Lee Nelkin Robins, a former president of APPA who attended her last meeting in 2008. She died on September 25, 2009. Trained in sociology, Lee Robins made essential contributions to the understanding of the development and distribution of mental disorders, particularly antisocial and deviant behavior as a precursor of later problems. Her rigorous causal thinking was informed by epidemiological data, and she was instrumental in improving the quality and quantity of such data over the course of her productive life.
vii
This page intentionally left blank
Contents
Contributors xi
part i causal theory and scientific inference 1 Integrating Causal Analysis into Psychopathology Research 3 patrick e. shrout, phd 2 What Would Have Been Is Not What Would Be 25 Counterfactuals of the Past and Potential Outcomes of the Future sharon schwartz, phd, nicolle m. gatto, phd, and ulka b. campbell, phd 3 The Mathematics of Causal Relations 47 judea pearl, phd 4 Causal Thinking in Psychiatry 66 A Genetic and Manipulationist Perspective kenneth s. kendler, md 5 Understanding the Effects of Menopausal Hormone Therapy 79 Using the Women’s Health Initiative Randomized Trials and Observational Study to Improve Inference garnet l. anderson, phd, and ross l. prentice, phd
part ii innovations in methods 6 Alternative Graphical Causal Models and the Identification of Direct Effects 103 james m. robins, md, and thomas s. richardson, phd
ix
Contents
7 General Approaches to Analysis of Course 159 Applying Growth Mixture Modeling to Randomized Trials of Depression Medication bengt muthe´n, phd, hendricks c. brown, phd, aimee m. hunter, phd, ian a. cook, md, and andrew f. leuchter, md 8 Statistical Methodology for a SMART Design in the Development of Adaptive Treatment Strategies 179 alena i. oetting, ms, janet a. levy, phd, roger d. weiss, md, and susan a. murphy, phd 9 Obtaining Robust Causal Evidence From Observational Studies 206 Can Genetic Epidemiology Help? george davey smith, md, dsc 10 Rare Variant Approaches to Understanding the Causes of Complex Neuropsychiatric Disorders 252 matthew w. state, md, phd
part iii causal thinking in psychiatry 11 Causal Thinking in Developmental Disorders 279 e. jane costello, phd, and adrian angold, mrcpsych 12 Causes of Posttraumatic Stress Disorder 297 naomi breslau, phd 13 Causal Thinking for Objective Psychiatric Diagnostic Criteria 321 A Programmatic Approach in Therapeutic Context donald f. klein, md, dsc 14 The Need for Dimensional Approaches in Discerning the Origins of Psychopathology 338 robert f. krueger, phd, and daniel goldman Index 353
Contributors
e. jane costello, ph.d. Center for Developmental Epidemiology Department of Psychiatry and Behavioral Sciences Duke University Medical Center Durham, NC
garnet l. anderson, ph.d WHI Clinical Coordinating Center Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, WA adrian angold, m.d. Center for Developmental Epidemiology Department of Psychiatry and Behavioral Sciences Duke University Medical Center Durham, NC
nicolle m. gatto, ph.d. Director, TA Group Head, Epidemiology Safety and Risk Management Medical Division, Pfizer, Inc. New York, NY
naomi breslau, ph.d. Department of Epidemiology Michigan State University College of Human Medicine, East Lansing, MI
daniel goldman University of Minnesota Minneapolis, MN aimee m. hunter, ph.d. Research Psychologist Laboratory of Brain, Behavior, and Pharmacology University of California, Los Angeles Los Angeles, CA
hendricks c. brown, ph.d. University of Miami Miami, FL ulka b. campbell, ph.d. Associate Director, Epidemiology Safety and Risk Management Medical Division, Pfizer, Inc. New York, NY
kenneth s. kendler, m.d. Virginia Institute for Psychiatric and Behavioral Genetics Departments of Psychiatry and Human and Molecular Genetics Medical College of Virginia Virginia Commonwealth University Richmond, VA
ian a. cook, m.d. Associate Professor Semel Institute for Neuroscience and Human Behavior University of California, Los Angeles, Los Angeles, CA xi
Contributors
donald f. klein, m.d., d.sc. Research Professor Phyllis Green and Randolph Cowen Institute for Pediatric Neuroscience NYU Child Study Center, NYU Medical Center Professor Emeritus, Department of Psychiatry College of Physicians & Surgeons Columbia University New York, NY
judea pearl, ph.d. Cognitive Systems Laboratory Computer Science Department University of California Los Angeles, CA ross l. prentice WHI Clinical Coordinating Center Division of Public Health Sciences Fred Hutchinson Cancer Research Center Seattle, WA
robert f. krueger Washington University in St. Louis St. Louis, MO
sharon schwartz, ph.d. Professor of Clinical Epidemiology Columbia University New York, NY
andrew f. leuchter, m.d. Professor Department of Psychiatry and Biobehavioral Science University of California, Los Angeles Los Angeles, CA
george davey smith MRC Centre for Causal Analyses in Translational Epidemiology Department of Social Medicine University of Bristol Bristol, UK
janet a. levy Center for Clinical Trials Network National Institute on Drug Abuse Bethesda, MD
matthew w. state, m.d., ph.d. Donald J. Cohen Associate Professor Co-Director Program on Neurogenetics Yale Child Study Center Department of Genetics and Psychiatry Yale University School of Medicine New Haven, CT
susan a. murphy, ph.d. Professor, Psychiatry University of Michigan Institute for Social Research Ann Arbor, MI bengt muthe´n, ph.d. Professor Emeritus Graduate School of Education & Information Studies University of California, Los Angeles Los Angeles, CA
roger d. weiss, m.d. Harvard Medical School McLean Hospital Belmont, MA
alena i. oetting University of Michigan Institute for Social Research Ann Arbor, MI
xii
part i Causal Theory and Scientific Inference
This page intentionally left blank
1 Integrating Causal Analysis into Psychopathology Research patrick e. shrout
Both in psychopathology research and in clinical practice, causal thinking is natural and productive. In the past decades, important progress has been made in the treatment of disorders ranging from attention-deficit/hyperactivity disorder (e.g., Connor, Glatt, Lopez, Jackson, & Melloni, 2002) to depression (e.g., Dobson, 1989; Hansen, Gartlehner, Lohr, Gaynes, & Carey, 2005) to schizophrenia (Hegarty, Baldessarini, Tohen, & Waternaux, 1994). The treatments for these disorders include pharmacological agents as well as behavioral interventions, which have been subjected to clinical trials and other empirical evaluations. Often, the treatments focus on the reduction or elimination of symptoms, but in other cases the interventions are designed to prevent the disorder itself (Brotman et al., 2008). In both instances, the interventions illustrate the best use of causal thinking to advance both scientific theory and clinical practice. When clinicians understand the causal nature of treatments, they can have confidence that their actions will lead to positive outcomes. Moreover, being able to communicate this confidence tends to increase a patient’s comfort and compliance (Becker & Maiman, 1975). Indeed, there seems to be a basic inclination for humans to engage in causal explanation, and such explanations affect both basic thinking, such as identification of categories (Rehder & Kim, 2006), and emotional functioning (Hareli & Hess, 2008). This inclination may lead some to ascribe causal explanations to mere correlations or coincidences, and many scientific texts warn researchers to be cautious about making causal claims (e.g., Maxwell & Delaney, 2004). These warnings have been taken to heart by editors, reviewers, and scientists themselves; and there is often reluctance regarding the use of causal language in the psychopathology literature. As a result, many articles simply report patterns of association and refer to mechanisms with euphemisms that imply causal thinking without addressing causal issues head-on. 3
4
Causality and Psychopathology
Over 35 years ago Rubin (1974) began to talk about strong causal inferences that could be made from experimental and nonexperimental studies using the so-called potential outcomes approach. This approach clarified the nature of the effects of causes A vs. B by asking us to consider what would happen to a given subject under these two conditions. Forget for a moment that at a single instant a subject cannot experience both conditions—Rubin provided a formal way to think about how we could compare potential rather than actual outcomes. The contrast of the potential outcomes was argued to provide a measure of an individual causal effect, and Rubin and his colleagues showed that the average of these causal effects across many individuals could be estimated under certain conditions. Although approaches to causal analysis have also been developed by philosophers and engineers (see Pearl, 2009), the formal approaches of Rubin and his colleagues (e.g., Holland, 1986; Frangakis & Rubin, 2002) and statistical epidemiologists (Greenland, Pearl, & Robins, 1999a, 1999b; Robins, 1986; Robins, Hernan, & Brumback, 2000) have prompted researchers to have new appreciation for the strengths and limitations of both experimental and nonexperimental designs. This volume is designed to promote conversations among those concerned with causal inference in the abstract and those interested in causal explanation of psychopathology more specifically. Authors include prominent contributors from both types of literature. Some of the chapters from experts in causal analysis are rather technical, but all engage important and cuttingedge issues in the field. The psychopathology experts raise challenging issues that are likely to be the subject of discussion for years to come. In this introductory chapter, I give an overview of some of the themes that will be discussed in the subsequent chapters. These themes have to do with the assessment of causal effects, the sources of bias in clinical trials and nonexperimental designs, and the potential of innovative designs and perspectives. In addition to the themes that are developed later in the volume, I discuss two topics that are not fully discussed elsewhere in the volume. One topic focuses on the role of time when considering the effects of causes in psychopathology research. The other topic is mediation analysis, which is a statistical method that was developed in psychology to describe the intervening processes between an intervention and an outcome of that intervention.
Themes in Causal Analysis In the pioneering work of Rubin (1978), randomized experiments had a special status in making causal claims based on the causal effect. As mentioned, the causal effect is defined as the difference between the outcome (say, Y) that would be observed if person U were administered treatment
1 Integrating Causal Analysis into Psychopathology Research
5
A vs. what would have been observed if that person received treatment B. The outcome under treatment A is called YA(U) and the outcome under treatment B is called YB(U). Because only one treatment can be administered for a given measurement of Y(U), the definition of the causal effect depends on a counterfactual consideration, namely, what the outcome of someone with treatment A would have been had he or she received treatment B or what the outcome of someone assigned to treatment B would have been had he or she received treatment A. Our inability to observe both outcomes is what Holland (1986) called ‘‘the fundamental problem of causal inference.’’
Average Causal Effects from Experiments Although the individual causal effect cannot be known, the average causal effect can be estimated if subjects are randomly assigned to treatment A or B and several other conditions are met. In this case, between-person information can be used to estimate the average of within-person counterfactual causal effects. The magnitude of the causal effect is taken seriously, and efforts are made to estimate this quantity without statistical bias. This can be done in a relatively straightforward way in engineering experiments and in randomized studies with animal models since the two groups being compared are known to be probabilistically equivalent. Moreover, in basic science applications the assumption of a stable unit treatment value (SUTVA) (Rubin, 1980, 1990) is plausible. As Schwartz, Gatto, and Campbell discuss (Chapter 2), the SUTVA assumption essentially means that subjects are exchangeable and one subject’s outcome is unaffected by another subject’s treatment assignment. This assumption will typically hold in carefully executed randomized experiments using genetically pure laboratory animals. For example, O’Mahony and colleagues (2009) randomly selected male Sprague-Dawley rat pups for a postnatal stress experience, which involved removing them daily from their mother for 3 hours on days 2–12. The randomly equivalent control pups were left with their mothers. At the end of the study, the investigators found differences in the two groups with respect to biological indicators of stress and immune function. The biologically equivalent subjects in this example are plausibly exchangeable, consistent with SUTVA; but we must also assume that the subjects did not affect each other’s responses. In the causal literature, it is common to represent the causal effects as shown in Figure 1.1A. In this figure, the treatment is represented as variable T, and it would take two values, one for treatment A and one for treatment B. The outcome is represented by Y, which would have been the value of one of the biological measurements in the O’Mahony et al. (2009) example. In addition, there is variable E, which represents the other factors that can influence the value of Y, such as genetic mutations or measurement error
Causality and Psychopathology
6 Panel A
Panel B
E
T
Y
C
X
E
Y
Figure 1.1 Schematic representation of Treatment condition (T) on outcome (Y). Boxes represent observed values and circles represent latent variables. In the panel on the left (Panel A) the treatment is the only systematic influence on Y, but in the panel on the right (Panel B) there is a confounding variable (C) that influences both the treatment and the outcome.
in the biological assays. When random assignment is used to define T, then E and T are unrelated; and this is represented in Figure 1.1 by a lack of any explicit link between these two variables.
Clinical Trials One might suppose that this formal representation works for experiments involving humans as subjects. However, things get complicated quickly in this situation, as is well documented in the literature on clinical trials (e.g., Fleiss, 1986; Everitt & Pickles, 2004; Piantadosi, 2005). It is easy enough to assign people randomly to either group A or group B and to verify that the two groups are statistically equivalent in various characteristics, but human subjects are agents who can undo the careful experimental design. Individuals in one group might not like the treatment to which they are assigned and may take various actions, such as failing to adhere with the treatment, switching treatments, selectively adding additional treatments, or withdrawing from the study entirely. This issue of nonadherence introduces bias into the estimate of the average causal effect (see Efron, 1998; Peduzzi, Wittes, Detre, & Holford, 1993, for detailed discussion). For example, if a drug assigned to group A has good long-term efficacy but temporary negative side effects, such as dry mouth or drowsiness, then persons who are most distressed by the side effects might refuse to take the medication or cut back on the dose. Persons in group B may not feel the need to change their assigned treatment, and thus, the two groups become nonequivalent in adherence. One would expect that the comparison of the outcomes in the two groups would underestimate the efficacy of the treatment. A different source of bias will be introduced if persons in one group are more likely to withhold data or to be lost to follow-up compared to the
1 Integrating Causal Analysis into Psychopathology Research
7
other group. This issue of missing data is another threat to clear causal inference in clinical trials. Mortality and morbidity are common reasons for follow-up data being missing, but sometimes data are missing because subjects has become so high-functioning that they do not have time to give to follow-up measurement. If observations that are missing come from a distribution other than the observations that were completed and if this discrepancy is different in groups A and B, then there is potential for the estimate of the causal effect to become biased (see Little & Rubin, 2002) For many clinical studies, the bias in the causal effect created by differential nonadherence and missing data patterns is set aside rather than confronted directly. Instead, the analysis of the trials typically emphasizes intent to treat (ITT). This requires that subjects be analyzed within the groups originally randomized, regardless of whether they were known to have switched treatment or failed to provide follow-up data. Missing data in this case must be imputed, using either formal imputation methods (Little & Rubin, 2002) or informal methods such as carrying the last observed measurement forward. ITT shifts the emphasis of the trial toward effectiveness of the treatment protocol, rather than efficacy of the treatment itself (see Piantadosi, 2005, p. 324). For example, if treatment A is a new pharmacologic agent, then the effectiveness question is how a prescription of this drug is likely to change outcome compared to no prescription. The answer to this question is often quite different from whether the new agent is efficacious when administered in tightly controlled settings since effectiveness is affected by side effects, cost of treatment, and social factors such as stigma associated with taking the treatment. Indeed, as clinical researchers reach out to afflicted persons who are not selected on the basis of treatment-seeking or volunteer motives, nonadherence and incomplete data are likely to be increasingly more common and challenging in effectiveness evaluation. Although these challenges are real, there are important reasons to examine the effectiveness of treatments in representative samples of persons outside of academic medical centers. Whereas ITT and ad hoc methods of filling in missing data can provide rigorous answers to effectiveness questions, causal theorists are drawn to questions of efficacy. Given that we find that a treatment plan has no clear effectiveness, do we then conclude that the treatment would never be efficacious? Or suppose that overall effectiveness is demonstrated: Can we look more carefully at the data to determine if the treatment caused preventable side effects? Learning more about the specific causal paths in the development and/or treatment of psychopathology is what stimulates new ideas about future interventions. It also helps to clarify how definitive results are from clinical trials or social experiments (e.g., Barnard, Frangakis, Hill, & Rubin, 2003). Toh and Herna´n (2008) contrast findings based on an ITT approach to findings based on causally informative analyses.
8
Causality and Psychopathology
Nonexperimental Observational Studies Just as nonadherence and selective missing data can undermine the randomized equivalence of treatment groups A and B, selection effects make it especially difficult to compare groups whose exposure to different agents is simply observed in nature. Epidemiologists, economists, and other social scientists have invested considerable effort into the development of methods that allow for adjustment of confounding due to selection biases. Many of these methods are reviewed or further developed in this volume (see Chapters 6 and 11). In fact, the problems that ‘‘break’’ randomized experiments with humans (Barnard et al., 2003) have formal similarity to selection, measurement, and attrition effects in nonexperimental studies. A simple version of this formal representation is shown in Figure 1.1B. In this version, some confounding variable, C, is shown to be related to the treatment, T, and the outcome, Y. If variable C is ignored (either because it is not measured or because it is left out of the analysis for other reasons), then the estimated causal effect of T on Y will be biased. There can be multiple types of confounding effects, and missing data processes may be construed to be examples of these. Often, the confounding is more complex than illustrated in Figure 1.1. For example, Breslau in this volume (Chapter 12) considers examples where T is experience of trauma (vs. no trauma) and Y are symptoms of avoidance/numbing that are consistent with posttraumatic stress syndrome. Although the causal association between T and Y is often assumed, Breslau considers the existence of other variables, such as personality factors, that might be related to either exposure to T or appraisal of the trauma and the likelihood of experiencing avoidance or numbing. If the confounding variables are not identified as causal alternatives and if data that are informative of the alternate causal paths are not obtained, then the alleged causal effect of the trauma will be overstated.
Innovative Designs and Analyses for Improving Causal Inferences When studying the effects of purported causes such as environmental disasters, acts of war or terror, bereavement, or illness/injury, psychopathology researchers often note that random assignment is not possible but that a hypothetical random experiment would provide the gold standard for clear causal inference. This hypothetical ideal can be useful in choosing quasiexperimental designs that find situations in nature that seem to mimic random assignment. There are several classes of quasi-experimental design that create informative contrasts in the data by clever definition of treatment groups rather than random assignment (Shadish, Cook, & Campbell, 2002). For example, Costello and colleagues describe a situation in which a subset of
1 Integrating Causal Analysis into Psychopathology Research
9
rural families in the Great Smoky Mountain developmental study (see Chapter 11; Costello, Compton, Keeler, & Angold, 2003) were provided with new financial resources by virtue of being members of the Cherokee Indian tribe at the time the tribe opened a new casino. Tribe members were provided payments from profit from the casino, while their nontribe neighbors were not. Costello and colleagues describe how this event, along with a developmental model, allowed strong claims to be made about the protective impact of family income on drug use and abuse. Modern genetics provides new and exciting tools for creating groups that appear to be equivalent in every respect except exposure. Kendler in this volume (Chapter 4) describes how twin studies can create informative quasiexperimental designs. Suppose that we are interested in an environmental exposure that is observed in nature and that the probability of exposure is known to be related to psychological characteristics such as cognitive ability, risk-taking, and neuroticism, which are known to have genetic loadings. If we can find monozygotic and dizygotic twin pairs with individual twins who differ in exposure, then we have a strong match for selection. Modern genetic analyses are useful in isolating the risk of exposure (a selection factor) from the causal effect of the exposure on subsequent psychological functioning. Twin studies are not necessary to take advantage of genetic thinking to overcome selection effects. Davey Smith (Chapter 9) writes that researchers are learning about genetic variants that determine how various environmental agents (e.g., alcohol, cannabis) are metabolized and that these variants are nearly randomly distributed in certain populations. Under the so-called Mendelian randomization (Davey Smith & Ebrahim, 2003) approach, a causal theory that involves known biochemical pathways can be tested by comparing outcomes following exposure in persons who differ in the targeted genetic location. Mendelian randomization strategies and co-twin designs make use of genetics to provide insightful causal analyses of environmental exposures. Random genetic variation can also be tapped to examine the nature of genetic associations themselves. State (Chapter 10) describes how rare genetic variants, such as nucleotide substitutions or repeats or copy number variations, can be informative about the genetic mechanisms in complex diseases. He illustrates this causal approach with findings on autism. Because the rare variants seem to be random, the selection issues that concern most observational studies are less threatening.
Analytic Approaches to Confounding Understanding the nature of the effects of confounding by nonadherence and missing values in clinical trials and by selection effects in nonexperimental
10
Causality and Psychopathology
comparative studies has been aided by formal representations of the nature of the causal effects in the work of Pearl (2000, see Chapter 3). Pearl has promoted the use of directed acyclic graphs (DAGs), which are explicit statements of assumed causal paths. These graphical representations can be used to recognize sources of confounding as well as to identify sufficient sets of variables to adjust for confounding. The graphs can also be used to gain new insights into the meaning of direct and indirect effects. The interpretation of these graphs is enhanced by the use of the "do" operator of Pearl, which states explicitly that a variable, Ti, can be forced to take one fixed value, do(Ti = ti), or an alternate. For example, if T is an indicator of treatment A or B, then this operator explicitly asks what would happen if individual i were forced to have one treatment or the other. The formal analysis of causation requires consideration, whether empirical or hypothetical, of what would happen to the causal "descendents" if a variable is changed from one fixed value to a different fixed value. A particularly useful feature of the formal exercise is the consideration of competing causal paths that can bias the interpretation of a given set of data, as well as the consideration that the causal model might differ across individuals. Once articulated, investigators often are able to obtain measures of possible confounding variables. A question of great interest among modern causal analysts is how to use these measures to eliminate bias. Traditionally, the confounders are simply added as ‘‘control’’ variables in linear regression or structural equation models (Morgan & Winship, 2007; Bollen, 1989). If (1) C is known to be linearly related to T, (2) C is known to be linearly related to Y, (3) the relation of T to Y is known to be the same for all levels of C, (4) C is measured without error, and (5) the set of variables in C is known to represent all aspects of selection bias, then the regression model approach will yield an unbiased estimate of the causal effect of T on Y. The adjusted effect is often interpreted with phrases such as ‘‘holding constant C’’ and ‘‘the effect of T on Y is X,’’ which can be interpreted as an average causal effect. Causal analysts often talk about the fact that assumptions that are needed to make an adjustment valid are untestable. An investigator might argue for the plausibility of the linear model assumptions by testing whether nonlinear terms improve the fit of the linear models and testing for interactions between C and T in the prediction of Y. However, these empirical tests will leave the skeptic unconvinced if the study sample is small and the statistical power of the assumption tests is limited. Another approach to adjustment relies on the computation of propensity scores (e.g., Rosenbaum & Rubin, 1983), which are numerical indicators of how similar individuals in condition A are to individuals in condition B. These scores are computed as summaries of multivariate representations of the similarity of the individuals in the two groups. The propensity scores
1 Integrating Causal Analysis into Psychopathology Research
11
themselves are created using methods such as logistic regression and nonlinear classification algorithms with predictor variables that are conceptually prior to the causal action of T on Y. One important advantage of this approach is that the analyst is forced to study the distributions of the propensity scores in the two groups to be compared. Often, one discovers that there are some persons in one group who have no match in the other group and vice versa. These unique individuals are not simply included as extrapolations, as they are in traditional linear model adjustments, but are instead set aside for the estimation of the causal effect. The computation of the adjusted group difference is based on either matching of propensity scores or forming propensity score strata. This approach is used to make the groups comparable in a way that is an approximation to random assignment given the correct estimation of the propensity score (see Gelman & Hill, 2007). Propensity score adjustment neither assumes a simple linear relation between the confounder variables and the treatment nor leads to a unique result. Different methods for computing the propensity score can yield different estimates of the average causal effect. The ways that propensity scores might be used to improve causal inference continue to be developed. For example, based on work by Robins (1993), Toh and Herna´n (2008) describe a method called inverse probability weighting for adjustment of adherence and retention in clinical trials. This method uses propensity score information to give high weight to individuals who are comparable across groups and low weight to individuals who are unique to one group. Whereas direct adjustment and calculation of propensity scores make use of measured variables that describe the possible imbalance of the groups indexed by T, the method of instrumental variables attempts to adjust for confounding by using knowledge about the relation of a set of variables I to the outcome Y. If I can affect Y only through the variable T, then it is possible to isolate spurious correlation between the treatment (T) and the outcome (Y). Figure 1.2 shows a representation of this statement. The instrumental variable I is said to cause a change in T and, through this variable, to affect Y.
E
I
T
Y
Figure 1.2 Schematic representation of how an instrumental variable (I) can isolate the causal effect from the correlation between the treatment variable (T) and the error term (E).
12
Causality and Psychopathology
There may be other reasons why T is related to Y (as indicated by correlation between T and E), but if the instrumental variable model is correct, the causal effect can be isolated. The best example of this is when I is an indicator of random assignment, T is a treatment condition, and Y is the outcome. On the average, random assignment is related to Y only through the treatment regime T. Economists and others have shown that instrumental variables allow for confounding to be eliminated even if the nature of the confounding process is not measured. In nonexperimental studies, the challenge is to find valid instrumental variables. The arguments are often made on the basis of scientific theories of the causal process. For example, in the Costello et al. (2003) Great Smoky Mountain Study, if tribal membership has never been related to substance use by adolescents in a rural community but it becomes related after it is associated with casino profit payments, then a plausible case can be made for tribal membership being an instrumental variable. However, as Herna´n and Robins (2006) discuss, careful reexamination of instrumental variable assumptions can raise questions about essentially untestable assumptions about the causal process. The analytic approaches to confounding can provide important insights into the effects of adherence and retention in clinical trials and the impact of alternate explanations of causal effects by selection processes in nonexperimental studies. As briefly indicated, the different approaches make different assumptions and these assumptions can lead to different estimates of causal effects. Researchers who strive for closure from a specific study find such a lack of clarity to be unsatisfying. Indeed, one of the advantages of the ITT analysis of randomized clinical trials is that it can give a single clear answer to the question of treatment effectiveness, especially when the analyses follow rigorous guidelines for a priori specification of primary outcomes and are based on data with adequate statistical power.
Temporal Patterns of Causal Processes As helpful as the DAG representations of cause can be, they tend to emphasize causal relations as if they occur all at once. These models may be perfectly appropriate in engineering applications where a state change in T is quickly followed by a response change in Y. In psychopathology research, on the other hand, both processes that are hypothesized to be causes and processes that are hypothesized to be effects tend to unfold over time. For example, in clinical trials of fluoxetine, the treatment is administered for 4–6 weeks before it is expected to show effectiveness (Quitkin et al., 2003). When the treatment is ended, the risk of relapse is typically expected to increase
1 Integrating Causal Analysis into Psychopathology Research
13
with time off the medication. There are lags to both the initial effect of the treatment and the risk of relapse. Figure 1.3A shows one representation of this effect over time, where the vertical arrows represent a pattern of treatments. Another pattern is expected in preventive programs aimed at reducing externalizing problems in high-risk children through the improvement of parenting skills of single mothers. The Incredible Years intervention of Webster-Stratton and her colleagues (e.g., Gross et al., 2003) takes 12 weeks to unfold and involves both parent and teacher sessions, but the impact of the program is expected to continue well beyond the treatment period. The emphasis on positive parenting, warm but structured interactions, and reduction of harsh interactions is expected to affect the mother– child relationships in ways that promote health, growth, and reduction of conduct problems. Figure 1.3B shows how this pattern might look over time, with an initial lag of treatment and a subsequent shift. For some environmental shocks or chemical agents with pharmacokinetics of rapid absorption, metabolism, and excretion, the temporal patterns might be similar to those found in engineering applications. These are characterized by rapid change following the treatment and fairly rapid return to baseline after the treatment is ended. Figure 1.3C illustrates this pattern, which might be typical for heart rate change following a mild threat such as a fall or for headache relief following the ingestion of a dose of analgesic. As Costello and Angold discuss (Chapter 11), the consideration of these patterns of change is complicated by the fact that the outcome being studied might not be stable. Psychological/biological processes related to symptoms might be developing due to maturation or oscillating due to circadian rhythms, or they might be affected by other processes related to the treatment itself. In randomized studies, the control group can give a picture of the trajectory of the naturally occurring process, so long as adequate numbers of assessments are taken over time. However, the comparison of the treatment and control group may no longer give a single outcome but, rather, a series of estimated causal effects at different end points, both because of the hypothesized time patterns illustrated in Figure 1.3 and because of the natural course of the processes under study. Although one might expect that effects that are observed at adjacent times are due to the same causal mechanism, there is no guarantee that the responses are from the same people. One group of persons might have a short-lived response at one time and another group might have a response at the next measured time point. Muthe´n and colleagues’ parametric growth mixture models (Chapter 7) shift the attention to the individual over time, rather than specific (and perhaps arbitrarily chosen) end points. These models allow the expected
Causality and Psychopathology
14 Panel A 10.0
5.0
0.0 0
5
10
15
10
15
Time
Panel B show 10.0
5.0
0.0
0
5 Time
twelve weeks and shift around 10 weeks. Panel C 10.0
5.0
0.0
0
5
10
15
Figure 1.3 Examples of time trends relating treatments (indicated by vertical arrow) and response on Y. Panel A shows an effect that takes time to be seen and additional time to diminish when the treatment is removed. Panel B shows an effect that takes time to be seen, but then is lasting.
1 Integrating Causal Analysis into Psychopathology Research
15
trajectory in group A to be compared with that in group B. This class of models also considers various patterns of individual differences in the trajectories, with an assumption that some persons in treatment group A might have trajectories just like those in placebo group B. Although the parametric assumptions about the nature of the trajectories can provide interesting insights and possibly increased statistical power, causal analysts can have strikingly different opinions about the wisdom of making strong untestable assumptions. Scientists working on problems in psychopathology often have a general idea of the nature of the trajectory, and this is reflected in the timing of the measurements. However, unless repeated measurements are taken at fairly short intervals, it is impossible to document the nature of the alternative patterns as shown in Figure 1.3. Such basic data are needed to choose among the possible parametric models that can be fit in growth mixture models, and they are also necessary to implement the ideas of Klein (Chapter 13), which involve starting and stopping treatment repeatedly to determine who is a true responder and who is not. Note that Klein’s proposal is related to classic crossover designs in which one group is assigned treatment sequence (A, B) and another group is assigned (B, A). This design assumes a temporal pattern like in Figure 1.3C, and it requires a ‘‘washout’’ period during which the effect of the first treatment is assumed to have dissipated. The literature on these designs is extensive (e.g., Fleiss, 1986; Piantadosi, 2005), and it seems to provide an intuitive solution to Holland’s (1986) fundamental problem of causal inference. If one cannot observe both potential outcomes, YA(U) and YB(U), at the same instant, then why not fix the person U (say U = u) and estimate YA(U) and YB(U) at different times? Holland called this a scientific approach to the fundamental problem, but he asserted that the causal estimate based on this design depends on an untestable homogeneity assumption, namely, that person u at time 1 is exactly the same as person u at time 2, except for the treatment. Although the test of that assumption cannot be definitive, an accumulated understanding of temporal patterns of effects will make the assumption more or less plausible.
Mediation and Moderation of Causal Effects Just as psychopathology researchers are willing to consider scientific approaches to the fundamental problem of causal inference using crossover designs, they may also be inclined to develop intuitive statistical models of causal process. For example, Freedland et al. (2009) found that assignment to a cognitive behavior therapy (CBT) condition was related to reduced
Causality and Psychopathology
16
depression 3 months after treatment completion among depressed patients who had experienced coronary artery bypass surgery. A researcher might ask if the improvement was due to mastery of one or another component of CBT, namely, (1) control of challenging distressing automatic thoughts or (2) changing and controlling dysfunctional attitudes. Suppose the researcher had included assessments of these two cognitive skills at the 2-month assessment (1 month before the end point assessment). The question could be asked, Can the effect of treatment (T = CBT) on depression (Y) be explained by an intervening cognitive skill (M)? Kenny and colleagues (Judd & Kenny, 1981; Baron & Kenny, 1986) formalized the mediation analysis approach to this question using a set of linear models that are represented in Figure 1.4. Panel A shows a causal relation between T and Y (represented as c) and panel B shows how that effect might be explained by mediator variable M. There are four steps in the Baron and Kenny tradition: (1) show that T is related to Y (effect c in panel A); (2) show that T is related to M (effect a in panel B); (3) show that, after adjusting for T, M is related to Y (effect b in panel B) and then determine if the direct effect of T on Y, after adjusting for M, remains non-zero (effect c0 in panel B). If the direct effect can be considered to be zero, then Baron and Kenny described the result as complete mediation—otherwise, it was partial mediation. In addition to these steps, the mediation tradition suggests estimating the indirect effect of T on Y through M as the product of estimates of a and b in Figure 1.4B and testing the null hypothesis that the product is equal to zero (see MacKinnon, 2008). It is difficult to overestimate the impact of this approach on informal causal analysis in psychopathology research. The Baron and Kenny report alone has been cited more than 17,000 times, and thousands of these citations are by psychopathology researchers. Often, the mediation approach is used in the context of experiments such as those already described, but other times it is used to explain associations observed in cross-sectional surveys. These have special problems.
Panel A
eM
Panel B
M
eY
T
c
Y
T
eY b
a c´
Y
Figure 1.4 Traditional formulation of Baron and Kenny (1986) mediation model, with Panel A showing total effect (c) and Panel B showing indirect (a*b) and direct (c0 ) effect decomposition.
1 Integrating Causal Analysis into Psychopathology Research
17
Although Kenny and his colleagues have explicitly warned that the analysis is appropriate only when the ordering of variables is unambiguous, many published studies have not established this order rigorously. Even if an experimental design guarantees that the mediating and outcome processes (M, Y) follow the intervention (T), M and Y themselves are often measured at the same point in time and the association between M and Y is estimated as a correlation rather than a manipulated causal relation. This leaves open the possibility of important bias in the estimated indirect effect of T on Y through M. Figure 1.5A is an elaboration of Figure 1.4B that represents the possibility of other influences besides T on the association between M and Y. This is shown as correlated residual terms, eM and eY. For example, if we were trying to explain the effect of CBT (T) on depression (Y) through changes in control of dysfunctional attitudes (M), we could surmise that there is a correlation of degree of dysfunctional attitudes and depression symptoms that would be observed even in the control group. Baseline intuitions, insight, or self-help guides in the lay media might have led to covariation in the degree of dysfunctional attitudes and depression. In fact, part of this covariation could be reverse pathways such that less depressed persons more actively read self-help strategies and then change their attitudes as a function of Panel A
T
eM
M
eY
Y Panel B
T c´
a
eM
M0
M g1
rMY
b
Y0
g2
Y
eY
Figure 1.5 Formulation of mediation model to show correlated errors (Panel A) and an extended model that includes baseline measures of the mediating variable (M0) and the outcome measure () and the outcome measure (Y0).
18
Causality and Psychopathology
the reading. If these sources of covariation are ignored, then the estimate of the b effect will be biased, as will be the product, a * b. In most cases, the bias will overestimate the amount of effect of T that goes through M. Hafeman (2008) has provided an analysis of this source of bias from an epidemiologic and causal analysis framework. Although Figure 1.5A represents a situation where b will be biased when the usual Baron and Kenny (1986) analysis is carried out, the model shown in Figure 1.5A cannot itself be used to respecify the analysis to eliminate the bias. This is because the model is not empirically identified. This means that we cannot estimate the size of the correlation between eM and eY while also estimating a, b, and c0 . However, investigators often have information that is ignored that can be used to resolve this problem. Figure 1.5B shows a model with adjustments for baseline (prerandomization) measures of the outcome (Y0) and mediating process (M0). When these baseline measures are included, it is possible both to account for baseline association between Y and M and to estimate a residual correlation between Y and M. The residual correlation can be estimated if it is reasonable to consider the baseline M0 as an instrumental variable that has an impact on the outcome Y only through its connection with the postrandomized measure of the mediating process, M1 How important can this adjustment be? Consider a hypothetical numerical example in which a = 0.7, b = 0.4, and c0 = 0.28. Assuming that the effects are population values, these values indicate a partial mediation model. The total effect of T on Y (c in Figure 1.4A) is the sum of the direct and indirect effects, 0.56 = 0.28 + (0.70)(0.40), and exactly half the effect goes through M. The stability of the mediation process from baseline to postintervention is represented by g1 and the comparable stability of the outcome variable is g2. Finally, the degree of correlation between M0 and Y0 is rmy. Figure 1.6 shows results from an analysis of the bias using the Figure 1.4B model to represent mediation for different levels of correlation between M0 and Y0. The results differ depending on how stable are the mediating and outcome processes in the control group. (For simplicity, the figure assumes that they are the same, i.e., g1 = g2.) Focusing on the estimate of the indirect effect, a * b, one can see that there is no bias if M and Y have no stability: The estimate is the expected 0.28 for all values of rmy when g1 = g2 = 0. However, when stability in M and Y is observed, the correlation between M0 and Y0 is substantial. Given that symptoms, such as depression, and coping strategies, such as cognitive skills, tend to be quite stable in longitudinal 1. There can be further refinements to the model shown in Figure 1.5B. One might consider a model where Y0 is related to the mediating process M. For example, if less depressed persons in the study were inclined to seek self-help information and M represented new cognitive skills that are available in the media, then the path between Y0 and M could be non-zero and negative.
1 Integrating Causal Analysis into Psychopathology Research
19
studies, we must conclude that important bias in estimates of the indirect effect is likely to be the rule rather than the exception. When investigators compute mediation analyses without taking into account the correlation of M and Y at baseline, they run the risk of concluding that an experimental result is completely mediated by an intervening process, when in fact there may be direct effects or other intervening processes also involved. The use of baseline measures is not the only way to make adjustments for spurious correlations between M and Y. In social psychology, Spencer, Zanna, and Fong (2005) argued that the causal involvement of the mediating path would be most convincing if researchers developed supplemental studies that manipulated M directly through randomized experiments. For example, in a long-term study of the CBT effects on depression, an investigator might randomly assign persons in the treatment group to experience (or not) a ‘‘booster’’ session that reminds the patient of the cognitive skills that were taught previously in the CBT sessions. One of the more challenging assumptions of this approach is that the nature of the M change in a direct intervention is identical to the nature of the M change that follows manipulation of T. It is possible, for example, that the booster intervention on cognitive skills might have a different impact from the original intervention because the patient has become aware of barriers to the implementation of the skill set. As a result of that experience, the patient might attend to different Indirect Effect Bias 1.20
1.00
Product a*b
0.80 Stability .8 Stability .6 Stability .4 Stability .2 Stability .0
0.60
0.40
0.20
0.00 0
0.1
0.2
0.3 0.4 0.5 0.6 Baseline Corr of M and Y
0.7
0.8
0.9
Figure 1.6 Chart showing the expected values of the indirect effect estimated from the model in Panel B of Figure 1.4 when the actual model was Panel B of Figure 1.5 with values a=.7, b=.4 and c0 =.28. Different lines show values associated with different stabilities of the M and Y processes (g1, g2 in Figure 1.5B) as a function of the baseline correlation between M and Y.
20
Causality and Psychopathology
aspects of the booster session from the original intervention. This difference could affect the strength of the relation between M and T in the direct manipulation condition. Nonetheless, the new information provided by direct manipulation of M is likely to increase the confidence one has in the estimate of the indirect causal path. Noting that it is the correlational nature of the link between M and Y that makes it challenging to obtain unbiased estimates of indirect (mediated) effects in randomized studies, it should not be surprising that the challenges are much greater in nonexperimental research. There are a number of studies published in peer review journals that attempt to partition assumed causal effects into direct and indirect components. For example, Mohr et al. (2003) reported that an association between traumatic stress and elevated health problems in 713 active police officers was fully mediated by subjective sleep problems in the past month. All the variables were measured in a cross-sectional survey. The path of stress to sleep to health problems is certainly plausible, but it is also possible that health problems raise the risk of both stress and sleep problems. Even if there is no dispute about the causal order, there can be dispute about the meaning of the mediation analysis in cases such as this. Presumably, the underlying model unfolds on a daily basis: Stress today disrupts sleep tonight, and this increases the risk of health problems tomorrow. One might hope that cross-sectional summaries of stress and sleep patterns obtained for the past month would be informative about the mediating process. However, Maxwell and Cole (Cole & Maxwell, 2003; Maxwell & Cole, 2007) provided convincing evidence that there is no certain connection between a time-dependent causal model and a result based on cross-sectional aggregation of data. They studied the implications of a stationary model where causal effects were observed on a daily basis for a number of days or parts of days. In addition to the mediation effects represented in Figure 1.4B (a, b, c0 ), they represented the stability of the T, M, and Y processes from one time point to the next. They studied the inferences that would be made from a cross-sectional analysis of the variables under different assumptions about the mediation effects and the stability of the processes. The bias of the cross-sectional analysis was greatly influenced by the process stability, and the direction of the bias was not consistent. Sometimes the bias of the indirect effect estimate was positive and sometimes it was negative. The Maxwell and Cole work prompts psychopathology researchers to think carefully about the temporal patterns in mediation and to take seriously the assumptions that were articulated by Judd and Kenny (1981). Others have called for modifications of the original positions taken by Kenny and his colleagues. An important alternate perspective has been advanced by MacArthur Network researchers (Kraemer, Kiernan, Essex, & Kupfer, 2008),
1 Integrating Causal Analysis into Psychopathology Research
21
who call into question the Baron and Kenny (1986) distinction between mediation and moderation. As we have already reviewed in Figure 1.4, a third variable is said by Baron and Kenny (1986) to be a mediator if it both has a direct association with Y adjusting for T and can be represented as being related linearly with T. A moderator, according to Baron and Kenny (1986), is a third variable (W) that is involved in a statistical interaction with T when T and W are used to predict Y. The MacArthur researchers note that the Baron and Kenny distinction is problematic if various nonlinear transformations of Y are considered. Such transformations can produce interaction models, even if there is no evidence that the causal effect is moderated. They propose to limit the concept of moderation to effect modifiers. If the third variable represents a status before the treatment is applied and if the size of the TY effect varies with the level of the status, then moderation is demonstrated from the MacArthur perspective. For randomized studies, the moderating variable would be expected to be uncorrelated with T. If psychopathology researchers embrace the MacArthur definition of moderation, considerable confusion in the literature will be avoided in the future.
Conclusion The time is ripe for psychopathology researchers to reconsider the conventions for making causal statements about mental health processes. On the one hand, conventions such as ITT analyses of clinical trials have led to conservative conclusions about the causal processes involved in the changes following interventions, and on the other hand, rote application of the Baron and Kenny (1986) steps for describing mediated paths have led to premature closure regarding which causal paths account for intervention effects. The old conventions are efficient for researchers in that they prescribe a small number of steps that must be followed in preparing manuscripts, but they limit the insights that are possible from a deeper consideration of causal mechanisms and pathways. The new approaches to causal analysis will not lead to quick statements about which factors are causal and which are spurious or even definitive statements, but they will allow clinical and experimental data to be viewed from multiple perspectives to reveal new causal insights. In many cases, the new approaches are likely to suggest causal heterogeneity in a population. Because of genetic differences, social context, developmental stage, timing of measurements, and random environmental flux, the size of causal effects of intervention T will vary from person to person. The new methods will help us to appreciate how the alternate summaries of the population causal effect can be affected by these distributions.
22
Causality and Psychopathology
It will often take more effort to use the modern tools of causal analysis, but the benefit of the effort is that researchers will be able to talk more explicitly about interesting causal theories and patterns rather than about associations that have been edited to remove any reference to ‘‘cause’’ or ‘‘effect.’’ In the long run the more sophisticated analyses will lead to more nuanced prevention and treatment interventions and a deeper understanding of the determinants of psychiatric problems and disorders. Many examples of these insights are provided in the chapters that follow.
References Barnard, J., Frangakis, C. E., Hill, J. L., & Rubin, D. B. (2003). Principal stratification approach to broken randomized experiments: A case study of school choice vouchers in New York City. Journal of the American Statistical Association, 98, 299–311. Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. Becker, M. H., & Maiman, L. A. (1975). Sociobehavioral determinants of compliance with health and medical care. Medical Care, 13(1), 10–24. Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley. Brotman, L. M., Gouley, K. K., Huang, K.-Y., Rosenfelt, A., O’Neal, C., Klein, R. G., et al. (2008). Preventive intervention for preschoolers at high risk for antisocial behavior: Long-term effects on child physical aggression and parenting practices. Journal of Clinical Child and Adolescent Psychology, 37, 386–396. Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal data: Questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112, 558–577. Connor, D. F., Glatt, S. J., Lopez, I. D., Jackson, D., & Melloni, R. H., Jr. (2002). Psychopharmacology and aggression. I: A meta-analysis of stimulant effects on overt/covert aggression-related behaviors in ADHD. Journal of the American Academy of Child & Adolescent Psychiatry, 41(3), 253–261. Costello, E. J., Compton, S. N., Keeler, G., & Angold, A. (2003). Relationships between poverty and psychopathology: A natural experiment. Journal of the American Medical Association, 290, 2023–2029. Davey Smith, G., & Ebrahim, S. (2003). ‘‘Mendelian randomization’’: Can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology, 32, 1–22. Dobson, K. S. (1989). A meta-analysis of the efficacy of cognitive therapy for depression. Journal of Consulting and Clinical Psychology, 57(3), 414–419. Efron, B. (1998). Forward to special issue on analyzing non-compliance in clinical trials. Statistics in Medicine, 17, 249–250. Everitt, B. S., & Pickles, A. (2004). Statistical aspects of the design and analysis of clinical trials. London: Imperial College Press. Fleiss, J. L. (1986). The design and analysis of clinical experiments. New York: Wiley. Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58, 21–29.
1 Integrating Causal Analysis into Psychopathology Research
23
Freedland, K. E., Skala, J. A., Carney, R. M., Rubin, E. H., Lustman, P. J., DavilaRoman, V. G., et al. (2009). Treatment of depression after coronary artery bypass surgery: A randomized controlled trial. Archives of General Psychiatry, 66(4), 387–396. Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press. Greenland, S., Pearl, J., & Robins, J. M. (1999a). Causal diagrams for epidemiologic research. Epidemiology, 10(1), 37–48. Greenland, S., Pearl, J., & Robins, J. M. (1999b). Confounding and collapsibility in causal inference. Statistical Science, 14(1), 29–46. Gross, D., Fogg, L., Webster-Stratton, C., Garvey, C., Julion, W., & Grady, J. (2003). Parent training of toddlers in day care in low-income urban communities. Journal of Consulting and Clinical Psychology, 71, 261–278. Hafeman, D. M. (2008). A sufficient cause based approach to the assessment of mediation. European Journal of Epidemiology, 23, 711–721. Hansen, R. A., Gartlehner, G., Lohr, K. N., Gaynes, B. N., & Carey, T. S. (2005). Efficacy and safety of second-generation antidepressants in the treatment of major depressive disorder. Annals of Internal Medicine, 143, 415–426. Hareli, S., & Hess, U. (2008). The role of causal attribution in hurt feelings and related social emotions elicited in reaction to other’s feedback about failure. Cognition & Emotion, 22(5), 862–880. Hegarty, J. D., Baldessarini, R. J., Tohen, M., & Waternaux, C. (1994). One hundred years of schizophrenia: A meta-analysis of the outcome literature. American Journal of Psychiatry, 151(10), 1409–1416. Herna´n, M. A., & Robins, J. M. (2006). Instruments for causal inference: an epidemiologist’s dream? Epidemiology, 17(4), 360–372. Holland, P. (1986). Statistics and causal inference (with discussion). Journal of the American Statistical Association, 81, 945–970. Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treatment evaluations. Evaluation Review, 5, 602–619. Kraemer, H., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria defining moderators and mediators differ between the Baron & Kenny and MacArthur approaches. Health Psychology, 27(Suppl. 2), S101–S108. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley. MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York: Lawrence Erlbaum. Maxwell, S. E., & Cole, D. A. (2007). Bias in cross-sectional analyses of longitudinal mediation. Psychological Methods, 12(1), 23–44. Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. Mohr, D., Vedantham, K., Neylan, T., Metzler, T. J., Best, S., & Marmar, C. R. (2003). The mediating effects of sleep in the relationship between traumatic stress and health symptoms in urban police officers. Psychosomatic Medicine, 65, 485–489. Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference. New York: Cambridge University Press. O’Mahony, S. M., Marchesi, J. R., Scully, P., Codling, C., Ceolho, A., Quigley, E. M. M., et al. (2009). Early life stress alters behavior, immunity, and microbiota in rats: Implications for irritable bowel syndrome and psychiatric illnesses. Biological Psychiatry, 65(3), 263–267.
24
Causality and Psychopathology
Pearl, J. (2009). Causality: Models, reasoning and inference. (Second edition) New York: Cambridge University Press. Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan Kaufmann. Peduzzi, P., Wittes, J., Detre, K., & Holford, T. (1993). Analysis as-randomized and the problem of non-adherence: An example from the Veterans Affairs Randomized Trial of Coronary Artery Bypass Surgery. Statistics in Medicine, 12, 1185–1195. Piantadosi, S. (2005). Clinical trials: A methodologic perspective (2nd ed.). New York: Wiley. Quitkin, F. M., Petkova, E., McGrath, P. J., Taylor, B., Beasley, C., Stewart, J., et al. (2003). When should a trial of fluoxetine for major depression be declared failed? American Journal of Psychiatry, 160(4), 734–740. Rehder, B., & Kim, S. (2006). How causal knowledge affects classification: A generative theory of categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(4), 659–683. Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods—applications to control of the healthy worker survivor effect. Mathematical Modeling, 7, 1393–1512. Robins, J. M. (1993). Analytic methods for estimating HIV treatment and cofactor effects. In D. G. Ostrow & R. C. Kessler (Eds.), Methodological issues of AIDS mental health research (pp. 213–288). New York: Springer. Robins, J. M., Hernan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550–560. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of propensity scores in observational studies for causal effects. Biometrika, 70, 41–55. Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. Rubin, D. B. (1978). Bayesian inference for causal effects. Annals of Statistics, 6, 34–58. Rubin, D. B. (1980). Discussion of ‘‘Randomization analysis of experimental data in the Fisher randomization test,’’ by D. Basu. Journal of the American Statistical Association, 75, 591–593. Rubin, D. B. (1990). Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25, 279–292. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-Mifflin. Spencer, S. J., Zanna, M. P., & Fong, G. T. (2005). Establishing a causal chain: Why experiments are often more effective than mediational analyses in examining psychological processes. Journal of Personality and Social Psychology, 89(6), 845–851. Toh, S., & Herna´n, M. (2008). Causal inference from longitudinal studies with baseline randomization. International Journal of Biostatistics, 4(1), article 22. Retrieved from http://www.bepress.com/ijb/vol4/iss1/22
2 What Would Have Been Is Not What Would Be Counterfactuals of the Past and Potential Outcomes of the Future sharon schwartz, nicolle m. gatto, and ulka b. campbell
Introduction Epidemiology is often described as the basic science of public health. A mainstay of epidemiologic research is to uncover the causes of disease that can serve as the basis for successful public-health interventions (e.g., Institute of Medicine, 1988; Milbank Memorial Fund Commission, 1976). A major obstacle to attaining this goal is that causes can never be seen but only inferred. For this reason, the inferences drawn from our studies must always be interpreted with caution. Considerable progress has been made in the methods required for sound causal inference. Much of this progress is rooted in a full and rich articulation of the logic behind randomized controlled trials (Holland, 1986). From this work, epidemiologists have a much better understanding of barriers to causal inference in observational studies, such as confounding and selection bias, and their tools and concepts are much more refined. The models behind this progress are often referred to as ‘‘counterfactual’’ models. Although researchers may be unfamiliar with them, they are widely (although not universally) accepted in the field. Counterfactual models underlie the methodologies that we all use. Within epidemiology, when people talk about a counterfactual model, they usually mean a potential outcomes model—also known as ‘‘Rubin’s causal model.’’ As laid out by epidemiologists, the potential outcomes model is rooted in the experimental ideas of Cox and Fisher, for which Neyman provided the first mathematical expression. It was popularized by Rubin, who extended 25
26
Causality and Psychopathology
it to observational studies, and expanded by Robins to exposures that vary over time (Maldonado & Greenland, 2002; Hernan, 2004; VanderWeele & Hernan, 2006). This rich tradition is responsible for much of the progress we have just noted. Despite this progress in methods of causal inference, a common charge in the epidemiologic literature is that public-health interventions based on the causes we identify in our studies often fail. Even when they do not fail, the magnitudes of the effects of these interventions are often not what we expected. Levins (1996) provides a particularly gloomy assessment: The promises of understanding and progress have not been kept, and the application of science to human affairs has often done great harm. Public health institutions were caught by surprise by the resurgence of old diseases and the appearance of new ones. . . . Pesticides increase pests, create new pest problems and contribute to the load of poison in our habitat. Antibiotics create new pathogens resistant to our drugs. (p. 1) A less pessimistic assessment suggests that although public-health interventions may be narrowly successful, they may simultaneously lead to considerable harm. An example is the success of antismoking campaigns in reducing lung cancer rates in the United States, while simultaneously increasing smoking and thereby lung cancer rates in less developed countries. This unintended consequence resulted from the redirection of cigarette sales to these countries (e.g., Beaglehole & Bonita, 1997). Ironically, researchers often attribute these public-health failures to a narrowness of vision imposed by the same models of causal inference that heralded modern advances in epidemiology and allied social and biological sciences. That is, counterfactual models improve causal inference in our studies but are held at least partly responsible for the failures of the interventions that follow those studies. Critics think that counterfactually based approaches in epidemiology not only do not provide a sound basis for publichealth interventions but cannot (e.g., Shy, 1997; McMichael, 1999). While there are many aspects of the potential outcomes model that warrant discussion, here we focus on one narrowly framed question: Is it possible, as the critics contend, that the same models that enhance the validity of our studies can mislead us when we try to intervene on the causes these studies uncover? We think the answer is a qualified ‘‘yes.’’ We will argue that the problem arises not because of some failure of the potential outcomes approach itself but, rather, because of unintended consequences of the metaphors and tools implied by the model. We think that the language, analogies, and conceptual frame that enhance the valid estimation of precise causal effects can encourage unrealistic expectations about the
2 What Would Have Been is Not What Would Be
27
relationship between the causal effects uncovered in our studies and results of interventions based on their removal. More specifically, we will argue that the unrealistic expectations of the success of interventions arise in the potential outcomes frame because of a premature emphasis on the effects of causal manipulation (understanding what would happen if the exposure were altered) at the expense of two other tasks that must come first in epidemiologic research: (1) causal identification (identifying if an exposure did cause an outcome) and (2) causal explanation (understanding how the exposure caused the outcome). We will describe an alternative approach that specifies all three of these steps—causal identification, followed by causal explanation, and then the effects of causal manipulation. While this alternative approach will not solve the discrepancy between the results of our studies and the results of our interventions, it makes the sources of the discrepancy explicit. The roles of causal identification and causal explanation in causal inference, which we build upon here, have been most fully elaborated by Shadish, Cook, and Campbell (2002), heirs to a prominent counterfactual tradition in psychology (Cook & Campbell 1979) . We think that a dialogue between these two counterfactual traditions (i.e., the potential outcomes tradition and the Cook and Campbell tradition as most recently articulated in Shadish et al.) can provide a more realistic assessment of what our studies can accomplish and, perhaps, a platform for a more successful translation of basic research findings into sound public-health interventions. To make these arguments, we will (1) review the history and principles of the potential outcomes model, (2) describe the limitations of this model as the basis for interventions in the real world, and (3) propose an alternative based on an integration of the potential outcomes model with other counterfactual traditions. We wish to make clear at the outset that virtually all of the ideas in this chapter already appear in the causal inference literature (Morgan & Winship, 2007). This chapter simply presents the picture we see as we stand on the shoulders of the giants in causal inference.
The Potential Outcomes Model In the epidemiologic literature, a counterfactual approach is generally equated with a potential outcomes model (e.g., Maldonado & Greenland, 2002; Hernan, 2004; VanderWeele & Hernan, 2006). In describing this model, we will use the term exposure to mean a variable we are considering as a possible cause. For ease of discourse, we will use binary exposures and
28
Causality and Psychopathology
outcomes throughout. Thus, individuals will be either exposed or not and will either develop the disease or not. The concept at the heart of the potential outcomes model is the causal effect of an exposure. A causal effect is defined as the difference between the potential outcomes that would arise for an individual under two different exposure conditions. In considering a disease outcome, each individual has a potential outcome for the disease under each exposure condition. Therefore, when comparing two exposure conditions (exposed and not exposed), there are four possible pairs of potential outcomes for each individual. An individual can develop the disease under both conditions, only under exposure, only under nonexposure, or under neither condition. Greenland and Robins (1986) used response types as a shorthand to describe these different pairs of potential outcomes. Individuals who would develop the disease under either condition (i.e., whether or not they were exposed) are called ‘‘doomed’’; those who would develop the disease only if they were exposed are called ‘‘causal types’’; those who would develop the disease only if they were not exposed are called ‘‘preventive types’’; and those who would not develop the disease under either exposure condition are called ‘‘immune.’’ Every individual is conceptualized as having a potential outcome under each exposure that is independent of the actual exposure. Potential outcomes are determined by the myriad of largely unknown genetic, in utero, childhood, adult, social, psychological, and biological causes to which the individuals have been exposed, other than the exposure under study. The effect of the exposure for each individual is the difference between the potential outcome under the two exposure conditions, exposed and not. For example, if an individual’s potential outcomes were to develop the disease if exposed but not if unexposed, then the exposure is causal for that individual (i.e., he or she is a causal type). Rubin uses the term treatment to refer to these types of exposures and describes a causal effect in language that implies an imaginary clinical trial. In Rubin’s (1978) terms, ‘‘The causal effect of one treatment relative to another for a particular experimental unit is the difference between the result if the unit had been exposed to the first treatment and the result if, instead, the unit had been exposed to the second treatment’’ (p. 34). One of Rubin’s contributions was the popularization of this definition of a causal effect in an experiment and the extension of the definition to observational studies (Hernan, 2004). For example, the causal effect of smoking one pack of cigarettes a day for a year (i.e., the first treatment) relative to not smoking at all (the second treatment) is the difference between the disease outcome for an individual if he or she smokes a pack a day for a year compared with the disease outcome
2 What Would Have Been is Not What Would Be
29
in that same individual if he or she does not smoke at all during this same time interval. One can think about the average causal effect in a population simply as the average of the causal effects for all of the individuals in the population. It is the difference between the disease experience of the individuals in a particular population if we were to expose them all to smoking a pack a day and the disease experience if we were to prevent them from smoking at all during this same time period. A useful metaphor for this tradition is that of ‘‘magic powder,’’ where the magic powder can remove an exposure. Imagine we sprinkle an exposure on a population and observe the disease outcome. Imagine then that we use magic powder to remove that exposure and can go back in time to see the outcome in the same population. The problem of causal inference is twofold—we do not have magic powder and we cannot go back in time. We can never see the same people at the same time exposed and unexposed. That is, we can never see the same people both smoking a pack of cigarettes a day for a year and, simultaneously, not smoking cigarettes at all for a year. From a potential outcomes perspective, this is conceptualized as a missing-data problem. For each individual, at least one of the exposure experiences is missing. In our studies, we provide substitutes for the missing data. Of course, our substitutes are never exactly the same as what we want. However, they can provide the correct answer if the potential outcomes of the substitute are the same as the potential outcomes of the target, the person or population you want information about. The potential outcomes model is clearly a counterfactual model in the sense that the same person cannot simultaneously experience both exposure and nonexposure. The outcomes of at least one of the exposure conditions must represent a counterfactual, an outcome that would have, but did not, happen. Rubin (2005), however, objects to the use of the term counterfactual when applied to his model. Counterfactual implies there is a fact (e.g., the outcome that did occur in a group of exposed individuals) to which the counterfactual (e.g., the outcome that would have occurred had this group of individuals not been exposed) is compared. However, for Rubin, there is no fact to begin with. Rather, the comparison is between the potential outcomes of two hypothetical exposure conditions, neither of which necessarily reflects an occurrence. The causal effect for Rubin is between two hypotheticals. Thus, in the potential outcomes frame, when epidemiologists use the term counterfactual, they mean ‘‘hypothetical’’ (Morgan & Winship, 2007). This subtle distinction has important implications, as we shall see. This notion of a causal effect as a comparison between two hypotheticals derives from the rootedness of the potential outcomes frame in experimental traditions. Holland (1986), an early colleague of Rubin and explicator of his
30
Causality and Psychopathology
work, makes this experimental foundation clear in his summary of the three main tenets of the potential outcomes model. First, the potential outcomes model studies the effects of causes and not the causes of effects. Thus, the goal is to estimate the average causal effect of an exposure, not to identify the causes of an outcome. For a population, this is the average causal effect, defined as the average difference between two potential outcomes for the same individuals, the potential outcome under exposure A vs. the potential outcome under exposure B. The desired, but unobservable, true causal effect is the difference in outcome in one population under two hypothetical exposure conditions: if we were to expose the entire population to exposure A vs. if we were to expose them to exposure B. As in an experiment, the exposure is treated as if it were in the control of the experimenter; the goal is to estimate the effect that this manipulation would have on the outcome. Second, the effects of causes are always relative to particular comparisons. One cannot ask questions about the effect of a particular exposure without specifying the alternative exposure that provides the basis for the comparison. For example, smoking a pack of cigarettes a day can be preventive of lung cancer if the comparison was smoking four packs of cigarettes a day but is clearly causal if the comparison was with smoking zero packs a day. As in an experiment, the effect is the difference between two hypothetical exposure conditions. Third, potential outcomes models limit the types of factors that can be defined as causes. In particular, attributes of units (e.g., attributes of people such as gender) are not considered to be causes. This requirement clearly derives from the experimental, interventionist grounding of this model. To be a cause (or at least a cause of interest), the factor must be manipulable. In Holland (1986, p. 959) and Rubin’s terminology, ‘‘No causation without manipulation.’’1 The focus on the effect of causes, the precise definition of the two comparison groups, and the emphasis on manipulability clearly root the potential outcomes approach in experimental traditions. Strengths of this approach include the clarity of the definition of the causal effect being estimated and the articulation of the assumptions necessary for this effect to be valid. These assumptions are (1) that the two groups being compared (e.g., the exposed and the unexposed) are exchangeable (i.e., they have the same potential outcomes) and (2) that the stable unit treatment value assumption (SUTVA) holds. While exchangeability is well understood in epidemiology, the requirements of SUTVA may be less accessible.
1. Rubin (1986), in commenting on Holland’s 1986 article, is not as strict as Holland in demanding that causes be, by definition, manipulable. Nonetheless, he contends that one cannot calculate the causal effect of a nonmanipulable cause and coauthored the ‘‘no causation without manipulation’’ mantra.
2 What Would Have Been is Not What Would Be
31
Stable Unit Treatment Value Assumption A valid estimate of this causal effect requires that the two groups being compared (e.g., the exposed and the unexposed) are exchangeable (i.e., that is there is no confounding) and that SUTVA is reasonable. SUTVA requires that (1) the effect of a treatment is the same, no matter how an individual came to be treated, and (2) the outcome in an individual is not influenced by the treatment that other individuals receive. In Rubin’s (1986) language, SUTVA is simply the a priori assumption that the value of Y [i.e., the outcome] for unit u [e.g., a particular person] when exposed to treatment t [e.g., a particular exposure or risk factor] will be the same no matter what mechanism is used to assign treatment t to unit u and no matter what treatments the other units receive . . . SUTVA is violated when, for example, there exist unrepresented versions of treatments (Ytu depends on which version of treatment t was received) or interference between units (Ytu depends on whether unit u0 received treatment t or t0 ). (p. 961) Thus, if one were to study the effects of a particular form of psychotherapy, SUTVA would be violated if (1) there were different therapists with different levels of expertise or some individuals freely agreed to enter therapy while others agreed only at the behest of a relative and the mode of entry influenced the effectiveness of the treatment (producing unrepresented versions of treatments) (Little & Rubin, 2000) or (2) individuals in the treatment group shared insights they learned in therapy with others in the study (producing interference between units) (Little & Rubin, 2000). The language in which SUTVA is described, the effects of treatment assignment and versions of treatments, is again indicative of the explicit connection between the potential outcomes model and randomized experiments. To make observational studies as close to experiments as possible, we must ensure that those exposed to the ‘‘alternative treatments’’ (i.e., different exposures) are exchangeable in the sense that the same outcomes would arise if the individuals in the different exposure groups were reversed. In addition, we must ensure that we control all factors that violate SUTVA. We do this by carefully defining exposures or risk factors in terms of narrowly defined treatments that can be manipulated, at least in theory. To continue our smoking example, one could ask questions about the average causal effect of smoking a pack of cigarettes a day for a year (treatment A) compared with never having smoked at all (treatment B) in an observational study. Since we cannot observe the same people simultaneously under two different treatments, we compare the disease experience of two
32
Causality and Psychopathology
groups of people: one with treatment A, the exposure of interest, and one with treatment B, the substitute for the potential outcomes of the same group under the second treatment option. In order for the substitution to yield an accurate effect estimate (i.e., for exchangeability to hold), we must ensure that the smokers and nonsmokers are as similar as possible on all causes of the outcome (other than smoking). This can be accomplished by random assignment in a randomized controlled trial. To meet SUTVA assumptions, we have to (1) be vigilant to define our exposure precisely so there is only one version of each treatment and be certain that how individuals entered the smoking and nonsmoking groups did not influence their outcome and (2) ensure the smoking habits of some individuals in our study did not influence the outcomes of other individuals. Barring other methodological problems, it would be assumed that if we did the intervention in real life, that is, if we prevented people from smoking a pack of cigarettes a day for a year, the average causal effect estimated from our study would approximate this intervention effect. The potential outcomes model is an attempt to predict the average causal effect that would arise (or be prevented) from a particular manipulation under SUTVA. It is selfconsciously interventionist. Indeed, causal questions are framed in terms of intervention consequences. To ensure the validity of the causal effects uncovered in epidemiologic studies, researchers are encouraged to frame the causal question in these terms. As a prototypical example, Glymour (2007), in a cogent methodologic critique of a study examining the effect of childhood socioeconomic position on adult health, restated the goal of the study in potential outcome terms. ‘‘The primary causal question of interest is how adult health would differ if we intervened to change childhood socio-economic position’’ (p. 566). It is critical to note that even when we do not explicitly begin with this type of model, the interventionist focus of the potential outcomes frame implicitly influences our thinking through its influence on our methods. For example, this notion is embodied in our understanding of the attributable risk as the proportion of disease that would be prevented if we were to remove this exposure (Last, 2001). More generally, authors often end study reports with a statement about the implications of their findings for intervention or policy that reflect this way of thinking.
Limitations of the Potential Outcomes Model for Interventions in the Real World To ensure the internal validity of our inferences, we isolate the effects of our causes from the context in which they act. We do this by narrowly defining
2 What Would Have Been is Not What Would Be
33
our treatments, creating exchangeability between treated and untreated people, and considering social norms and the physical environment as part of a stable background in which causes act. In order for the causal effect of an exposure in a study to translate to the effect of its intervention, all of the controls and conditions imposed in the study must hold in the intervention and the targeted population (e.g., treatment definition, follow-up time frame, distribution of other causes). The problem is that, in most cases, interventions in the real world cannot replicate the conditions that gave rise to the average causal effect in a study. It is important to note that this is true for randomized controlled trials as well as observational studies. It is true for classic risk factors as well as for exposures in life course and social epidemiology. The artificial world that we appropriately create to identify causal effects—a narrow swath of temporal, geographic, and social reality in which exchangeability exists and SUTVA is not violated—captures a vital but limited part of the world in which we are interested. Thus, while the approach we use in studies aids in the valid estimation of a causal effect for the past, it provides a poor indicator of a causal effect for the future. For these reasons, the causal effects of our interventions in the real world are unlikely to be the same as the causal effects of our studies. This problem is well recognized in the literature on randomized controlled trials in terms of the difference between efficacy and effectiveness and in the epidemiologic literature as the difference between internal validity and external validity. However, this recognition is rarely reflected in research practice. We suspect this problem may be better understood by deeper examination of the causes of the discrepancy between the effects observed in studies and the effects of interventions. We group these causes into three interrelated categories: direct violations of SUTVA, unintended consequences of our interventions, and context dependencies.
Direct Violations of SUTVA Stable Treatment Effect In order to identify a causal effect, a necessary SUTVA is that there is only one version of the treatment. To meet this assumption, we need to define the exposures in our studies in an explicit and narrow way. For example, we would ask about the effects of a particular form of psychotherapy (e.g., interpersonal psychotherapy conducted by expert clinicians) rather than about psychotherapy in general. This is because the specific types of therapy encompassed within the broad category of ‘‘psychotherapy’’ are likely to have different effects on the outcome.
34
Causality and Psychopathology
While this is necessary for the estimation of precise causal effects in our studies, it is not likely to reflect the meaning of the exposure or treatment in the real world. The removal of causes or the provision of treatments, no matter how well defined, is never surgical. Unlike the removal of causes by the magic powder in our thought experiments, interventions are often crude and messy. Public-health interventions are inherently broad. Even in a clinical context, treatment protocols are never followed precisely in realworld practice. In public-health interventions, there are also different ways of getting into ‘‘treatment,’’ and these may well have different effects on the outcome. For instance, the effect of an intervention offering a service may be very different for those who use it only after it has become popular (the late adopters). Early adopters of a low-fat diet, for example, may increase their intake of fruits and vegetables to compensate for the caloric change. Late adopters may substitute low-fat cookies instead. A low-fat diet was adopted by both types of people, but the effect on an outcome (e.g., weight loss) would likely differ. There are always different versions of treatments, and the mechanisms through which individuals obtain the treatments will frequently impact the effect of the treatments on the outcome. Interference Between Units When considered in real-world applications over a long enough time frame, there will always be ‘‘interference between units.’’ Because people live in social contexts, their behavior influences norms and social expectations. Behavior is contagious. This can work in positive ways, increasing the effectiveness of an intervention, or lead to negative unintended consequences. An example of the former would be when the entrance of a friend into a weightloss program encourages another friend to lose weight (Christakis & Fowler, 2007). Thus, the outcome for one individual is contingent on the exposure of another individual. Similarly, changes in individual eating behaviors spread. This influences not only individuals’ behavior but, eventually, the products that stores carry, the price of food, and the political clout of like-minded individuals. It changes the threshold for the adoption of healthy eating habits. There is an effect not only of the weight-loss program itself but also of the proportion of people enrolled in weight-loss programs within the population. Within the time frame of our studies, the extant norms caused by interactions among individuals and the effect of the proportion of exposure in the population are captured as part of the background within which causes act, are held constant, and are invisible. To identify the true effects these causes had, this approach is reasonable and necessary. The causes worked during
2 What Would Have Been is Not What Would Be
35
that time frame within that normative context. However, in a public-health intervention, these norms change over time due to the intervention. This problem is well recognized in infectious disease studies where the contagion occurs in a rapid time frame, making noninterference untenable even in the context of a short-term study. It is hard to imagine, though, any behavior which is not contagious over long enough time frames. The fact is that the causal background we must hold constant to estimate a causal effect is influenced by our interventions.
Unintended Consequences of Interventions Unintended consequences of interventions are consequences of exposure removal not represented as part of the causal effect of the exposure on the outcome under study. The causes of these unintended consequences include natural confounding and narrowly defined outcomes. Natural Confounding Recall that the estimation of the true causal effect requires exchangeability of potential outcomes between the exposed and unexposed groups in our studies. Exchangeability is necessary to isolate the causal effect of interest. For example, in examining the effects of alcohol abuse on vehicular fatalities, we may control for the use of illicit drugs. We do so because those who abuse alcohol may be more likely to also abuse other drugs that are related to vehicular fatalities. If the association between alcohol abuse and illicit drug use is a form of ‘‘natural confounding,’’ that is, the association between alcohol and drug use arises in naturally occurring populations and is not an artifact of selection into the study, then this association is likely to have important influences in a real-world intervention. That is, the way in which individuals came to be exposed may influence the effect of the intervention, in violation of SUTVA. For example, when two activities derive from a similar underlying factor (social, psychological, or biologic), the removal of one may influence the presence of the other over time; it may activate a feedback loop. Thus, the causal effect of alcohol abuse on car accidents may overestimate the effect of the removal of alcohol abuse from a population if the intervention on alcohol use inadvertently increases marijuana use. As this example illustrates, an intervention may influence not only the exposure of interest but also other causes of the outcome that are linked with the exposure in the real world. In our studies, we purposely break this link. We overcome the problem of the violation of SUTVA by imposing narrow limits on time and place so that SUTVA holds in the study. We control these
36
Causality and Psychopathology
variables, precisely because they are also causes of the outcome under study. In the real world, however, their influence may make the interventions less effective than our effect estimates suggest. The control in the study was not incorrect as it was necessary to isolate the true effect that alcohol use did have on car accidents among these individuals given the extant conditions of the study. However, outside the context of the study, removal of the exposure of interest had unintended consequences over time through its link with other causes of the outcome. Narrowly Defined Outcomes Although we may frame our studies as identifying the ‘‘effects of causes,’’ they identify only the effects of causes on the specific outcomes we examine in our studies. In the real world, causes are likely to have many effects. Likewise, our interventions have effects on many outcomes, not only those we intend. Unless we consider the full range of outcomes, our interventions may be narrowly successful but broadly harmful. For example, successful treatments for AIDS have decreased the death rate but have also led people to reconceptualize AIDS from a lethal illness to a manageable chronic disease. This norm change can lead to a concomitant rise in risk-taking behaviors and an increase in disease incidence. More optimistically, our interventions may have beneficial effects that are greater than we assume if we consider unintended positive effects. For example, an intervention designed to increase high school graduation rates may also reduce alcoholism among teens.
Context Dependency Most fundamentally, all causal effects are context-dependent, and therefore, all effects are local. It is unlikely that a public-health intervention will be applied only in the exact population in which the causal effects were studied. Public-health interventions often apply to people who do not volunteer for them, to a broader swath of the social fabric and over a different historical time frame. Therefore, even if our effect estimates were perfectly valid, we would expect effects to vary between our studies and our interventions. For example, psychiatric drugs are often tested on individuals who meet strict Diagnostic and Statistical Manual of Mental Disorders criteria, do not have comorbidities, and are placebo nonresponders. Once the drugs are marketed, however, they are used to treat individuals who represent a much wider population. It is unlikely that the effects of the drugs will be similar in real-world usage as in the studies. For all these reasons, it seems unlikely that the causal effect of any intervention will reflect the causal effect found in our studies. These problems are
2 What Would Have Been is Not What Would Be
37
well known and much discussed in the social science literature (e.g., Merton, 1936, 1968; Lieberson, 1985) and the epidemiologic literature (e.g., Greenland, 2005). Nonetheless, when carrying out studies, epidemiologists often talk about trying to identify ‘‘the true causal effect of an exposure,’’ as if this was a quantification that has some inherent meaning. An attributable risk is interpreted as if this provided a quantification of the effect of the elimination of the exposure under study. Policy implications of etiologic work are discussed as if they flowed directly from our results. We think that this is an overly optimistic assessment of what our studies can show. We think that as a field we tend to estimate the effect exposures had in the past and assume that this will be the effect in the future. We do this by treating the counterfactual of the past as equivalent to the potential outcome of the future.
An Alternative Counterfactual Framework (An Integrated Counterfactual Approach) An alternative framework, which we will refer to as an ‘‘integrated counterfactual approach’’ (ICA), distinguishes three sequential tasks in the relationship between etiologic studies and public-health interventions, the first two of which are not explicit goals in a potential outcomes frame: (1) causal identification, (2) causal explanation, and (3) the effects of causal manipulation.
Step 1: Causal Identification In line with the Cook and Campbell tradition (Shadish et al., 2002; Cook & Campbell 1979), this alternative causal approach uses the insights and methods of potential outcomes models but reframes the question that these models address as the identification of a cause rather than the result of a manipulation. Whereas the potential outcomes model is rooted in experiments, the ICA is rooted in philosophic discussions of counterfactual definitions of a cause, particularly the work of Mackie (1965, 1974). It begins with Mackie’s definition of a cause rather than a definition of a causal effect. For Mackie, X is a cause of Y if, within a causal field, with all held constant, Y would not have occurred if X had not, at least not when and how it did. Mackie’s formulation begins with a particular outcome and attempts to identify some of the factors that caused it. Thus, the causal contrast for Mackie is between what actually happened and what would have happened had everything remained the same except that one of the exposures was absent. The contrast represents the difference between a fact and a counterfactual, rather than two potential outcomes.
Causality and Psychopathology
38
Thus, for Mackie, something is a cause if the outcome under exposure is different from what the outcome would have been under nonexposure. By beginning with actual occurrences, Mackie gives prominence to the contingency of all causal identification. This approach explicitly recognizes that causes are always identified within a causal field of interest, where certain factors are assumed to be part of the background in which causes act, rather than factors available for consideration as causes. The decision to assign factors to the background may differ among researchers and time periods. Thus, there is a subjective element in deciding which, among the myriad of possible exposures, factor is hypothesized to be a cause of interest. Rothman and Greenland (1998) provide a definition of a cause in the context of health that is consistent with Mackie’s view: ‘‘We can define a cause of a specific disease event as an antecedent event, condition, or characteristic that was necessary for the occurrence of the disease at the moment it occurred, given that other conditions are fixed’’ (p. 8). As applied to a health context, both Mackie and Rothman and Greenland begin with the notion that, for most diseases, an individual can develop a disease from one of many possible causes, each of which consists of several components working together. In this model, although none of the components in any given constellation can cause disease by itself, each makes a nonredundant and necessary contribution to complete a causal mechanism. A constellation of components that is minimally sufficient to cause disease is termed a sufficient cause. Mackie referred to these component causes as ‘‘insufficient but necessary components of unnecessary but sufficient’’ (INUS) causes. Rothman’s (1976) sufficient causes are typically depicted as ‘‘causal pies.’’ As an example, assume that the disease of interest is schizophrenia. There may be three sufficient causes of this disease (see Figure 2.1). An individual can develop schizophrenia from a genetic variant, a traumatic event, and poor nutrition; from stressful life events, childhood neglect,
Poor nutrition Trauma U1 Gene
Sufficient Cause 1
Neglect Stressful event
U2
Toxin
Sufficient Cause 2
Child virus
Prenatal viral exposure U3
Vitamin deficiency Sufficient Cause 3
Figure 2.1 Potential Causes of Schizophrenia depicted as Causal Pies. Adapted from Rothman and Greenland, (1998).
2 What Would Have Been is Not What Would Be
39
and exposure to an environmental toxin; or from prenatal viral exposures, childhood viral exposure, and a vitamin deficiency. We have added components U1, U2, and U3 to the sufficient causes to represent additional unknown factors. Each individual develops schizophrenia from one of these sufficient causes; in no instance does the disease occur from any one factor—rather, it occurs due to several factors working in tandem. The ICA and potential outcomes model are quite consistent in many critical ways in this first step. Indeed, the potential outcomes model provides a logical, formal, statistical framework applicable to causal inference within the context of the ICA. Regardless of whether we intend to identify a cause or estimate a causal effect, the same isolation of the cause is required. Most essentially, this means that comparison groups must be exchangeable. However, each framework is intended to answer different questions (see Table 2.1). From a potential outcomes perspective, the goal is to estimate the causal effect of the exposure. From an ICA perspective, the goal is to identify whether an exposure was a cause. This distinction between the goals of identifying the effects of causes and the causes of effects is critical and has many consequences. First, identifying the effects of causes is future-oriented. We start with a cause and estimate its effect. The causal contrast is between the potential
Table 2.1 Differences Between the Potential Outcomes Model and an Integrated Counterfactual Approach Potential Outcomes Model Goal • Salient differences
Estimation of true causal effect
Integrated Counterfactual Approach Identification of true causes
• Estimate • Quantitative • Effects of causes
• Identify • Qualitative • Causes of effects
Compare two potential outcomes
Compare a fact with a counterfactual • Exposed under two exposure conditions • Any factor • Construct validity • Mimic assignment of exposed
Means • Salient differences
• Entire population under two
Interpretation • Salient differences
• Expect consistency
exposures
• Manipulable causes • SUTVA • Mimic random assignment Potential outcome of the future
Causal effect of the past
• Expect inconsistency
40
Causality and Psychopathology
disease experiences of a group of individuals under two exposure conditions. Identifying the causes of effects, in contrast, implies that the identification is about what happened in the past. The causal contrast is between what did happen to a group of individuals under the condition of exposure, something explicitly grounded in and limited by a particular sociohistorical reality, and what would have happened had all conditions remained constant except that the exposure was absent. This approach identifies factors that actually were causes of the outcome. Whether or not they will be causes of the outcome depends on the constellation of the other factors held constant at that particular sociohistorical moment. The effect of this cause in the future is explicitly considered a separate question. Second, when we consider a potential outcomes model, the causal effect of interest is most often the causal effect for the entire population. That is, we conceptualize the causal contrast as the entire study population under two different treatments. We create exchangeability by mimicking random assignment. Neither exposure condition is ‘‘fact’’ or ‘‘counterfactual.’’ Rather, both treatment conditions are substitutes for the experience of the entire population under that treatment. In contrast, Mackie’s perspective implies that the counterfactual of interest is a counterfactual for the exposed. We take as a given what actually happened to people with the putative causal factor and imagine a counterfactual in reference to them. We create exchangeability by mimicking the predispositions of the exposed. This puts a different spin on the issue of confounding and nonexchangeability.2 The factors that differentiate exposed and unexposed people are more easily seen as grounded in characteristics of truly exposed people and their settings. It makes explicit that the causal effect for people who are actually exposed may not be the same as the effect that cause would have on other individuals. Thus, this type of confounding is seen not as a study artifact but as a form of true differences between exposed and unexposed people that can be and must be adjusted for in our study but must also be considered as an active element in any real-life intervention. Third, the focus on estimating the effects of causes in the potential outcomes model leads to the requirement of manipulability; any factor which is not manipulable is not fodder for causal inference. From an ICA perspective, any factor can be a cause (Shadish et al., 2002). To qualify, it has to be a factor that, were it absent and with all else the same, this outcome within this context would not have occurred. Even characteristics of individuals, such as gender, are grist for a counterfactual thought experiment. The world is
2. Technically, when the effect for the entire population is of interest, full exchangeability is required. When the effect for the exposed is of interest, only partial exchangeability is required (Greenland & Robins, 1986).
2 What Would Have Been is Not What Would Be
41
fixed as it is in this context, say, with a fairly rigid set of social expectations depending on identified sex at birth. We can ask a question about what an individual’s life would have been like had he been born male, rather than female, given this social context. Fourth, this perspective brings the issue of context dependency front and center. As Rothman’s (1976) and Mackie’s (1965) models make explicit, shifts in the component causes and their distributions, variations in the field of interest, and the sociohistorical context change the impact of the cause and, indeed, determine whether or not the factor is a cause in this circumstance. Thus, the impact of a cause is explicitly recognized as context-dependent; the size of an effect is not universal. A factor can be a cause for some individuals in some contexts but not in others. Thus, the goal is the ‘‘identification of causes in the universe,’’ rather than the estimation of universal causal effects. By ‘‘causes in the universe’’ we mean factors which at some moment in time have caused the outcome of interest and could theoretically (if all else were equal) happen again.
Step 2: Causal Explanation The focus on the causes of effects facilitates an important distinction that emerges from the Cook and Campbell (1979) tradition—that between causal identification and causal explanation. From their perspective, in the first step, we identify whether the exposure of interest did cause the outcome in some people in our study. We label this ‘‘causal identification.’’3 If we want to understand the effect altering a cause in the future, an additional step of causal explanation is required. Causal explanation comprises two components, construct validity, an understanding of the ‘‘active ingredients’’ of the exposure and how they work, and external validity, an identification of the characteristics of persons, places, and settings that facilitate its effect on the outcome. Construct Validity In causal identification, we examine the causal effects of our variables as measured. In causal explanation, we ask what it is about these variables that caused the outcome. Through mediational analyses, we examine both the active ingredients of the exposure (i.e., what aspects of the exposure are causal) and the pathways through which the exposure affects the outcome. Mediational analyses explicitly explore the potential SUTVA violation inherent 3. Shadish et al. (2002) call this step ‘‘causal description.’’ We think ‘‘causal identification: is a better fit for our purposes.
42
Causality and Psychopathology
in different versions of treatments. Exploration of pathways can lead to a more parsimonious explanation for findings across different exposure measures. Based on the active ingredients of exposure (and their resultant pathways), we can test not only the specific exposure–disease relationship but also a more integrative theory regarding the underlying ‘‘general causal mechanisms’’ (Judd & Kenny, 1981). This theory allows us to make statements about an observed association that are less bounded by the specific circumstances of a given study and to generalize based on deep similarities (Judd & Kenny, 1981; Shadish et al., 2002). This generalization has two practical benefits. First, knowledge of mechanisms enhances our ability to compare study results across exposures and, thus, integrate present knowledge. Second, such an analysis can help to identify previously unknown exposures or treatments, due to the fact that they capture the same active ingredient (or work through the same mechanism) as the exposure or treatment under study (Hafeman, 2008). Let us continue the gender example. First, we test the hypothesis that female gender was a cause of depression for some people in our sample. By this we mean that there are some people who got depressed as women who would not have been depressed had they not been women (i.e., if they were male—or some other gender). Of course, causal inference is tentative as always. Assume that at this first step we identified something that is not just an association and we took care to rule out all noncausal alternative explanations to the best of our ability. Once that step is accomplished, we may ask how female gender causes disease. Gender is a multifaceted construct with many different aspects— genetic, hormonal, psychological, and social. Once we know that gender has a causal effect, probing the construct helps us to identify what it is about female gender that causes depression. This may help to verify gender’s causality in depression and to identify other exposures that do the same thing as gender (i.e., other constructs that have the same active ingredient). For example, some have suggested that the powerlessness of women’s social roles is an active ingredient in female gender as a cause of depression. This would suggest that other social roles related to powerlessness, such as low socioeconomic position, might also be causally related to depression. Probing the construct of the outcome plays a similar role in causal explanation. It helps to identify the specific aspects of the outcome that are influenced by the exposure and to refine the definition of the outcome. External Validity The other aspect of causal explanation requires an examination of the conditions under which exposures act (Shadish et al., 2002). The context
2 What Would Have Been is Not What Would Be
43
dependency of causal effects is therefore made explicit. Causal inference is strengthened through the theoretical consideration and testing of effect variation. From the perspective of the ICA, consistency of effects across settings, people, and time periods is not the expectation. Rather, variation is expected and requires examination. When we identify causes in our studies, we make decisions about the presumptive world that we hold constant, considering everything as it was when the exposure arose. Thus, the social effects and norms that may have been consequences of the exposure are frozen in the context. However, when we intervene on our causes, we must consider the new context. This aspect of causal explanation, the specification of the conditions under which exposures will and will not cause disease, is considered the separate task of external validity in the Cook and Campbell scheme.
Step 3: Causal Manipulation While this separation of causal identification and causal explanation has the benefit of placing contingency and context dependency center stage, it does not resolve the discrepancy between the effects observed in studies and the effects of interventions. It does not provide the tools necessary to uncover the feedback loops and unintended consequences of our interventions. It does not fully address the violation of the SUTVA of no interference between units. Even causal explanation is conducted within established methods of isolation, reductionism, and linearity. Prediction of the effects of causal manipulation may require a different approach, one rooted in complexity theories and systems analysis, as the critics contend (e.g., McMichael, 1999; Levins, 1997; Krieger, 1994). To understand an intervention, the complexity of the system and feedbacks depends, of course, on the question at hand. The critical issue, as Levins (1996) notes, is the ability to decide when simplification is constructive and when it is an obfuscation. The implementation of systems approaches within epidemiology requires considerable methodological and conceptual development but may be a required third step to link etiologic research to policy. The integrated causal approach does not provide a solution to the discrepancy between the results of etiologic studies and the results of public-health interventions. It does, however, provide a way of thinking in which causal identification is explicitly conceptualized as a first step rather than a last step for public-health intervention. It is a road map to a proposed peace treaty in the epidemiology wars between the counterfactual and dynamic model camps. It suggests that the models are useful for different types of questions. Counterfactual approaches, under SUTVA, are essential for identifying causes
44
Causality and Psychopathology
of the past. Dynamic models allowing for violations of SUTVA are required to understand potential outcomes of the future.
Summary The rigor of causal inference, brought to light in the development of the potential outcomes model, is essential as the basis for any intervention. Rigor is demanded because interventions developed around noncausal associations are doomed to failure. However, reifying the results of our studies by treating causes as potential interventions is also problematic. We suspect that public health will benefit from interventions identified using an approach that integrates the potential outcomes tradition of Rubin and Robins in statistics and epidemiology with the counterfactual tradition of Shadish, Cook, and Campbell in psychology. This integrated approach clarifies that the identification of causes facilitated by isolation is only a first step in policy formation. A second step, causal explanation, aids in the generalizability of our findings. Here, however, instead of replication of our study in different contexts, we generalize on the basis of the deep similarities uncovered through causal explanations. The steps of identification and explanation may require a third step of prediction to understand intervention effects. The causes that we identify, together with their mediators and effect modifiers, may be considered nodes in more complex analyses that allow for the consideration of feedback loops and the unintended consequences that are inherent in any policy application. The methods for this final step have not yet been fully developed. The conceptual separation of these three questions, grounded in a distinction between counterfactuals of the past and potential outcomes of the future may prepare the ground for such innovations. For as Kierkegaard (1943; cited in Hannay, 1996) noted, ‘‘life is to be understood backwards, but it is lived forwards.’’ At a minimum, we hope that a more modest assessment of what current epidemiologic methods can provide will help stem cynicism that inevitably arises when we promise more than we can possibly deliver.
References Beaglehole, R., & Bonita, R. (1997). Public health at the crossroads: Achievements and prospects. New York: Cambridge University Press. Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357, 370–379. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally.
2 What Would Have Been is Not What Would Be
45
Glymour, M. M. (2007). Selected samples and nebulous measures: Some methodological difficulties in life-course epidemiology. International Journal of Epidemiology, 36, 566–568. Greenland, S. (2005). Epidemiologic measures and policy formulation: Lessons from potential outcomes. Emerging Themes in Epidemiology, 2, 1–7. Greenland, S., & Robins, J. M. (1986). Identifiability, exchangeability and epidemiological confounding. International Journal of Epidemiology, 15, 413–419. Hafeman, D. (2008). Opening the black box: A re-assessment of mediation from a counterfactual perspective. Unpublished doctoral dissertation, Columbia University, New York. Hannay, A. (1996). Soren Kierkegaard S. (1843): Papers and journals. London: Penguin Books. Hernan, M. A. (2004). A definition of causal effect for epidemiological research. Journal of Epidemiology and Community Health, 58, 265–271. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960. Institute of Medicine (1988). The Future of Public Health. Washington, DC: National Academy Press. Judd, C. M., & Kenny, D. A. (1981). Process analysis: Estimating mediation in treatment evaluations. Evaluation Review, 5, 602–619. Krieger, N. (1994). Epidemiology and the web of causation: Has anyone seen the spider? Social Science and Medicine, 39, 887–903. Last, J. M. (2001). A dictionary of epidemiology. New York: Oxford University Press. Levins, R. (1996). Ten propositions on science and anti-science. Social Text, 46/47, 101–111. Levins, R. (1997). When science fails us. Forests, Trees and People Newsletter, 32/33, 1–18. Lieberson, S. (1985). Making it count: The improvement of social research and theory. Berkeley: University of California Press. Little, R. J., & Rubin, D. B. (2000). Causal effects in clinical and epidemiological studies via potential outcomes: Concepts and analytical approaches. Annual Review of Public Health, 21, 121–145. Mackie, J. L. (1965). Causes and conditions. American Philosophical Quarterly, 4, 245–264. Mackie, J. L. (1974). Cement of the universe: A study of causation. Oxford: Oxford University Press. Maldonado, G. & Greenland S. (2002). Estimating causal effects. International Journal of Epidemiology, 31, 422–429. McMichael, A. J. (1999). Prisoners of the proximate: Loosening the constraints on epidemiology in an age of change. American Journal of Epidemiology, 149, 887–897. Merton, R. K. (1936). The unanticipated consequences of purposive social action. American Sociological Review, 1, 894–904. Merton, R. K. (1968). Social theory and social structure. New York: Free Press. Milbank Memorial Fund Commission (1976). Higher education for public health: A report of the Milbank Memorial Fund Commission. New York: Prodist. Morgan, S. L., & Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. Cambridge: Cambridge University Press. Rothman, K. J. (1976). Causes. American Journal of Epidemiology, 104, 587–592. Rothman, K.J. & Greenland, S. (1998). Modern epidemiology. Philadelphia: LippincottRaven Publishers.
46
Causality and Psychopathology
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6, 34–58. Rubin, D. B. (1986). Statistics and causal inference comment: Which ifs have causal answers. Journal of the American Statistical Association, 81, 961–962. Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling decisions. Journal of the American Statistical Association, 100, 322–331. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin. Shy, C. M. (1997). The failure of academic epidemiology: Witness for the prosecution. American Journal of Epidemiology, 145, 479–484. VanderWeele, T. J., & Hernan, M. A. (2006). From counterfactuals to sufficient component causes and vice versa. European Journal of Epidemiology, 21, 855–858.
3 The Mathematics of Causal Relations judea pearl
Introduction Almost two decades have passed since Paul Holland published his highly cited review paper on the Neyman-Rubin approach to causal inference (Holland, 1986). Our understanding of causal inference has since increased severalfold, due primarily to advances in three areas: 1. Nonparametric structural equations 2. Graphical models 3. Symbiosis between counterfactual and graphical methods These advances are central to the empirical sciences because the research questions that motivate most studies in the health, social, and behavioral sciences are not statistical but causal in nature. For example, what is the efficacy of a given drug in a given population? Can data prove an employer guilty of hiring discrimination? What fraction of past crimes could have been avoided by a given policy? What was the cause of death of a given individual in a specific incident? Remarkably, although much of the conceptual framework and many of the algorithmic tools needed for tackling such problems are now well established, they are hardly known to researchers in the field who could put them into practical use. Why? Solving causal problems mathematically requires certain extensions in the standard mathematical language of statistics, and these extensions are not generally emphasized in the mainstream literature and education. As a result, large segments of the statistical research community find it hard to appreciate and benefit from the many results that causal analysis has produced in the past two decades. 47
48
Causality and Psychopathology
This chapter aims at making these advances more accessible to the general research community by, first, contrasting causal analysis with standard statistical analysis and, second, comparing and unifying various approaches to causal analysis.
From Associational to Causal Analysis: Distinctions and Barriers The Basic Distinction: Coping with Change The aim of standard statistical analysis, typified by regression, estimation, and hypothesis-testing techniques, is to assess parameters of a distribution from samples drawn of that distribution. With the help of such parameters, one can infer associations among variables, estimate the likelihood of past and future events, as well as update the likelihood of events in light of new evidence or new measurements. These tasks are managed well by standard statistical analysis so long as experimental conditions remain the same. Causal analysis goes one step further; its aim is to infer not only the likelihood of events under static conditions but also the dynamics of events under changing conditions, for example, changes induced by treatments or external interventions. This distinction implies that causal and associational concepts do not mix. There is nothing in the joint distribution of symptoms and diseases to tell us that curing the former would or would not cure the latter. More generally, there is nothing in a distribution function to tell us how that distribution would differ if external conditions were to change—say, from observational to experimental setup—because the laws of probability theory do not dictate how one property of a distribution ought to change when another property is modified. This information must be provided by causal assumptions which identify relationships that remain invariant when external conditions change. These considerations imply that the slogan ‘‘correlation does not imply causation’’ can be translated into a useful principle: One cannot substantiate causal claims from associations alone, even at the population level—behind every causal conclusion there must lie some causal assumption that is not testable in observational studies.
Formulating the Basic Distinction A useful demarcation line that makes the distinction between associational and causal concepts crisp and easy to apply can be formulated as follows. An associational concept is any relationship that can be defined in terms of a joint
3 The Mathematics of Causal Relations
49
distribution of observed variables, and a causal concept is any relationship that cannot be defined from the distribution alone. Examples of associational concepts are correlation, regression, dependence, conditional independence, likelihood, collapsibility, risk ratio, odd ratio, propensity score, ‘‘Granger causality,’’ marginalization, conditionalization, and ‘‘controlling for.’’ Examples of causal concepts are randomization, influence, effect, confounding, ‘‘holding constant,’’ disturbance, spurious correlation, instrumental variables, ignorability, exogeneity, exchangeability, intervention, explanation, and attribution. The former can, while the latter cannot, be defined in term of distribution functions. This demarcation line is extremely useful in causal analysis for it helps investigators to trace the assumptions that are needed for substantiating various types of scientific claims. Every claim invoking causal concepts must rely on some premises that invoke such concepts; it cannot be inferred from, or even defined in terms of, statistical notions alone.
Ramifications of the Basic Distinction This principle has far-reaching consequences that are not generally recognized in the standard statistical literature. Many researchers, for example, are still convinced that confounding is solidly founded in standard, frequentist statistics and that it can be given an associational definition, saying (roughly) ‘‘U is a potential confounder for examining the effect of treatment X on outcome Y when both U and X and both U and Y are not independent’’ (Pearl 2009b, p. 338). That this definition and all of its many variants must fail is obvious from the demarcation line above; ‘‘independence’’ is an associational concept, while confounding is for a tool used in establishing causal relations. The two do not mix; hence, the definition must be false. Therefore, to the bitter disappointment of generations of epidemiology researchers, confounding bias cannot be detected or corrected by statistical methods alone; one must make some judgmental assumptions regarding causal relationships in the problem before an adjustment (e.g., by stratification) can safely correct for confounding bias. Another ramification of the sharp distinction between associational and causal concepts is that any mathematical approach to causal analysis must acquire new notation for expressing causal relations—probability calculus is insufficient. To illustrate, the syntax of probability calculus does not permit us to express the simple fact that ‘‘symptoms do not cause diseases,’’ let alone to draw mathematical conclusions from such facts. All we can say is that two events are dependent—meaning that if we find one, we can expect to encounter the other but we cannot distinguish statistical dependence, quantified by the conditional probability p(disease | symptom) from causal
50
Causality and Psychopathology
dependence, for which we have no expression in standard probability calculus. Scientists seeking to express causal relationships must therefore supplement the language of probability with a vocabulary for causality, one in which the symbolic representation for the relation ‘‘symptoms cause disease’’ is distinct from the symbolic representation of ‘‘symptoms are associated with disease.’’
Two Mental Barriers: Untested Assumptions and New Notation The preceding requirements—(1) to commence causal analysis with untested,1 theoretically or judgmentally based assumptions and (2) to extend the syntax of probability calculus—constitute the two main obstacles to the acceptance of causal analysis among statisticians and among professionals with traditional training in statistics. Associational assumptions, even untested, are testable in principle, given a sufficiently large sample and sufficiently fine measurements. Causal assumptions, in contrast, cannot be verified even in principle, unless one resorts to experimental control. This difference stands out in Bayesian analysis. Though the priors that Bayesians commonly assign to statistical parameters are untested quantities, the sensitivity to these priors tends to diminish with increasing sample size. In contrast, sensitivity to prior causal assumptions—say, that treatment does not change gender—remains substantial regardless of sample size. This makes it doubly important that the notation we use for expressing causal assumptions be meaningful and unambiguous so that one can clearly judge the plausibility or inevitability of the assumptions articulated. Statisticians can no longer ignore the mental representation in which scientists store experiential knowledge since it is this representation and the language used to access this representation that determine the reliability of the judgments upon which the analysis so crucially depends. How does one recognize causal expressions in the statistical literature? Those versed in the potential-outcome notation (Neyman, 1923; Rubin, 1974; Holland, 1986) can recognize such expressions through the subscripts that are attached to counterfactual events and variables, for example, Yx(u) or Zxy—some authors use parenthetical expressions, such as Y(x, u) or Z(x, y). The expression Yx(u), for example, stands for the value that outcome Y would take in individual u had treatment X been at level x. If u is chosen at random, Yx is a random variable and one can talk about the probability that Yx
1. By ‘‘untested’’ I mean untested using frequency data in nonexperimental studies.
3 The Mathematics of Causal Relations
51
would attain a value y in the population, written p(Yx = y). Alternatively, Pearl (1995) used expressions of the form p[Y = y | set(X = x)] or p[Y = y | do(X = x)] to denote the probability (or frequency) that event (Y = y) would occur if treatment condition (X = x) were enforced uniformly over the population.2 Still a third notation that distinguishes causal expressions is provided by graphical models, where the arrows convey causal directionality.3 However, few have taken seriously the textbook requirement that any introduction of new notation must entail a systematic definition of the syntax and semantics that govern the notation. Moreover, in the bulk of the statistical literature before 2000, causal claims rarely appear in the mathematics. They surface only in the verbal interpretation that investigators occasionally attach to certain associations and in the verbal description with which investigators justify assumptions. For example, the assumption that a covariate is not affected by a treatment, a necessary assumption for the control of confounding (Cox, 1958), is expressed in plain English, not in a mathematical expression. Remarkably, though the necessity of explicit causal notation is now recognized by most leaders in the field, the use of such notation has remained enigmatic to most rank-and-file researchers and its potentials still lay grossly underutilized in the statistics-based sciences. The reason for this, I am firmly convinced, can be traced to the way in which causal analysis has been presented to the research community, relying primarily on outdated paradigms of controlled randomized experiments and black-box ‘‘missing-data’’ models (Rubin, 1974; Holland, 1986). The next section provides a conceptualization that overcomes these mental barriers; it offers both a friendly mathematical machinery for cause–effect analysis and a formal foundation for counterfactual analysis.
The Language of Diagrams and Structural Equations Semantics: Causal Effects and Counterfactuals How can one express mathematically the common understanding that symptoms do not cause diseases? The earliest attempt to formulate such a relationship mathematically was made in the 1920s by the geneticist Sewall Wright 2. Clearly, P[Y = y|do(X = x)] is equivalent to P(Yx = y). This is what we normally assess in a controlled experiment, with X randomized, in which the distribution of Y is estimated for each level x of X. 3. These notational clues should be useful for detecting inadequate definitions of causal concepts; any definition of confounding, randomization, or instrumental variables that is cast in standard probability expressions, void of graphs, counterfactual subscripts, or do(*) operators, can safely be discarded as inadequate.
Causality and Psychopathology
52
(1921), who used a combination of equations and graphs. For example, if X stands for a disease variable and Y stands for a certain symptom of the disease, Wright would write a linear equation y ¼ x þ u
ð1Þ
where x stands for the level (or severity) of the disease, y stands for the level (or severity) of the symptom, and u stands for all factors, other than the disease in question, that could possibly affect Y.4 In interpreting this equation, one should think of a physical process whereby nature examines the values of X and U and, accordingly, assigns variable Y the value y = x + u. To express the directionality inherent in this process, Wright augmented the equation with a diagram, later called a ‘‘path diagram,’’ in which arrows are drawn from (perceived) causes to their (perceived) effects and, more importantly, the absence of an arrow makes the empirical claim that the value nature assigns to one variable is not determined by the value taken by another.5 The variables V and U are called ‘‘exogenous’’; they represent observed or unobserved background factors that the modeler decides to keep unexplained, that is, factors that influence, but are not influenced by, the other variables (called ‘‘endogenous’’) in the model. If correlation is judged possible between two exogenous variables, U and V, it is customary to connect them by a dashed double arrow, as shown in Figure 3.1b. To summarize, path diagrams encode causal assumptions via missing arrows, representing claims of zero influence, and missing double arrows (e.g., between V and U), representing the (causal) assumption Cot(U, V) = 0.
V x=v y = βx + u X
β (a)
U
V
Y
X
U
β
Y
(b)
Figure 3.1 A simple structural equation model, and its associated diagrams. Unobserved exogenous variables are connected by dashed arrows.
4. We use capital letters (e.g., X,Y,U) for variable names and lower case letters (e.g., x,y,u) for values taken by these variables. 5. A weaker class of causal diagrams, known as ‘‘causal Bayesian networks,’’ encodes interventional, rather than functional dependencies; it can be used to predict outcomes of randomized experiments but not probabilities of counterfactuals (for formal definition, see Pearl, 2000a, pp. 22–24).
3 The Mathematics of Causal Relations W
V
W
U
53
V
U
X
Y
x0 Z
X
Y
Z
(a)
(b)
Figure 3.2 (a) The diagram associated with the structural model of Eq. (2). (b) The diagram associated with the modified model of Eq. (3), representing the intervention do (X = x0).
The generalization to a nonlinear system of equations is straightforward. For example, the nonparametric interpretation of the diagram of Figure 3.2a corresponds to a set of three functions, each corresponding to one of the observed variables: z ¼ fZ ðwÞ x ¼ fX ðz; Þ
ð2Þ
y ¼ fY ðx; uÞ where W, V, and U are here assumed to be jointly independent but, otherwise, arbitrarily distributed. Remarkably, unknown to most economists and philosophers, structural equation models provide a formal interpretation and symbolic machinery for analyzing counterfactual relationships of the type ‘‘Y would be y had X been x in situation U = u,’’ denoted Yx(u) = y. Here, U stands for the vector of all exogenous variables and represents all relevant features of an experimental unit (i.e., a patient or a subject). The key idea is to interpret the phrase ‘‘had X been x0’’ as an instruction to modify the original model M and replace the equation for X by a constant, x0, yielding a modified model, Mx0 : z ¼ fZ ðwÞ x ¼ x0
ð3Þ
y ¼ fY ðx; uÞ the graphical description of which is shown in Figure 3.2b. This replacement permits the constant x0 to differ from the actual value of X—namely, fX(z, v)—without rendering the system of equations inconsistent, thus yielding a formal definition of counterfactuals in multistage models, where the dependent variable in one equation may be an independent variable in another (Balke & Pearl, 1994a, 1994b; Pearl, 2000b). The general definition reads as follows: 4
Yx ðuÞ ¼ YMx ðuÞ:
ð4Þ
54
Causality and Psychopathology
In words, the counterfactual Yx(u) in model M is defined as the solution for Y in the modified submodel Mx, in which the equation for X is replaced by X = x. For example, to compute the average causal effect of X on Y, that is, EðYx0 Þ, we solve equation 3 for Y in terms of the exogenous variables, yielding Yx0 ¼ fY ðx0 ; uÞ, and average over U and V. To answer more sophisticated questions, such as whether Y would be y1 if X were x1 given that in fact Y is y0 and X is x0, we need to compute the conditional probability, PðYx1 ¼ y1 jY ¼ y0 ; X ¼ x0 Þ, which is well defined once we know the forms of the structural equations and the distribution of the exogenous variables in the model. This formalization of counterfactuals, cast as solutions to modified systems of equations, provides the conceptual and formal link between structural equation models used in economics and social science, the potentialoutcome framework, to be discussed later under The Language of Potential Outcomes; Lewis’ (1973) ‘‘closest-world’’ counterfactuals; Woodward’s (2003) ‘‘interventionalism’’ approach; Mackie’s (1965) ‘‘insufficient but necessary components of unnecessary but sufficient’’ (INUS) condition; and Rothman’s (1976) ‘‘sufficient component’’ framework (see VanderWeele and Robins, 2007). The next section discusses two long-standing problems that have been completely resolved in purely graphical terms, without delving into algebraic techniques.
Confounding and Causal Effect Estimation The central target of most studies in the social and health sciences is the elucidation of cause–effect relationships among variables of interests, for example, treatments, policies, preconditions, and outcomes. While good statisticians have always known that the elucidation of causal relationships from observational studies must rest on assumptions about how the data were generated, the relative roles of assumptions and data and the ways of using those assumptions to eliminate confounding bias have been a subject of much controversy. The preceding structural framework puts these controversies to rest. Covariate Selection: The Back-Door Criterion Consider an observational study where we wish to find the effect of X on Y, for example, treatment on response, and assume that the factors deemed relevant to the problem are structured as in Figure 3.3; some are affecting the response, some are affecting the treatment, and some are affecting both treatment and response. Some of these factors may be unmeasurable, such as genetic trait or lifestyle, while others are measurable, such as gender, age,
3 The Mathematics of Causal Relations Z 1
Z 2
W1 Z 3 X
55
W 3
W 2 Y
Figure 3.3 Graphical model illustrating the back-door criterion. Error terms are not shown explicitly.
and salary level. Our problem is to select a subset of these factors for measurement and adjustment so that if we compare treated vs. untreated subjects having the same values of the selected factors, we get the correct treatment effect in that subpopulation of subjects. Such a set of factors is called a ‘‘sufficient set,’’ ‘‘admissible’’ or a set ‘‘appropriate for adjustment.’’ The problem of defining a sufficient set, let alone finding one, has baffled epidemiologists and social scientists for decades (for review, see Greenland, Pearl, & Robins, 1999; Pearl, 2000a, 2009a). The following criterion, named the ‘‘back-door’’ criterion (Pearl, 1993a), provides a graphical method of selecting such a set of factors for adjustment. It states that a set, S, is appropriate for adjustment if two conditions hold: 1. No element of S is a descendant of X. 2. The elements of S ‘‘block’’ all back-door paths from X to Y, that is, all paths that end with an arrow pointing to X.6 Based on this criterion we see, for example, that each of the sets {Z1, Z2, Z3}, {Z1, Z3}, and {W2, Z3} is sufficient for adjustment because each blocks all back-door paths between X and Y. The set {Z3}, however, is not sufficient for adjustment because it does not block the path X W1 Z1 ! Z3 Z2 ! W2 ! Y. The implication of finding a sufficient set, S, is that stratifying on S is guaranteed to remove all confounding bias relative to the causal effect of X on Y. In other words, it renders the causal effect of X on Y identifiable, via PðY ¼ yjdoðX ¼ xÞÞ X PðY ¼ yjX ¼ x; S ¼ sÞPðS ¼ sÞ ¼
ð5Þ
s
6. In this criterion, a set, S, of nodes is said to block a path, P, if either (1) P contains at least one arrow-emitting node that is in S or (2) P contains at least one collision node (e.g., ! Z ) that is outside S and has no descendant in S (see Pearl, 2009b, pp. 16–17, 335–337).
Causality and Psychopathology
56
Since all factors on the right-hand side of the equation are estimable (e.g., by regression) from the preinterventional data, the causal effect can likewise be estimated from such data without bias. The back-door criterion allows us to write equation 5 directly, after selecting a sufficient set, S, from the diagram, without resorting to any algebraic manipulation. The selection criterion can be applied systematically to diagrams of any size and shape, thus freeing analysts from judging whether ‘‘X is conditionally ignorable given S,’’ a formidable mental task required in the potential-response framework (Rosenbaum & Rubin, 1983). The criterion also enables the analyst to search for an optimal set of covariates—namely, a set, S, that minimizes measurement cost or sampling variability (Tian, Paz, & Pearl, 1998). General Control of Confounding Adjusting for covariates is only one of many methods that permit us to estimate causal effects in nonexperimental studies. A much more general identification criterion is provided by the following theorem: Theorem 1 (Tian & Pearl, 2002) A sufficient condition for identifying the causal effect P[y|do(x)] is that every path between X and any of its children traces at least one arrow emanating from a measured variable.7 For example, if W3 is the only observed covariate in the model of Figure 3.3, then there exists no sufficient set for adjustment (because no set of observed covariates can block the paths from X to Y through Z3), yet P[y|do(x)] can nevertheless be estimated since every path from X to W3 (the only child of X) traces either the arrow X ! W3, or the arrow W3 ! Y, each emanating from a measured variable. In this example, the variable W3 acts as a ‘‘mediating instrumental variable’’ (Pearl, 1993b; Chalak & White, 2006) and yields the following estimand: PðY ¼ yjdo ðX ¼ xÞÞ X PðW3 ¼ wjdo ðX ¼ xÞÞPðY ¼ yjdo ðW3 ¼ wÞÞ ¼ w3
¼
X w
PðwjxÞ
X
ð6Þ
Pð y j w; x 0 ÞPðx 0 Þ
x
More recent results extend this theorem by (1) presenting a necessary and sufficient condition for identification (Shpitser & Pearl, 2006) and 7. Before applying this criterion, one may delete from the causal graph all nodes that are not ancestors of Y.
3 The Mathematics of Causal Relations
57
(2) extending the condition from causal effects to any counterfactual expression (Shpitser & Pearl, 2007). The corresponding unbiased estimands for these causal quantities are readable directly from the diagram.
The Language of Potential Outcomes The elementary object of analysis in the potential-outcome framework is the unit-based response variable, denoted Yx(u)—read ‘‘the value that Y would obtain in unit u had treatment X been x’’ (Neyman, 1923; Rubin, 1974). These subscripted variables are treated as undefined quantities, useful for expressing the causal quantities we seek, but are not derived from other quantities in the model. In contrast, in the previous section counterfactual entities were derived from a set of meaningful physical processes, each represented by an equation, and unit was interpreted a vector u of background factors that characterize an experimental unit. Each structural equation model thus provides a compact representation for a huge number of counterfactual claims, guaranteed to be consistent. In view of these features, the structural definition of Yx(u) (equation 4) can be regarded as the formal basis for the potential-outcome approach. It interprets the opaque English phrase ‘‘the value that Y would obtain in unit u had X been x’’ in terms of a scientifically-based mathematical model that allows such values to be computed unambiguously. Consequently, important concepts in potential-response analysis that researchers find ill-defined or esoteric often obtain meaningful and natural interpretation in the structural semantics. Examples are ‘‘unit’’ (‘‘exogenous variables’’ in structural semantics), ‘‘principal stratification’’ (‘‘equivalence classes’’ in structural semantics) (Balke & Pearl, 1994b; Pearl, 2000b), ‘‘conditional ignorability’’ (‘‘back-door condition’’ in Pearl, 1993a), and ‘‘assignment mechanism’’ [P(x|direct causes of X) in structural semantics]. The next two subsections examine how assumptions and inferences are handled in the potential-outcome approach vis a` vis the graphical–structural approach.
Formulating Assumptions The distinct characteristic of the potential-outcome approach is that, although its primitive objects are undefined, hypothetical quantities, the analysis itself is conducted almost entirely within the axiomatic framework of probability theory. This is accomplished by postulating a ‘‘super’’ probability function on both hypothetical and real events, treating the former as ‘‘missing data.’’ In other words, if U is treated as a random variable, then the value of the counterfactual Yx(u) becomes a random variable as well, denoted as Yx.
58
Causality and Psychopathology
The potential-outcome analysis proceeds by treating the observed distribution P(x1 . . . xn) as the marginal distribution of an augmented probability function (P*) defined over both observed and counterfactual variables. Queries about causal effects are phrased as queries about the probability distribution of the counterfactual variable of interest, written P*(Yx = y). The new hypothetical entities Yx are treated as ordinary random variables; for example, they are assumed to obey the axioms of probability calculus, the laws of conditioning, and the axioms of conditional independence. Moreover, these hypothetical entities are not entirely whimsy but are assumed to be connected to observed variables via consistency constraints (Robins, 1986) such as X ¼ x ) Yx ¼ Y;
ð7Þ
which states that for every u, if the actual value of X turns out to be x, then the value that Y would take on if X were x is equal to the actual value of Y. For example, a person who chose treatment x and recovered would also have recovered if given treatment x by design. The main conceptual difference between the two approaches is that, whereas the structural approach views the subscript x as an operation that changes the distribution but keeps the variables the same, the potentialoutcome approach views Yx to be a different variable, unobserved and loosely connected to Y through relations such as equation 7. Pearl (2000a, chap. 7) shows, using the structural interpretation of Yx(u), that it is indeed legitimate to treat counterfactuals as jointly distributed random variables in all respects, that consistency constraints like equation 7 are automatically satisfied in the structural interpretation, and, moreover, that investigators need not be concerned about any additional constraints except the following two:8 Yyz ¼ y for all y and z Xz ¼ x ) Yxz ¼ Yz for all x and z
ð8Þ ð9Þ
Equation 8 ensures that the intervention do(Y = y) results in the condition Y = y, regardless of concurrent interventions, say do(Z = z), that are applied to variables other than Y. Equation 9 generalizes equation 7 to cases where Z is held fixed at z. To communicate substantive causal knowledge, the potential-outcome analyst must express causal assumptions as constraints on P*, usually in the
8. This completeness result is due to Halpern (1998), who noted that an additional axiom
fYxz ¼ yg & fZxy ¼ zg ) Yx ¼ y must hold in nonrecursive models. This fundamental axiom may come to haunt economists and social scientists who blindly apply Neyman-Rubin analysis in their fields.
3 The Mathematics of Causal Relations
59
form of conditional independence assertions involving counterfactual variables. In Figure 3.2(a), for instance, to communicate the understanding that a treatment assignment (Z) is randomized (hence independent of both U and V), the potential-outcome analyst needs to use the independence constraint Z ?? fXz ; Yx g. To further formulate the understanding that Z does not affect Y directly, except through X, the analyst would write a so-called exclusion restriction: Yxz = Yx. Clearly, no mortal can judge the validity of such assumptions in any real-life problem without resorting to graphs.9
Performing Inferences A collection of assumptions of this type might sometimes be sufficient to permit a unique solution to the query of interest; in other cases, only bounds on the solution can be obtained. For example, if one can plausibly assume that a set, Z, of covariates satisfies the conditional independence Yx ?? XjZ
ð10Þ
(an assumption that was termed ‘‘conditional ignorability’’ by Rosenbaum & Rubin, 1983), then the causal effect, P*(Yx = y), can readily be evaluated to yield X P ðYx ¼ yÞ ¼ P ðYx ¼ yjzÞPðzÞ z
¼
X
P ðYx ¼ yjx; zÞPðzÞ ðusing ð10ÞÞ
z
¼
X
P ðY ¼ yjx; zÞPðzÞ
ðusing ð7ÞÞ
z
¼
X
Pð yjx; zÞPðzÞ:
z
which is the usual covariate-adjustment formula, as in equation 5. Note that almost all mathematical operations in this derivation are conducted within the safe confines of probability calculus. Save for an occasional application of rule 9 or 7, the analyst may forget that Yx stands for a counterfactual quantity—it is treated as any other random variable, and the entire derivation follows the course of routine probability exercises. However, this mathematical illusion comes at the expense of conceptual clarity, especially at a stage where causal assumptions need to be formulated. The reader may appreciate this aspect by attempting to judge whether the assumption of conditional ignorability (equation 10), the key to the derivation 9. Even with the use of graphs the task is not easy; for example, the reader should try to verify whether fZ ?? Xz jYg holds in the simple model of Figure 3.2(a). The answer is given in Pearl (2000a, p. 214).
60
Causality and Psychopathology
of equation 11, holds in any familiar situation—say, in the experimental setup of Figure 3.2(a). This assumption reads ‘‘the value that Y would obtain had X been x is independent of X, given Z’’ (see footnote 4). Such assumptions of conditional independence among counterfactual variables are not straightforward to comprehend or ascertain for they are cast in a language far removed from ordinary understanding of cause and effect. When counterfactual variables are not viewed as by-products of a deeper, processbased model, it is also hard to ascertain whether all relevant counterfactual independence judgments have been articulated, whether the judgments articulated are redundant, or whether those judgments are self-consistent. The need to express, defend, and manage formidable counterfactual relationships of this type explains the slow acceptance of causal analysis among epidemiologists and statisticians and why economists and social scientists continue to use structural equation models instead of the potential-outcome alternatives advocated in Holland (1988); Angrist, Imbens, and Rubin (1996); and Sobel (1998). On the other hand, the algebraic machinery offered by the potential-outcome notation, once a problem is properly formalized, can be powerful in refining assumptions (Angrist et al., 1996), deriving consistent estimands (Robins, 1986), analyzing mediation (Pearl, 2001), bounding probabilities of causation (Tian & Pearl, 2000), and combining data from experimental and nonexperimental studies (Pearl, 2000a, pp. 302–303).
Combining Graphs and Counterfactuals—The Mediation Formula Pearl (2000a, p. 232) presents a way of combining the best features of the two approaches. It is based on encoding causal assumptions in the language of diagrams, translating these assumptions into potential-outcome notation, performing the mathematics in the algebraic language of counterfactuals, and, finally, interpreting the result in plain causal language. Often, the answer desired can be obtained directly from the diagram, and no translation is necessary (as demonstrated earlier, Confounding and Causal Effect Estimation). One area that has benefited substantially from this symbiosis is the analysis of direct and indirect effects, also known as ‘‘mediation analysis’’ (Shrout & Bolger, 2002), which has resisted generalizations to discrete variables and nonlinear interactions for several decades (Robins & Greenland, 1992; Mackinnon, Lockwood, Brown, Wang, & Hoffman, 2007). The obstacles were definitional; the direct effect is sensitive to the level at which we condition the intermediate variable, while the indirect effect cannot be defined by conditioning on a third variable or taking the difference between the total and direct effects.
3 The Mathematics of Causal Relations
61
The structural definition of counterfactuals (equation 4) and the graphical analysis (see Confounding and Causal Effect Estimation) combined to produce formal definitions of, and graphical conditions under which, direct and indirect effects can be estimated from data (Pearl, 2001; Petersen, Sinisi, & van der Laan, 2006). In particular, under conditions of no unmeasured (or uncontrolled for) confounders, this symbiosis has produced the following Mediation Formulas for the expected direct (DE) and indirect (IE) effects of the transition from X = x to X = x’ (with outcome Y and mediating set Z): X ½EðYjx 0 ; zÞ EðYjx; zÞPðzjxÞ: ð12Þ DE ¼ IE ¼
z X
EðYjx; zÞ½Pðzjx 0 Þ PðzjxÞ
ð13Þ
z
These general formulas are applicable to any type of variables,10 any nonlinear interactions, and any distribution and, moreover, are readily estimable by regression. IE (respectively, DE) represents the average increase in the outcome Y that the transition from X = x to X = x’ is expected to produce absent any direct (respectively, indirect) effect of X on Y. When the outcome Y is binary (e.g., recovery or hiring), the ratio (1 – IE/TE) represents the fraction of responding individuals who owe their response to direct paths, while (1 – DE/TE) represents the fraction who owe their response to Z-mediated paths. TE stands for the total effect, TE = E(Y|x’) – E(Y|x), which, in nonlinear systems may or may not be the sum of the direct and indirect effects. Additional results spawned by the structural–graphical–counterfactual symbiosis include effect estimation under noncompliance (Balke & Pearl, 1997; Chickering & Pearl, 1997), mediating instrumental variables (Pearl, 1993b; Brito & Pearl, 2006), robustness analysis (Pearl, 2004), selecting predictors for propensity scores (Pearl, 2010a, 2010c), and estimating the effect of treatment on the treated (Shpitser & Pearl, 2009). Detailed descriptions of these results are given in the corresponding articles (available at http:// bayes.cs.ucla.edu/csl_papers.html).
Conclusions Statistics is strong in devising ways of describing data and inferring distributional parameters from a sample. Causal inference require two additional
10. Integrals should replace summations when Z is continuous. Generalizations to cases involving observed or unobserved confounders are given in Pearl (2001) and exemplified in Pearl (2010a, 2010b). Conceptually, IE measures the average change in Y under the operation of setting X to x and, simultaneously, setting Z to whatever value it would have obtained under X = x’ (Robins & Greenland, 1992).
62
Causality and Psychopathology
ingredients: a science-friendly language for articulating causal knowledge and a mathematical machinery for processing that knowledge, combining it with data, and drawing new causal conclusions about a phenomenon. This chapter introduces nonparametric structural equation models as a formal and meaningful language for formulating causal knowledge and for explicating causal concepts used in scientific discourse. These include randomization, intervention, direct and indirect effects, confounding, counterfactuals, and attribution. The algebraic component of the structural language coincides with the potential-outcome framework, and its graphical component embraces Wright’s method of path diagrams (in its nonparametric version). When unified and synthesized, the two components offer investigators a powerful methodology for empirical research (e.g., Morgan & Winship, 2007; Greenland et al., 1999; Glymour & Greenland, 2008; Chalak & White, 2006; Pearl, 2009a). Perhaps the most important message of the discussion and methods presented in this chapter would be a widespread awareness that (1) all studies concerning causal relations must begin with causal assumptions of some sort and (2) a friendly and formal language is currently available for articulating such assumptions. This means that scientific articles concerning questions of causation must contain a section in which causal assumptions are articulated using either graphs or subscripted formulas. Authors who wish their assumptions to be understood, scrutinized, and discussed by readers and colleagues would do well to use graphs. Authors who refrain from using graphs would be risking a suspicion of attempting to avoid transparency of their working assumptions. Another important implication is that every causal inquiry can be mathematized. In other words, mechanical procedures can now be invoked to determine what assumptions investigators must be willing to make in order for desired quantities to be estimable consistently from the data. This is not to say that the needed assumptions would be reasonable or that the resulting estimation method would be easy. It means that the needed causal assumptions can be made transparent and brought up for discussion and refinement and that, once consistency is assured, causal quantities can be estimated from data through ordinary statistical methods, free of the mystical aura that has shrouded causal analysis in the past.
References Angrist, J., Imbens, G., & Rubin, D. (1996). Identification of causal effects using instrumental variables (with comments). Journal of the American Statistical Association, 91(434), 444–472. Balke, A., & Pearl, J. (1994a). Counterfactual probabilities: Computational methods, bounds, and applications. In R. L. de Mantaras and D. Poole (Eds.), Proceedings of the
3 The Mathematics of Causal Relations
63
Tenth Conference on Uncertainty in Artificial Intelligence (pp. 46–54). San Mateo, CA: Morgan Kaufmann. Balke, A., & Pearl, J. (1994b). Probabilistic evaluation of counterfactual queries. In Proceedings of the Twelfth National Conference on Artificial Intelligence (Vol. I, pp. 230–237). Menlo Park, CA: MIT Press. Balke, A., & Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92(439), 1172–1176. Brito, C., & Pearl, J. (2006). Graphical condition for identification in recursive SEM. In Proceedings of the Twenty-third Conference on Uncertainty in Artificial Intelligence (pp. 47–54). Corvallis, OR: AUAI Press. Chalak, K., & White, H. (2006). An extended class of instrumental variables for the estimation of causal effects (Tech. Rep. Discuss. Paper). San Diego: University of California, San Diego, Department of Economics. Chickering, D., & Pearl, J. (1997). A clinician’s tool for analyzing non-compliance. Computing Science and Statistics, 29(2):424–431. Cox, D. (1958). The Planning of Experiments. New York: John Wiley & Sons. Glymour, M., & Greenland, S. (2008). Causal diagrams. In K. Rothman, S. Greenland, and T. Lash (Eds.), Modern Epidemiology (3rd ed., pp. 183–209). Philadelphia: Lippincott Williams & Wilkins. Greenland, S., Pearl, J., & Robins, J. (1999). Causal diagrams for epidemiologic research. Epidemiology, 10(1):37–48. Halpern, J. (1998). Axiomatizing causal reasoning. In G. Cooper and S. Moral (Eds.), Uncertainty in Artificial Intelligence (pp. 202–210). San Francisco: Morgan Kaufmann. (Reprinted in Journal of Artificial Intelligence Research, 12, 17–37, 2000.) Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. Holland, P. (1988). Causal inference, path analysis, and recursive structural equations models. In C. Clogg (Ed.), Sociological Methodology (pp. 449–484). Washington, DC: American Sociological Association. Lewis, D. (1973). Counterfactuals. Cambridge, MA: Harvard University Press. Mackie, J. (1965). Causes and conditions. American Philosophical Quarterly, 2/4:261–264. (Reprinted in E. Sosa and M. Tooley [Eds.], Causation. Oxford: Oxford University Press, 1993.) MacKinnon, D., Lockwood, C., Brown, C., Wang, W., & Hoffman, J. (2007). The intermediate endpoint effect in logistic and probit regression. Clinical Trials, 4, 499–513. Morgan, S., & Winship, C. (2007). Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). New York: Cambridge University Press. Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Statistical Science, 5(4), 465–480. Pearl, J. (1993a). Comment: Graphical models, causality, and intervention. Statistical Science, 8(3), 266–269. Pearl, J. (1993b). Mediating instrumental variables (Tech. Rep. No. TR-210). Los Angeles: University of California, Los Angeles, Department of Computer Science. http://ftp.cs.ucla.edu/pub/stat_ser/R210.pdf. Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4), 669–710. Pearl, J. (2000a). Causality: Models, Reasoning, and Inference. New York: Cambridge University Press.
64
Causality and Psychopathology
Pearl, J. (2000b). Comment on A. P. Dawid’s, Causal inference without counterfactuals. Journal of the American Statistical Association, 95(450), 428–431. Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan Kaufmann. Pearl, J. (2004). Robustness of causal claims. In M. Chickering and J. Halpern (Eds.), Proceedings of the Twentieth Conference Uncertainty in Artificial Intelligence (pp. 446– 453). Arlington, VA: AUAI Press. Pearl, J. (2009a). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146. Retrieved from http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf. Pearl, J. (2009b). Causality: Models, Reasoning, and Inference. New York: Cambridge University Press, 2nd edition. Pearl, J. (2009c). Remarks on the method of propensity scores. Statistics in Medicine, 28, 1415–1416. Retrieved from http://ftp.cs.ucla.edu/pub/stat_ser/r345-sim.pdf. Pearl, J. (2010a). The foundation of causal inference. (Tech. Rep. No. R-355). Los Angeles: University of California, Los Angeles. http://ftp.cs.ucla.edu/pub/stat_ser/ r355.pdf. Forthcoming, Sociological Methodology. Pearl, J. (2010b). The mediation formula: A guide to the assessment of causal pathways in non-linear models. (Tech. Rep. No. R-363). Los Angeles: University of California, Los Angeles, http://ftp.cs.ucla.edu/pub/stat_ser/r363.pdf. Pearl, J. (2010c). On a class of bias-amplifying variables that endanger effect estimates. (In P. Grunwald and P. Spirtes (Eds.), Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (pp. 425–432). Corvallis, OR: AUAI. http:// ftp.cs.ucla.edu/pub/stat_ser/r356.pdf. Petersen, M., Sinisi, S., & van der Laan, M. (2006). Estimation of direct causal effects. Epidemiology, 17(3), 276–284. Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—applications to control of the healthy workers survivor effect. Mathematical Modeling, 7, 1393–1512. Robins, J., & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3(2), 143–155. Rosenbaum, P., & Rubin, D. (1983). The central role of propensity score in observational studies for causal effects. Biometrika, 70, 41–55. Rothman, K. (1976). Causes. American Journal of Epidemiology, 104, 587–592. Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701. Shpitser, I., & Pearl, J. (2006). Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (pp. 1219–1226). Menlo Park, CA: AAAI Press. Shpitser, I., & Pearl, J. (2007). What counterfactuals can be tested. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (pp. 352–359). Vancouver, Canada: AUAI Press. (Reprinted in Journal of Machine Learning Research, 9, 1941–1979, 2008.) Shpitser, I., & Pearl, J. (2009). Effects of treatment on the treated: Identification and generalization. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. Montreal: AUAI Press. Shrout, P., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations. Psychological Methods, 7(4), 422–445.
3 The Mathematics of Causal Relations
65
Sobel, M. (1998). Causal inference in statistical models of the process of socioeconomic achievement. Sociological Methods & Research, 27(2), 318–348. Tian, J., Paz, A., & Pearl, J. (1998). Finding minimal separating sets (Tech. Rep. No. R-254). Los Angeles: University of California, Los Angeles. http://ftp.cs.ucla.edu/pub/ stat_ser/r254.pdf. Tian, J., & Pearl, J. (2000). Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28, 287–313. Tian, J., & Pearl, J. (2002). A general identification condition for causal effects. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (pp. 567–573). Menlo Park, CA: AAAI Press/MIT Press. VanderWeele, T., & Robins, J. (2007). Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology, 18(5), 561–568. Woodward, J. (2003). Making Things Happen. New York: Oxford University Press. Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20, 557–585.
4 Causal Thinking in Psychiatry A Genetic and Manipulationist Perspective kenneth s. kendler
A large and daunting philosophical literature exists on the nature and meaning of causality. Add to that the extensive discussions in the statistical literature about what it means to claim that C causes E, and it can be overwhelming for the scientists, who, after all, are typically just seeking guidelines about how to conduct and analyze their research. Add to this mix the inherent problems in psychiatry—which examines an extraordinarily wide array of potential causal processes from molecules to minds and societies, some of which permit experimental manipulation but many of which do not—and you can readily see the sense of frustration and, indeed, futility with which this issue might be addressed. In the first section of this chapter, I reflect on two rather practical aspects of causal inference that I have confronted in my research career in psychiatric genetics. The first of these is what philosophers call a ‘‘brute fact’’ of our world—the unidirectional causal relationship between variation in genomic DNA and phenotype. The second is the co-twin–control method—a nice example of trying to use twins as a ‘‘natural experiment’’ to clarify causal processes when controlled trials are infeasible. In the second section, I briefly outline and advocate for a particular approach to causal inference developed by Jim Woodward (2003) that I term ‘‘interventionism.’’ I argue that this approach is especially well suited to the needs of our unusual field of psychiatry.
Two Practical Aspects of Causal Inference in Psychiatric Genetics Research I often teach students that it is almost too easy in psychiatric research to show that putative risk factors correlate with outcomes. It is much harder to 66
4 Causal Thinking in Psychiatry
67
determine if that relationship is a causal one. Indeed, assuming that for practical or ethical reasons a randomized trial of exposure to the risk factor is not feasible, one must rely on observational data. In these instances, it can be ‘‘damn near impossible’’ to confidently infer causation. However, in this mire of casual uncertainty, it is interesting to note that one relationship stands out in its causal clarity: the relationship between variation in germline DNA (gDNA) and phenotypes. It did not have to be this way. Indeed, folk wisdom has long considered the inheritance of acquired characteristics (which implies a phenotype ! gDNA relationship, interpreted as phenotype causes gDNA) to be a plausible mechanism of heredity. In the eighteenth century, this concept (of the inheritance of acquired characteristics) was most closely associated with the name of Lamark. In the twentieth century, due to an unfortunate admixture of bad science and repressive politics, this same process came to dominate Soviet biology through the efforts of Lysenko. However, acquired characteristics are not inherited through changes in our DNA sequence. Rather, the form of life of which we are a product evolved in such a manner as to render the sequence of gDNA relatively privileged and protected. Therefore, causal relationships between genes and phenotypes are, of necessity, unidirectional. To put this more crudely, genes can influence phenotypes but phenotypes cannot influence genes. I do not mean to imply that gDNA never varies. It is subject to a range of random features, from point mutations to insertions of transposons and slippage of replication machinery leading to deletions and duplications. However, I am unaware of any widely accepted claims that such changes in gDNA can occur in a systematic and directed fashion so as to establish a true phenotype ! gDNA causal pathway. Let me be a bit more precise. Assume that our predictor variable is a measure of genetic variation (GV). This can be either latent—as it might be if we are studying twins or adoptees—or observed—if we are directly examining variation in genomic sequence (e.g., via single nucleotide polymorphisms). Assume our dependent measure is the liability toward having a particular psychiatric disease, which we will term ‘‘risk.’’ We can then assume that GV causes risk: GV ! Risk Also, we can be certain that risk does not cause GV: Risk ! GV This claim is specific and limited. It does not apply to other features of our genetic machinery such as gene expression—the product of genes at the level
68
Causality and Psychopathology
of either mRNA or protein—and epigenetic modifications of DNA. Our expression levels can be exquisitely sensitive to environmental effects. There is also evidence that environmental factors can alter methylation of DNA. Thus, it is not true that in all of genetics research causal relationships are unambiguous. If, however, we consider only the sequence of gDNA, it is, I argue, a little corner of causal clarity that we should cherish. The practical consequence of these unidirectional causal relationships does not end with the simple bivariate relationship noted between gDNA and phenotype. For example, using structural equation modeling, far more elaborate models can be developed that involve multiple phenotypes, developmental pathways, gene–environment interaction, differences in gene expression by sex or cohort, and gene–environment correlation. Under some situations (e.g., van den Oord & Snieder, 2002), including genetic effects can help to clarify other causal relationships. For example, if two disorders (A and B) are comorbid, the identification of genetic risk factors should help to determine whether this comorbidity results from shared risk factors like genes or a phenotypic pathway in which developing disorder A directly contributes to the risk of developing disorder B. While theoretically clear, it is unfortunately not possible to study such gene to phenotype pathways in the real world without introducing other assumptions. For example, if we are studying a population which contains two subpopulations that differ in frequency of both gene and disease, standard case–control association studies can artifactually produce significant findings in the absence of any true gDNA-to-phenotype relationship. The ability to infer the action of genetic risk factors in twin studies is based on the equal-environment assumption as well as assumptions about assortative mating and the relationship of additive to nonadditive genetic effects. The bottom-line message here is a simple one: The world in which we live is often causally ambiguous; this cannot be better demonstrated than in many areas of psychiatric research. Because of the way our life forms have evolved, gDNA is highly protected. Our bodies work very hard to ensure that nothing, including our own behavior or environmental experiences, messes with our gDNA. This quirk in our biology gives us causal purchase that we might not otherwise have. We should take this gift from nature, grasp it hard, and use it for all it is worth.
The Co-Twin–Control Method One common approach to understanding the causal relationship between a putative risk factor and a disease is to match individuals on as many variables as possible except that one group has been exposed to the putative risk factor
4 Causal Thinking in Psychiatry
69
and the other group has not. If the exposed group has a higher rate of disease, then we can argue on this basis that the risk factor truly causes the disease. While intuitively appealing, this common nonexperimental approach—like many in epidemiology—has a key point of vulnerability. While it may be that the risk factor causes the disease, it is also possible that a set of ‘‘third variables’’ predispose to both the risk factor and the disease. Such a case will produce a noncausal risk factor–disease association. This is a particular problem in psychiatric epidemiology because so many exposures of interest—stressful life events, social support, educational status, social class—are themselves complex and the result not only of the environment (with causal effects flowing from environment to person) but also of the actions of human beings themselves (with causal effects flowing from people to their environment) (Kendler & Prescott, 2006). As humans, we actively create our own environments, and this activity is substantially influenced by our genes (Kendler & Baker, 2007). Thus, for behavioral traits, our phenotypes quite literally extend far beyond our skin. Can we use genetically informative designs to get any purchase on these possible confounds? Sometimes. Let me describe one such method—the cotwin–control design. I will first illustrate its potential utility with one example and then describe a critical limitation. A full co-twin–control design involves the comparison of the association between a risk factor and an outcome in three samples: (1) an unselected population, (2) dizygotic (DZ) twin pairs discordant for exposure to the risk factor, and (3) monozygotic (MZ) pairs discordant for exposure to the risk factor (Kendler et al., 1993). Three possible different patterns of results are illustrated in Figure 4.1. The results on the left side of the figure show the
4 All subjects DZ pairs MZ pairs 3
OR 2
1.0
1 All Causal
Partly Non-causal Genetic
Non-causal - All Genetic
0
Figure 4.1 Interpretation of Results Obtained from Studies Using a Cotwin-Control Design.
70
Causality and Psychopathology
pattern that would be expected if the risk factor–outcome association was truly causal (note: this assumes that no other environmental confounding is present). Controlling for family background or genetic factors would, within the limits of statistical fluctuation, make no difference to the estimate of the association. The middle set of results in Figure 4.1 shows an example where part of the risk factor–outcome association is due to genetic factors that influence both the risk factor and the outcome. Here, the association is strongest in the entire sample (where genetic and causal effects are entirely confounded as we control for neither genetic nor shared environmental factors), intermediate among discordant DZ twins (where we control for shared environmental factors and partly for genetic background), and lowest among discordant MZ pairs (where we control entirely for both shared environmental and genetic backgrounds). The degree to which the association declines from the entire sample to discordant MZ pairs is a rough measure of the proportion of the association that is noncausal and the result of shared genetic influences on the risk factor and the outcome. The results on the right side of Figure 4.1 show the extreme case, where all of the risk factor–outcome association is due to shared genetic effects and the risk factor has no real causal effect on the outcome. Thus, within discordant MZ pairs there will be no association between the risk factor and the outcome (i.e., an odds ratio [OR] of 1.0), and the association within discordant DZ pairs would be expected to be roughly midway between 1 and the value for the entire sample. Let me illustrate this method with a striking real-world example taken from research conducted primarily with my long-term colleague, Dr. Carol Prescott (Kendler & Prescott, 2006; Prescott & Kendler, 1999). In general population samples, early age at drinking onset (the risk factor of interest) has been consistently associated with an increased risk for developing alcohol-use disorders (Grant & Dawson, 1997). The prevalence of alcoholuse disorders among individuals who first try alcohol before age 15 is as high as 50% in some studies. Several studies reporting this effect interpreted it to be a causal one—that early drinking directly produces an increased risk for later alcohol problems. On the basis of this interpretation, calls have been made to delay the age at first drink among early adolescents as a means of decreasing risk for adult alcohol problems (Pedersen & Skrondal, 1998). This risk factor–outcome relationship, however, need not be a causal one. For example, early drinking could be one manifestation of a broad liability to deviance which might be evident in a host of problem behaviors, such as use of illicit substances, antisocial behavior, and adult alcoholism (Jessor & Jessor, 1977). If this were the case, delaying the first exposure to alcohol use would
4 Causal Thinking in Psychiatry
71
not alter the underlying liability to adolescent problem behavior or to adult alcoholism. Using data from the Virginia Adult Twin Study of Psychiatric and Substance Use Disorders, we tested these two hypotheses about why early drinking correlates with alcoholism. The results are depicted in Figure 4.2. As in prior studies, we found a strong association between lifetime prevalence of alcoholism and age at first drink among both males and females (Prescott & Kendler, 1999). As shown in Figure 4.2, males who began drinking before age 15 were twice as likely (OR = 2.0) to develop Diagnostic and Statistical Manual of Mental Disorders (fourth edition) alcohol dependence (AD) as those who did not drink early. The association for females was even more dramatic: Early drinkers were more than four times as likely to develop AD as other women. The data to test causality come from twin pairs who were discordant for early drinking. Under the causal hypothesis, we would expect that the twins with earlier drinking onset would have a higher risk for alcoholism than their later-drinking co-twins and that the same pattern would hold for MZ and DZ pairs. However, if early age at drinking is just an index of general deviance which influences (among other things) risk of developing alcoholism, we would expect that the prevalence would be similar for members of MZ discordant-onset pairs. The ‘‘unexposed’’ twins (with a later onset of drinking) would share their co-twins’ risk for behavioral deviance and, thus, have a higher risk for alcoholism than the pairs in which neither twin drank early. The pattern observed in MZ vs. DZ discordant-onset pairs tells us to what degree familial resemblance for behavioral deviance is due to shared environmental vs. genetic factors. If it is due to shared environmental
5 All subjects DZ pairs MZ pairs
4
3 OR 2
1 Males
Females
0
Figure 4.2 Odds Ratios from Cotwin-Control Analyses of the Association between Drinking Before Age 15 and Alcohol Dependence.
72
Causality and Psychopathology
factors, the risk for alcoholism among the unexposed twins from DZ discordantonset pairs would be expected to be the same as that in the MZ pairs. However, if familial resemblance for deviance is due to genetic factors, the risk for alcoholism in an unexposed individual would be lower among DZ than MZ pairs. As shown in Figure 4.2, the twin pair resemblance was inconsistent with the causal hypothesis. Instead, the results suggested that early drinking and later alcoholism are both the result of a shared genetic liability. For example, among the 213 male and 69 female MZ pairs who were discordant for early drinking, there was only a slight difference in the prevalence of AD between the twins who drank early and the co-twins who did not. The ORs were 1.1 for both sexes and were not statistically different from the 1.0 value predicted by the noncausal model for MZ pairs. The ORs for the DZ pairs were midway between those of the MZ pairs and the general sample, indicating that the source of the familial liability is genetic rather than environmental. I am not claiming that these results are definitive, and they certainly require replication. It is frankly unlikely that early onset of alcohol consumption has no impact on subsequent risk for problem drinking. Surely, however, these results should give pause to those who want to stamp out alcohol problems by restricting the access of adolescents to alcohol and suggest that non-casual processes might explain at least some of the early drinking–later alcoholism relationship. For those interested in other psychiatric applications of the co-twin–control method, our group has applied it to clarify the association between smoking and major depression (Kendler et al., 1993), stressful life events and major depression (Kendler, Karkowski, & Prescott, 1999), and childhood sexual abuse and a range of psychiatric outcomes (Kendler et al., 2000). Lest you leap to the conclusion that this method is a panacea for our problems of causal inference, I have some bad news. The co-twin–control method is asymmetric with regard to the causal clarity of its results. Studies in which MZ twins discordant for risk-factor exposure have equal rates of the disease can, I think, permit the rather strong inference that the risk factor– disease association is not causal. However, if in MZ twins discordant for riskfactor exposure the exposed twin has a significantly higher risk of illness than the unexposed twin, it is not possible to infer with such confidence that the risk factor–disease association is causal. This is because in the typical design it is not possible to rule out the potential that some unique environmental event not shared with the co-twin produced both the risk factor and the disease. For example, imagine we are studying the relationship between early conduct disorder and later drug dependence. Assume further that we find many MZ twin pairs of the following type: the conduct-disordered twin
4 Causal Thinking in Psychiatry
73
(twin A) develops drug dependence, while the nondisordered co-twin (twin B) does not. We might wish to argue that this strongly proves the causal path from conduct disorder to later drug dependence. Alas, it is not so simple. It is perfectly possible that twin A had some prior environmental trauma not shared with twin B (an obstetric complication or a fall off a bicycle with a resulting head injury) that predisposed to both conduct disorder and drug dependence. While the MZ co-twin–control design excludes the possibility that a common set of genes or a class of shared environmental experiences predisposes to both risk factor and outcome, it cannot exclude the possibility that a ‘‘third factor’’ environmental experience unique to each twin plays a confounding role. Genetic strategies can occasionally provide some traction on issues of causation in psychiatric epidemiology that might be otherwise difficult to establish. In addition to the co-twin–control method, genetic randomization is another potentially powerful natural experiment that relies on using genes as instrumental variables (again taking advantage of the causal asymmetry between genotype and phenotype) (Davey-Smith, 2006; also see Chapter 9). Neither of these methods can entirely substitute for controlled trials, but for many interesting questions in psychiatry such an approach is either impractical or unethical. These methods are far from panaceas, but they may be underused by some in our field who are prone to slip too easily from correlative to causal language.
Interventionism as an Approach to Causality Well-Suited for Psychiatry I have been reading for some years in the philosophy of science (and a bit in metaphysics) about approaches to causation and explanation. For understandable reasons, this is an area often underutilized by psychiatric researchers. I am particularly interested in the question of what general approach to causality is most appropriate for the science of psychiatry, which itself is a hybrid of the biological, psychological, and sociological sciences. First, I would argue that the deductive-nomological approach emerging from the logical positivist movement is poorly suited to psychiatric research. This position—which sees true explanation as being deduced from general laws as applied to specific situations—may have its applications in physics. However, psychiatry lacks the broad and deep laws that are seen at the core of physics. Many, myself included, doubt that psychiatry will ever have laws of the wide applicability of general relativity or quantum mechanics. It is simply not, I suggest, in the nature of our discipline to have such powerful and simple explanations. A further critical limitation of this approach for
74
Causality and Psychopathology
psychiatry, much discussed in the literature, is that it does a poor job at the critical discrimination between causation and correlation—which I consider a central problem for our field. The famous example that is most commonly used here is of flagpoles and shadows. Geometric laws can equally predict the length of a shadow from the height of a flagpole or the height of the flagpole from the length of the shadow. However, only one of these two relationships is causally sensible. Second, while a mechanistic approach to causation is initially intuitively appealing, it is also ill-suited as a general approach for our field. By a ‘‘mechanistic approach,’’ I mean the idea that causation is best understood as the result of some direct physical contact, a spatiotemporal process, which involves the transfer of some process or energy from one object to another. One might think of this as the billiard ball model of causality—that satisfying click we hear when our cue ball knocks against another ball, sending it, we hope, into the designated pocket. How might this idea apply to psychiatric phenomena? Consider the empirical observation that the rate of suicide declined dramatically in England in the weeks after September 11, 2001 (9/11) (Salib, 2003). How would a mechanistic model approach this causal process? It would search for the specific nature of the spatiotemporal processes that connected the events of 9/11 in the United States to people in England. For example, it would have to determine the extent to which information about the events of 9/11 were conveyed to the English population through radio, television, e-mail, word of mouth, and newspapers. Then, it would have to trace the physical processes whereby this news influenced the needed brain pathways, etc. I am not a Cartesian dualist, so do not misunderstand me here. I am not suggesting that in some ultimate way physical processes were not needed to explain why the suicide rate declined in England in September 2001. Instead, perhaps time spent figuring out the physical means by which news of 9/11 arrived in England is the wrong level on which to understand this process. Mechanistic models fail for psychiatry for the same reasons that hard reductionist models fail. Critical causal processes in psychiatric illnesses exist at multiple levels, only some of which are best understood at a physical–mechanical level. A third approach is the interventionist model (IM), which evolved out of the counterfactual approach to causation. The two perspectives share the fundamental idea that in thinking about causation we are ultimately asking questions about what would have happened if things had been different. While some counterfactual literature discusses issues around closest parallel worlds, the IM approach is a good deal more general and can be considered ‘‘down to earth.’’ What is the essence of the IM? Consider a simple, idealized case. Suppose we want to determine whether stress (S) increases the risk for
4 Causal Thinking in Psychiatry
75
major depression (MD). The ‘‘ideal experiment’’ here would be the unethical one in which, in a given population, we randomly intervene on individuals, exposing them to a stressful experience such as severe public humiliation (H). This experience increases their level of S, and we heartlessly observe if they subsequently suffer from an increased incidence of MD. Our design is H intervenes on S ! MD Thus, we are assuming that intervention on S will make a difference to risk for MD. For this to work, according to the IM, the intervention must meet several conditions (for more details, see Pearl, 2001; Woodward & Hitchcock, 2003). We will illustrate these with our thought experiment as follows: 1. In individuals who are and are not exposed to our intervention, H must be the only systematic cause of S that is unequally distributed among the exposed and the unexposed (so that all of the averaged differences in level of S in each cohort of our exposed and unexposed subjects result entirely from H). 2. H must not affect the risk for MD by any route that does not go through S (e.g., by causing individuals to stop taking antidepressant medication). 3. H is not itself influenced by any cause that affects MD via a route that does not go through S, as might occur if individuals prone to depression were more likely to be selected for H. In sum, the IM says that questions about whether X causes Y are questions about what would happen to Y if there were an intervention on X. One great virtue of the IM is that it allows psychiatrists freedom to use whatever family of variables seems appropriate to the characterization of a particular problem. There is no assumption that the variables have to be capable of figuring in quite general laws of nature, as with the deductive-nomological approach, or that the variables have to relate to basic spatiotemporal processes, as with the mechanistic approach. The fact is that the current evidence points to causal roles for variables of many different types, and the interventionist approach allows us to make explicit just what those roles are. For all that, there is a sense in which the approach is completely rigorous. It is particularly unforgiving in assuring that causation is distinguished from correlation. Though our exposition here is highly informal, we are providing an intuitive introduction to ideas whose formal development has been
76
Causality and Psychopathology
vigorously pursued by others (e.g., Spirtes, Glymour, & Scheines, 1993; Pearl, 2001; Woodward, 2003). If I were to try to put the essence of the IM of causality into a verbal description, it would be as follows: I know C to be a true cause of E if I can go into the world with its complex web of causal interrelationships, hold all these background relationships constant, and make a ‘‘surgical’’ intervention (or ‘‘twiddle’’) on C. If E changes, then I know C causes E. I see the nonreductive nature of the IM to be a critical strength for psychiatry. Unlike the mechanistic model, it makes no a priori judgment on the level of abstraction on which the causal processes can be best understood. The IM requires only that, at whatever level it is conceived, the cause makes a difference in the world. This is so important that it deserves repeating. The IM provides a single, clear empirical framework for the evaluation of all causal claims of relevance to psychiatry, from molecules to neural systems to psychological constructs to societies.
The IM and Mechanisms Before closing, two points about the possible relationship between the IM and mechanistic causal models are in order. First, it is in the nature of science to want to move from findings of causality to a clarification of the mechanisms involved—whether they are social, psychological, or molecular. The IM can play a role in this process by helping scientists to focus on the level at which causal mechanisms are most likely to be operative. However, a word of caution is in order. Given the extraordinary complexity of most psychiatric disorders, causal effects (and the mechanisms that underlie them) may be occurring on several levels. For example, because cognitive behavioral therapy works for MD and psychological mechanisms are surely the level at which this process can be currently best understood, this does not therefore mean that neurochemical interventions on MD (via pharmacology) cannot also work. On the other hand, although pharmacological tools can impact on symptoms of eating disorders, cultural models of female beauty, although operating at a very different level, can also impact on risk. Second, we should briefly ponder the following weighty question: Should the plausibility of a causal mechanism impact on our interpretation of IMs? Purists will say ‘‘No!’’ If we design the right study and the results are clear, then causal imputations follow. Pragmatists, whose position is well
4 Causal Thinking in Psychiatry
77
represented by the influential criteria of Hill (1965), will disagree. The conversation would go something like this: Pragmatist: Surely you cannot be serious! Do you mean if you find evidence for the efficacy of astrology or ESP, my interpretation of these results should not be influenced by the fact that we have no bloody idea of how such processes could work in the world as we understand it? Purist: I am entirely serious. Your comments about astrology and ESP clearly illustrate the problem. You have said that you are quite willing to impose your preconceptions of how you think the world should work on your interpretation of the data. The whole point in science is to get away from our biases, not embrace them. This is extra important in psychiatry, where our biases are often strong and our knowledge of how the world really works is typically nonexistent or at best meager. Personally, I am a bit on the pragmatist’s side, but the purists have a point well worth remembering.
Summary of the IM To summarize, the IM is attractive for psychiatry for four major reasons (Kendler & Campbell, 2009). First, the IM is anchored in the practical and reflects the fundamental goals of psychiatry, which are to intervene in the world to prevent and cure psychiatric disorders. Second, the IM provides a single, clear empirical framework for the evaluation of all causal claims in psychiatry. It provides a way by which different theoretical orientations within psychiatry can be judged by a common metric. Third, the framework provided by the IM can help us to find the optimal level for explanation and ultimately for intervention. Finally, the IM is explicitly agnostic to issues of the mind–body problem. Its application can help us replace the sterile metaphysical arguments about mind and brain which have yielded little of practical benefit with productive empirical research followed by rigorous conceptual and statistical analysis.
Acknowledgements This work was supported in part by grant DA-011287 from the US National Institutes of Health. Much of my thinking in this area has been stimulated
78
Causality and Psychopathology
by and developed in collaboration with John Campbell, a philosopher now at UC Berkeley (Kendler & Campbell, 2009).
References Davey-Smith, G. (2006). Randomized by (your) god: Robust inference from a nonexperimental study design. Journal of Epidemiology and Community Health, 60, 382–388. Grant, B. F., & Dawson, D. A. (1997). Age at onset of alcohol use and its association with DSM-IV alcohol abuse and dependence: results from the National Longitudinal Alcohol Epidemiologic Survey. Journal of Substance Abuse, 9, 103–110. Hill, A. B. (1965). The environment and disease: Association or causation? Proceedings of the Royal Society of Medicine, 58, 295–300. Jessor, R., & Jessor, S. L. (1977). Problem behavior and psychosocial development: A longitudinal study of youth. New York: Academic Press. Kendler, K. S., & Baker, J. H. (2007). Genetic influences on measures of the environment: A systematic review. Psychological Medicine, 37, 615–626. Kendler, K. S., Bulik, C. M., Silberg, J., Hettema, J. M., Myers, J., & Prescott, C. A. (2000). Childhood sexual abuse and adult psychiatric and substance use disorders in women: An epidemiological and cotwin control analysis. Archives of General Psychiatry, 57, 953–959. Kendler, K. S., & Campbell, J. (2009). Interventionist causal models in psychiatry: Repositioning the mind–body problem. Psychological Medicine, 39, 881–887. Kendler, K. S., Karkowski, L. M., & Prescott, C. A. (1999). Causal relationship between stressful life events and the onset of major depression. American Journal of Psychiatry, 156, 837–841. Kendler, K. S., Neale, M. C., MacLean, C. J., Heath, A. C., Eaves, L. J., & Kessler, R. C. (1993). Smoking and major depression. A causal analysis. Archives of General Psychiatry, 50, 36–43. Kendler, K. S., & Prescott, C. A. (2006). Genes, environment, and psychopathology: Understanding the causes of psychiatric and substance use disorders. New York: Guilford Press. Pearl, J. (2001). Causality models, reasoning, and inference. Cambridge: Cambridge University Press. Pedersen, W., & Skrondal, A. (1998). Alcohol consumption debut: Predictors and consequences. Journal of Studies on Alcohol, 59, 32–42. Prescott, C. A., & Kendler, K. S. (1999). Age at first drink and risk for alcoholism: A noncausal association. Alcoholism, Clinical and Experimental Research, 23, 101–107. Salib, E. (2003). Effect of 11 September 2001 on suicide and homicide in England and Wales. British Journal of Psychiatry, 183, 207–212. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction and search. New York: Springer-Verlag. van den Oord, E. J., & Snieder, H. (2002). Including measured genotypes in statistical models to study the interplay of multiple factors affecting complex traits. Behavior Genetics, 32, 1–22. Woodward, J. (2003). Making things happen. New York: Oxford University Press. Woodward, J., & Hitchcock, C. (2003). Explanatory generalizations, part I: A counterfactual account. Nous, 37, 1–24.
5 Understanding the Effects of Menopausal Hormone Therapy Using the Women’s Health Initiative Randomized Trials and Observational Study to Improve Inference garnet l. anderson and ross l. prentice
Introduction Over the last decade, several large-scale randomized trials have reported results that disagreed substantially with the motivating observational studies on the value of various chronic disease–prevention strategies. One high-profile example of these discrepancies was related to postmenopausal hormone therapy (HT) use and its effects on cardiovascular disease and cancer. The Women’s Health Initiative (WHI), a National Heart, Lung, and Blood Institute–sponsored program, was designed to test three interventions for the prevention of chronic diseases in postmenopausal women, each of which was motivated by a decade or more of analytic epidemiology. Specifically, the trials were testing the potential for HT to prevent coronary heart disease (CHD), a low-fat eating pattern to reduce breast and colorectal cancer incidence, and calcium and vitamin D supplements to prevent hip fractures. Over 68,000 postmenopausal women were randomized to one, two, or all three randomized clinical trial (CT) components between 1993 and 1998 at 40 U.S. clinical centers (Anderson et al., 2003a). The HT component consisted of two parallel trials testing the effects of conjugated equine estrogens alone (E-alone) among women with prior hysterectomy and the effect of combined estrogen plus progestin therapy (E+P), in this case conjugated equine estrogens plus medroxyprogesterone acetate, among women with an intact uterus, on the incidence of CHD and overall health. In 2002, the randomized trial of E+P was stopped early, based on an assessment of risks exceeding benefits for chronic disease prevention, raising concerns among millions of menopausal women and their care providers about 79
80
Causality and Psychopathology
their use of these medicines. The trial confirmed the benefit of HT for fracture-risk reduction but the expected benefit for CHD, the primary study end point, was not observed. Rather, the trial results documented increased risks of CHD, stroke, venous thromboembolism (VTE), and breast cancer with combined hormones (Writing Group for the Women’s Health Initiative Investigators, 2002). Approximately 18 months later, the E-alone trial was also stopped, based on the finding of an adverse effect on stroke rates and the likelihood that the study would not confirm the CHD-prevention hypothesis. The results of this trial revealed a profile of risks and benefits that did not completely coincide with either the E+P trial results or previous findings from observational studies (Women’s Health Initiative Steering Committee, 2004). In conjunction with these trials, the WHI investigators conducted a parallel observational study (OS) of 93,676 women recruited from the same population sources with similar data-collection protocols and follow-up. OS enrollees were similar in many demographic and chronic disease risk factor characteristics but were ineligible for or unwilling to be randomized into the CT (Hays et al. 2003). Because a substantial fraction of women in the OS were current or former users of menopausal HT, joint analyses of the effects of HT use in the CT and OS provide an opportunity to examine design and analysis methods that serve to compare and contrast these two study designs, to identify some of the strengths and weakness of each, and to determine the extent to which detailed data analysis provisions could bring these results into agreement and thereby explain the discrepancies between these randomized trials and observational studies. This chapter reviews the motivation for the hormone trials and describes the major findings for chronic disease effects, with particular attention to the results that differed from what was hypothesized. Then, the series of joint analyses of CT and corresponding OS is presented. Finally, some discussion about the implications of these analyses for the design and analysis of future studies is provided.
Hormone Therapy Trial Background Since the 1940s, women have been offered exogenous estrogens to relieve menopausal symptoms. The use of unopposed estrogens grew until evidence of an increased risk of endometrial cancer arose in the 1970s and tempered enthusiasm for these medicines, at least for the majority of women who had not had a hysterectomy. With the subsequent information that progestin effectively countered the carcinogenic effects of estrogen in the endometrium, HT prescriptions again climbed (Wysowski, Golden, & Burke, 1995). Observational studies found that use of HT was associated with lower risks of osteoporosis and fractures; subsequently, the U.S. Food and Drug
5 Understanding the Effects of Menopausal Hormone Therapy
81
Administration approved HT for the treatment and prevention of osteoporosis, leading to still further increases in the prevalence and duration of hormone use (Hersh, Stefanick, & Stafford, 2004). The pervasiveness of this exposure permitted a large number of observational studies of both case–control and prospective cohort designs to examine the relationship between hormone use and a wide range of diseases among postmenopausal women. Most of the more than 30 observational studies available at the initiation of the WHI reported substantial reductions in CHD rates, the major cause of morbidity and mortality among postmenopausal women (Bush et al., 1987; Stampfer & Colditz, 1991; Grady et al., 1992). Support for the estrogen and heart disease hypothesis, which originated partly from the male–female differences in CHD rates and the marked increase in CHD rates after menopause, was further buttressed by mechanistic studies showing beneficial effects of HT on blood lipid profiles and vascular motility in animal models (Pick, Stamler, Robard, & Katz, 1952; Hough & Zilversmit, 1986; Adams et al., 1990; Clarkson, Anthony, & Klein, 1996). The estimated effects were substantial, ranging from 30%–70% reductions, prompting considerable public-health interest in HT as a preventive intervention for CHD in postmenopausal women. One barrier to more widespread use was the reports of adverse effects, most notably breast cancer. Numerous observational studies had reported a modest increase in breast-cancer risk with longer-term exposure to estrogen (Steinberg et al., 1991; Barrett-Conner & Grady, 1998). The effect of adding progestin, however, was not clear. Evidence of increased risks of VTE and biliary disease existed, but possible reductions in risk of colorectal cancer, strokes, mortality, dementia, and many other conditions associated with aging, in addition to menopausal symptom control, suggested that HT was overall beneficial for menopausal women. The overall effects, while still imprecisely estimated, suggested important benefits for prevention of chronic disease (Grady et al., 1992). Increasingly, postmenopausal women were encouraged to use HT to reduce their risks of osteoporosis, fractures, and CHD; in fact, prescriptions reached approximately 90 million per year in the United States alone (Hersh et al., 2004). The positive view of HT was so widely held that the initiation of a long-term, placebo-controlled trial in the WHI was considered highly controversial (Food and Nutrition Board and Board on Health Sciences Policy, 1993).
WHI Trial Design In 1993 and in this environment of considerable optimism regarding an overall benefit to postmenopausal women, the WHI HT trials were launched.
82
Causality and Psychopathology
The final design specified two parallel randomized, double-blind, placebocontrolled trials testing E-alone in women with prior hysterectomy and E+P in women with an intact uterus. The primary objective of each trial was to determine whether HT would reduce the incidence of CHD and provide overall health benefit with respect to chronic disease rates. Postmenopausal women aged 50–79 years were eligible if they were free of any condition with expected survival of less than 3 years and satisfied other criteria related to ability to adhere to a randomized assignment, safety, and competing risk considerations (Women’s Health Initiative Study Group, 1998). A total of 10,739 women were recruited into the trial of E-alone and 16,608 were accrued into the E+P trial. Although observational studies suggested that about a 45% reduction in CHD risk could be achieved with HT, the trial design assumed a 21% reduction, with 81% and 88% power for E-alone and E+P, respectively. The conservatism in the specified effect size was intended to account for the anticipated lack of adherence to study pills, lag time to full intervention effect, loss to follow-up in the trial, and potential anticonservatism in the motivating observational studies results (Anderson et al., 2003a). Breast-cancer incidence was defined as the primary safety outcome of the HT trials. The power to detect a 22% increase in breast cancer during the planned duration of the trial was relatively low (46% for E-alone and 54% for E+P), so the protocol indicated that an additional 5 years of follow-up without further intervention would be required to assure 79% and 87% for E-alone and E+P, respectively. Pooling the two trials was also an option, if the results were sufficiently similar. Additional design considerations have been published (Women’s Health Initiative Study Group, 1998; Anderson et al., 2003a).
Trial Findings The independent Data and Safety Monitoring Board terminated the E+P trial after a mean 5.2 years of follow-up, when the breast-cancer statistic exceeded the monitoring boundary defined for establishing this adverse effect; and this statistic was supported by an overall assessment of harms exceeding benefits for the designated outcomes. Reductions in hip-fracture and colorectal-cancer incidence rates were observed, but these were outweighed by increases in the risk of CHD, stroke, and VTE, particularly in the early follow-up period, in addition to the adverse effect on breast cancer. A prespecified global index, devised to assist in benefit versus risk monitoring and defined for each woman as time to the first event for any of the designated clinical events
5 Understanding the Effects of Menopausal Hormone Therapy
83
(CHD, stroke, pulmonary embolism, breast cancer, colorectal cancer, endometrial cancer, hip fractures, or death from other causes), found a 15% increase in the risk of women having one or more of these events (Writing Group for the Women’s Health Initiative Investigators, 2002). The final ‘‘intention-to-treat’’ trial results (mean follow-up 5.6 years, Table 5.1) confirm the interim findings. The 24% increase in breast-cancer incidence (Chlebowski et al., 2003), the 31% increase in risk of stroke
Table 5.1 Hypothesized Effects of HT at the Time the WHI Began and the Final Results of the Two HT Trials Hypothesized Effect
E+P HR
Coronary heart disease Stroke Pulmonary embolism Venous thromboembolism Breast cancer Colorectal cancer Endometrial cancer Hip fractures Total fractures Total mortality Global indexp
# $# "
1.24
95% CI a
E-Alone AR HR
1.00–1.54 +6
95% CI b
0.95
AR
0.79–1.15 –3
1.31c 1.02–1.68 +8 1.37d 1.09–1.73 +12 2.13e 1.45–3.11 +10 1.37f 0.90–2.07 +4
"
2.06e 1.57–2.70 +18 1.32f
" #
1.24g 1.02–1.50 +8 0.56i 0.38–0.81 –7 0.81k 0.48–1.36 –1
0.80h 0.62–1.04 –6 1.12j 0.77–1.55 +1 NA
# # #
0.67l 0.76l 0.98n 1.15n
0.65m 0.71m 1.04o 1.01o
0.47–0.96 0.69–0.83 0.82–1.18 1.03–1.28
–5 –47 –1 +19
0.99–1.75 +8
0.45–0.94 0.64–0.80 0.91–1.12 1.09–1.12
–7 –53 +3 +2
HR, hazard ratio; CI, confidence interval; AR, attributable risk. a From Manson et al. (2003). b From Hsia et al. (2006). c From Wassertheil-Smoller et al. (2003). d From Hendrix et al. (2006). e From Cushment et al. (2004). f From Curb et al. (2006). g From Chlebowski et al. (2003). h From Stefanick et al. (2006). i From Chlebowski et al. (2004). j From Ritenbaugh et al. (2008). k From Anderson et al. (2003b). l From Cauley et al. (2003). m From Jackson et al. (2006). n From Writing Group for the Women’s Health Initiative Investigators (2002). o From Women’s Health Initiative Steering Committee (2004). p Global index defined as time to first event among coronary heart disease, stroke, pulmonary embolism, breast cancer, colorectal cancer, endometrial cancer (E+P only), hip fractures, and death from other causes.
84
Causality and Psychopathology
(Wassertheil-Smoller et al., 2003), and the doubling of VTE rates (Cushman et al., 2004) in the E+P group represented attributable risks of 8, 8, and 18 per 10,000 person-years, respectively, in this population. Benefits of seven fewer colorectal cancers (44% reduction) (Chlebowski et al., 2004) and five fewer hip fractures (33% reduction) (Cauley et al., 2003) per 10,000 personyears were also reported. It was the observed 24% increase in CHD risk or six additional events per 10,000 person-years (Manson et al., 2003), however, that was the most surprising and perhaps the most difficult finding to accept. Neither the usual 95% confidence intervals nor the protocol-defined weighed log-rank statistic indicate that this is clearly statistically significant. Nevertheless, even the very conservative adjusted confidence intervals, which controlled for the multiple testing, ruled out the level of protection described by the previous observational studies as well as the conservative projection for CHD benefit used in the trial design (Anderson et al., 2007). The results of the E-alone trial, stopped by the National Institutes of Health approximately 18 months later, provided a different profile of risks and benefits (Women’s Health Initiative Steering Committee, 2004). The final results (Table 5.1), based on an average of 7.1 years of follow-up, reveal an increased risk of stroke with E-alone of similar magnitude to that observed with E+P (Hendrix et al., 2006) but no effect on CHD rates (Hsia et al., 2006). E-alone appeared to increase the risk of VTE events (Curb et al., 2006) but to a lesser extent than was observed with E+P. The E-alone hazard ratios for hip, vertebral, and other fractures were comparable to those for E+P (Jackson et al., 2006). Most surprising of the E-alone findings was the estimated 23% reduction in breast-cancer rates, which narrowly missed being statistically significant (Stefanick et al., 2006), in contrast to the increased risk seen in a large number of observational studies and the E+P trial. E-alone had no effect on colorectal-cancer risk (Ritenbaugh et al., 2008), another finding that differed from previous studies and the E+P trial. The hazard ratios for total mortality and the global index were close to one, indicating an overall balance in the number of women randomized to E-alone or to placebo who died or experienced one or more of these designated health outcomes (Women’s Health Initiative Steering Committee, 2004).
Contrasting the WHI CT and OS To better understand the divergent findings and, if possible, to bring the two types of studies into agreement, WHI investigators conducted a series of analyses examining cardiovascular outcomes in the CT and OS data jointly
5 Understanding the Effects of Menopausal Hormone Therapy
85
(Prentice et al., 2005, 2006). The parallel recruitment and follow-up procedures in the OS and CT components of the WHI make this a particularly interesting exercise since differences in data sources and collection protocols are minimized. For both E+P and E-alone, the analogous user and nonuser groups from the OS were selected for both HT trials. Specifically, for the E+P analyses, OS women with a uterus who were using an estrogen plus progestin combination or were not using any HT at baseline were defined as the exposed (n = 17,503) and unexposed (n = 35,551) groups, respectively (Prentice et al., 2005). Similarly, for E-alone analyses, 21,902 estrogen users and 21,902 nonusers of HT in OS participants reported a prior hysterectomy at baseline (Prentice et al., 2006). Failure times were defined as time since study enrollment (OS) or randomization (CT). In the CT, follow-up was censored at the time each intervention was stopped. In the OS, censoring was applied at a time chosen to give a similar average follow-up time (5.5 years for OS/E+P and 7.1 years for OS/E-alone). For CT participants, HT exposure was defined by randomization and analyses were based on the intention-to-treat principle. In parallel, OS participants’ HT exposure was defined by HT use at the time of study enrollment. In OS women, the ratio of age-adjusted event rates in E+P users to that in nonusers was less than one for CHD (0.71) and stroke (0.77) and close to one for VTE (1.06), but each was 40%–50% lower than the corresponding statistics from the randomized trial (Table 5.2, upper panel) and therefore similar to the motivating observational studies. For E-alone, the corresponding ratios were all less than one (0.68 for CHD, 0.95 for stroke, and 0.78 for VTE) and 30%–40% lower than the CT estimates (Table 5.2, lower panel). The cardiovascular risk profile (race/ethnicity, education, income, body mass index [BMI], physical activity, current smoking status, history of cardiovascular disease, and quality of life) among E+P users in the OS was somewhat better than that for OS nonusers (examples of these shown in Figure 5.1). The distribution of these risk factors in the CT was balanced across treatment arms but resembled that of the OS nonuser population more than the corresponding HT user group. A similar pattern of healthy user bias was observed for E-alone among OS participants. Aspects of HT exposure also varied between the CT and OS. Among HT users in the OS, the prevalence of long-term use, defined here as the preenrollment exposure duration for the HT regimen reported at baseline, was considerably higher than in the CT (Figure 5.2); but few were recent initiators of HT in the OS. In the CT, most participants had never used HT before or had used it only briefly. In terms of both duration and recency of each regimen, the distributions in the CT more closely resembled those of the OS nonusers (Prentice et al., 2005, 2006).
Causality and Psychopathology
86
Table 5.2 Hormone Therapy Hazard Ratios (95% Confidence Intervals) for CHD,1 Stroke, and VTE Estimated Separately in the WHI CT and OS and Jointly with a Ratio Measure of Agreement (OS/CT) Between the Two Study Components
Estrogen+ progestina Age-adjusted Multivariate adjustedb By time since initiation <2 years 2–5 years 5+ years Combined OS/CT <2 years 2–5 years 5+ years
CT
CHD OS OS/CT CT
Stroke OS OS/CT CT
VTE OS OS/CT
1.21 1.27
0.71 0.87
0.59 0.69
1.33 1.21
0.77 0.86
0.58 0.71
2.10 2.13
1.06 1.31
0.50 0.62
1.68 1.25 0.66
1.12 1.05 0.83
0.67 0.84 1.26
1.15 1.49 0.74
2.10 0.48 0.89
1.83 0.32 1.20
3.10 1.89 1.31
2.37 1.52 1.24
0.76 0.80 0.95
0.93 1.58 (1.12–2.24) 1.19 (0.87–1.63) 0.86 (0.59–1.26)
Estrogen alonec Age-adjusted 0.96 Multivariate 0.97 adjusted By time since initiation <2 years 1.07 2–5 years 1.13 5+ years 0.80 Combined OS/CT <2 years 2–5 years 5+ years
0.76 1.41 (0.90–2.22) 1.14 (0.82–1.59) 1.12 (0.73–1.72)
0.84 3.02 (1.94–4.69) 1.85 (1.30–2.65) 1.47 (0.96–2.24)
0.68 0.74
0.71 0.77
1.37 1.35
0.95 1.00
0.69 0.74
1.33 1.39
0.78 0.88
0.59 0.63
1.20 1.09 0.73
1.12 0.96 0.91
1.69 1.14 1.41
0.37 0.89 1.01
0.22 0.78 0.72
2.36 1.31 1.16
1.48 0.91 0.85
0.63 0.69 0.73
0.89 1.11 (0.73–1.69) 1.17 (0.88–1.56) 0.81 (0.62–1.06)
0.68 1.48 (0.89–2.44) 1.18 (0.83–1.67) 1.48 (1.06–2.06)
0.82 2.18 (1.15–4.13) 1.22 (0.80–1.85) 1.06 (0.72–1.56)
CHD, coronary heart disease; VTE, venous thromboembolism; CT, clinical trial; OS, observational study. a From Prentice et al. (2005). b Adjusted for age, race, body mass index, education, smoking status, age at menopause, and physical functioning. Hazard ratios accompanied by 95% confidence intervals in combined OS and CT analyses. c From Prentice et al. (2006).
In the trials, the HT tested was conjugated equine estrogens (0.625 mg/day) with or without medroxyprogesterone acetate (2.5 mg/day). OS women had access to a broader range of regimens, including different formulations, doses, and routes of administration; but the majority of HT use reported
Women with prior hysterectomy
Women with a uterus OS E+P user
OS E-alone user
OS Non-user
OS Non-user
White Black Hispanic American Indian
CT E-alone
CT E+P
Asian/Pacific Islander CT Placebo
Unknown
CT Placebo 0%
50%
0%
100%
OS E+P user
OS E-alone user
OS Non-user
OS Non-user
CT E+P
CT E-alone
50%
100%
Underweight Normal Overweight Obese I Obese II CT Placebo
Extremely Obese
CT Placebo 0%
20%
40%
60%
80% 100%
0%
20% 40% 60% 80% 100%
Figure 5.1 Distribution of selected cardiovascular risk factors by hysterectomy status, study component and hormone use at baseline in the Observational Study (OS) or randomization assignment in the Clinical Trial (CT). E+P, estrogen plus progestin; E-alone, estrogen alone. Derived from Prentice et al, 2005, 2006.
OS E+P user
OS E-alone user
OS Non-user
OS Non-user
0-8 years Some high school
CT E+P
CT E-alone
CT Placebo
CT Placebo 0%
50%
High school diploma Some college/technical College degree
0%
100%
OS E+P user
OS E-alone user
OS Non-user
OS Non-user
20% 40% 60% 80% 100%
Never smoked CT E+P
CT E-alone
CT Placebo
CT Placebo 0%
Figure 5.1 (Con’t).
20%
40%
60%
80% 100%
Past smoked Current smoker
0%
20% 40% 60% 80% 100%
Women with a uterus
Women with a prior hysterectomy
Duration of prior E+P use OS E+F user
OS E-alone user
OS Non-user
OS Non-user
CT E+F
CT E-alone
CT Placebo
CT Placebo
Never <2 years 2-5 years >5 years
0%
20%
40%
60%
80%
100%
80%
85%
90%
95%
100%
Recency of E+P use at baseline OS E+F user
OS E-alone user
OS Non-user
OS Non-user
CT E+F
CT E-alone
CT Placebo
CT Placebo
Never
0%
20%
40%
60%
80%
100%
80%
Current Within past 1-4 years Last use 5-10 years age Last use 10+ years age
85%
90%
95%
100%
Figure 5.2 Hormone therapy exposure history by hysterectomy status, study component and hormone use at baseline in the Observational Study (OS) or randomization assignment in the Clinical Trial (CT). E+P, estrogen plus progestin; E-alone, estrogen alone. Derived from Prentice et al, 2005, 2006.
Duration of prior E-alone use OS E+P user
OS E-alone user
OS Non-user
OS Non-user
CT E+F
CT E-alone
CT Placebo
CT Placebo
Never < 2 years 2-5 years 5-10 years 10-15 years
75%
80%
85%
90%
95%
100%
>15 years
0%
20%
40%
60%
80%
100%
Recency of E-alone use at baseline OS E+P user
OS E-alone user
OS Non-user
OS Non-user
CT E+F
CT E-alone
CT Placebo
CT Placebo
Never Current Within past 1-4 years
75%
Figure 5.2 (Con’t).
80%
85%
90%
95%
100%
Last use 5-10 years ago Last use >10 years ago
0%
20%
40%
60%
80%
100%
5 Understanding the Effects of Menopausal Hormone Therapy
91
was of the same dose of the same oral estrogens and progestin as used in the CT, with another substantial fraction taking different doses of the same preparation (Prentice et al., 2005, 2006), suggesting that differences in the medications used is an unlikely source of the discrepant results.
Joint Analyses of the CT and OS To determine the extent to which traditional confounders could explain the differences in HT effects on CHD, stroke, and VTE between each trial and the corresponding OS sample, a Cox regression model was fit for each HT regimen and each outcome. Because age is such a strong predictor of disease, the model incorporated separate baseline disease incidence rates for each 5-year age group as well as a linear predictor of age to control for any residual confounding. For similar reasons, BMI was modeled in four levels as well as a linear term. Age at menopause, education, current smoking status, and baseline physical functioning were also included. These multivariate adjusted results moved the CT and OS results somewhat closer together (Table 5.2), but the ratios of hazard ratios in the OS and CT were still significantly less than one (range of OS/CT ratios for these three disease end points was 0.62–0.71 for E+P and 0.63–0.77 for E-alone). These analyses rely on the common assumption of proportional hazards, that is, that the effect of HT is to multiply the nonuser event rate by a constant factor over time. Allowing the E+P hazard ratios to depend on time since initiation of the current HT episode (defined by the intervals <2 years, 2–5 years, and 5+ years since randomization in the CT and since the woman started her current HT regimen in the OS) improved the comparison of the OS and CT hazard ratios for all three cardiovascular disease outcomes, although a nearly statistically significant difference between the studies remained for stroke (Prentice et al., 2005). In these analyses, the pattern of early risk for CHD, stroke and VTF which attenuates over time is found in the OS as well as the CT (Table 5.2). The relative lack of information on effects during early exposure in the OS and a corresponding limitation in the CT on longer-term use produce rather imprecise estimates in these intervals. A combined analysis that allows HT effects to vary over time but assumes a proportional difference between HT effects in the two study components (ratio of HT hazard ratios in the OS to that in the CT is constant across time intervals) capitalizes on the strength of both studies. For this purpose, a model was specified to estimate a single HT hazard ratio for each specified time interval and a single OS/CT ratio to describe the level of agreement between the two studies.
92
Causality and Psychopathology
In combined analyses for E+P (Table 5.2, upper panel), the estimates of the OS/CT ratios were not significantly different from one for CHD and VTE (0.93 and 0.84, respectively), suggesting that this model provided reasonable agreement between the two studies. For stroke there is evidence of residual bias (OS/CT = 0.76). These combined analyses describe a pattern of early risks for all three of these cardiovascular disease outcomes and continuing risk for VTE. Adherence-adjusted versions of these analyses tended to yield hazard ratios with somewhat greater departures from unity than those shown in Tables 5.1 and 5.2 but had little effect on comparative hazard ratios between the CT and OS. Those analyses involved a particularly simple form of adherence adjustment, with follow-up time censored six months after a change in status from HT user to nonuser or vice versa. Inverse adherence probabilityweighted estimating procedures (e.g., Robbins & Finkelstein, 2000) can also be recommended for this problem. For E-alone, the joint analysis produced excellent agreement for CHD (ratio of E-alone hazard ratios in the OS to that in the CT was OS/CT = 0.89) and some improvement in the alignment for VTE (OS/CT = 0.82) (Table 5.2, lower panel). The effects of E-alone on stroke risk also appeared to differ between the OS and CT (OS/CT = 0.68). These OS/CT ratios were very similar to those for E+P, but the pattern of E-alone effects was somewhat different. These combined analyses provide evidence of an early increased risk for both stroke and VTE, with the adverse effect on stroke rates continuing beyond year 5 but no significant effects on CHD. Additional analyses to examine the source of discrepancies between the estimated hormone effects on breast cancer and colorectal cancer in the two trials and the motivating literature can be understood by similar comparisons of the OS and CT cohorts that have recently been published. In the E+P trial, the elevated risk of breast cancer was generally expected but the estimated effect size was considerably smaller than other investigators have reported from observational studies (e.g., Million Women Study Collaborators, 2003). The fact that the E-alone hazard ratio for breast cancer was less than one, though not statistically significant, presents another puzzle in that estrogen has long been considered to have a carcinogenic effect in the breast. In joint OS/CT analyses, the discrepancy between the trials and the OS regarding HT effects on breast-cancer risk could be accounted for only by modeling the effects of the time between menopause and first HT use as well as the time since initiation of current HT episode, mammography use, and traditional confounding factors (Prentice et al., 2008a, 2008b). The contrasts for colorectal cancer are similarly interesting in that a protective effect was observed with E+P but not with E-alone, even though the literature had identified a potential beneficial role for estrogen.
5 Understanding the Effects of Menopausal Hormone Therapy
93
Joint OS/CT analyses suggest no clear effect of either HT preparation (Prentice et al., 2009).
Extending the CT Results The relative success in identifying the statistical adjustments that helped to reconcile the OS and CT results within the WHI allows one to consider additional analyses that would be impossible without a new trial or would be severely underpowered in the CT alone. For example, if biases in the OS are similar across all HTs, one could apply the models described above to data for other hormone regimens observed in the OS and thereby obtain more reliable estimates of their effects. To test this within the WHI, Prentice et al. (2006) used the CT and OS data for one HT regimen (e.g., E+P) to adjust the OS data for the alternate therapy (E-alone) and vice versa. The first adjustment simply applies the OS/CT ratio from one HT to the OS data for the other HT preparation. A second adjustment incorporates an additional proportional effect between the two HT preparations. In Figure 5.3, the simple age-adjusted as well as the two levels of adjusted results from the OS are displayed in addition to the parallel CT findings, showing that these adjustments generally tend to improve the OS estimates. A second possible use of the joint analyses is to improve the precision of subgroup analyses. In this regard, interest has focused on the youngest women (aged 50–59) because they are more likely to seek treatment for vasomotor symptoms, and hence, the observational studies that motivated the trials primarily involved women who usually initiated therapy during this time of life. Further, some argue that early intervention in atherosclerosis may be helpful whereas later maneuvers could be detrimental. For E-alone, the CT data suggest that women aged 50–59 may experience some reduction in CHD risk and in the risk of any of the designated events in the global index (Women’s Health Initiative Steering Committee, 2004; Hsia et al., 2006). There was no evidence of a reduction in CHD risk for the younger women in the E+P trial however (Manson et al., 2003). A recent exploratory analysis that pooled the data from the two trials suggests that the adverse CHD effects may be a function of time since menopause, with women beginning HT soon after menopause having a reduced risk of CHD (Rossouw et al., 2007). To examine hormone effects on CHD in the younger women with greater precision, Prentice et al. (2006) conducted similar joint OS/CT analyses for CHD among women aged 50–59 years at baseline. For E+P, these analyses reflect the same general pattern of an early increased risk with E+P that diminished over time. For E-alone, however, the combined
Causality and Psychopathology
94
E+P Hazard Ratios for CHD 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 < 2 years
2-5 years
> 5 years
E+P Hazard Ratios for Stroke 3.5 3 2.5 2 1.5 1 0.5 0 < 2 years
2-5 years
> 5 years
E+P Hazard Ratios for VTE 3.5 3 2.5 2 1.5 1 0.5 0 < 2 years OS
2-5 years
Adjustment 1
Adjustment 2
> 5 years CT
Figure 5.3 HT hazard ratios in the Observational Study based on a simple multivariate model (OS), with adjustment for the OS/CT hazard ratio estimated from the alternate trial (Adjustment 1), and assuming proportional hazards for E-alone to E+P (Adjustment 2), compared to the corresponding Clinical Trial hazard ratios (CT). Derived from Prentice et al, 2005, 2006.
5 Understanding the Effects of Menopausal Hormone Therapy
95
E-alone Hazard Ratios for CHD 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 < 2 years
2-5 years
> 5 years
E-alone Hazard Ratios for Stroke 1.8 1.6 1.4 1.2 1 0.5 0.6 0.4 0.2 0 < 2 years
2-5 years
> 5 years
E-alone Hazard Ratios for VTE 2.5 2 1.5 1 0.5 0 < 2 years OS
2-5 years
Adjustment 1
Adjustment 2
> 5 years CT
Figure 5.3 (Con’t).
OS/CT analyses suggest a reduced risk of CHD among these younger women with prior hysterectomy.
Discussion The stark contrasts between the results from a large number of observational studies and the WHI randomized trials of menopausal HT provide impetus for reflection on the role of observational studies in evaluating therapies.
96
Causality and Psychopathology
Despite the usual effort to control for potential confounders in most previous observational studies, the replication of findings of CHD benefit and breastcancer risk with HT across different study populations and study designs, and support from mechanistic studies, clinically relevant aspects of the relationship between HT and risk for several chronic diseases were not appreciated until the WHI randomized trial results were published. The reliance on lower-level evidence may have exposed millions of women to small increases in risk of several serious adverse effects. Randomized trials have their own limitations. In this example, the WHI HT trials tested two specific regimens in a population considered appropriate for CHD prevention. As many have claimed, the trial design did not fully reflect the way HT had been used in practice—prescribed near the time of menopause, with possible tailoring of regimen to the individual. Also, while the WHI tested HT in the largest number of women in the 50–59 year age range ever studied, using the same agents and dosages used by the vast majority of U.S. women, estimates of HT effects within this subgroup remain imprecise because of the very low event rate. This example raises many questions with regard in the public-health research enterprise. When is it reasonable to rely on second-tier evidence to test a hypothesis? Are there better methods to test these hypotheses? Can we learn more from our trials, and can we use this to make observational studies more reliable? There are insufficient resources to conduct full-scale randomized trials of the numerous hypotheses of interest in public health and clinical medicine. Observational studies will remain a mainstay of our research portfolio but methods to increase the reliability of observational study results, through better designs and analytic tools, are clearly needed. Nevertheless, when assessing an intervention of public-health significance, the WHI experience suggests that the evaluation needs to be anchored in a randomized trial. It seems highly unlikely that the importance of the timedependent effect of HT on cardiovascular disease would have been recognized without the Heart Estrogen–Progestin Replacement Study (Hulley et al., 1998) and the WHI randomized trials. Neither observational studies conducted before WHI nor the WHI OS itself would have observed these early adverse cardiovascular disease effects without the direction from the trials to look for it. The statistical alignment of the OS and CT results relied on several other factors. Detailed information on the history of HT use, an extensive database of potential confounders, and meticulous modeling of these factors were critical. For an exposure that is more complex, such as dietary intake or physical activity, the measurement problems are likely too great to permit such an approach. Less obvious but probably at least as important was the
5 Understanding the Effects of Menopausal Hormone Therapy
97
uncommon feature of WHI in having both the randomized trials and an OS conducted in parallel, minimizing methodologic differences in outcome ascertainment, data collection, and some aspects of the study population. Such a study design has rarely been used but could be particularly advantageous if there are multiple related therapies already in use but the resources are available to test only one in a full-scale design. The work of Prentice and colleagues (2005, 2006, 2008a, 2008b, 2009) provides important examples of methods to leverage the information from clinical trials in the presence of a parallel OS. The exercises in which adjustments derived by the joint OS and CT analysis of one HT were applied to OS results for a related therapy suggest that it may be possible to evaluate one intervention in a rigorous trial setting and expand the inference to similar interventions in OS data. The joint analyses of CT and OS data to strengthen subgroup analyses would have almost universal appeal. Additional effort to define the requirements and assumptions in these designs and analyses would be helpful. In summary, the WHI provides an important example of the weakness of observational study data, some limitations of randomized trials, and an approach to combining the two to produce more reliable inference.
References Adams, M. R., Kaplan, J. R., Manuck, S. B., Koritinik, D. R., Parks, J. S., Wolfe, M. S., et al. (1990). Inhibition of coronary artery atherosclerosis by 17-b estradiol in ovariectomized monkeys: Lack of an effect of added progesterone. Arteriosclerosis, 10, 1051–1057. Anderson, G. L., Manson, J. E., Wallace, R., Lund, B., Hall, D., Davis, S., et al. (2003a). Implementation of the WHI design. Annals of Epidemiology, 13, S5–S17. Anderson, G. L., Judd, H. L., Kaunitz, A. M., Barad, D. H., Beresford, S. A. A., Pettinger, M., et al. (2003b). Effects of estrogen plus progestin on gynecologic cancers and associated diagnostic procedures: The Women’s Health Initiative randomized trial. Journal of the American Medical Association, 290(13), 1739–1748. Anderson, G. L., Kooperberg, C., Geller, N., Rossouw, J. E., Pettinger, M., & Prentice, R. L. (2007). Monitoring and reporting of the Women’s Health Initiative randomized hormone therapy trials. Clinical Trials, 4, 207–217. Barrett-Connor, E., & Grady, D. (1998). Hormone replacement therapy, heart disease and other considerations. Annual Review of Public Health, 19, 55–72. Bush, T. L., Barrett-Connor, E., Cowan, L. D., Criqui, M. H., Wallace, R. B., Suchindran, C. M., et al. (1987). Cardiovascular mortality and noncontraceptive use of estrogen in women: Results from the Lipid Research Clinics Program Follow-up Study. Circulation, 75, 1102–1109. Cauley, J. A., Robbins, J., Chen, Z., Cummings, S. R., Jackson, R. D., LaCroix, A. Z., et al. (2003). Effects of estrogen plus progestin on risk of fracture and bone mineral density: The Women’s Health Initiative randomized trial. Journal of the American Medical Association, 290, 1729–1738.
98
Causality and Psychopathology
Chlebowski, R. T., Hendrix, S. L., Langer, R. D., Stefanick, M. L., Gass, M., Lane, D., et al. (2003). Influence of estrogen plus progestin on breast cancer and mammography in healthy postmenopausal women: The Women’s Health Initiative randomized trial. Journal of the American Medical Association, 289, 3243–3253. Chlebowski, R. T., Wactawski-Wende, J., Ritenbaugh, C., Hubbell, F. A., Ascensao, J., Rodabough, R. J., et al. (2004). Estrogen plus progestin and colorectal cancer in postmenopausal women. New England Journal of Medicine, 350, 991–1004. Clarkson, T. B., Anthony, M. S., & Klein, K. P. (1996). Hormone replacement therapy and coronary artery atherosclerosis: The monkey model. British Journal of Obstetrics and Gynaecology, 103(Suppl. 13), 53–58. Curb, J. D., Prentice, R. L., Bray, P. F., Langer, R. D., Van Horn, L., Barnabei, V. M., et al. (2006). Venous thrombosis and conjugated equine estrogen in women without a uterus. Archives of Internal Medicine, 166, 772–780. Cushman, M., Kuller, L. H., Prentice, R., Rodabough, R. J., Psaty, B. M., Stafford, R. S., et al. (2004). Estrogen plus progestin and risk of venous thrombosis. Journal of the American Medical Association, 292, 1573–1580. Food and Nutrition Board and Board on Health Sciences Policy (1993). An assessment of the NIH Women’s Health Initiative. S. Thaul and D. Hotra (Eds.). Washington, DC: National Academy Press. Grady, D., Rubin, S. M., Pettiti, D. B., Fox, C. S., Black, D, Ettinger, B., et al. (1992). Hormone therapy to prevent disease and prolong life in postmenopausal women. Annals of Internal Medicine, 117, 1016–1036. Hays, J., Hunt, J. R., Hubbell, F. A., Anderson, G. L., Limacher, M., Allen, C., et al. (2003). The Women’s Health Initiative recruitment methods and results. Annals of Epidemiology, 13, S18–S77. Hendrix, S. L., Wassertheil-Smoller, S., Johnson, K. C., Howard, B. V., Kooperberg, C., Rossouw, J. E., et al. (2006). Effects of conjugated equine estrogen on stroke in the Women’s Health Initiative. Circulation, 113, 2425–2434. Hersh, A. L., Stefanick, M., & Stafford, R. S. (2004). National use of postmenopausal hormone therapy. Journal of the American Medical Association, 291, 47–53. Hough, J. L., & Zilversmit, D. B. (1986). Effect of 17-b estradiol on aortic cholesterol content and metabolism in cholesterol-fed rabits. Arteriosclerosis, 6, 57–64. Hsia, J., Langer, R. D., Manson, J. E., Kuller, L., Johnson, K. C., Hendrix, S. L., et al. (2006). Conjugated equine estrogens and coronary heart disease: The Women’s Health Initiative. Archives of Internal Medicine, 166, 357–365. Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B., et al. (1998). Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Journal of the American Medical Association, 280, 605–613. Jackson, R. D., Wactawski-Wende, J., LaCroix, A. Z., Pettinger, M., Yood, R. A., Watts, N. B., et al. (2006). Effects of conjugated equine estrogen on risk of fractures and BMD in postmenopausal women with hysterectomy: Results from the Women’s Health Initiative randomized trial. Journal of Bone and Mineral Research, 21, 817–828. Manson, J. E., Hsia, J., Johnson, K. C., Rossouw, J. E., Assaf, A. R., Lasser, N. L., et al. (2003). Estrogen plus progestin and the risk of coronary heart disease. New England Journal of Medicine, 349, 523–534. Million Women Study Collaborators (2003). Breast cancer and hormone replacement therapy in the Million Women Study. Lancet, 362, 419–427.
5 Understanding the Effects of Menopausal Hormone Therapy
99
Pick, R., Stamler, J., Robard, S., & Katz, L. N. (1952). The inhibition of coronary atherosclerosis by estrogens in cholesterol-fed chicks. Circulation, 6, 276–280. Prentice, R. L., Langer, R., Stefanick, M., Howard, B., Pettinger, M., Anderson, G., et al. (2005). Combined postmenopausal hormone therapy and cardiovascular disease: Toward resolving the discrepancy between Women’s Health Initiative clinical trial and observational study results. American Journal of Epidemiology, 162, 404–414. Prentice, R. L., Langer, R., Stefanick, M., Howard, B., Pettinger, M., Anderson, G., et al. (2006). Combined analysis of Women’s Health Initiative observational and clinical trial data on postmenopausal hormone treatment and cardiovascular disease. American Journal of Epidemiology, 163, 589–599. Prentice, R. L., Chlebowski, R. T., Stefanick, M. L., Manson, J. E., Pettinger, M., Hendrix, S. L., et al. (2008a). Estrogen plus progestin therapy and breast cancer in recently postmenopausal women. American Journal of Epidemiology, 167, 1207–1216. Prentice, R. L., Chlebowski, R. T., Stefanick, M. L., Manson, J. E., Langer, R. D., Pettinger, M., et al. (2008b). Conjugated equine estrogens and breast cancer risk in the Women’s Health Initiative clinical trial and observational study. American Journal of Epidemiology, 167, 1407–1415. Prentice, R. L., Pettinger, M., Beresford, S. A., Wactawski-Wende, J., Hubbell, F. A., Stefanick, M. L., et al. (2009). Colorectal cancer in relation to postmenopausal estrogen and estrogen plus progestin in the Women’s Health Initiative clinical trial and observational study. Cancer Epidemiology, Biomarkers and Prevention, 18, 1531–1537. Ritenbaugh, C., Stanford, J. L., Wu, L., Shikany, J. M., Schoen, R. E., Stefanick, M. L., et al. (2008). Conjugated equine estrogens and colorectal cancer incidence and survival: The Women’s Health Initiative randomized clinical trial. Cancer Epidemiology, Biomarkers and Prevention, 17, 2609–2618. Robbins, J., & Finklestein, D. (2000). Correcting for non-compliance and dependent censoring in an AIDS clinical trial with inverse probability of censoring weighted (IPCW) log-rank tests. Biometrics, 56, 779–788. Rossouw, J. E., Prentice, R. L., Manson, J. E., Wu, L., Barad, D., Barnabei, V. M., et al. (2007). Postmenopausal hormone therapy and risk of cardiovascular disease by age and years since menopause. Journal of the American Medical Association, 297, 1465–1477. Stampfer, M. J., & Colditz, G. A. (1991). Estrogen replacement therapy and coronary heart disease: A quantitative assessment of the epidemiologic evidence. Preventive Medicine, 20, 47–63. Stefanick, M. L., Anderson, G. L., Margolis, K. L., Hendrix, S. L., Rodabough, R. J., Paskett, E. D., et al. (2006). Effects of conjugated equine estrogens on breast cancer and mammography screening in postmenopausal women with hysterectomy. Journal of the American Medical Association, 295, 1647–1657. Steinberg, K. K., Thacker, S. B., Smith, S. J., Stroup, D. F., Zack, M. M., Flanders, W. D., et al. (1991). A meta-analysis of the effect of estrogen replacement therapy on the risk of breast cancer. Journal of the American Medical Association, 265, 1985–1990. Wassertheil-Smoller, S., Hendrix, S. L., Limacher, M., Heiss, G., Kooperberg, C., Baird, A., et al. (2003). Effect of estrogen plus progestin on stroke in postmenopausal women: The Women’s Health Initiative: a randomized trial. Journal of the American Medical Association, 289, 2673–2684.
100
Causality and Psychopathology
Women’s Health Initiative Steering Committee (2004). Effects of conjugated equine estrogen in postmenopausal women with hysterectomy: The Women’s Health Initiative randomized controlled trial. Journal of the American Medical Association, 291, 1701–1712. Women’s Health Initiative Study Group (1998). Design of the Women’s Health Initiative clinical trial and observational study. Controlled Clinical Trials, 19, 61–109. Writing Group for the Women’s Health Initiative Investigators (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the Women’s Health Initiative randomized controlled trial. Journal of the American Medical Association, 288, 321–333. Wysowski, D. K., Golden, L., & Burke, L. (1995). Use of menopausal estrogens and medroxyprogesterone in the United States, 1982–1992. Obstetrics and Gynecology, 85, 6–10.
part ii Innovations in Methods
This page intentionally left blank
6 Alternative Graphical Causal Models and the Identification of Direct Effects james m. robins and thomas s. richardson
Introduction The subject-specific data from either an observational or experimental study consist of a string of numbers. These numbers represent a series of empirical measurements. Calculations are performed on these strings and causal inferences are drawn. For example, an investigator might conclude that the analysis provides strong evidence for ‘‘both an indirect effect of cigarette smoking on coronary artery disease through its effect on blood pressure and a direct effect not mediated by blood pressure.’’ The nature of the relationship between the sentence expressing these causal conclusions and the statistical computer calculations performed on the strings of numbers has been obscure. Since the computer algorithms are well-defined mathematical objects, it is crucial to provide formal causal models for the English sentences expressing the investigator’s causal inferences. In this chapter we restrict ourselves to causal models that can be represented by a directed acyclic graph. There are two common approaches to the construction of causal models. The first approach posits unobserved fixed ‘potential’ or ‘counterfactual’ outcomes for each unit under different possible joint treatments or exposures. The second approach posits relationships between the population distribution of outcomes under experimental interventions (with full compliance) to the set of (conditional) distributions that would be observed under passive observation (i.e., from observational data). We will refer to the former as ‘counterfactual’ causal models and the latter as ‘agnostic’ causal models (Spirtes, Glymour, & Scheines, 1993) as the second approach is agnostic as to whether unit-specific counterfactual outcomes exist, be they fixed or stochastic. The primary difference between the two approaches is ontological: The counterfactual approach assumes that counterfactual variables exist, while the 103
104
Causality and Psychopathology
agnostic approach does not require this. In fact, the counterfactual theory logically subsumes the agnostic theory in the sense that the counterfactual approach is logically an extension of the latter approach. In particular, for a given graph the causal contrasts (i.e. parameters) that are well-defined under the agnostic approach are also well-defined under the counterfactual approach. This set of contrasts corresponds to the set of contrasts between treatment regimes (strategies) which could be implemented in an experiment with sequential treatment assignments (ideal interventions), wherein the treatment given at stage m is a (possibly random) function of past covariates on the graph. We refer to such contrasts or parameters as ‘manipulable with respect to a given graph’. As discussed further in Section 1.8, the set of manipulable contrasts for a given graph are identified under the associated agnostic causal model from observational data with a positive joint distribution and no hidden (i.e. unmeasured) variables. A parameter is said to be identified if it can be expressed as a known function of the distribution of the observed data. A discrete joint distribution is positive if the probability of a joint event is nonzero whenever the marginal probability of each individual component of the event is nonzero. Although the agnostic theory is contained within the counterfactual theory, the reverse does not hold. There are causal contrasts that are welldefined within the counterfactual approach that have no direct analog within the agnostic approach. An example that we shall discuss in detail is the pure direct effect (also known as a natural direct effect) introduced in Robins and Greenland (1992). The pure direct effect (PDE) of a binary treatment X on Y relative to an intermediate variable Z is the effect the treatment X would have had on Y had (contrary to fact) the effect of X on Z been blocked. The PDE is non-manipulable relative to X, Y and Z in the sense that, in the absence of additional assumptions, the PDE does not correspond to a contrast between treatment regimes of any randomized experiment performed via interventions on X, Y and Z. In this chapter, we discuss three counterfactual models, all of which agree in two important respects: first they agree on the set of well-defined causal contrasts; second they make the consistency assumption that the effect of a (possibly joint) treatment on a given subject depends neither on whether the treatment was freely chosen by, versus forced on, the subject nor on the treatments received by other subjects. However the counterfactual models do not agree as to the subset of these contrasts that can be identified from observational data with a positive joint distribution and no hidden variables. Identifiability of causal contrasts in counterfactual models is obtained by assuming that (possibly conditional on prior history) the treatment received at a given time is independent of some set of counterfactual outcomes. Different versions of this independence assumption are
6 Alternative Graphical Causal Models
105
possible: The stronger the assumption (i.e., the more counterfactuals assumed independent of treatment), the more causal contrasts that are identified. For a given graph, G, all the counterfactual models we shall consider identify the set of contrasts identified under the agnostic model for G. We refer to this set of contrasts as the manipulable contrasts relative to G. Among the counterfactual models we discuss, the model derived from the non-parametric structural equation model (NPSEM) of Pearl (2000) makes the strongest independence assumption; indeed, the assumption is sufficiently strong that the PDE may be identified (Pearl, 2001). In contrast, under the weaker independence assumption of the Finest Fully Randomized Causally Interpretable Structured Tree Graph (FFRCISTG) counterfactual model of Robins (1986) or the Minimal Counterfactual Model (MCM) introduced in this chapter, the PDE is not identified. The MCM is the weakest counterfactual model (i.e., contains the largest set of distributions over counterfactuals) that satisfies the consistency assumption and identifies the set of manipulable contrasts based on observational data with a positive joint distribution and no hidden variables. The MCM is equivalent to the FFRCISTG model when all variables are binary. Otherwise the MCM is obtained by a mild further weakening of the FFRCISTG independence assumption. The identification of the non-manipulable PDE parameter under an NPSEM appears to violate the slogan ‘‘no causation without manipulation.’’ Indeed, (Pearl, 2010) has recently advocated the alternative slogan ‘‘causation before manipulation’’ in arguing for the ontological primacy of causation relative to manipulation. Such an ontological primacy follows, for instance, from the philosophical position that all dependence between counterfactuals associated with different variables is due to the effects of common causes (that are to be included as variables in the model and on the associated graph, G), thus privileging the NPSEM over other counterfactual models. (Pearl, 2010) privileges the NPSEM over other models but presents different philosophical arguments for his position. Pearl’s view is anathema to those with a refutationist view of causality (e.g., Dawid (2000)) who argue that a theory that allows identification of nonmanipulable parameters (relative to a graph G) is not a scientific theory because some of its predictions (e.g., that the PDE is a particular function of the distribution of the observed data) are not experimentally testable and, thus, are non-refutable. Indeed, in Appendix B, we give an example of a data generating process that satisfies the assumptions of an FFRCISTG model but not those of an NPSEM such that (i) the NPSEM prediction for the PDE is false but (ii) the predictions made by all four causal models for the manipulable parameters relative to the associated graph G are correct. In this setting, anyone who assumed an NPSEM would falsely believe he or she was able to
106
Causality and Psychopathology
consistently estimate the PDE parameter from observational data on the variables on G and no possible experimental intervention on these variables could refute either their belief in the correctness of the NPSEM or their belief in the validity (i.e., consistency) of their estimator of the PDE. In Appendix C, we derive sharp bounds for the PDE under the assumption that the FFRCISTG model associated with graph G holds. We find that these bounds may be quite informative, even though the PDE is not (point) identified. This strict refutationist view of causality relies on the belief that there is a sharp separation between the manipulable and non-manipulable causal contrasts (relative to graph G) because every prediction made concerning a manipulable contrast based on observational data can be checked by an experiment involving interventions on variables in G. However, this view ignores the facts that (i) such experiments may be infeasible or unethical; (ii) such empirical experimental tests will typically require an auxiliary assumption of exchangeability between the experimental and observational population and the ability to measure all the variables included in the causal model, neither of which may hold in practice; and (iii) such tests are themselves based upon the untestable assumption that experimental interventions are ideal. Thus, many philosophers of science do not agree with the strict refutationist’s sharp separation between manipulable and non-manipulable causal contrasts. However, Pearl does not rely on this argument in responding to the refutationist critique of the NPSEM that it can identify a contrast, the PDE, that is not subject to experimental test. Rather, he has responded by describing a scenario in which the PDE associated with a particular NPSEM is identifiable, scientifically meaningful, of substantive interest, and corresponds precisely to the intent-to-treat parameter of a certain randomized controlled experiment. Pearl’s account appears paradoxical in light of the results described above, since it suggests that the PDE may be identified via intervention. Resolving this apparent contradiction is the primary subject of this chapter. We will show that implicit within Pearl’s account is a causal model associated with an expanded graph (G0) containing more variables than Pearl’s original graph (G). Furthermore, although the PDE of the original NPSEM counterfactual model is not a manipulable parameter relative to G, it is manipulable relative to the expanded graph G0. Consequently, the PDE is identified by all four of the causal models (agnostic, MCM, FFRCISTG and NPSEM) associated with G0. The causal models associated with graph G0 formalize Pearl’s verbal, informal account and constitute the ‘‘additional assumptions’’ required to make the original NPSEM’s pure direct effect contrast equal to a contrast between treatments in a randomized
6 Alternative Graphical Causal Models
107
experiment—a randomized experiment whose treatments correspond to variables on the expanded graph G0 that are absent from the original graph G. However, the distribution of the variables of the expanded graph G0 is not positive. Furthermore, the available data are restricted to the variables of the original graph G. Hence, it is not at all obvious prima facie that the expanded causal model’s treatment contrasts will be identified. Remarkably, we prove that, under all four causal models associated with the larger graph G0, the manipulable causal contrast of the expanded causal model that equals the PDE of Pearl’s original NPSEM G is identified from observational data on the original variables. This identification crucially relies on certain deterministic relationships between variables in the expanded model. Our proof thus resolves the apparent contradiction; furthermore, it shows that the ontological primacy of manipulation reflected in the slogan ‘‘no causation without manipulation’’ can be maintained by interpreting the PDE parameter of a given counterfactual causal model as a manipulable causal parameter in an appropriate expanded model. Having said this, although in Pearl’s scenario the intervention associated with the expanded causal model was scientifically plausible, we discuss a modification of Pearl’s scenario in which the intervention required to interpret the PDE contrast of the original graph G as a manipulable contrast of an expanded graph G0 is more controversial. Our discussion reveals the scientific benefits that flow from the exercise of trying to provide an interventionist interpretation for a non-manipulable causal parameter identified under an NPSEM associated with a graph G. Specifically, the exercise often helps one devise explicit, and sometimes even practical, interventions, corresponding to manipulable causal parameters of an expanded graph G0. The exercise also helps one recognize when such interventions are quite a stretch. In this chapter, our goal is not to advocate for the primacy of manipulation or of causation. Rather, our goal is to contribute both to the philosophy and to the mathematics of causation by demonstrating that the apparent conflict between these paradigms is often not a real one. The reduction of an identified non-manipulable causal contrast of an NPSEM to a manipulable causal contrast of an expanded model that is then identified via deterministic relationships under the expanded agnostic model is achieved here for the PDE. A similar reduction for the effect of treatment on the treated (i.e. compliers) in a randomized trial with full compliance in the placebo arm was given by Robins, VanderWeele, and Richardson (2007); see also Geneletti and Dawid (2007) and Appendix A herein. This chapter extends and revises previous discussions by Robins and colleagues (Robins & Greenland, 1992; Robins, 2003; Robins, Rotnitzky,
108
Causality and Psychopathology
& Vansteelandt, 2007) of direct and indirect effects. We restrict consideration to causal models, such as the agnostic, FFRCISTG, MCM, and NPSEM, that can be represented by a directed acyclic graph (DAG). See Robins, Richardson, and Spirtes (2009) for a discussion of alternative points of view due to Hafeman and VanderWeele (2010), Imai, Keele, and Yamamoto (2009) and Petersen, Sinisi, and Laan (2006) that are not based on DAGs. The chapter is organized as follows: Section 1 introduces the four types of causal model associated with a graph; Section 2 defines direct effects; Section 3 analyzes the conditions required for the PDE to be identified; Section 4 considers, via examples, the extent to which the PDE may be interpreted in terms of interventions; Section 5 relates our results to the work of Avin, Shpitser, and Pearl (2005) on path-specific causal effects; finally Section 6 concludes.
1 Graphical Causal Models Define a DAG G to be a graph with nodes (vertices) representing the elements of a vector of random variables V ¼ (V1, . . . , VM) with directed edges (arrows) and no directed cycles. To avoid technicalities, we assume all variables Vm are discrete. We let f (v) fV(v) P(V ¼ v) all denote the probability density of V, where, for simplicity, we assume v 2 V V M , V m V 1 V m , V m denotes the assumed known space of possible values vm of Vm, and for any z1, . . . , zm, we define z m ¼ ðz1 ; . . . ;zm Þ. By convention, for any z m ; we define z 0 z0 0. Note V m V 1 V m is the product space of the V j , j m. We do not necessarily assume that f (v) is strictly positive for all v 2 V. As a simple example, consider a randomized trial of smoking cessation, represented by the DAG G with node set V ¼ (X, Z, Y) in Figure 6.1. Thus, M ¼ 3, V1 ¼ X, V2 ¼ Z, V3 ¼ Y. Here, X is the randomization indicator, with X ¼ 0 denoting smoking cessation and X ¼ 1 active smoking; Z is an indicator variable for hypertensive status 1 month post randomization; Y is an
X
Z
Y
Figure 6.1 A simple DAG containing a treatment X, an intermediate Z and a response Y.
6 Alternative Graphical Causal Models
109
indicator variable for having a myocardial infarction (MI) by the end of follow-up at 3 months. For simplicity, assume complete compliance with the assigned treatment and assume no subject had an MI prior to 1 month. We refer to the variables V as factual variables as they are variables that could potentially be recorded on the subjects participating in the study. Because in this chapter our focus is on identification, we assume the study population is sufficiently large that sampling variability can be ignored. Then, the density f (v) ¼ f (x, z, y) of the factual variables can be taken to be the proportion of our study population with X ¼ x, Z ¼ z, Y ¼ y. Our ultimate goal is to try to determine whether X has a direct effect on Y not through Z. We use either PAVm or PAm to denote the parents of Vm, that is, the set of nodes from which there is a direct arrow into Vm. For example, in Figure 6.1, PAY ¼ {X, Z}. A variable Vj is a descendant of Vm if there is a sequence of nodes connected by edges between Vm and Vj such that, following the direction indicated by the arrows, one can reach Vj by starting at Vm, that is, Vm ! ! Vj. Thus, in Figure 6.1, Z is a descendant of X but not of Y. We suppose that, as in Figure 6.1, the V ¼ (V1, . . . , VM) are numbered so that Vj is not a descendant of Vm for m > j. Let R ¼ (R1, . . . , RK) denote any subset of V and let r ¼ (r1, . . . , rK) be a value of R. We write Rj ¼ Vm, Rj ¼ V m if the jth variable in R corresponds to the mth variable in V. The NPSEM, MCM and FFRCISTG models all assume the existence of the counterfactual random variable Vm(r) encoding the value the variable Vm would have if, possibly contrary to fact, R were set to r, r 2 R ¼ R1 RK, where Vm(r) is assumed to be well-defined in the sense that there is reasonable agreement as to the hypothetical intervention (i.e., closest possible world) which sets R to r (Robins & Greenland, 2000). For example, in Figure 6.1, Z(x ¼ 1) and Y(x ¼ 1) are a subject’s Z and Y had, possibly contrary to fact, the subject been a smoker. By assumption, if Rj 2 R is the mth variable Vm, then Vm(r) equals the value rj to which the variable Vm ¼ Rj was set. For example, in Figure 6.1, the counterfactual X(x ¼ 1) is equal to 1. Note we assume Vm(r) is well-defined even when the factual probability P(R ¼ r) is zero. We recognize that under certain circumstances such an assumption might be ‘metaphysically suspect’ because the counterfactuals could be ‘radically’ ill-defined, since no one was observed to receive the treatment in question. However, in our opinion, in a number of the examples that we consider in this chapter these counterfactuals do not appear to be much less well-defined than those corresponding to treatments that have positive factual probability. We often write the density fV(r)(v) of V(r) as fr int ðvÞ, with ‘int’ being short for intervene, to emphasize the fact that fVðrÞ ðvÞ ¼ fr int ðvÞ represents the
110
Causality and Psychopathology
density of V in the counterfactual world where we intervened and set each subject’s R to r. We say that fr int ðvÞ is the density of V had, contrary to fact, each subject followed the treatment regime r. In contrast, f (v) is the density of the factual variables V. With this background, we specify our four causal models.
1.1 FFRCISTG Causal Models Given a DAG G with node set V, an FFRCISTG model associated with G makes four assumptions. (i) All one-step-ahead counterfactuals Vm ðv m1 Þ exist for any setting v m1 2 V m1 of their predecessors. For example, in Figure 6.1, a subject’s hypertensive status Z(x) ¼ V2(v1) at smoking level x for x ¼ 0 and for x ¼ 1 exists and a subject’s MI status Yðx;zÞ ¼ V3 ðv 2 Þ at each joint level of smoking and hypertension exists. Because V1 ¼ X has no predecessor, V1 ¼ X exists only as a factual variable. (ii) Vm ðv m1 Þ Vm ðpam Þ is a function of v m1 only through the values pam of Vm’s parents on G. For example, were the edge X ! Y missing in Figure 6.1, this assumption would imply that Y(x, z) ¼ Y(z) for every subject and every z. That is, the absence of the edge would imply that smoking X has no effect on Y other than through its effect on Z. (iii) Both the factual variables Vm and the counterfactuals Vm(r) for any R V are obtained recursively from the one-step-ahead counterfactuals Vj ðv j1 Þ, for j m. For example, V3 ¼ V3(V1, V2(V1)) and V3(v1) ¼ V3(v1, V2(v1)). Thus, in Figure 6.1, with the treatment R being smoking X, a subject’s possibly counterfactual MI status Y(x ¼ 1) ¼ V3(v1 ¼ 1) had he been forced to smoke is Y(x ¼ 1, Z(x ¼ 1)) and, thus, is completely determined by the one-step-ahead counterfactuals Z(x) and Y(x, z). That is, Y(x ¼ 1) is obtained by evaluating the one-step-ahead counterfactual Y(x ¼ 1, z) at z ¼ Z(x ¼ 1). Similarly, a subject’s factual X and one-step-ahead counterfactuals determine the subject’s factual hypertensive status Z and MI status Y as Z(X) and Y(X, Z(X)) where Z(X) is the counterfactual Z(x) evaluated at x ¼ X and Y(X, Z(X)) is the counterfactual Y(x, z) evaluated at (x, z)¼ (X, Z(X)).
6 Alternative Graphical Causal Models
(iv) The following independence holds: ?Vm ðvm1 Þ j V m1 ¼ vm1 ; Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ?
111
ð6:1Þ
for all m and all vM1 2 V M1 ; where for a fixed v M1 , v k ¼ ðv1 ; . . . ;vk Þ, k < M 1 denotes the initial subvector of v M1 . Assumption (iv) is equivalent to the statement that for each m, conditional on the factual past V m1 ¼ v m1 , the factual variable Vm is independent of any possible evolution from m + 1 of one-step-ahead counterfactuals (consistent with V m1 ¼ v m1 ), i.e. fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg, for some v M1 of which v m1 is a sub-vector. This follows since by (iii), Vm Vm ðV m1 Þ ¼ Vm ðv m1 Þ when V m1 ¼ v m1 . Note that by (iii) above, the counterfactual Vmþ1 ðv m Þ for a given subject, say subject i, depends on the treatment v m received by the subject but does not depend on the treatment received by any other subject. Further, Vmþ1 ðv m Þ takes the same value whether the treatment v m is counter to fact (i.e., V m 6¼ v m Þ or factual (i.e., V m ¼ v m and thus Vmþ1 ðv m Þ ¼ Vmþ1 Þ. That is, the FFRCISTG model satisfies the consistency assumption described in the Introduction. Indeed, we shall henceforth refer to (iii) as the ‘consistency assumption’. The following example will play a central role in the chapter. Example 1 Consider the FFRCISTG model associated with the graph in Figure 6.1, then, for all z, Yðx ¼ 1; zÞ; Zðx ¼ 1Þ ? ? X;
Yðx ¼ 0; zÞ; Zðx ¼ 0Þ ? ?X
ð6:2Þ
Yðx ¼ 0; zÞ ? ? Zðx ¼ 0Þ j X ¼ 0
ð6:3Þ
and Yðx ¼ 1; zÞ ? ? Zðx ¼ 1Þ j X ¼ 1;
are true statements by assumption (iv). However, the model makes no claim as to whether Yðx ¼ 1; zÞ ? ? Zðx ¼ 0Þ j X ¼ 0 and Yðx ¼ 1; zÞ ? ? Zðx ¼ 0Þ j X ¼ 1 are true because, for example, the value of x in Y(x ¼ 1, z) differs from the value x ¼ 0 in Z(x ¼ 0). We shall see that all four of the above independence statements are true by assumption under the NPSEM associated with the graph in Figure 6.1.
Causality and Psychopathology
112
1.2 Minimal Counterfactual Models (MCMs) An MCM differs from an FFRCISTG model only in that (iv) is replaced by: (iv*) For all m and all v M1 2 V M1 , f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 ; Vm ¼ vm Þ ¼ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 Þ:
ð6:4Þ
Since (iv) can be written as the condition that: For all m, all v M1 2 V M1 ; and all vm 2 Vm, Þ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1; Vm ¼ vm
¼ f ðVmþ1 ðvm Þ; . . . ; VM ðvM1 Þ j V m1 ¼ vm1 Þ; condition (iv) for an FFRCISTG implies condition (iv*) for an MCM. However, the reverse does not hold. An MCM requires only that the last display holds for the unique value vm of Vm that occurs in the given v M1 . Thus, Equation (6.4) states that, conditional on the factual past V m1 ¼ v m1 through m 1, any possible evolution from m + 1 of one-step-ahead counterfactuals, fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg, consistent with the past V m ¼ v m through m, is independent of the event Vm ¼ vm. In other words,
Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ? ?IðVm ðvm1 Þ ¼ vm Þ j V m1 ¼ vm1 ; for all m and all vM1 2 V M1 ;
ð6:5Þ
where I(Vm ¼ vm) is the Bernoulli indicator random variable. It follows that in the special case where all the Vm are binary, an MCM and an FFRCISTG model are equivalent because, for Vm binary, the random variables Vm and I(Vm ¼ vm) are the same (possibly up to a recoding). Physical randomization of X and/or Z implies counterfactual independencies beyond those of an FFRCISTG or MCM model, see Robins et al. (2009). However, these extra independencies fail to imply Z(0)? ?Y (1; z) for the graph in Figure 6.1 and hence do not affect our results.
1.3 A Representation of MCMs and FFRCISTG Models that Does Not Condition on the past In this section we derive alternative characterizations of the counterfactual independence conditions for the FFRCISTG model and the MCM that will facilitate the comparison of these models with the NPSEM.
6 Alternative Graphical Causal Models
113
Theorem 1 Given an FFRCISTG model associated with a graph G: (a) The set of independences in condition (6.1) is satisfied if and only if for each vM1 2 V M1 ; the random variables Vmþ1 ðvm Þ; m ¼ 0; . . . ; M 1 are mutually independent:
ð6:6Þ
(b) Furthermore, the set of independences (6.6) is the same for any ordering of the variables compatible with the descendant relationships in G. Proof of (a): ()) Given v M1 and m 2 {1, . . . , M 1}, we define =m = {=m,m, . . . ,=M,m} to be a set of conditional independence statements: i)
=m;m : VM ðvM1 Þ; . . . ; Vmþ1 ðvm Þ ? ?Vm ðvm1 Þ; and
ii) for j ¼ 1 to j ¼ M m, =mþj;m : ? Vmþj ðvmþj1 Þ j V M ðvM1 Þ; . . . ; V mþjþ1 ðvmþj Þ ? V mþj1 ðvmþj2 Þ¼ vmþj1 ; . . . ; V m ðvm1 Þ¼ vm : First, note that the set of independences in condition (6.1) is precisely =1. Now, if the collection =m holds (for m < M 2) then =m+1 holds since (I) the set =m+1 is precisely the set {=m+1,m , . . . , =M,m} except with Vm ðv m1 Þ removed from all conditioning events and (II) =m,m licenses such removal. Thus, beginning with =1, we recursively obtain that =m and thus =m,m holds for m ¼ 1, . . . , M 1. The latter immediately implies that the variables Vm+1ðv m Þ, m ¼ 0, . . . , M 1 are mutually independent. (() The reverse implication is immediate upon noting that the conditioning event V m1 ¼ v m1 in Equation (6.1) is the event V0 ¼ v0, V1(v0) ¼ v1, . . . ,Vm1ðv m2 Þ ¼ vm1. Proof of (b): This follows immediately from the assumption that Vm ðv m1 Þ ¼ Vm ðpam Þ. œ Theorem 2 Given an MCM associated with a graph G: (a) The set of independences in condition (6.5) is satisfied if and only if for each v M1 2 V M1 , and each m 2 {1, . . . , M 1}, ? IðVm ðvm1 Þ ¼ vm Þ: ð6:7Þ Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ? (b) Furthermore, the set of independences (6.7) is the same for any ordering of the variables compatible with the descendant relationships in G. An immediate corollary is the following.
114
Causality and Psychopathology
Corollary 3 An MCM implies that for all v M1 2 V M1 , the random variables IðVmþ1 ðv m Þ ¼ vmþ1 Þ, m ¼ 0, . . . , M 1 are mutually independent. Proof of Theorem 2(a): ()) Given v M1 , the proof exactly follows that of the previous theorem when we redefine: i)
=m;m : VM ðvM1 Þ; . . . ; Vmþ1 ðvm Þ ? ? IðVm ðvm1 Þ ¼ vm Þ; and
ii) for j ¼ 1 to j ¼ M m, =mþj;m : V M ðvM1 Þ; . . . ; V mþjþ1 ðvmþj Þ ? ? IðVmþj ðvmþj1 Þ ¼ vmþj Þ j V mþj1 ðvmþj2 Þ¼ vmþj1 ; . . . ; Vm ðvm1 Þ¼ vm : The reverse implication and (b) follows as in the proof of the previous theorem. œ
1.4 Non-Parametric Structural Equation Models (NPSEMs) Given a DAG G with node set V, an NPSEM associated with G assumes that there exist mutually independent random variables m and deterministic unknown functions fm such that the counterfactual Vm ðv m1 Þ Vm ðpam Þ is given by fm(pam, m) and both the factual variables Vm and the counterfactuals Vm(x) for any X V are obtained recursively from the Vm ðv m1 Þ as in (iii) in Section 1.1. Under an NPSEM both the FFRCISTG condition (6.1) and the MCM condition (6.5) hold. However, an FFRCISTG or MCM associated with G will not, in general, be an NPSEM for G. Indeed an NPSEM implies Vmþ1 ðvm Þ; . . . ; VM ðvM1 Þ ? ? Vm ðv m1 Þ j V m1 ¼ vm1 ; ð6:8Þ for all m; all vM1 2 V M1 ; and all vm1 ; vm1 2 V m1 : That is, conditional on the factual past V m1 ¼ v m1 , the counterfactual Vm ðv m1 Þ is statistically independent of all future one-step ahead counterfactuals. This implies that all four statements in Example 1 are true under an NPSEM; see also Pearl (2000, Section 3.6.3). Hence, in an MCM or FFRCISTG model, in contrast to an NPSEM, the defining independences are those for which the value of v m1 in (a) the conditioning event, (b) the counterfactual Vm at m and (c) the set of future one-step-ahead counterfactuals fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg are equal. Thus, an FFRCISTG assumes independence of fVmþ1 ðv m Þ; . . . ;VM ðv M1 Þg and
6 Alternative Graphical Causal Models
115
Vm ðv m1 Þ given Vm1 ¼ v m1 only when v m1 ¼ v m1 ¼ v m1 . As mentioned above, the MCM further weakens the independence by replacing Vm with I(Vm ¼ vm). In Appendix B we describe a data-generating process leading to a counterfactual model that is an MCM/FFRCISTG model associated with Figure 6.1, but not an NPSEM for this figure. Understanding the implications of these additional counterfactual independences assumed by an NPSEM compared to an MCM or FFRCISTG model is one of the central themes of this chapter.
1.5 The g-Functional Before defining our third causal model, the agnostic causal model, we need to define the g-functional density. The next Lemma shows that the assumptions of an MCM, and thus a fortiori those of the NPSEMs and FFRCISTG models, restrict the joint distribution of the factual variables (when there are missing edges in the DAG). Lemma 4 In an MCM associated with DAG G, for all v such that f (v) > 0, the density f (v) P(V ¼ v) of the factuals V satisfies the Markov factorization f ðvÞ ¼
M Y
f ðvj j paj Þ:
ð6:9Þ
j¼1
Robins (1986) proved Lemma 4 for an FFRCISTG model; the proof applies equally to an MCM. Equation (6.9) is equivalent to the statement that each variable Vm is conditionally independent of its non-descendants given its parents (Pearl, 1988). Example 2 In Figure 6.1, f (x, z, y) ¼ f (y | x, z) f (z | x) f (x). If the arrow from X to Y were missing, we would then have f (x, z, y) ¼ f (y | z) f (z | x) f (x) since Z would be the only parent of Y. Definition 5 Given a DAG G, a set of variables R V, and a value r of R, define the g-functional density (Q Pr ðV ¼ vÞ fr ðvÞ
0
j:Vj 62R
f ðvj j paj Þ
if v ¼ ðu; rÞ; if v ¼ ðu; r Þ with r 6¼ r:
In words, fr(v) is the density obtained by modifying the product on the right-hand side of Equation (6.9) by removing the term f (vj | paj) for every
116
Causality and Psychopathology
Vj 2 R, while for Vj 62 R, for each Rm 2 R in PAj, set Rm to the value rm in the term f (vj | paj). Note the probability that R equals r is 1 under the density fr(v), that is, Pr(R ¼ r) fr(r) ¼ 1. The density fr(z) may not be a well-defined function of the density f (v) of the factual data V when the factual distribution of V is non-positive because setting Rm 2 PAj to the value rm in f (vj | paj) may result in conditioning on an event that has probability zero of occurring in the factual distribution. Example 3 In Figure 6.1 with R ¼ (X, Z), r ¼ (x ¼ 1, z ¼ 0), fr(v) fx¼1,z¼0(x*, z*, y) ¼ f (y | x ¼ 1, z ¼ 0) if (x*, z*) ¼ (1, 0). On the other hand, fx¼1,z¼0(x*, z*, y) ¼ 0 if (x*, z*) 6¼ (1, 0) since, under fx¼1,z¼0(x*, z*, y), X is always 1 and Z always 0. It follows that fx¼1,z¼0(y) ¼ f (y | x ¼ 1, z ¼ 0). If the event (X, Z) ¼ (1, 0) has probability zero under f (v) then fx¼1,z¼0(y) is not a function of f (v) and is thus not uniquely defined. The following Lemma connects the g-functional density fr(v) to the intervention density frint ðvÞ. Lemma 6 Given an MCM associated with a DAG G, sets of variables R, Z V and a treatment regime r, if the g-functional density fr(z) is a well-defined function of f (v), then fr ðzÞ ¼ frint ðzÞ. In words, whenever the g-functional density fr(z) is a well-defined function of f (v), it is equal to the intervention density for Z that would be observed had, contrary to fact, all subjects followed the regime r. This result can be extended from so-called static treatment regimes r to general treatment regimes, where treatment is a (possibly random) function of the observed history, as follows. Suppose we are given a set of variables R and for each Vj ¼ Rm 2 R we are given a density pj ðvj j v j1 Þ. Then, we define pR to be the general treatment regime corresponding to an intervention in which, for each Vj ¼ Rm 2 R, a subject’s treatment level vj is randomly assigned with randomization probabilities pj ðvj j v j1 Þ that are a function of the values of the subset of the variables V j1 that temporally precede Vj. We let fpRint ðvÞ be the distribution of V that would be observed if, contrary to fact, all subjects had been randomly assigned treatment with probabilities pR. Further, we define the g-functional density fpR(v) to be the density Y Y f ðvj j paj Þ pj ðvj j vj1 Þ fpR ðvÞ j:Vj 62R
j:Vj 2R
and for Z V, fpR ðzÞ vnz fpR ðvÞ. Thus the marginal fpR(z) is obtained from fpR(v) by summation in the usual way. Then, we have the following extension of Lemma 6.
6 Alternative Graphical Causal Models
117
Extended Lemma 6: Given an MCM associated with a DAG G, sets of variables R, Z V, and a treatment regime pR, if the g-functional density fpR(z) is a welldefined function of f (v), then fpR ðzÞ ¼ fpRint ðzÞ. In words, whenever the g-functional density fpR(z) is a well-defined function of f (v), it is equal to the intervention density for Z that would be observed had, contrary to fact, all subjects followed the general regime pR. Robins (1986) proved Extended Lemma 6 for an FFRCISTG model; the proof applies equally to an MCM. Extended Lemma 6 actually subsumes Lemma 6 as fr int ðzÞ is fpRint ðzÞ for pR such that for Vj ¼ Rm 2 R, pj ðvj j v j1 Þ ¼ 1 if vj ¼ rm and is zero if vj 6¼ rm. Corollary to Extended Lemma 6: Given an MCM associated with a DAG G, sets of variables R, Z V and a treatment regime pR, fpR ðzÞ ¼ fpint ðzÞ whenever pR R satisfies the following positivity condition: For all Vj 2 R, f ðV j1 Þ > 0 and pj ðvj jV j1 Þ > 0 imply f ðvj j V j1 Þ > 0 with probability one under f (v). This follows directly from Extended Lemma 6, as the positivity condition implies that fpR(z) is a well-defined function of f (v). In the literature, one often sees only the Corollary stated and proved. However, as noted by Gill and Robins (2001), these proofs use the ‘positivity condition’ only to show that fpR(z) is a well-defined (i.e., unique) function of f (v). Thus, these proofs actually establish the general version of Extended Lemma 6. In this chapter we study models in which fpR(z) is a well-defined function of f (v) even though the positivity assumption fails; as a consequence, we require the general version of the Lemma.
1.6 Agnostic Causal Models We are now ready to define the agnostic causal model (Spirtes et al., 1993): Given a DAG G with node set V, the agnostic causal model represented by G assumes that the joint distribution of the factual variables V factors as in Equation (6.9) and that the interventional density of Z V, again denoted by fpRint ðzÞ or fr int ðzÞ, under treatment regime pR or regime r is given by the g-functional density fpR(z) or fr(z), whenever fpR(z) or fr(z) is a well-defined function of f (v). Although this model assumes that density fpRint ðvÞ or fr int ðvÞ of V under these interventions exist, the model makes no reference to counterfactual variables and is agnostic as to their existence. Thus the agnostic causal model does not impose any version of a consistency assumption.
118
Causality and Psychopathology
1.7 Interventions Restricted to a Subset of Variables In this chapter we restrict consideration to graphical causal models in which we assume that interventions on every possible subset of the variables are possible and indeed well-defined. The constraint that only a subset V* of V can be intervened on may be incorporated into the agnostic causal model by marking the elements of V* and requiring that, for any intervention pR, R ˝ V*. For the FFRCISTG model, Robins (1986, 1987) constructs a counterfactual model, the fully randomized causally interpreted structured tree graph (FRCISTG) model that imposes the constraint and reduces to the FFRCISTG model when V* ¼ V. We briey review his approach and its extension to the MCM model in Appendix D. See also the decision-theoretic models of Dawid (2000) and Heckerman and Shachter (1995).
1.8 Manipulable Contrasts and Parameters In the Introduction we defined the set of manipulable contrasts relative to a graph G to be the set of causal contrasts that are well-defined under the agnostic causal model, i.e., the set of contrasts that are functions of the causal effects fpRint ðzÞ. The set consists of all contrasts between treatment regimes in an experiment with sequential treatment assignments, wherein the treatment given at stage m is a function of past covariates on the graph. Definition 7 We say a causal effect in a particular causal model associated with a DAG G with node set V is non-parametrically identified from data on V (or, equivalently, in the absence of hidden variables) if it is a function of the density f (v) of the factuals V. Thus in all four causal models, the causal effects fpint ðzÞ for which the gR functional fpR(z) is a well-defined function of f (v) are non-parametrically identified from data on V. It follows that the manipulable contrasts are non-parametrically identified under an agnostic causal model from observational data with a positive joint distribution and no hidden (i.e., unmeasured) variables. (Recall that a discrete joint distribution is positive if the probability of a joint event is nonzero whenever the marginal probability of each individual component of the event is nonzero.) In contrast, the effect of treatment on the treated ETTðxÞ E½YðxÞ Yð0Þ j X ¼ x is not a manipulable parameter relative to the graph G: X ! Y, since it is not well-defined under the corresponding agnostic causal model. However,
6 Alternative Graphical Causal Models
119
ETT(x) is identified under both MCMs and FFRCISTG models. Robins (2003) stated that an FFRCISTG model identified only ‘‘manipulable parameters.’’ However, in that work, unlike here, no explicit definition of manipulable was used; in particular it was not specified which class of interventions was being considered. In Appendix A we show that the MCMs and FFRCISTG models identify ETT(x), which is not a manipulable parameter relative to the graph X ! Y. However, ETT(x) is a manipulable parameter relative to an expanded graph G0 with deterministic relations; see also Robins, VanderWeele, and Richardson (2007), and Geneletti and Dawid (2007). For expositional simplicity, we will henceforth restrict our discussion to static deterministic regime effects frint ðzÞ except when non-static (i.e., dynamic and/or random) regimes pR are being explicitly discussed.
2 Direct Effects Consider the following query: Do cigarettes (X) have a causal effect on MI (Y) through a pathway that does not involve hypertension (Z)? This query is often rephrased as whether X has a direct causal effect on Y not through the intermediate variable Z. The concept of direct effect has been formalized in three different ways in the literature. For notational simplicity we always take X to be binary, except where noted in Appendix A.
2.1 Controlled Direct Effects (CDEs) Consider a causal model associated with a DAG G with node set V containing (X, Y, Z). In a counterfactual causal model, the individual and average controlled direct effect (CDE) of X on Y when Z is set to z are, respectively, defined as Y(x ¼ 1, z) Y(x ¼ 0, z) and CDE(z) ¼ E[Y(x ¼ 1, z) Y(x ¼ 0, z)]. In our previous notation, E[Y(x ¼ 1, z) Y(x ¼ 0, z)] is the difference in means int int int ½Y Ex¼0;z ½Y of Y under the intervention distributions fx¼1;z ðvÞ and Ex¼1;z int fx¼0;z ðvÞ. Under the associated agnostic causal model, counterfactuals do not int int ½Y Ex¼0;z ½Y. Under all exist but the CDE(z) can still be defined as Ex¼1;z int int four causal models, Ex¼1;z ½Y Ex¼0;z ½Y is identified from data on V by Ex¼1,z [Y] Ex¼0,z [Y] under the g-formula densities fx¼1,z(v) and fx¼0,z(v), if these are well-defined functions of f (v). In the case of Figure 6.1, Ex,z[Y] is just the mean E[Y | X ¼ x, Z ¼ z] of the factual Y given X ¼ x and Z ¼ z since, by the definition of the g-formula, fx,z(y) ¼ f (y | X ¼ x, Z ¼ z). When Z is binary there exist two different controlled direct effects corresponding to z ¼ 1 and z ¼ 0. For example, CDE(1) is the average effect of X
120
Causality and Psychopathology
on Y in the study population were, contrary to fact, all subjects to have Z set to 1. It is possible for CDE(1) to be zero and CDE(0) to be nonzero or vice versa. Whenever CDE(z) is nonzero for some level of Z, there will exist a directed path from X to Y not through Z on the causal graph G, regardless of the causal model.
2.2 Pure Direct Effects (PDEs) In a counterfactual model, Robins and Greenland (1992) (hereafter R&G) defined the individual pure direct effect (PDE) of a (dichotomous) exposure X on Y relative to an intermediate variable Z to be Y(x ¼ 1, Z(x ¼ 0)) Y(x ¼ 0). That is, the individual PDE is the subject’s value of Y under exposure to X had, possibly contrary to fact, X’s effect on the intermediate Z been blocked (i.e., had Z remained at its value under non-exposure) minus the value of Y under non-exposure to X. The individual PDE can also be written as Y(x ¼ 1, Z(x ¼ 0)) Y(x ¼ 0, Z(x ¼ 0)), since Y(x ¼ 0) ¼ Y(x ¼ 0, Z(x ¼ 0)). Thus the PDE contrast measures the direct effect of X on Y when Z is set to its value Z(x ¼ 0) under non-exposure to X. The average PDE is given by PDE ¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ E ½Yðx ¼ 0Þ ¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ Yðx ¼ 0; Zðx ¼ 0ÞÞ:
ð6:10Þ
Pearl (2001) adopted R&G’s definition but changed nomenclature. He referred to the pure direct effect as a ‘natural’ direct effect. Since the interint ½Y is identified from data on V under any of vention mean E½Yðx ¼ 0Þ ¼Ex¼0 the associated causal models, the PDE is identified if and only if E[Y(x ¼ 1, Z(x ¼ 0))] is identified. The data generating process given in Appendix B shows that E[Y(x ¼ 1, Z(x ¼ 0))] is not a manipulable effect relative to the graph in Figure 6.1. Further we show that E[Y(x ¼ 1, Z(x ¼ 0))] is not identified under an MCM or FFRCISTG model from data on V in the absence of further untestable assumptions. However, we shall see that E[Y(x ¼ 1, Z(x ¼ 0))] is identified under the NPSEM associated with the graph in Figure 6.1. Under the agnostic causal model, the concept of pure direct effect is not defined since the counterfactual Y(x ¼ 1, Z(x ¼ 0)) is not assumed to exist.
2.3 Principal Stratum Direct Effects (PSDEs) In contrast to the control direct effect and pure direct effect, the individual principal stratum direct effect (PSDE) is defined only for subjects for whom X has no causal effect on Z so that Z(x ¼ 1) ¼ Z(x ¼ 0). For a subject with
6 Alternative Graphical Causal Models
121
Z(x ¼ 1) ¼ Z(x ¼ 0) ¼ z, the individual principal stratum direct effect is defined to be Yðx ¼ 1; zÞ Yðx ¼ 0; zÞ (here, X is assumed to be binary). The average PSDE in principal stratum z is defined to be PSDEðzÞ E½Yð1;zÞ Yð0;zÞ j Zð1Þ ¼Zð0Þ ¼z: Robins (1986, Sec. 12.2) first proposed using PSDE(z) to define causal effects. In his article, Y ¼ 1 denoted the indicator of death from a cause of interest (subsequent to a time t), Z ¼ 0 denoted the indicator of survival until t from competing causes, and the contrast PSDE(z) was used to solve the problem of censoring by competing causes of death in defining the causal effect of the treatment X on the cause Y. Rubin (1998) and Frangakis and Rubin (1999, 2002) later used this same contrast to solve precisely the same problem of ‘‘censoring by death.’’ Finally, the analysis of Rubin (2004) was also based on this contrast, except that Z and Y were no longer assumed to be failure-time indicators. The argument given below in Sec. 4 to prove that E[Y(x ¼ 1, Z(x ¼ 0))] is not a manipulable effect relative to the graph in Figure 6.1 also proves that PSDE(z) is not a manipulable effect relative to this graph. Furthermore, the PSDE(z) represents a causal contrast on a non-identifiable subset of the study population — the subset with Z(1) ¼ Z(0) ¼ z. An even greater potential problem with the PSDE is that if X has an effect on every subject’s Z, then PSDE(z) is undefined for every possible z. If Z is continuous and/or multivariate, it would not be unusual for X to have an effect on every subject’s Z. Thus, Z is generally chosen to be univariate and discrete with few levels, often binary when PSDE(z) is the causal contrast. However, principal stratum direct effects have the potential advantage of remaining well-defined even when controlled direct effects or pure direct effects are ill-defined. Note that for a subject with Z(x ¼ 1) ¼ Z(x ¼ 0) ¼ z, we have Y(x ¼ 1, z) ¼ Y(x ¼ 1, Z(x ¼ 1)) Y(x ¼ 1) and Y(x ¼ 0, z) ¼ Y(0, Z(0)) Y(x ¼ 0), so the individual PSDE for this subject is Y(x ¼ 1) Y(x ¼ 0). The average PSDE is given by: PSDEðzÞ ¼ E ½Yðx ¼ 1Þ Yðx ¼ 0Þ j Zð1Þ ¼ Zð0Þ ¼ z: Thus, PSDE’s can be defined in terms of the counterfactuals Y(x) and Z(x). Now, in a trial where X is randomly assigned but the intermediate Z is not, there will generally be reasonable agreement as to the hypothetical intervention (i.e., closest possible world) which sets X to x so Y(x) and Z(x) are well defined; however, there may not be reasonable agreement
122
Causality and Psychopathology
as to the hypothetical intervention which sets X to x and Z to z, in which case Y(x, z) will be ill-defined. In that event, controlled and pure direct effects are ill-defined, but one can still define PSDE(z) by the previous display. However, when Y(x, z) and thus CDEs and PDEs are ill-defined, and therefore use of the PSDE(z) is proposed, it is often the case that (i) the intermediate variable that is truly of scientific and policy relevance — say, Z* — is many leveled, even continuous and/or multivariate, so PSDE(z*) may not exist for any z*, and (ii) Z is a coarsening (i.e. a function) of Z*, chosen to ensure that PSDE(z) exists. In such settings, the counterfactual Y(x, z*) is frequently meaningful because the hypothetical intervention which sets X to x and Z* to z* (unlike the intervention that sets X to x and Z to z) is welldefined. Furthermore, the CDE(z*) and PDE based on Z*, in contrast to the PSDE(z), provide knowledge of the pathways or mechanisms by which X causes Y and represent the effects of interventions of public-health importance. In such a setting, the direct effect contrasts of primary interest are the CDE(z*) and the PDE based on Z* rather than the PSDE(z) based on a binary coarsening Z of Z*. See Robins, Rotnitzky, and Vansteelandt (2007); Robins et al. (2009) for an example and further discussion.
3 Identification of The Pure Direct Effect We have seen that the CDE(z), as a manipulable parameter relative to the graph in Figure 6.1, is generally identified from data on V under all four of the causal models associated with this graph. We next consider identification of E[Y(x ¼ 1, Z(x ¼ 0))] and, thus, identification of the PDE in three important examples. The first two illustrate that the PDE may be identified in the NPSEM associated with a DAG but not by the associated MCMs or FFRCISTG models. In the third example the PDE is not identified under any of the four causal models associated with the DAG. We will elaborate these examples in subsequent sections.
3.1 Identification of the PDE in the DAG in Figure 6.1 Pearl (2001) proved that under the NPSEM associated with the causal DAG in Figure 6.1 E[Y(x ¼ 1, Z(x ¼ 0))] is identified. To see why, note that if Yðx ¼ 1; zÞ ? ? Zðx ¼ 0Þ for all z;
ð6:11Þ
then E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼
X z
int int Ex¼1;z ½Y fx¼0 ðzÞ;
ð6:12Þ
6 Alternative Graphical Causal Models
123
because E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼
X
E ½Yðx ¼ 1; zÞ j Zðx ¼ 0Þ ¼ zP½Zðx ¼ 0Þ ¼ z
z
¼
X
E ½Yðx ¼ 1; zÞP½Zðx ¼ 0Þ ¼ z;
z
where the first equality is by the laws of probability and the second by (6.11). Now, the right side of Equation (6.12) is non-parametrically identified from int ½Y f (v) under all four causal models since the intervention parameters Ex;z int and fx ðzÞ are identified by the g-functional. In particular, with Figure 6.1 as the causal DAG, X z
int int Ex¼1;z ½Y fx¼0 ðzÞ ¼
X
E ½Y j X ¼ 1; Z ¼ zf ðz j X ¼ 0Þ:
ð6:13Þ
z
Hence, it remains only to show that (6.11) holds for an NPSEM corresponding to the graph in Figure 6.1. Now, we noted in Example 1 that Y(x ¼ 1, z) ? ? Z(x ¼ 0) | X ¼ j held for j ¼ 0 and j ¼ 1 for the NPSEM (but not for the FFRCISTG) associated with the DAG in Figure 6.1. Further, for this NPSEM, {Y(x ¼ 1, z), Z(x ¼ 0)} ? ? X. Combining, we conclude that (6.11) holds. In contrast, for an FFRCISTG model or MCM corresponding to Figure 6.1, E[Y(x ¼ 1, Z(x ¼ 0))] is not identified, because condition (6.11) need not hold. In Appendix C we derive sharp bounds for the PDE under the assumption that the FFRCISTG model or the MCM associated with graph G holds. We find that these bounds may be quite informative, even though the PDE is not (point) identified under this model.
3.2 The ‘Natural Direct Effect’ of Didelez, Dawid & Geneletti Didelez, Dawid, and Geneletti (2006) (hereafter referred to as DDG) discuss an effect that they refer to as the ‘natural direct effect’ and prove it is identified under the agnostic causal model associated with the DAG in Figure 6.1, the difference between Equation (6.13) and E[Y | X ¼ 0] being the identifying formula. Since the parameter we have referred to as the natural or pure direct effect is not even defined under the agnostic model, it is clear they are giving the same name to a different parameter. Thus, DDG’s results have no relevance to the identification of the PDE. To clarify, we discuss DDG’s results in greater detail. To define DDG’s parameter, let R ¼ (X, Z) and consider a regime pR p(X ¼ j,Z) with p(x) ¼ 1 if and only if x ¼ j and with a given p(z | x) ¼ p*(z) that does not depend on X. ðvÞ ¼ fpint ðx; y; zÞ is the density in a hypothetical study where Then, fpint ðx¼j;zÞ ðx¼j;zÞ
Causality and Psychopathology
124
each subject receives X ¼ j and then is randomly assigned Z based on the density p*(z). DDG define the natural direct effect to be int EpintðX¼1;ZÞ ½Y EpintðX¼0;ZÞ ½Y with p*(z) equal to the density fx¼0 ðzÞ of Z when X int int is set to 0, provided EpðX¼0;ZÞ ½Y is equal to Ex¼0 ½Y, the mean of Y when all int subjects are untreated. When EpintðX¼0;ZÞ ½Y 6¼ Ex¼0 ½Y, they say their natural direct effect is undefined. Now, under the agnostic causal model associated with the DAG in Figure 6.1, it follows from Extended Lemma 6 that int ½Y ¼ E½YjX ¼ 0 and EpintðX¼1;ZÞ ½Y is given by the right side EpintðX¼0;ZÞ ½Y ¼ Ex¼0 of Equation (6.13), confirming DDG’s claim about their parameter EpintðX¼1;ZÞ ½Y EpintðX¼0;ZÞ ½Y. In contrast, our PDE parameter is given by the difference between Equation (6.13) and E[Y | X ¼ 0] only when E[Y(x ¼ 1, Z(x ¼ 0))] equals Equation (6.13), which cannot be the case under an agnostic causal DAG model as E[Y(x ¼ 1, Z(x ¼ 0))] is then undefined. Note E[Y(x ¼ 1, Z(x ¼ 0))] does equal Equation (6.13) under the NPSEM associated with Figure 6.1 but not under the MCM or FFRCISTG model associated with this Figure.
3.3 Identification of the PDE with a Measured Common Cause of Z and Y that Is Not Directly Affected by X Consider the causal DAG in Figure 6.2(a) that differs from the DAG in Figure 6.1 in that it assumes (in the context of our smoking study example) there is a measured common cause L of hypertension Z and MI Y that is not caused by X. Suppose we assume an NPSEM with V ¼ (X, L, Z, Y) and our goal remains estimation of E[Y(x ¼ 1, Z(x ¼ 0))]. Then E[Y(x ¼ 1, Z(x ¼ 0))]. remains identified under the NPSEM associated with the DAG in Figure 6.2(a) with the identifying formula now X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ:
z;l
X
Z
(a)
Y
L
X
Z
(b)
Y
L
Figure 6.2 An elaboration of the DAG in Figure 6.2 in which L is a (measured) common cause of Z and Y.
6 Alternative Graphical Causal Models
125
This follows from the fact that under an NPSEM associated with the DAG in Figure 6.2(a), Yðx ¼ 1; zÞ ? ? Zðx ¼ 0Þ j L
for all z;
ð6:14Þ
which in turn implies E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼
X
int int Ex¼1;z ½Y j L ¼ l fx¼0 ðz j L ¼ lÞf ðlÞ:
ð6:15Þ
z;l
The right side of (6.15) remains identified under all four causal models via X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ:
ð6:16Þ
z;l
In contrast, for an MCM or FFRCISTG associated with the graph in Figure 6.2(a), E[Y(x ¼ 1, Z(x ¼ 0))] is not identified because (6.14) need not hold.
3.4 Failure of Identification of the PDE in an NPSEM with a Measured Common Cause of Z and Y that Is Directly Affected by X Consider the causal DAG shown in Figure 6.2(b) that differs from that in Figure 6.2(a) only in that X now causes L so that there exists an arrow from X to L. The right side of Equation (6.15) remains identified under all four causal models via X
E ½YjX ¼ 1; Z ¼ z; L ¼ l f ðzjX ¼ 0; L ¼ lÞf ðljX ¼ 0Þ:
z;l
Under an NPSEM, MCM, or FFRCISTG model associated with this causal DAG Y(x ¼ 1, Z(x ¼ 0)) is by definition Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0ÞÞ ¼ Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ:
ð6:17Þ
Avin et al. (2005) prove that Equation (6.14) does not hold for this NPSEM. Thus, even under an NPSEM, we cannot conclude that Equation (6.15) holds. In fact, Avin et al. (2005) prove that for this NPSEM E[Y(x ¼ 1, Z(x ¼ 0))] is not identified from data on V. This is because the expression on the righthand side of Equation (6.17) involves both L(x ¼ 1) and L(x ¼ 0), and there is no way to eliminate either.
Causality and Psychopathology
126
Additional Assumptions Identifying the PDE in the NPSEM Associated with the DAG in Figure 6.2(b) However, if we were to consider a counterfactual model that imposes even more counterfactual independence assumptions than the NPSEM, then the PDE may still be identified, though by a different formula. For example, if, in addition to the usual NPSEM independence assumptions, we assume that Lðx ¼ 0Þ ? ? Lðx ¼ 1Þ
ð6:18Þ
then we have
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ X E½Yðx ¼ 1; l; zÞ j Lðx ¼ 1Þ ¼ l; Zðx ¼ 0; l Þ ¼ z; Lðx ¼ 0Þ ¼ l ¼ l ;l;z
f ðLðx ¼ 1Þ ¼ l; Zðx ¼ 0; l Þ ¼ z; Lðx ¼ 0Þ ¼ l Þ ¼
X
E½Yðx ¼ 1; l; zÞ f ðLðx ¼ 0Þ ¼ l ; Lðx ¼ 1Þ ¼ lÞf ðZðx ¼ 0; l Þ ¼ zÞ
l ;l;z
¼
X
E½Yðx ¼ 1; l; zÞ f ðLðx ¼ 0Þ ¼ l Þf ðLðx ¼ 1Þ ¼ lÞf ðZðx ¼ 0; l Þ ¼ zÞ
l ;l;z
¼
X
E½YjX ¼ 1; L ¼ l; Z ¼ z f ðL ¼ l jX ¼ 0Þf ðL ¼ ljX ¼ 1Þ
l ;l;z
f ðZ ¼ zjX ¼ 0; L ¼ l Þ:
ð6:19Þ
Here, the second and fourth equalities follow from the usual NPSEM independence restrictions but the third requires condition (6.18). One setting under which (6.18) holds is that in which the counterfactual variables L(0) and L(1) result from a restrictive ‘minimal sufficient cause model’ (Rothman, 1976) such as
LðxÞ ¼ ð1 xÞA0 þ xA1
ð6:20Þ
where A0 and A1 are independent both of one another and of all other counterfactuals. Note that (6.18) would not hold if the right-hand side of Equation (6.20) was (1 x)A0 + xA1 + A2, even if the Ai’s were again assumed to be independent (VanderWeele & Robins, 2007). An alternative further assumption, sufficient to identify the PDE in the context of the NPSEM associated with Figure 6.2(b), is that L(1) is a
6 Alternative Graphical Causal Models
127
deterministic function of L(0), i.e., L(1) ¼ g(L(0)) for some function g(). In this case, we have: f ðLðx ¼ 0Þ ¼ l ; Lðx ¼ 1Þ ¼ lÞ ¼ f ðLðx ¼ 0Þ ¼ l ÞIðl ¼ gðl ÞÞ ¼ f ðL ¼ l jX ¼ 0ÞIðl ¼ gðl ÞÞ; where I() is the indicator function. Hence
E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ X ¼ E½YjX ¼ 1; L ¼ l; Z ¼ z f ðL ¼ l jX ¼ 0ÞIðl ¼ gðl ÞÞ l ;l;z
ð6:21Þ
f ðZ ¼ zjX ¼ 0; L ¼ l Þ: For a scalar L taking values in a continuous state-space there will exist a function g() such that Lð1Þ ¼gðLð0ÞÞ under the condition of rank preservation, that is, if L i ð0Þ 5 L j ð0Þ
)
L i ð1Þ 5 L j ð1Þ;
for all individuals i, j. In this case g is simply the quantile-quantile function: 1 1 FLð0Þ ðlÞ ¼ FLjX¼1 FLjX¼0 ðlÞ ; ð6:22Þ gðlÞ FLð1Þ where F() and F 1() indicate the cumulative distribution function (CDF) and its inverse; the equality follows from the NPSEM assumptions; this expression shows that gð Þ is identified. (Since L is continuous, the sums over l, l* in Equation (6.21) are replaced by integrals.) A special case of this example is a linear structural equation system, where it was already known that the PDE is identified in the graph in Figure 6.2(b). Our analysis shows that identification of the PDE in this graph merely requires rank preservation and not linearity. Note that a linear structural equation model implies both rank preservation and linearity. We note that the identifying formula in Equation (6.21) differs from Equation (6.19). Since neither identifying assumption imposes any restriction on the distribution of the factual variables in the DAG in Figure 6.2(b), there is no empirical basis for deciding which, if either, of the assumptions is true. Consequently, we do not advocate blithely adopting such assumptions in order to preserve identification of the PDE in contexts such as the DAG in Figure 6.2(b).
128
Causality and Psychopathology
4 Models in which the PDE Is Manipulable We now turn to the question of whether E[Y(x ¼ 1, Z(x ¼ 0))] can be identified by intervening on the variables V on G in Figure 6.1. Now, as noted by R&G (1992) we could observe E[Y(x ¼ 1, Z(x ¼ 0))] if we could intervene and set X to 0, observe Z(0), then ‘‘return each subject to their pre-intervention state,’’ intervene to set X to 1 and Z to Z(0), and finally observe Y(1, Z(0)). However, such an intervention strategy will usually not exist because such a return to a pre-intervention state is usually not possible in a real-world intervention (e.g., suppose the outcome Y were death). As a result, because we cannot observe the same subject under both X ¼ 1 and X ¼ 0, we are unable to directly observe the distribution of mixed counterfactuals such as Y(x ¼ 1, Z(x ¼ 0)). It follows that we cannot observe E[Y(x ¼ 1, Z(x ¼ 0))] by any intervention on the variables X and Z. (Pearl, 2001) argues similarly. That is, although we can verify through intervention the prediction made by all four causal models that the right-hand side of Equation (6.13) is equal to the expression on the right-hand side of Equation (6.12), we cannot verify, by intervention on X and Z, the NPSEM prediction that Equation (6.12) holds. Thus E[Y(x ¼ 1, Z(x ¼ 0))] is not manipulable with respect to the graph in Figure 6.1, and hence neither is the PDE with respect to this graph. Yet both of these parameters are identified in the NPSEM associated with this graph. This would be less problematic if these parameters were of little or no substantive interest. However, as shown in the next section, Pearl convincingly argues that such parameters can be of substantive importance.
4.1 Pearl’s Substantive Motivation for the PDE Pearl argues that the PDE and the associated quantity E[Y(x ¼ 1, Z(x ¼ 0))] are often causal contrasts of substantive and public-health importance by offering examples along the following lines. Suppose a new process can completely remove the nicotine from tobacco, allowing the production of a nicotine-free cigarette to begin next year. The substantive goal is to use already collected data on smoking status X, hypertensive status Z and MI status Y from a randomized smoking-cessation trial to estimate the incidence of MI in smokers were all smokers to change to nicotine-free cigarettes. Suppose it is (somehow?) known that the entire effect of nicotine on MI is through its effect on hypertensive status, while the non-nicotine toxins in cigarettes have no effect on hypertension. Then, under the further assumption that there do not exist unmeasured confounders for the effect of hypertension on MI, the causal DAG in Figure 6.1 can be used to represent
6 Alternative Graphical Causal Models
129
the study. Under these assumptions, the MI incidence in smokers of cigarettes free of nicotine would be E[Y(x ¼ 1, Z(x ¼ 0))] under all three counterfactual causal models since the hypertensive status of smokers of nicotine-free cigarettes will equal their hypertensive status under non-exposure to cigarettes. Pearl then assumes an NPSEM and concludes that E[Y(x ¼ 1, Z(x ¼ 0))] equals z E½YjX ¼ 1;Z ¼ z f ðzjX ¼ 0Þ, and the latter quantity can be estimated from the already available data. What is interesting about Pearl’s example is that to argue for the substantive importance of the non-manipulable parameter E[Y(x ¼ 1, Z(x ¼ 0))], he tells a story about the effect of a manipulation — a manipulation that makes no reference to Z at all. Rather, the manipulation is to intervene to eliminate the nicotine component of cigarettes. Indeed, the most direct representation of his story is provided by the extended DAG in Figure 6.3 with V ¼ (X, N, O, Z, Y) where N is a binary variable representing nicotine exposure, O is a binary variable representing exposure to the non-nicotine components of a cigarette, and (X, Z, Y) are as defined previously. The bolded arrows from X to N and O indicate a deterministic relationship. Specifically in the factual data, with probability one under f (v), either one smokes normal cigarettes so X ¼ N ¼ O ¼ 1 or one is a nonsmoker (i.e. ex-smoker) and X ¼ N ¼ O ¼ 0. In this representation the int ½Y of Y had, contrary to fact, all parameter of interest is the mean En¼0;o¼1 int subjects only been exposed to the non-nicotine components. As En¼0;o¼1 ½Y is int int a function of fn¼0;o¼1 ðvÞ, we conclude that En¼0;o¼1 ½Y is a manipulable causal effect relative to the DAG in Figure 6.3. Further, Pearl’s story gives no reason to believe that there is any confounding for estimating this effect. In Appendix B we present a scenario that differs from Pearl’s in which int ½Y is confounded, and thus, none of the four causal models assoEn¼0;o¼1 ciated with Figure 6.3 can be true (even though the FFRCISTG and agnostic causal models associated with Figure 6.1 are true). In contrast, under Pearl’s scenario it is reasonable to take any of the four causal models, including the agnostic model, associated with Figure 6.3
X
N
Z
O
Y
Figure 6.3 An elaboration of the DAG in Figure 6.1; N and O are, respectively, the nicotine and non-nicotine components of tobacco; thicker edges indicate deterministic relations.
130
Causality and Psychopathology
int as true. Under such a supposition En¼0;o¼1 ½Y is identified if En¼0;o¼1 ½Y is a well-defined function of f (v). Note, under f (v), data on (X, Z, Y) are equivalent to data on V ¼ (X, N, O, Z, Y), since X completely determines O and N in the factual data. We now show that, with Figure 6.3 as the causal DAG and int ½Y is identified V ¼ (X, N, O, Z, Y), under all four causal models, En¼0;o¼1 simply by applying the g-formula density in standard fashion. This result may seem surprising at first since no subject in the actual study data followed the regime (n ¼ 0, o ¼ 1), so the standard positivity assumption P[N ¼ 0, O ¼ 1] > 0 usually needed to make the g-formula density fn¼0,o¼1(v) a function of f (v) (and thus identifiable) fails. However, as we now demonstrate, even without positivity, the conditional independences implied by the assumptions of no direct effect of N on Y and no effect of O on Z encoded in the missing arrows from N to Y and O to Z in Figure 6.3 along with the deterministic relationship between O, N, and X under f (v) allow one to obtain identification. Specifically, under the DAG in Figure 6.3,
fn¼0;o¼1 ðy; zÞ ¼ f ðy j O ¼ 1; zÞ f ðz j N ¼ 0Þ ¼ f ðy j O ¼ 1; N ¼ 1; zÞ f ðz j N ¼ 0; O ¼ 0Þ ¼ f ðy j X ¼ 1; zÞ f ðz j X ¼ 0Þ;
where the first equality is by definition of the g-formula density fn¼0,o¼1(y, z), the second by the conditional independence relations encoded in the DAG in Figure 6.3, and the last by the deterministic relationships between O, N, and X under f (v) with V ¼ (X, N, O, Z, Y). Thus X En¼0;o¼1 ½Y y fn¼0;o¼1 ðy; zÞ y;z
¼
X
y f ðy j X ¼ 1; zÞf ðz j X ¼ 0Þ
y;z
X
E ½Y j X ¼ 1; Z ¼ z f ðz j X ¼ 0Þ;
z
which is a function of f (v) with V ¼ (X, N, O, Z, Y). Note that this argument goes through even if Z and/or Y are non-binary, continuous variables. The Role of the Extended Causal Model in Figure 6.3 The identifying formula under all four causal models associated with the DAG in Figure 6.3 is the identifying formula Pearl obtained when representing the problem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under the NPSEM associated with the DAG in Figure 6.1.
6 Alternative Graphical Causal Models
131
For Pearl, having at the outset assumed an NPSEM associated with the DAG in Figure 6.1, the story did not contribute to identification; rather, it served only to show that the non-manipulable parameter E[Y(x ¼ 1, Z(x ¼ 0))] of the NPSEM associated with the DAG in Figure 6.1 could, under the scenario of our story, encode a substantively important parameter — the manipulable causal effect of setting N to 0 and O to 1 on the extended causal model associated with the DAG in Figure 6.3. However, from the refutationist point of view, it is the story itself that make’s Pearl’s claim that E½Yðx ¼ 1;Zðx ¼ 0ÞÞ ¼z E½YjX ¼ 1;z f ðzjO ¼ 1Þ refutable and, thus, scientifically meaningful. Specifically, when nicotine-free cigarettes become available, Pearl’s claim can be tested by an intervention that forces a random sample of the population to smoke nicotine-free cigarettes. For someone willing to entertain only an agnostic causal model, the information necessary to identify the effect of nicotine-free cigarettes was contained in the story as the parameter E[Y(x ¼ 1, Z(x ¼ 0))] is undefined without the story. [Someone, such as Dawid (2000), opposed to counterfactuals and thus wedded to the agnostic causal model, might then reasonably and approint int int int priately choose to define En¼0;o¼1 ½Y En¼0;o¼0 ½Y ¼ En¼0;o¼1 ½Y Ex¼0 ½Y to be the natural or pure direct effect of X not through Z. This definition differs from, and in our view is preferable to, the definition of DDG (2006) discussed previously: The definition of DDG fails to correspond to the concept of the PDE as used in the literature since its introduction in Robins and Greenland (1992).] For an analyst who had assumed that the MCM, but not necessarily the NPSEM, associated with the DAG in Figure 6.1 was true, the information contained in the above story licenses the assumption that the MCM associated with Figure 6.3 holds. This latter assumption can be used in two alternative ways, both leading to the same identifying formula. First, it leads via Lemma 6 to the above g-functional analysis also used by the agnostic model advocate. Second, as we next show, it can be used to prove that : (6.11) holds, allowing identification to proceed a` la Pearl (2001). The Role of Determinism Consider an MCM associated with the DAG in Figure 6.3 with node set V ¼ (X, N, O, Z, Y). It follows from the fact that X ¼ N ¼ O with probability (w.p.) 1 that the condition that N(x) ¼ O(x) ¼ x w.p. 1 also holds. However, for pedagogic purposes, suppose for the moment that the condition N(x) ¼ O(x) ¼ x w.p. 1 does not hold. For expositional simplicity we assume all variables are binary so our model is also an FFRCISTG model. Then, V0 ¼ X, V1(v0) ¼ N(x), and V4 ðv 3 Þ ¼ V4 ðv2 ;v3 Þ. V2 ðv0 ;v1 Þ ¼ V2 ðv1 Þ ¼ OðxÞ;V3 ðv 2 Þ ¼ V3 ðv1 Þ ¼ ZðnÞ;
132
Causality and Psychopathology
By Theorem 1, {Y(o, z), Z(n), O(x), N(x)} are mutually independent. However, because we are assuming an FFRCISTG model and not an NPSEM, we cannot conclude that O(x) ? ? N(x*) for x 6¼ x*. Consider the induced counterfactual models for the variables (X, Z, Y) obtained from our FFRCISTG model by marginalizing over (N, O). Because N and O each has only a single child on the graph in Figure 6.3, the counterfactual model over (X, Z, Y) is the FFRCISTG associated with the complete graph of Figure 6.1, where the one-step-ahead counterfactuals Z(1)(x), Y(1) (x, z) associated with Figure 6.1 are obtained from the counterfactuals {Y(o, z), Z(n), O(x), N(x)} associated with Figure 6.3 by Z(1)(x) ¼ Z(N(x)), Y(1)(x, z) ¼ Y(O(x), z). Here, we have used the superscript ‘(1)’ to emphasize the graph with respect to which Z(1)(x) and Y(1)(x, z) are one-step-ahead counterfactuals. We cannot conclude that Z(1)(0) ¼ Z(N(0)) and Y(1)(1, z) ¼ Y(O(1), z) are independent, even though Z(n) and Y(o, z) are independent because, as noted above, the FFRCISTG model associated with Figure 6.3 does not imply independence of O(1) and N(0). Suppose now we re-instate the deterministic constraint that N(x) ¼ O(x) ¼ x w.p. 1. Then, we conclude that O(x) is independent of N(x*), since both variables are constants. It then follows that Z(1)(0) and Y(1)(1, z) are independent and, thus, that (6.11) holds and E[Y(1)(1, Z(1)(0))] is identified. The Need for Conditioning on Events of Probability Zero In our argument that, under the deterministic constraint that N(x) ¼ O(x) ¼ x w.p. 1, the FFRCISTG associated with the DAG in Figure 6.3 implied condition (6.11), the crucial step was the following: By Theorem 1, the independences in condition (6.1) that define an FFRCISTG imply that Y(o, z) and Z(n) are independent for n ¼ 0 and o ¼ 1. In this section, we show that had we modified (6.1), and thus our definition of an FFRCISTG, by restricting to conditioning events V m1 ¼ v m1 that have a positive probability under f (v), then Theorem 1 would not hold for non-positive densities f (v). Specifically, if f (v) is not positive, the modified version of condition (6.1) does not imply condition (6.6); furthermore, the set of independences implied by a modified FFRCISTG associated with a graph G could differ for different orderings of the variables consistent with the descendant relationships on the graph. Specifically, we now show that for the modified FFRCISTG associated with Figure 6.3 and the ordering (X, N, O, Z, Y), we cannot conclude that Y(x, n, o, z) ¼ Y(o, z) and Z(x, n, o) ¼ Z(n) are independent for n ¼ 0 and o ¼ 1 and, thus, that condition (6.11) holds. However, the modified FFRCISTG with the alternative ordering (X, N, Z, O, Y) does imply Y(o, z) ? ? Z(n). First, consider the modified FFRCISTG associated with Figure 6.3 and ordering
6 Alternative Graphical Causal Models
133
(X, N, O, Z, Y ) under the deterministic constraint N(x) ¼ O(x) ¼ x w.p. 1. The unmodified condition (6.1) implies the set of independences Yðn; o; zÞ ? ? Zðn; oÞ j X ¼ x; NðxÞ ¼ n; OðxÞ ¼ o; for fz; x; n; o 2 f0; 1gg: The modified condition (6.1) implies only the subset corresponding to {x, z 2 {0, 1}; n ¼ o ¼ x} since the event {N(x) ¼ j, O(x) ¼ 1 j, j 2 {0, 1}} has probability 0. As a consequence, we can only conclude that Y(n, o, z) ¼ Y(o, z) ? ? Z(n) for o ¼ n. In contrast, for the modified FFRCISTG associated with Figure 6.3 and the ordering V ¼ (X, N, Z, O, Y), the deterministic constraint N(x) ¼ O(x) ¼ x w.p. 1 implies Y(o, z) ? ? Z(n) for n ¼ 0 and o ¼ 1 as follows: By Equation (6.1) and the fact that Y(x, n, z, o) ¼ Y(o, z) and Z(x, n) ¼ Z(n), we have, without having to condition on an event of probability 0, that Yðo; zÞ; ZðnÞ ? ? X for z; o; n 2 f0; 1g; Yðo; zÞ ? ? ZðnÞ j X ¼ x; NðxÞ ¼ n for x; z; o 2 f0; 1g and n ¼ x:
ð6:23Þ ð6:24Þ
? Z(n ¼ x) | X ¼ x for x, z, o 2 { 0, 1} as X ¼ x However, (6.23) implies Y(o, z) ? is the same event as X ¼ N(x) ¼ x. Thus, Y(o, z) ? ? Z(n) for n, z, o 2 {0, 1} by (6.23). The heuristic reason that for the ordering V ¼ (X, N, Z, O, Y) we must condition on events of probability zero in condition (6.1) in order to prove (6.11) is that such conditioning is needed to instantiate the assumption that O has no effect on Z; if we do not allow conditioning on events of probability zero, the FFRCISTG model with this ordering does not instantiate this assumption because O and N are equal with probability one and, thus, we can substitute O for N as the cause Z. Under the ordering V ¼ (X, N, Z, O, Y) in which O is subsequent to Z, it was not necessary to condition on events of probability zero in (6.1) to instantiate this assumption as the model precludes later variables in the ordering from being causes of earlier variables; thus, O cannot be a cause of Z. The above example demonstrates that the assumption that Equation (6.1) holds even when we condition on events of probability zero can place independence restrictions on the distribution of the counterfactuals over and above those implied by the assumption that Equation (6.1) holds when the conditioning events have positive probability. One might wonder how this could be so; it is usually thought that different choices for probabilities conditional on events of probability zero have no distributional implications. The following simple canonical example that makes no reference to causality
Causality and Psychopathology
134
or counterfactuals clarifies how multiple distributional assumptions conditional on events of probability zero can place substantive restrictions on a distribution. Example 4 Suppose we have random variables (X, Y, R) where R ¼ 1 w.p. 1. Suppose we assume both that (i) f (x, y | R ¼ 0) ¼ f (x, y) and (ii) f (x, y | R ¼ 0) ¼ f (x | R ¼ 0) f (y | R ¼ 0). Then we can conclude that f (x, y) ¼ f (x | R ¼ 0) f (y | R ¼ 0) and, thus, that X and Y are independent since the joint density f (x, y) factors as a function of x times a function of y. The point is that although neither assumption (i) nor assumption (ii) alone restricts the joint distribution of (X, Y); nonetheless, together they impose the restriction that X and Y are independent. Inclusion of a Measured Common Cause of Z and Y A similar elaboration may be given for the causal DAG in Figure 6.2(a). The extended causal DAG represented by our story would then be the DAG in Figure 6.4. Under any of our four causal models, fn¼0;o¼1 ðy; z; lÞ ¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðlÞ ¼ f ðy j O ¼ 1; N ¼ 1; z; lÞf ðz j N ¼ 0; O ¼ 0; lÞf ðlÞ ¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðlÞ: Hence,
En¼0;o¼1 ½Y ¼
X
E ½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ;
z;l
which is the identifying formula Pearl obtained when representing the problem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under an NPSEM associated with the DAG in Figure 6.2(a).
X
N
Z
L
O
Y
Figure 6.4 The graph from Figure 6.3 with, in addition, a measured common cause (L) of the intermediate Z and the final response Y.
6 Alternative Graphical Causal Models
135
Summary We believe in some generality that whenever a particular causal effect is (a) identified from data on V under an NPSEM associated with a DAG G with node set V (but is not identified under the associated MCM, FFRCISTG model, or agnostic causal model) and (b) can expressed as the effect of an intervention on certain variables (which may not be elements of V ) in an identifiable sub-population, then that causal effect is also identified under the agnostic causal DAG model based on a DAG G0 with node set V 0, a superset of V. To find such an identifying causal DAG model G0, it is generally necessary to make the variables in V 0 \V deterministic functions of the variables in V. The above examples based on extended DAGs in Figures 6.3 and 6.4 are cases in point; see Robins, VanderWeele, and Richardson (2007), Geneletti and Dawid (2007), and Appendix A for such a construction for the effect of treatment on the treated.
4.2 An Example in which an Interventional Interpretation of the PDE is more Controversial The following example shows that the construction of a scientifically plausible story under which the PDE can be regarded as a manipulable contrast relative to an expanded graph G0 may be more controversial than our previous example would suggest. After presenting the example, we briefly discuss its philosophical implications. Suppose nicotine X was the only chemical found in cigarettes that had an effect on MI but that nicotine produced its effects by two different mechanisms. First, it increased blood pressure Z by directly interacting with a membrane receptor on blood pressure control cells located in the carotid artery in the neck. Second, it directly caused atherosclerotic plaque formation and, thus, an MI by directly interacting with a membrane receptor of the same type located on the endothelial cells of the coronary arteries of the heart. Suppose the natural endogenous ligand produced by the body that binds to these receptors was nicotine itself. Finally, assume that exogenous nicotine from cigarettes had no causal effect on the levels of endogenous nicotine (say, because the time-scale under study is too short for homeostatic feedback mechanisms to kick in) and we had precisely measured levels of endogenous nicotine L before randomizing to smoking or not smoking (X). Suppose that, based on this story, an analyst posits that the NPSEM associated with the graph in Figure 6.2(a) with V ¼ (X, Z, Y, L) is true. As noted in Section 3.3, under this supposition E[Y(x ¼ 1, Z(x ¼ 0))] is identified via z;l E½YjX ¼ 1;Z ¼z;L ¼ l f ðzjX ¼ 0;L ¼ lÞf ðlÞ.
Causality and Psychopathology
136
Can we express E[Y(x ¼ 1, Z(x ¼ 0))] as an effect of a scientifically plausible intervention? To do so, we must devise an intervention that (i) blocks the effect of exogenous nicotine on the receptors in the neck without blocking the effect of exogenous nicotine on the receptors in the heart but (ii) does not block the effect of endogenous nicotine on the receptors in either the neck or heart. To accomplish (i), one could leverage the physical separation of the heart and the neck to build a ‘‘nano-cage’’ around the blood pressure control cells in the neck that prevents exogenous nicotine from reaching the receptors on these cells. However, because endogenous and exogenous nicotine are chemically and physically identical, the cage would also block the effect of endogenous nicotine on receptors in the neck, in violation of (ii). Thus, a critic might conclude that E[Y(x ¼ 1, Z(x ¼ 0))] could not be expressed as the effect of an intervention. If the critic adhered to the slogan ‘‘no causation without manipulation’’ (i.e., causal contrasts are best thought of in terms of explicit interventions that, at least in principle, could be performed (Robins & Greenland, 2000)), he or she would then reject the PDE as a meaningful causal contrast in this context. In contrast, if the critic believed in the ontological primacy of causation, he or she would take the example as evidence for their slogan ‘‘causation before manipulation.’’ Alternatively, one can argue that the critic’s conclusion that E[Y(x ¼ 1, Z(x ¼ 0))] could not be expressed as the effect of an intervention indicates only a lack of imagination and an intervention satisfying (i) and (ii) may someday exist. Specifically, someday it may be possible to chemically attach a side group to the exogenous nicotine in cigarettes in such a way that (a) the effect of the (exogenous) chemically-modified nicotine and the effect of the unmodified nicotine on the receptors in the heart and neck are identical, while (b) allowing the placement of a ‘‘nano-cage’’ in the neck that successfully binds the side group attached to the exogenous nicotine, thereby preventing it from reaching the receptors in the neck. In that case, E[Y(x ¼ 1, Z(x ¼ 0))] equals a manipulable contrast of the extended deterministic causal DAG of Figure 6.5. In the Figure C ¼ 1 denotes that the ‘‘nano-cage’’ C
X
Rn
D
Z
L M Rh Y
Figure 6.5 An example in which an interventional interpretation of the PDE is hard to conceive; thicker edges indicate deterministic relations.
6 Alternative Graphical Causal Models
137
is present. We allow X to take three values, as before X ¼ 0 indicates no cigarette exposure, X ¼ 1 indicates exposure to cigarettes with unmodified nicotine, and X ¼ 2 indicates exposure to cigarettes with modified nicotine. Rn is the fraction of the receptors in the neck that are bound to a nicotine molecule (exogenous or endogenous) and Rh is the fraction of the receptors in the heart that are bound to a nicotine molecule. M is a variable that is 1 if and only if X 6¼ 0; D is a variable that takes the value 1 if and only if either X ¼ 1 or (X ¼ 2 and C ¼ 0). Then, E[Y(x ¼ 1, Z(x ¼ 0))] is the parameter int ½Y corresponding to the intervention described in (a) and (b). Ex¼2;c¼1 Under all four causal models associated with the graph in Figure 6.5, X
fx¼2;c¼1 ðy; z; lÞ
f ðy j rh ; zÞ f ðrh j m; lÞ f ðz j rn Þ f ðm j x ¼ 2Þ
m;d;rh ;rn
f ðrn j d; lÞ f ðd j c ¼ 1; x ¼ 2Þf ðlÞ ¼ f ðy j M ¼ 1; l; zÞ f ðz j D ¼ 0; lÞf ðlÞ ¼ f ðy j X ¼ 1; z; lÞ f ðz j X ¼ 0; lÞf ðlÞ; where the first equality uses the fact that D ¼ 0 and M ¼ 1 when x ¼ 2 and c ¼ 1 and the second uses the fact that, since in the observed data C ¼ 0 w.p. 1, D ¼ 0 if and only if X ¼ 0, and M ¼ 1 if and only if X ¼ 1 (since X 6¼ 2 w.p. 1). Thus,
Ex¼2;c¼1 ½Y ¼
X
E½Y j X ¼ 1; Z ¼ z; L ¼ l f ðz j X ¼ 0; L ¼ lÞf ðlÞ;
z;l
which is the identifying formula Pearl obtained when representing the problem as the estimation of E[Y(x ¼ 1, Z(x ¼ 0))] under an NPSEM based on the DAG in Figure 6.2(a). As noted in the Introduction, the exercise of trying to construct a story to provide an interventionist interpretation for a non-manipulable causal parameter of an NPSEM often helps one devise explicit, and sometimes even practical, interventions which can then be represented as a manipulable causal effect relative to an extended deterministic causal DAG model such as Figure 6.3.
5 Path-Specific Effects In this section we extend our results to path-specific effects. We begin with a particular motivating example.
Causality and Psychopathology
138
5.1 A Specific Example Suppose our underlying causal DAG was the causal DAG of Figure 6.2(b) in which there is an arrow from X to L. We noted above that Pearl proved E[Y(x ¼ 1, Z(x ¼ 0))] was not identified from data (X, L, Z, Y) on the causal DAG in Figure 6.2(b) even under the associated NPSEM. There exist exactly three possible extensions of Pearl’s original story that are consistent with the causal DAG in Figure 6.2(b), as shown in Figure 6.6: (a) nicotine N causes L but O does not; (b) O causes L but N does not; (c) both N and O cause L. We consider as int before the causal effect En¼0;o¼1 ½Y. Under all four causal models associated with int ½Y is identified from factual data on the graph in Figure 6.6(a), En¼0;o¼1 V ¼ (X, L, Z, Y). Specifically, on the DAG in Figure 6.6(a), we have fn¼0;o¼1 ðy; z; lÞ ¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðl j N ¼ 0Þ ¼ f ðy j O ¼ 1; N ¼ 1; z; lÞf ðz j N ¼ 0; O ¼ 0; lÞf ðl j N ¼ 0; O ¼ 0Þ ¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 0Þ; so int ½Y ¼ En¼0;o¼1
X
EðY j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 0Þ:
ð6:25Þ
l;z
X
N
Z
L
X
O
(a)
N
Z
L
O
(b)
Y
Y
T
X
N
Z
L
X
O
(c)
N
Z
L
O
Y
(d)
Y
Figure 6.6 Elaborations of the graph in Figure 6.2(b), with additional variables as described in the text; thicker edges indicate deterministic relations.
6 Alternative Graphical Causal Models
139
Similarly, under all four causal models associated with graph in Figure 6.6(b), int ½Y is identified from factual data on V ¼ (X, L, Z, Y): On the DAG in En¼0;o¼1 Figure 6.6(b) we have fn¼0;o¼1 ðy; z; lÞ ¼ f ðy j O ¼ 1; z; lÞf ðz j N ¼ 0; lÞf ðl j O ¼ 1Þ ¼ f ðy j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 1Þ so int ½Y ¼ En¼0;o¼1
X
EðY j X ¼ 1; z; lÞf ðz j X ¼ 0; lÞf ðl j X ¼ 1Þ:
ð6:26Þ
l;z int However, En¼0;o¼1 ½Y is not identified from factual data on V ¼ (X, L, Z, Y) under any of the four causal models associated with graph in Figure 6.6(c). In this graph fn¼0,o¼1(y, z, l ) ¼ f (y | O ¼ 1, z, l )f (z | N ¼ 0, l ) f (l | O ¼ 1, N ¼ 0). However, f (l | O ¼ 1, N ¼ 0) is not identified from the factual data since the event {O ¼ 1, N ¼ 0} has probability 0 under f (v). Note that the int identifying formulae for En¼0;o¼1 ½Y for the graphs in Figure 6.6(a) and (b) are different.
Relation to Counterfactuals Associated with the DAG in Figure 6.2(b) Let Y(x, l, z), Z(x, l ) and L(x) denote the one-step-ahead counterfactuals associated with the graph in Figure 6.2(b). Then, it is clear from the assumed deterministic counterfactual relation N(x) ¼ O(x) ¼ x that the parameter int ½Y ¼ E ½Yðo ¼ 1; Lðn ¼ 0Þ; Zðn ¼ 0; Lðn ¼ 0ÞÞÞ En¼0;o¼1
associated with the graph in Figure 6.6(a) can be written in terms of the counterfactuals associated with the graph in Figure 6.2(b) as E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ: Likewise, we have that the parameter int En¼0;o¼1 ½Y ¼ E ½Yðo ¼ 1; Lðo ¼ 1Þ; Zðn ¼ 0; Lðo ¼ 1ÞÞÞ
associated with the graph in Figure 6.6(b) equals E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 1ÞÞÞ in terms of the counterfactuals associated with the graph in Figure 6.2(b). In int ½Y associated with the graph in Figure 6.6(c) is not the contrast En¼0;o¼1 mean of any counterfactual defined from Y(x, l, z), Z(x, l ), and L(x) under the graph in Figure 6.2(b) since L, after intervening to set n ¼ 0, o ¼ 1, is neither L(x ¼ 1) nor L(x ¼ 0) as both imply a counterfactual for L under which n ¼ o.
140
Causality and Psychopathology
Furthermore, the parameter E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ associated with the graph in Figure 6.2(b) is not identified under any of the four causal models associated with any of the three graphs in Figure 6.6(a), (b) and (c); see Section 3.4. Thus, in summary, under an MCM or FFRCISTG model associated with the DAG in Figure 6.6(a), the extension of Pearl’s original story encoded in that DAG allows the identification of the causal effect E[Y{x ¼ 1, L(x ¼ 0), Z(x ¼ 0)}] associated with the DAG in Figure 6.2(b). Similarly, under an MCM or FFRCISTG model associated with the DAG in Figure 6.2(b) the extension of Pearl’s original story encoded in this graph allows the identification of the causal effect E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1))] associated with the DAG in Figure 6.2(b). Contrast with the NPSEM for the DAG in Figure 6.2(b) We now compare these results to those obtained under the assumption that the NPSEM associated with the DAG in Figure 6.2(b) held. Under this model Avin et al. (2005) proved, using their theory of path-specific effects, that while E[Y(x ¼ 1, Z(x ¼ 0))] is unidentified, both E ½Yðx ¼ 1; Lðx ¼ 0Þ; Zðx ¼ 0ÞÞ and E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 1ÞÞÞ ð6:27Þ are identified (without requiring any additional story) by Equations (6.25) and (6.26) respectively. From the perspective of the FFRCISTG models associated with the graphs in Figure 6.6(a) and (b) if N and O represent, as we have been assuming, the substantive variables Nicotine and Other components of cigarettes (rather than merely formal mathematical constructions), these graphs will generally represent mutually exclusive causal hypotheses. As a consequence, at most one of the two FFRCISTG models will be true; thus, from this perspective, only one of the two parameters in (6.27) will be identified. Simultaneous Identification of both Parameters in (6.27) by an Expanded Graph We next describe an alternative scenario associated with the expanded graph in Figure 6.6(d) whose substantive assumptions imply (i) the FFRCISTG model associated with Figure 6.6(d) holds and (ii) the two parameters of (6.27) are manipulable parameters of that FFRCISTG which are identified by Equations (6.25) and (6.26), respectively. Thus, this alternative
6 Alternative Graphical Causal Models
141
scenario provides a (simultaneous) manipulative interpretation for the nonmanipulative (relative to (X, Z, Y)) parameters (6.27) that are simultaneously identified by the NPSEM associated with the DAG in Figure 6.2(b). Suppose it was (somehow?) known that, as encoded in the DAG in Figure 6.6(d), the Nicotine (N0) component of cigarettes was the only (cigaretterelated) direct cause of Z not through L, the Tar (T ) component was the only (cigarette-related) direct cause of L, the Other components (O0) contained all the (cigarette-related) direct causes of Y not through Z and L, and there are no further confounding variables so that the FFRCISTG model associated with Figure 6.6(d) can be assumed to be true. Then, the parameter En0 ¼0;t¼0;o0 ¼1 [Y] associated with Figure 6.6(d) equals both the parameter E[Y(x ¼ 1, L(x ¼ 0), Z(x ¼ 0))] associated with Figure 6.2(b) and the parameter En¼0,o¼1[Y] associated with Figure 6.6(a) (where n ¼ 0 is now defined to be the intervention that sets nicotine n0 ¼ 0 and tar t ¼ 0 while o ¼ 1 is the intervention o0 ¼ 1; N and O are redefined by 1 N ¼ (1 N0) (1 T ) and O ¼ O0). Furthermore, En0 ¼0;t¼0;o0 ¼1 [Y] is identified by Equation (6.25). Similarly, the parameter En0 ¼0;t¼1;o0 ¼1 [Y] associated with Figure 6.6(d) is equal to both the parameter E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1)))] associated with Figure 6.2(b) and the parameter En¼0,o¼1[Y] associated with Figure 6.6(b) (where n ¼ 0 is now the intervention that sets nicotine n0 ¼ 0 while o ¼ 1 denotes the intervention that sets tar t ¼ 1 and o0 ¼ 1; N and O are redefined by N ¼ N0 and O ¼ TO0). Furthermore, the parameter En0 ¼0;t¼1;o0 ¼1 [Y] is identified by Equation (6.26). Note that under this alternative scenario, and in contrast to our previous scenarios, the substantive meanings of the intervention that sets n ¼ 0 and o ¼ 1 and of the variables N and O for Figure 6.6(a) differ from the substantive meaning of this intervention and these variables for Figure 6.6(b), allowing the two parameters En¼0,o¼1[Y] to be identified simultaneously, each by a different formula, under the single FFRCISTG model associated with Figure 6.6(d).
Connection to Path-Specific Effects Avin et al. (2005) refer to E[Y(x ¼ 1, L(x ¼ 0), Z(x ¼ 0))] as the effect of X ¼ 1 on Y when the paths from X to L and from X to Z are both blocked (inactivated) and to E[Y(x ¼ 1, L(x ¼ 1), Z(x ¼ 0, L(x ¼ 1)))] as the effect of X ¼ 1 on Y when the paths from X to Z are blocked. They refer to E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; Lðx ¼ 1Þ; Zðx ¼ 0; Lðx ¼ 0ÞÞÞ as the effect of X ¼ 1 on Y when both the path from X to Z and (X’s effect on) the path from L to Z are blocked.
Causality and Psychopathology
142
5.2 The General Case We now generalize the above results. Specifically, given any DAG G, with a variable X, construct a deterministic extended DAG Gex that differs from G only in that the only arrows out of X on Gex are deterministic arrows from X to new variables N and O and the origin of each arrow out of X on G is from either N or O (but never both) on Gex. Then, with V nX being the set of variables on G other than X, the marginal g-formula density fn¼0,o¼1(vnx) is identified from the distribution of the variables V on G whenever f (v) is a positive distribution by
fn¼0;o¼1 ðvnxÞ ¼
Y
f ðvj j paj Þ
f j:Vj is not a child of Y
X on Gg
f j:Vj is a child of Y
O on Gex g
f j:Vj is a child of
N on Gex g
f ðvj j paj nx; X ¼ 1Þ f ðvj j paj nx; X ¼ 0Þ:
Note that if X has p children on G, there exist 2p different graphs Gex. The identifying formula for fn¼0,o¼1(vnx) in terms of f (v) depends on the graph Gex. It follows that, under the assumption that a particular Gex is associated int with one of our four causal models, the intervention distribution fn¼0;o¼1 ðvnxÞ corresponding to that Gex is identified under any of the four associated models. We now discuss the relationship with path-specific effects. Avin et al. (2005) first define, for any counterfactual model associated with G, the path-specific effect on the density of V nX when various paths on graph G have been blocked. Avin et al. (2005) further determine which path-specific densities are identified under the assumption that the NPSEM associated with G is true and provide the identifying formulae. The results of Avin et al. (2005) imply that the path-specific effect corresponding to the set of blocked paths on G being the paths from X to the subset of its children who were the children of N on any given Gex is identified under the NPSEM assumption for G. Their identifying formula is precisely our fn¼0,o¼1(vnx) corresponding to this Gex. In fact, our derivation int implies that this path-specific effect on G is identified by fn¼0;o¼1 ðvnxÞ for this Gex under the assumption that any of our four causal models associated with this Gex holds, even without assuming that the NPSEM associated with the original graph G is true. Again, under the NPSEM assumption for G, all 2p int effects fn¼0;o¼1 ðvnxÞ as Gex varies are identified, each by the formula fn¼0,o¼1(vnx), specific to the graph Gex.
6 Alternative Graphical Causal Models
143
int A substantive scenario under which all 2p effects fn¼0;o¼1 ðvnxÞ are simultaneously identified by the Gex-specific formulae fn¼0,o¼1(vnx) is obtained by assuming an FFRCISTG model for an expanded graph on which N and O are 0 replaced by a set of parents Xj , j ¼ 1, . . . , p, one for each child of X, X is the 0 0 0 only parent of each Xj , each Xj has a single child, and X ¼Xj w.p. 1 in the 0 actual data. We consider the 2p interventions that set a subset of the Xj to 1 and the remainder to 0. The relationship of this analysis to the analysis based on the graphs Gex containing N and O mimics the relationship of the analysis based on Figure 6.6(d) under the alternative scenario of the last subsection to the analyses based on Figure 6.6(a) and Figure 6.6(b). In these latter analyses, X had precisely three children: Z, L and Y. Avin et al. (2005) also show that other path-specific effects are identified under the NPSEM assumption for G. However, their results imply that whenever, for a given set of blocked paths, the path-specific density of V nX is identified from data on V under an NPSEM associated with G, the identifying formula is equal to the g-formula density fn¼0,o¼1(vnx) corresponding to one of the 2p graphs Gex. Avin et al. (2005) provide an algorithm that can be used to find the appropriate Gex corresponding to a given set of blocked paths. As discussed in Section 3.4 even the path-specific densities that are not identified under an NPSEM become identified under yet further untestable counterfactual independence assumptions and/or rank preservation assumptions.
6 Conclusion The results presented here, which are summarized in Table 6.1, appear to present a clear trade-off between the agnostic causal DAG, MCM, and FFRCISTG model frameworks and that of the NPSEM. Table 6.1 Relations between causal models and estimands associated with the DAG shown in Figure 6.1; column ‘D’ indicates if the contrast is defined in the model; ‘I’ whether it is identified. Causal Model
Agnostic DAG MCM FFRCISTG NPSEM
Potential Outcome Indep. Ass. None (6.5) (6.1) (6.8)
Direct Effects CDE
ETT
PDE
PSDE
|X |=2
|X |>2
D
I
D
I
D
I
D
I
D
I
Y Y Y Y
Y Y Y Y
N Y Y Y
N N N Y
N Y Y Y
N N N Y
N Y Y Y
N Y Y Y
N Y Y Y
N N Y Y
144
Causality and Psychopathology
In the NPSEM approach the PDE is identified, even though the result cannot be verified by a randomized experiment without making further assumptions. In contrast, the PDE is not identified under an agnostic causal DAG model or under an MCM/FFRCISTG model. Further, in Appendix A we show that the ETT can be identified under an MCM/ FFRCISTG model even though the ETT cannot be verified by a randomized experiment without making further assumptions. Our analysis of Pearl’s motivation for the PDE suggests that these dichotomies may not be as stark as they may at first appear. We have shown that in certain cases where one is interested in a prima facie non-manipulable causal parameter then the very fact that it is of interest implies that there also exists an extended DAG in which the same parameter is manipulable and identifiable in all the causal frameworks. Inevitably, such cases will be interpreted differently by NPSEM ‘skeptics’ and ‘advocates.’ Advocates may argue that if our conjecture holds, then we can work with NPSEMs and have some reassurance that in important cases of scientific interest we will have the option to go back to an agnostic causal DAG. Conversely, skeptics may conclude that if we are correct then this shows that it is advisable to avoid the NPSEM framework: Agnostic causal DAGs are fully ‘‘testable’’ (with the usual caveats) and many non-manipulable NPSEM parameters that are of interest, but not identifiable within a non-NPSEM framework, can be identified in an augmented agnostic causal DAG. Undoubtedly, this debate is set to run and run . . .
Appendix A: The Effect of Treatment on the Treated: A Non-Manipulable Parameter The primary focus of this chapter has been various contrasts assessing the direct effect of X on Y relative to an intermediate Z. In this appendix we discuss another non-manipulable parameter, the effect of treatment on the treated, in order to further clarify the differences among the agnostic, the MCM and the FFRCISTG models. For our purposes, we shall only require the simplest possible causal model based on the DAG X ! Y, obtained by marginalizing over Z in the graph in Figure 6.1. Let Y(0) denote the counterfactual Y(x) evaluated at x ¼ 0. In a counterfactual causal model, the average effect of treatment on the treated is defined to be ETTðxÞ E ½YðxÞ Yð0Þ j X ¼ x Exint ½Y j X ¼ x E0int ½Y j X ¼ x :
6 Alternative Graphical Causal Models
145
Minimal Counterfactual Models (MCMs) In an MCM associated with DAG X ! Y, E[Y(x) | X ¼ x] ¼ E[Y | X ¼ x], by the consistency assumption (iii) in Section 1.1. Thus, ETTðxÞ ¼ E ½Y j X ¼ x E ½Yð0Þ j X ¼ x: Hence the ETT(x) is identified iff the second term on the right is identified. First, note that ETTð0Þ ¼ E ½Y j X ¼ 0 E ½Yð0Þ j X ¼ 0 ¼ 0: Now, by consistency condition (iii) in Section 1.1 and the MCM assumption, Equation (6.4), we have E ½Y j X ¼ 0 ¼ E ½Yð0Þ j X ¼ 0 ¼ E ½Yð0Þ: By the law of total probability, E ½Yð0Þ ¼
X
E ½Yð0Þ j X ¼ x PðX ¼ xÞ:
x
Hence, it follows that E ½Y j X ¼ 0PðX 6¼ 0Þ ¼
X
E ½Yð0Þ j X ¼ x PðX ¼ xÞ:
ð6:28Þ
x:x6¼0
In the special case where X is binary, so jX j, the right-hand side of Equation (6.28) reduces to a single term and, thus, we have E[Y(0) | X ¼ 1] ¼ E[Y | X ¼ 0]. It follows that for binary X, we have ETTð1Þ ¼ E½Y j X ¼ 1 E½Y j X ¼ 0 under the MCM (and hence any counterfactual causal model). See Pearl (2010, pp. 396–7) for a similar derivation, though he does not make explicit that consistency is required. In contrast, if X is not binary, then the right-hand side of Equation (6.28) contains more than one unknown so that ETT(x) for x 6¼ 0 is not identified under the MCM. However, under an FFRCISTG model, condition (6.1) implies that E½Yð0ÞjX ¼ x ¼E½YjX ¼ 0; so ETT(x) is identified in this model, regardless of X’s sample space. The parameter ETT(1) ¼ E[Y(1) Y(0) | X ¼ 1] is not manipulable, relative to {X, Y}, even when X is binary, since, without further assumptions, we cannot experimentally observe Y(0) in subjects with X ¼ 1.
146
Causality and Psychopathology
Note that even under the MCM with jX j > 2, the non-manipulable (relative to {X, Y}) contrast E[Y(0) | X 6¼ 0] E[Y | X 6¼ 0], the effect of receiving X ¼ 0 on those who did not receive X ¼ 0, is identified since E[Y(0) | X 6¼ 0] is identified by the left-hand side of Equation (6.28). We now turn to the agnostic causal model for the DAG X ! Y. Although Exint ½Y is identified by the g-functional as E[Y | X ¼ x], nonetheless, as expected for a non-manipulable causal contrast, the effect of treatment on the treated is not formally defined within the agnostic causal model, without further assumptions, even for binary X. Of course, the g-functional (see Definition 5) does define a joint distribution fx(x*, y) for (X, Y ) under which X takes the value x with probability 1. However, in spite of apparent notational similarities, the conditional density fx(y | x*) expresses a different concept from that occurring in the definition of Exint ½Y j X ¼x E½YðxÞjX ¼ x in the counterfactual theory. The former relates to the distribution over Y among those individuals who (after the intervention) have the value X ¼ x*, under an intervention which sets every unit’s value to x and thus fx(y | x*) ¼ f (y | x) if x* ¼ x and is undefined if x* 6¼ x ; the latter is based on the distribution of Y under an intervention fixing X ¼ x among those people who would have had the value X ¼ x* had we not intervened. The minimality of the MCM among all counterfactual models that both satisfy the consistency assumption (iii) in Section 1.1 and identify the interðzÞg can be seen as follows. For binary X, the above vention distributions f fpint R argument for identification of the non-manipulable contrast ETT(1) under an MCM as the difference E[Y | X ¼ 1] E[Y | X ¼ 0] follows directly, via the laws of probability, from the consistency assumption (iii) in Section 1.1 and the minimal independence assumption (6.5) required to identify the intervention ðzÞg. In contrast, the additional independence assumptions distributions f fpint R (6.8) used to identify the PDE under the NPSEM for the DAG in Figure 6.1 or the additional independence assumptions used to identify ETT(1) for nonbinary X under an FFRCISTG model for the DAG X ! Y are not needed to identify intervention distributions. Of course, as we have shown, it may be the case that the PDE is identified as an intervention contrast in an extended causal DAG containing additional variables; but identification in this extended causal DAG requires additional assumptions beyond those in the original DAG and hence does not follow merely from application of the laws of probability. Similarly, the ETT(1) for the causal DAG X ! Y can be re-interpreted as an intervention contrast in an extended causal DAG containing additional variables, regardless of the dimension of X’s state space. Specifically, Robins, VanderWeele, and Richardson (2007) showed that the ETT(x) parameter is defined and identified via the extended agnostic causal DAG in Figure 6.7
6 Alternative Graphical Causal Models X*
X
147
Y
Figure 6.7 An extended DAG, with a treatment X ¼ X * and response Y, leading to an interventional interpretation of the effect of treatment on the treated (Robins, VanderWeele, & Richardson, 2007; Geneletti & Dawid, 2007). The thicker edge indicates a deterministic relationship.
that adds to the DAG X ! Y a variable X * that is equal to X with probability 1 under f (v) ¼ f (x*, x, y). Then Exint ½Y j X ¼ x is identified by the g-formula as E[Y | X ¼ x], because X is the only parent of Y. Furthermore Exint ½Y j X ¼ x has an interpretation as the effect on the mean of Y of setting X to x on those observed to have X ¼ x* because X ¼ X* with probability 1. Thus, though ETT(x) is not a manipulable parameter relative to the graph X ! Y, it is manipulable relative to the variables {X *, X, Y} in the DAG in Figure 6.7. In the extended graph ETT(x) is identified by the same function of the observed data as E[Y(x) Y(0) | X ¼ x] in the original FFRCISTG model for non-binary X or in the original MCM or FFRCISTG model for binary X. The substantive fact that would license the extended DAG of Figure 6.7 is that a measurement, denoted by X *, could be taken just before X occurs such that (i) in the observed data X * predicts X perfectly (i.e. X ¼ X * w.p. 1) but (ii) an intervention exists that could, in principle, be administered in the small time interval between the X * measurement and the occurrence of X whose effect is to set X to a particular value x, say x ¼ 0. As an example, let X * denote the event that a particular pill has been swallowed, X denote the event that the pill’s contents enter the blood stream, and the intervention be the administration of an emetic that causes immediate regurgitation of the pill, but otherwise has no effect on the outcome Y; see Robins, VanderWeele, and Richardson (2007).
A Model that Is an MCM but Not an FFRCISTG In this section we describe a parametric counterfactual model for the effect of a ternary treatment X on a binary response Y that is an MCM associated with the graph in Figure 6.8 but is not an FFRCISTG. Let ¼ (0, 1, 2) be a (vector-valued) latent variable with three components such that in a given population Dirichlet (0, 1, 2) so that 0 + 1 + 2 ¼ 1 w.p. 1. The joint distribution of the factual and counterfactual data is determined by the unknown parameters (0, 1, 2). Specifically the treatment X is ternary with states 0, 1, 2, and P(X ¼ k | ) ¼ k, equivalently X j Multinomial ð1; Þ:
Causality and Psychopathology
148
Y (x=0)
π 0 π1 π2
X
Y
Y (x=1) Y (x=2)
X
(a)
Y
(b)
Figure 6.8 (a) A simple graph; (b) A graph describing a confounding structure that leads to a counterfactual model that corresponds to the MCM but not the FFRCISTG associated with the DAG (a); thicker red edges indicate deterministic relations.
Now suppose that the response Y is binary and that the counterfactual outcomes Y(x) are as follows: Yðx ¼ 0Þ j Bernoulli ð1 =ð1 þ 2 ÞÞ; Yðx ¼ 1Þ j Bernoulli ð2 =ð2 þ 3 ÞÞ; Yðx ¼ 2Þ j Bernoulli ð0 =ð0 þ 1 ÞÞ: Thus in this example, conditional on the potential outcome Y(x ¼ k) ‘happens’ to be a realization of a Bernoulli random variable with probability of success equal to the probability of receiving treatment X ¼ k + 1 mod 3 given that treatment X is not k. In what follows we will use [] to indicate that an expression is evaluated mod 3. Now since (0, 1, 2) follows a Dirichlet distribution, it follows that ½iþ1 =ð½iþ1 þ ½iþ2 Þ ? ? i for i ¼ 0, 1, 2. Hence, in this example, for i ¼ 0, 1, 2 we have Y (x ¼ i) ? ? i. Further, I (X ¼ i) ? ? Y (x ¼ i) | i; hence, the model obeys the MCM independence restriction (6.5): Yðx ¼ iÞ ? ? IðX ¼ iÞ for all i; but not the FFRCISTG independence restriction (6.1), since Yðx ¼ iÞ ? 6 ? IðX ¼ jÞ for i 6¼ j: We note that we have: PðX ¼ iÞ ¼ Eði Þ ¼ i =ð0 þ 1 þ 2 Þ;
ð6:29Þ
j X ¼ i Dirichlet ði þ 1; ½iþ1 ; ½iþ2 Þ;
ð6:30Þ
Yðx ¼ iÞ j X ¼ i Bernoulli ð½iþ1 =ð½iþ1 þ ½iþ2 ÞÞ;
ð6:31Þ
Y j X ¼ i Bernoulli ð½iþ1 =ð½iþ1 þ ½iþ2 ÞÞ:
ð6:32Þ
6 Alternative Graphical Causal Models
149
Equation (6.30) follows from standard Bayesian updating (since the Dirichlet distribution is conjugate to the multinomial). It follows that the vector of parameters (0, 1, 2) is identified only up to a scale factor since the likelihood for the observed variables f (x, y | ) ¼ f (x, y | ) for any > 0, by Equations (6.29) and (6.32). We note that since E(Y(x)) ¼ E(Y | X ¼ x), ACEX!Y ðxÞ EðYðxÞÞ EðYð0ÞÞ ¼ EðYjX ¼ xÞ EðYjX ¼ 0Þ and thus is identified. However since Yðx ¼ 0Þ j X ¼ 1 Bernoulli ðð1 þ 1Þ=ð1 þ 1 þ 2 ÞÞ; Yðx ¼ 0Þ j X ¼ 2 Bernoulli ð1 =ð1 þ 2 þ 1ÞÞ; and the probability of success in these distributions is not invariant under rescaling of the vector , we conclude that these distributions are not identified from data on f(x, y). Consequently ETT(x*) E[Y(x ¼ x*) Y(x ¼ 0) | X ¼ x*] is not identified under our parametric model.
Appendix B: A Data-Generating Process Leading to an FFRCISTG but not an NPSEM Robins (2003) stated that it is hard to construct realistic (as opposed to mathematical) scenarios in which one would accept that the FFRCISTG model associated with Figure 6.1 held, but the NPSEM did not, and thus that CDEs are identified but PDEs are not. In this Appendix we describe such a scenario. We leave it to the reader to judge its realism. Suppose that a substance U that is endogenously produced by the body could both (i) decrease blood pressure by reversibly binding to a membrane receptor on the blood pressure control cells in the carotid artery of the neck and (ii) directly increase atherosclerosis, and thus MI, by stimulating the endothelial cells of the coronary arteries of the heart via an interaction with a particular protein, and that this protein is expressed in endothelial cells of the coronary arteries only when induced by the chemicals in tobacco smoke other than nicotine e.g., tar. Further, suppose one mechanism by which nicotine increased blood pressure Z was by irreversibly binding to the membrane receptor for U on the blood pressure control cells in the carotid, the dose of nicotine in a smoker being sufficient to bind every available receptor. Then, under the assumption that there do not exist further unmeasured confounders for the effect of hypertension on MI, this scenario implies that it is reasonable to assume that any of the four causal models associated with the expanded DAG in Figure 6.9 is true. Here R measures the degree of binding of protein U to the membrane receptor in blood pressure control cells. Thus, R is zero in smokers of cigarettes containing nicotine. E measures the degree of stimulation of the endothelial cells of the carotid
Causality and Psychopathology
150
U
E ≡ OU
R ≡ (1−N)U
X
N
Z
O
Y
Figure 6.9 An example leading to the FFRCISTG associated with the DAG in Figure 6.1 but not an NPSEM; thicker edges denote deterministic relations.
artery by U. Thus, E is zero except in smokers (regardless of whether the cigarette contains nicotine). Before considering whether the NPSEM associated with Figure 6.1 holds, let us first study the expanded DAG of Figure 6.9. An application of the g-formula to the DAG in Figure 6.9 shows that the effect of not smoking int int int int ½Y ¼ Ex¼0 ½Y and the effect of smoking En¼1;o¼1 ½Y ¼ Ex¼1 ½Y are En¼0;o¼0 identified by E[Y | X ¼ 0] and E[Y | X ¼ 1] under all four causal models int ½Y of smoking nicoassociated with Figure 6.9. However, the effect En¼0;o¼1 tine-free cigarettes is not identified. Specifically, int ½Y En¼0;o¼1 X ¼ E ½Y j O ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; N ¼ 0Þ f ðuÞ z;u
¼
X
E ½Y j N ¼ 1; O ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; N ¼ 0; O ¼ 0Þ f ðuÞ
z;u
¼
X
E ½Y j X ¼ 1; U ¼ u; Z ¼ z f ðz j U ¼ u; X ¼ 0Þ f ðuÞ
z;u
where the first equality used the fact that E is a deterministic function of U and O and that R is a deterministic function of N and U. The second equality int ½Y is not a used d-separation and the third, determinism. Thus, En¼0;o¼1 function of the density of the observed data on (X, Z, Y ) because u occurs both in the term E[Y | X ¼ 1, U ¼ u, Z ¼ z] where we have conditioned on X ¼ 1, and in the term f (z | U ¼ u, X ¼ 0), where we have conditioned on X ¼ 0. As a consequence, we do not obtain a function of the density of the observed data when we marginalize over U.
6 Alternative Graphical Causal Models
151
Since under all three counterfactual models associated with the extended int ½Y is equal to the parameter E[Y(x ¼ 1, Z(x ¼ 0))] DAG of Figure 6.9 En¼0;o¼1 of Figure 6.1, we conclude that E[Y(x ¼ 1, Z(x ¼ 0))], and thus, the PDE is not identified. Hence, the induced counterfactual model for the DAG in Figure 6.1 cannot be an NPSEM (as that would imply that the PDE would be identified). int Furthermore, En¼0;o¼1 ½Y is a manipulable parameter with respect to the DAG in Figure 6.3, since this DAG is obtained from marginalizing over U in int ½Y is not the graph in Figure 6.9. However, as we showed above, En¼0;o¼1 identified from the law of the factuals X, Y, Z, N, O, which are the variables in Figure 6.3. From this we conclude that none of the four causal models associated with the graph in Figure 6.3 can be true. Note that prima facie one might have thought that if the agnostic causal DAG in Figure 6.1 is true, then this would always imply that the agnostic causal DAG in Figure 6.3 is also true. This example demonstrates that such a conclusion is fallacious. Similar remarks apply to the MCM and FFRCISTG models. Additionally, for z ¼ 0, 1, by applying the g-formula to the graph in int Figure 6.9, we obtain that the joint effect of smoking and z, En¼1;o¼1;z ½Y, int and the joint effect of not smoking and z, En¼0;o¼0;z ½Y, are identified by E½YjX ¼ 1;Z ¼ z and E½YjX ¼ 0;Z ¼ z, respectively, under all four causal int int ½Y and En¼1;o¼1;z ½Y are equal to the models for Figure 6.9. Since En¼0;o¼0;z int int parameters Ex¼0;z ½Y and Ex¼1;z ½Y under all four associated causal models associated with the graph in Figure 6.1 we conclude that CDE(z) is also identified under all four causal models associated with Figure 6.1. The results obtained in the last two paragraphs are consistent with the FFRCISTG model and the MCM associated with the graph in Figure 6.1 holding but not the NPSEM. In what follows we prove such is the case. Before doing so, we provide a simpler and more intuitive way to understand the above results by displaying in Figure 6.10 the subgraphs of Figure 6.9 corresponding to U, Z, Y when the variables N and O are set to each of their four possible joint values. We see that only when we set N ¼ 0 and O ¼ 1 is it the case that U is a common cause of both Z and Y (as setting N ¼ 0, O ¼ 1 makes R ¼ E ¼ U). Thus, we have int int ½Y ¼ En¼0;o¼0 ½YjZ ¼ z En¼0;o¼0;z
¼ E ½YjO ¼ 0; N ¼ 0; Z ¼ z ¼ E ½YjX ¼ 0; Z ¼ z; and int ½Y En¼1;o¼1;z
int ¼ En¼1;o¼1 ½YjZ ¼ z
¼ E ½YjO ¼ 1; N ¼ 1; Z ¼ z ¼ E ½YjX ¼ 1; Z ¼ z as O and N are unconfounded and Z is unconfounded when either we int ½Y 6¼ set O ¼ 1, N ¼ 1 or we set O ¼ 0, N ¼ 0. However, En¼0;o¼1;z int En¼0;o¼1 ½Yjz ¼ E½YjN ¼ 0;O ¼ 1;Z ¼ z as the effect of Z on Y is confounded
Causality and Psychopathology
152 (a)
U
(b)
U
(c)
U
(d)
Z
Z
Z
Z
Y
Y
Y
Y
U
Figure 6.10 An example leading to the FFRCISTG associated with the DAG in Figure 6.1 holding but not the NPSEM: Causal subgraphs on U, Z, Y implied by the graph in Figure 6.9 when we intervene and set (a) N ¼ 0, O ¼ 0; (b) N ¼ 1, O ¼ 0; (c) N ¼ 0, O ¼ 1; (d) N ¼ 1, O ¼ 1. int int when we set N ¼ 0, O ¼ 1. It is because En¼0;o¼1;z ½Y 6¼ En¼0;o¼1 ½Yjz that int En¼0;o¼1 ½Y is not identified. If, contrary to Figure 6.9, there was no confounding between Y and Z when N is set to 0 and O is set to 1, then we would have int int ½Y ¼En¼0;o¼1 ½Yjz. It would then follow that En¼0;o¼1;z int ½Y ¼ En¼0;o¼1
X
int int En¼0;o¼1 ½Yjz fn¼0;o¼1 ½z
z
¼
X
int int En¼0;o¼1;z ½Y fn¼0;o¼1 ½z
z
¼
X
int int En¼1;o¼1;z ½Y fn¼0;o¼0 ½z
z
¼
X
E ½YjX ¼ 1; Z ¼ z f ½zjX ¼ 0;
z
where the third equality is from the fact that we suppose N has no direct effect on Y not through Z and O has no effect on Z. We conclude by showing that the MCM and FFRCISTG models associated with Figure 6.1 are true, but the NPSEM is not, if any of the three counterfactual models associated with Figure 6.9 are true. Specifically, the DAG in Figure 6.11 represents the DAG of Figure 6.1 with the counterfactuals for Z(x) and Y(x, z), the variable U of Figure 6.9, and common causes U1 and U2 of the Z(x) and the Y(x, z) added to the graph. Note that U being a common cause of Z and Y in Figures 6.9 and 6.10 only when we set N ¼ 0 and O ¼ 1 implies that U is only a common cause of Z(0), Y(1, 0), and Y(1, 1) in Figure 6.11. One can check using d-separation that the counterfactual independences in Figure 6.11 satisfy those required of an MCM or FFRCISTG model, but not those of an NPSEM, as Z(0) and Y(1, z) are dependent. However, Figure 6.11 contains more independences than are required for the FFRCISTG condition (6.1) applied to the DAG in Figure 6.1. In particular, in Figure 6.11 Z(1) and Y(0, z) are independent, which implies that E[Y(0, Z(1))] is identified by z E½YjX ¼ 0;Z ¼ z f ðzjX ¼ 1Þ and, thus, the the so-called total direct effect E[Y(1, Z(1))] E[Y(0, Z(1))] is also identified.
6 Alternative Graphical Causal Models
153
Figure 6.11 An example leading to an FFRCISTG corresponding to the DAG in Figure 6.1 but not an NPSEM: potential outcome perspective. Counterfactuals for Y are indexed Y(x, z). U, U1, and U2 indicate hidden confounders. Thicker edges indicate deterministic relations.
Finally, we note that we could easily modify our example to eliminate the independence of Z(1) and Y(0, z).
Appendix C: Bounds on the PDE under an FFRCISTG Model In this Appendix we derive bounds on the PDE PDE ¼ E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ E ½Y j X ¼ 0 under the assumption that the MCM or FFRCISTG model corresponding to the graph in Figure 6.1 holds and all variables are binary. Note E ½Yðx ¼ 1; Zðx ¼ 0ÞÞ ¼ E ½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ þ E ½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ: The two quantities E½Yðx ¼ 1;z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0 and E½Yðx ¼ 1;z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1 are constrained by the law for the observed data via E½Y j X ¼ 1; Z ¼ 0 ¼ E½Yðx ¼ 1; z ¼ 0Þ ¼ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZðx ¼ 0Þ ¼ 0Þ þ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 1PðZðx ¼ 0Þ ¼ 1Þ ¼ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ þ E½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ;
154
Causality and Psychopathology
E½Y j X ¼ 1; Z ¼ 1 ¼ E½Yðx ¼ 1; z ¼ 1Þ ¼ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 0PðZðx ¼ 0Þ ¼ 0Þ þ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZðx ¼ 0Þ ¼ 1Þ ¼ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 0PðZ ¼ 0 j X ¼ 0Þ þ E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1PðZ ¼ 1 j X ¼ 0Þ: It then follows from the analysis in Richardson and Robins (2010, Section 2.2) that the set of possible values for the pair ð0 ; 1 Þ ðE½Yðx ¼ 1; z ¼ 0Þ j Zðx ¼ 0Þ ¼ 0; E½Yðx ¼ 1; z ¼ 1Þ j Zðx ¼ 0Þ ¼ 1Þ compatible with the observed joint distribution f ðz;y j xÞ is given by ð0 ;1 Þ 2 ½l0 ;u0 ½l1 ;u1 where l0 ¼ maxf0; 1 þ ðE½Y j X ¼ 1; Z ¼ 0 1Þ=PðZ ¼ 0 j X ¼ 0Þg; u0 ¼ minfE½Y j X ¼ 1; Z ¼ 0=PðZ ¼ 0 j X ¼ 0Þ; 1g; l1 ¼ maxf0; 1 þ ðE½Y j X ¼ 1; Z ¼ 1 1Þ=PðZ ¼ 1 j X ¼ 0Þg; u1 ¼ minfE½Y j X ¼ 1; Z ¼ 1=PðZ ¼ 1 j X ¼ 0Þ; 1g: Hence, we have the following upper and lower bounds on the PDE: maxf0; PðZ ¼ 0 j X ¼ 0Þ þ E½Y j X ¼ 1; Z ¼ 0 1g þ maxf0; PðZ ¼ 1 j X ¼ 0Þ þ E½Y j X ¼ 1; Z ¼ 1 1g E ½Y j X ¼ 0 PDE minfPðZ ¼ 0 j X ¼ 0Þ; E½Y j X ¼ 1; Z ¼ 0g þ minfPðZ ¼ 1 j X ¼ 0Þ; E½Y j X ¼ 1; Z ¼ 1g E ½Y j X ¼ 0: Kaufman, Kaufman, and MacLehose (2009) obtain bounds on the PDE under assumption (6.2) but while allowing for confounding between Z and Y, i.e. not assuming that (6.3) holds, as we do. As we would expect the bounds that we obtain are strictly contained in those obtained by Kaufman et al. (2009, see Table 2, row {50}). Note that when P(Z ¼ z | X ¼ 0) ¼ 1, PDE ¼ CDEðzÞ ¼ E ½YjX ¼ 1; Z ¼ z E ½YjX ¼ 0; Z ¼ zÞ; thus, in this case our upper and lower bounds on the PDE coincide and the parameter is identified. In contrast, Kaufman et al.’s upper and lower bounds on the PDE do not coincide when P(Z ¼ z | X ¼ 0) ¼ 1. This follows because, under their assumptions, CDE(z) is not identified, but PDE ¼ CDE(z) when P(Z ¼ z | X ¼ 0) ¼ 1.
6 Alternative Graphical Causal Models
155
Appendix D: Interventions Restricted to a Subset: The FRCISTG Model To describe the FRCISTG model for V ¼ (V1, . . . , VM), we suppose that each Vm ¼ (Lm, Am) is actually a composite of variables Lm and Am, one of which can be the empty set. The causal effects of intervening on any of the Lm variables is not defined. However, we assume that for any subset R of A ¼ A M ¼ ðA1 ; . . . ;AM Þ, the counterfactuals Vm(r) are well-defined for any r 2 R. Specifically, we assume that the one-step-ahead counterfactuals Vm ða m1 Þ ¼(Lm ða m1 Þ;Am ða m1 Þ) exist for any setting of a m1 2 A m1 . Note that it is implicit in this definition that Lk precedes Ak for all k. Next, we make the consistency assumption that the factual variables Vm and the counterfactual variables Vm(r) are obtained recursively from the Vm ða m1 Þ. We do not provide a graphical characterization of parents. Rather, we say that the parents Pam of Vm consist of the smallest subset of A m1 such that, for all a m1 2 A m1 ,Vm ða m1 Þ ¼ Vm ðpam Þ where pam is the sub-vector of a m1 corresponding to Pam. One can then view the parents Pam of Vm as the direct causes of Vm relative to the variables prior to Vm on which we can perform interventions. Finally, an FRCISTG model imposes the following independences:
? Am ðam1 Þ j Lm ¼ lm ; Am1 ¼ am1 ; Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ? for all m; aM1 ; lm :
ð6:33Þ
Note that (6.33) can also be written
? Am ðam1 Þ j Lm ðam1 Þ ¼ lm ; Am1 ¼ am1 ; Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ? for all m; aM1 ; lm ;
where L m ða m1 Þ ¼(Lm ða m1 Þ;Lm1 ða m2 Þ; . . . ;L1 ). In the absence of inter-unit interference and non-compliance, data from a sequentially randomized experiment in which at each time m the treatment Am is randomly assigned, with the assignment probability at m possibly depending on the past ðL m ;A m1 Þ, will follow an FRCISTG model; see Robins (1986) for further discussion. The analogous minimal causal model (MCM) with interventions restricted to a subset is defined by replacing Am ða m1 Þ by IfAm ða m1 Þ ¼ am g in condition (6.33). It follows from (Robins, 1986) that our Extended Lemma 6 continues to hold when we substitute either ‘FRCISTG model’ or ‘MCM with restricted interventions’, for ‘MCM’ in the statement of the Lemma, provided we take R A.
156
Causality and Psychopathology
Likewise we may define an agnostic causal model with restricted interventions to be the causal model that simply assumes that the interventional density of ðzÞ, under treatment regime pR for any R A, is given Z V, denoted by fpint R by the g-functional density fpR(z) whenever fpR(z) is a well-defined function of f (v). In Theorem 1 we proved that the set of defining conditional independences in condition (6.1) of an FFRCISTG model can be re-expressed as a set of unconditional independences between counterfactuals. An analogous result does not hold for an FRCISTG. However, the following theorem shows that we can remove past treatment history from the conditioning set in the defining conditional independences of an FRCISTG model, provided that we continue to condition on the counterfactuals L m ða m1 Þ. Theorem 8 An FRCISTG model for V ¼ðV1 ; . . . ;VM Þ;Vm ¼ðLm ;Am Þ implies that for all m, a M1 ;l m , ? Am ðam1 Þ j Lm ðam1 Þ ¼ lm : Vmþ1 ðam Þ; . . . ; VM ðaM1 Þ ? Note that the theorem would not be true had we substituted the factual L m for L m ða m1 Þ.
References Avin, C., Shpitser, I., & Pearl, J. (2005). Identifiability of path-specific effects. In L. P. Kaelbling & A. Saffiotti (Eds.), IJCAI-05, Proceedings of the nineteenth international joint conference on artificial intelligence (pp. 357–363). Denver: Professional Book Center. Dawid, A. P. (2000). Causal inference without counterfactuals. Journal of the American Statistical Association, 95(450), 407–448. Didelez, V., Dawid, A., & Geneletti, S. (2006). Direct and indirect effects of sequential treatments. In R. Dechter & T. S. Richardson (Eds.), UAI-06, Proceedings of the 22nd annual conference on uncertainty in artificial intelligence (pp. 138–146). Arlington, VA: AUAI Press. Frangakis, C. E., & Rubin, D. B. (1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika, 86(2), 365–379. Frangakis, C. E., & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58(1), 21–29. Geneletti, S., & Dawid, A. P. (2007). Defining and identifying the effect of treatment on the treated (Tech. Rep. No. 3). Imperial College London, Department of Epidemiology and Public Health,. Gill, R. D., & Robins, J. M. (2001). Causal inference for complex longitudinal data: The continuous case. Annals of Statistics, 29(6), 1785–1811. Hafeman, D., & VanderWeele, T. (2010). Alternative assumptions for the identification of direct and indirect effects. Epidemiology. (Epub ahead of print)
6 Alternative Graphical Causal Models
157
Heckerman, D., & Shachter, R. D. (1995). A definition and graphical representation for causality. In P. Besnard & S. Hanks (Eds.), UAI-95: Proceedings of the eleventh annual conference on uncertainty in artificial intelligence (pp. 262–273). San Francisco: Morgan Kaufmann. Imai, K., Keele, L., & Yamamoto, T. (2009). Identification, inference, and sensitivity analysis for causal mediation effects (Tech. Rep.). Princeton University, Department of Politics. Kaufman, S., Kaufman, J. S., & MacLehose, R. F. (2009). Analytic bounds on causal risk differences in directed acyclic graphs involving three observed binary variables. Journal of Statistical Planning and Inference, 139(10), 3473–3487. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan Kaufmann. Pearl, J. (2000). Causality. Cambridge: Cambridge University Press. Pearl, J. (2001). Direct and indirect effects. In J. S. Breese & D. Koller (Eds.), UAI-01, Proceedings of the 17th annual conference on uncertainty in artificial intelligence (pp. 411–42). San Francisco: Morgan Kaufmann. Pearl, J. (2010). An introduction to causal inference. The International Journal of Biostatistics, 6(2). (DOI: 10.2202/1557-4679.1203) Petersen, M., Sinisi, S., & Laan, M. van der. (2006). Estimation of direct causal effects. Epidemiology, 17(3), 276–284. Richardson, T. S., & Robins, J. M. (2010). Analysis of the binary instrumental variable model. In R. Dechter, H. Geffner, & J. Halpern (Eds.), Heuristics, probability and causality: A tribute to Judea Pearl (pp. 415–444). London: College Publications. Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods – applications to control of the healthy worker survivor effect. Mathematical Modeling, 7, 1393–1512. Robins, J. M. (1987). Addendum to ‘‘A new approach to causal inference in mortality studies with sustained exposure periods – applications to control of the healthy worker survivor effect’’. Computers and Mathematics with Applications, 14, 923–945. Robins, J. M. (2003). Semantics of causal DAG models and the identification of direct and indirect effects. In P. Green, N. Hjort, & S. Richardson (Eds.), Highly structured stochastic systems (pp. 70–81). Oxford: Oxford University Press. Robins, J. M., & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. Robins, J. M., & Greenland, S. (2000). Comment on ‘‘Causal inference without counterfactuals’’. Journal of the American Statistical Association, 95(450), 431–435. Robins, J. M., Richardson, T. S., & Spirtes, P. (2009). Identification and inference for direct effects (Tech. Rep. No. 563). University of Washington, Department of Statistics. Robins, J. M., Rotnitzky, A., & Vansteelandt, S. (2007). Discussion of ‘‘Principal stratification designs to estimate input data missing due to death’’ by Frangakis, C.E., Rubin D.B., An, M., MacKenzie, E. Biometrics, 63(3), 650–653. Robins, J. M., VanderWeele, T. J., & Richardson, T. S. (2007). Discussion of ‘‘Causal effects in the presence of non compliance: a latent variable interpretation’’ by Forcina, A. Metron, LXIV(3), 288–298. Rothman, K. J. (1976). Causes. American Journal of Epidemiology, 104, 587–592. Rubin, D. B. (1998). More powerful randomization-based ‘‘p-values’’ with the p italicized in double-blind trials with non-compliance. Statistics in Medicine, 17, 371–385.
158
Causality and Psychopathology
Rubin, D. B. (2004). Direct and indirect causal effects via potential outcomes. Scandinavian Journal of Statistics, 31(2), 161–170. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Prediction and Search (No. 81). New York: Springer-Verlag. VanderWeele, T., & Robins, J. (2007). Directed acyclic graphs, sufficient causes, and the properties of conditioning on a common effect. American Journal of Epidemiology, 166(9), 1096–1104.
7 General Approaches to Analysis of Course Applying Growth Mixture Modeling to Randomized Trials of Depression Medication bengt muthe´n, hendricks c. brown, aimee m. hunter, ian a. cook, and andrew f. leuchter
Introduction This chapter discusses the assessment of treatment effects in longitudinal randomized trials using growth mixture modeling (GMM) (Muthe´n & Shedden, 1999; Muthe´n & Muthe´n, 2000; Muthe´n et al., 2002; Muthe´n & Asparouhov, 2009). GMM is a generalization of conventional repeated measurement mixed-effects (multilevel) modeling. It captures unobserved subject heterogeneity in trajectories not only by random effects but also by latent classes corresponding to qualitatively different types of trajectories. It can be seen as a combination of conventional mixed-effects modeling and cluster analysis, also allowing prediction of class membership and estimation of each individual’s most likely class membership. GMM has particularly strong potential for analyses of randomized trials because it responds to the need to investigate for whom a treatment is effective by allowing for different treatment effects in different trajectory classes. The chapter is motivated by a University of California, Los Angeles study of depression medication (Leuchter, Cook, Witte, Morgan, & Abrams, 2002). Data on 94 subjects are drawn from a combination of three studies carried out with the same design, using three different types of medications: fluoxetine (n = 14), venlafaxine IR (n = 17), and venlafaxine XR (n = 18). Subjects were measured at baseline and again after a 1-week placebo lead-in phase. In the subsequent double-blind phase of the study, the subjects were randomized into medication (n = 49) and placebo (n = 45) groups. After randomization, subjects were measured at nine occasions: at 48 hours and at weeks 1–8. The current analyses consider the Hamilton Depression Rating Scale. 159
160
Causality and Psychopathology
Several predictors of course of the Hamilton scale trajectory are available, including gender, treatment history, and a baseline measure of central cordance hypothesized to influence tendency to respond to treatment. The results of studies of this kind are often characterized in terms of an end point analysis where the outcome at the end of the study, here at 8 weeks, is considered for the placebo group and for the medication group. A subject may be classified as a responder by showing a week 8 depression score below 10 or when dropping below 50% of the initial score. The treatment effect may be assessed by comparing the medication and placebo groups with respect to the ratio of responders to nonresponders. As an alternative to end point analysis, conventional repeated measurement mixed-effects (multilevel) modeling can be used. Instead of focusing on only the last time point, this uses the outcome at all time points, the two pretreatment occasions and the nine posttreatment occasions. The trajectory shape over time is of key interest and is estimated by a model that draws on the information from all time points. The idea of considering trajectory shape in research on depression medication has been proposed by Quitkin et al. (1984), although not using a formal statistical growth model. Rates of response to treatment with antidepressant drugs are estimated to be 50%–60% in typical patient populations. Of particular interest in this chapter is how to assess treatment effects in the presence of a placebo response. A placebo response is an improvement in depression ratings seen in the placebo group that is unrelated to medication. The improvement is often seen as an early steep drop in depression, often followed by a later upswing. An example is seen in Figure 7.2. A placebo response confounds the estimation of the true effect of medication and is an important phenomenon given its high prevalence of 25%–60% (Quitkin, 1984). Because the placebo response is pervasive, the statistical modeling must take it into account when estimating medication effects. This can be done by acknowledging the qualitative heterogeneity in trajectory shapes for responders and nonresponders. It is important to distinguish among responder and nonresponder trajectory shapes in both the placebo and medication groups. Conventional repeated measures modeling may lead to distorted assessment of medication effects when individuals follow several different trajectory shapes. GMM avoids this problem while maintaining the repeated measures modeling advantages. The chapter begins by considering GMM with two classes, a nonresponder class and a responder class. The responder class is defined as those individuals who respond in the placebo group and who would have responded to placebo among those in the medication group. Responder class membership is observed for subjects in the placebo group but is unobserved in the medication group. Because of randomization, it can be assumed that this class of subjects is present in both the placebo and medication groups
7 General Approaches to Analysis of Course
161
and in equal numbers. GMM can identify the placebo responder class in the medication group. Having identified the placebo responder and placebo nonresponder classes in both the placebo and medication groups, medication effects can more clearly be identified. In one approach, the medication effect is formulated in terms of an effect of medication on the trajectory slopes after the treatment phase has begun. This medication effect is allowed to be different for the nonresponder and responder trajectory classes. Another approach formulates the medication effect as increasing the probability of membership in advantageous trajectory classes and decreasing the probability of membership in disadvantageous trajectory classes.
Growth Mixture Modeling This section gives a brief description of the GMM in the context of the current study. A two-piece, random effect GMM is applied to the Hamilton Depression Rating Scale outcomes at the 11 time points y1–y11. The first piece refers to the two time points y1 and y2 before randomization, and the second piece refers to the nine postrandomization time points y3–y11. Given only two time points, the first piece is by necessity taken as a linear model with a random intercept, defined at baseline, and a fixed effect slope. An exploration of each individual’s trajectory suggests a quadratic trajectory shape for the second piece. The growth model for the second piece is centered at week 8, defining the random intercept as the systematic variation at that time point. All random effect means are specified as varying across latent trajectory classes. The medication effect is captured by a regression of the linear and quadratic slopes in the second piece on a medication dummy variable. These medication effects are allowed to vary across the latent trajectory classes. The model is shown in diagrammatic form at the top of Figure 7.1.1 The statistical specification is as follows. Consider the depression outcome yit for individual i, let c denote the latent trajectory class variable, let g denote random effects, let at denote time, and let 2t denote residuals containing measurement error and time-specific variation. For the first, prerandomization piece, conditional on trajectory class k (k = 1, 2 . . . K),
1. In Figure 7.1 the observed outcomes are shown in boxes and the random effects in circles. Here, i, s, and q denote intercept, linear slope, and quadratic slope, respectively. In the following formulas, these random effects are referred to as g0, g1, and g2. The treatment dummy variable is denoted x.
Causality and Psychopathology
162
ybase ybpo1i
i1
s1
y48
y1
y2
i2
s2
q2
y48
y1
y2
i2
s2
q2
y3
y4
y5
y6
y7
y8
y3
y4
y5
y6
y7
y8
c
x
ybase ybpo1i
c
x
Figure 7.1 Two alternative GMM approaches. pre
pre
pre
pre
Yit ‰ci ¼k ¼ g0i ¼ g1i at þ 2it ;
(1)
with 1 = 0 to center at baseline, and random effects pre
g10i ‰ci ¼k ¼ 10k þ 10i ; pre
g11i ‰ci ¼k ¼ 11k þ 11i ;
(2) (3)
with only two prerandomization time points, the model is simplified to assume a nonrandom slope, V(11) = 0, for identification purposes. For the second, postrandomization piece, yit ‰ci ¼k ¼ g0i þ g1i at þ g2i at2 þ 2it ;
(4)
with 11 = 0, defining g0i as the week 8 depression status. The remaining t values are set according to the distance in timing of measurements. Assume for simplicity a single drug and denote the medication status for individual i by the dummy variable xi (x = 0 for the placebo group and x = 1 for the medication group).2 The random effects are allowed to be influenced by 2. In the application three dummy variables are used to represent the three different medications.
7 General Approaches to Analysis of Course
163
group and a covariate, w, their distributions varying as a function of trajectory class (k), g0i ‰ci ¼k ¼ 0k þ 01k xi þ 02k wi þ 0i ;
(5)
g1i ‰ci ¼k ¼ 1k þ 11k xi þ 12k wi þ 1i ;
(6)
g2i ‰ci ¼k ¼ 2k þ 21k xi þ 22k wi þ 2i ;
(7)
The residuals i in the first and second pieces have a 4 4 covariance matrix k, here taken to be constant across classes k. For both pieces the residuals 2it have a T T covariance matrix k, here taken to be constant across classes. For simplicity, k and k are assumed to not vary across treatment groups. As seen in equations 5–7, the placebo group (xi = 0) consists of subjects from the two different trajectory classes that vary in the means of the growth factors, which in the absence of covariate w are represented by 0k, 1k, and 2k. This gives the average depression development in the absence of medication. Because of randomization, the placebo and medication groups are assumed to be statistically equivalent at the first two time points. This implies that x is assumed to have no effect on g10i or g11i in the first piece of the development. Medication effects are described in the second piece by g01k, g11k, and g21k as a change in average growth rate that can be different for the classes. This model allows the assessment of medication effects in the presence of a placebo response. A key parameter is the medication-added mean of the intercept random effect centered at week 8. This is the g01k parameter of equation 5. This indicates how much lower or higher the average score is at week 8 for the medication group relative to the placebo group in the trajectory class considered. In this way, the medication effect is specific to classes of individuals who would or would not have responded to placebo. The modeling will be extended to allow for the three drugs of this study to have different g parameters in equations 5–7. Class membership can be influenced by baseline covariates as expressed by a logistic regression (e.g., with two classes), log½Pðc i ¼ 1jxi Þ=Pðci ¼ 2‰xi Þ ¼ c þ c wi ;
(8)
where c = 1 may refer to the nonresponder class and c = 2, the responder class. It may be noted that this model assumes that medication status does not influence class membership. Class membership is conceptualized as a quality characterizing an individual before entering the trial.
164
Causality and Psychopathology
A variation of the modeling will focus on postrandomization time points. Here, an alternative conceptualization of class membership is used. Class membership is thought of as being influenced by medication so that the class probabilities are different for the placebo group and the three medication groups. Here, the medication effect is quantified in terms of differences across groups in class probabilities. This model is shown in diagrammatic form at the bottom of Figure 7.1. It is seen that the GMM involves only the postrandomization outcomes, which is logical given that treatment influences the latent class variable, which in turn influences the posttreatment outcomes. In addition to the treatment variable, pretreatment outcomes may be used as predictors of latent class, as indicated in the figure. The treatment and pretreatment outcomes may interact in their influence on latent class membership.
Estimation and Model Choice The GMM can be fitted into the general latent variable framework of the Mplus program (Muthe´n & Muthe´n, 1998–2008). Estimation is carried out using maximum likelihood via an expectation-maximization (EM) algorithm. Missing data under the missing at random (MAR) assumption are allowed for the outcomes. Given an estimated model, estimated posterior probabilities for each individual and each class are produced. Individuals can be classified into the class with the highest probability. The classification quality is summarized in an entropy value with range 0–1, where 1 corresponds to the case where all individuals have probability 1 for one class and 0 for the others. For model fitting strategies, see Muthe´n et al. (2002), Muthe´n (2004), and Muthe´n and Asparouhov (2008). A common approach to decide on the number of classes is to use the Bayesian information criterion (BIC), which puts a premium on models with large log-likelihood values and a small number of parameters. The lower the BIC, the better the model. Analyses of depression trial data have an extra difficulty due to the typically small sample sizes. Little is known about the performance of BIC for samples as small as in the current study. Bootstrapped likelihood ratio testing can be performed in Mplus (Muthe´n & Asparouhov, 2008), but the power of such testing may not be sufficient at these sample sizes. Plots showing the agreement between the class-specific estimated means and the trajectories for individuals most likely belonging to a class can be useful in visually inspecting models but are of only limited value in choosing between models. A complication of maximum-likelihood GMM is the presence of local maxima. These are more prevalent with smaller samples such as the current ones for the placebo group, the medication group, as well as for the combined sample. To be confident that a global maximum has been found, many
7 General Approaches to Analysis of Course
165
random starting values need to be used and the best log-likelihood value needs to be replicated several times. In the present analyses, between 500 and 4,000 random starts were used depending on the complexity of the model.
Growth Mixture Analyses In this section the depression data are analyzed in three steps using GMM. First, the placebo group is analyzed alone. Second, the medication group is analyzed alone. Third, the placebo and medication groups are analyzed jointly according to the GMM just presented in order to assess the medication effects.
Analysis of the Placebo Group A two-class GMM analysis of the 45 subjects in the placebo group resulted in the model-estimated mean curves shown in Figure 7.2. As expected, a responder class (class 1) shows a postrandomization drop in the depression score with a low of 7.9 at week 5 and with an upswing to 10.8 at week 8. An estimated 32% of the subjects belong to the responder class. In contrast, the nonresponder class has a relatively stable level for weeks 1–8, ending with a depression score of 15.6 at week 8. The sample standard deviation at week 8 is 7.6. It may be noted that the baseline score is only slightly higher for the nonresponder class, 22.7 vs. 21.9. The standard deviation at baseline is 3.6.3 The observed trajectories of individuals classified into the two classes are plotted in Figure 7.3a and b as broken lines, whereas the solid curves show the model-estimated means. The figure indicates that the estimated mean curves represent the individual development rather well, although there is a good amount of individual variation around the mean curves. It should be noted that the classification of subjects based on the trajectory shape approach of GMM will not agree with that using end point analysis. As an example, the nonresponder class of Figure 7.3b shows two subjects with scores less than 5 at week 8. The individual with the lowest score at week 8, however, has a trajectory that agrees well with the nonresponder mean curve for most of the trial, deviating from it only during the last 2 weeks. The week 8 score has a higher standard deviation than at earlier time points, thereby weighting this time point somewhat less. Also, the data coverage due to
3. The maximum log-likelihood value for the two-class GMM of Figure 7.2 is 1,055.974, which is replicated across many random starts, with 28 parameters and a BIC value of 2,219. The classification based on the posterior class probabilities is not clear-cut in that the classification entropy value is only 0.66.
Causality and Psychopathology
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Class 1, 32.4%
week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
lead-in 48 hrs
Class 2, 67.6%
baseline
HamD
166
Time
Figure 7.2 Two-class GMM for placebo group.
missing observations is considerably lower for weeks 5–7 than other weeks, reducing the weight of these time points. The individual with the second lowest score at week 8 deviates from the mean curve for week 5 but has missing data for weeks 6 and 7. This person is also ambiguously classified in terms of his or her posterior probability of class membership. To further explore the data, a three-class GMM was also fitted to the 45 placebo subjects. Figure 7.4a shows the mean curves for this solution. This solution no longer shows a clear-cut responder class. Class 2 (49%) declines early, but the mean score does not go below 14. Class 1 (22%) ends with a mean score of 10.7 but does not show the expected responder trajectory shape of an early decline.4 A further analysis was made to investigate if the lack of a clear responder class in the three-class solution is due to the sample size of n = 45 being too small to support three classes. In this analysis, the n = 45 placebo group subjects were augmented by the medication group subjects but using only the two prerandomization time points from the medication group. Because of randomization, subjects are statistically equivalent before randomization, so this approach is valid. The first, prerandomization piece of the GMM has nine parameters, leaving only 25 parameters to be estimated in the second, postrandomization piece by 4. The log-likelihood value for the model in Figure 7.4a is 1,048,403, replicated across several random starts, with 34 parameters and a BIC value of 2,226. Although the BIC value is slightly worse than for the two-class solution, the classification is better, as shown by the entropy value of 0.85.
167
(b)
week 5
week 6
week 7
week 8
week 5
week 6
week 7
week 8
Time
week 4
week 3
week 2
week 1
lead-in 48 hrs
36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 baseline
HamD
(a)
week 4
week 3
week 2
week 1
lead-in 48 hrs
36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 baseline
HamD
7 General Approaches to Analysis of Course
Time
Figure 7.3 Individual trajectories for placebo subjects classified into (a) the responder class and (b) the non-responder class.
the n = 45 placebo subjects alone. Figure 7.4b shows that a responder class (class 2) is now found, with 21% of the subjects estimated to be in this class. High (class 3) and low (class 1) nonresponder classes are found, with 18% and 60% estimated to be in these classes, respectively. Compared to Figure 7.3, the observed individual trajectories within class are somewhat less heterogeneous (trajectories not shown).5
5. The log-likelihood value for the model in Figure 7.4b is 1,270.030, replicated across several random starts, with 34 parameters and a BIC value of 2,695. The entropy value is 0.62. Because a different sample size is used, these values are not comparable to the earlier ones.
week 5
week 6
week 7
week 8
week 5
week 6
week 7
week 8
week 4
week 3
week 2
lead-in 48 hrs
baseline
HamD
28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
week 1
Causality and Psychopathology
168
week 4
week 3
week 2
week 1
lead-in 48 hrs
baseline
HamD
Time 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
Time
Figure 7.4 (a) Three-class GMM for placebo group. (b) Three-class GMM for placebo group and pre-randomization medication group individuals.
Analysis of the Medication Group Two major types of GMMs were applied to the medication group. The first type analyzes all time points and either makes no distinction among the three drugs (fluoxetine, venlafaxine IR, venlafaxine XR) or allows drug differences for the class-specific random effect means of the second piece of the GMM. It would not make sense to also let class membership vary as a function of drug since class membership is conceptualized as a quality characterizing an individual before entering the trial. Class membership influences prerandomization outcomes, which cannot be influenced by drugs. To investigate class membership, the second type of GMM analyzes the nine postrandomization time points both to focus on the period where the medications have an effect and to let the class membership correspond to only postrandomization variables. Here, not only are differences across the three drugs allowed for the random effect means for each of the classes but the drug type is also allowed to influence class probabilities.
7 General Approaches to Analysis of Course
169
Analysis of All Time Points A two-class GMM analysis of the 49 subjects in the medication group resulted in the model-estimated mean curves shown in Figure 7.5. As expected, one of the classes is a large responder class (class 1, 85%). The other class (class 2, 15%) improves initially but then worsens.6 A three-class GMM analysis of the 49 subjects in the medication group resulted in the model-estimated mean curves shown in Figure 7.6. The three mean curves show the expected responder class (class 3, 68%) and the class (class 2, 15%) found in the two-class solution showing an initial improvement but later worsening. In addition, a nonresponse class (class 1, 17%) emerges, which has no medication effect throughout.7 Allowing for drug differences for the class-specific random effect means of the second piece of the GMM did not give a trustworthy solution in that the best log-likelihood value was not replicated. This may be due to the fact that this model has more parameters than subjects (59 vs. 49). Analysis of Postrandomization Time Points As a first step, two- and three-class analyses of the nine postrandomization time points were performed, not allowing for differences across the three drugs. This gave solutions that were very similar to those of Figures 7.5 and 7.6. The similarity in mean trajectory shape held up also when allowing for class probabilities to vary as a function of drug. Figure 7.7 shows the estimated mean curves for this latter model. The estimated class probabilities for the three drugs show that in the responder class (class 2, 63%) 21% of the subjects are on fluoxetine, 29% are on venlafaxine IR, and 50% are on venlafaxine XR. For the nonresponder class that shows an initial improvement and a later worsening (class 3, 19%), 25% are on fluoxetine, 75% are on venlafaxine IR, and 0% are on venlafaxine XR. For the nonresponder class that shows no improvement at any point (class 1, 19%), 58% are on fluoxetine, 13% are on venlafaxine IR, and 29% are on venlafaxine XR. Judged across all three trajectory classes, this suggests that venlafaxine XR has the better outcome, followed by venlafaxine IR, with fluoxetine last. Note, however, that for these data subjects were not randomized to the different medications; therefore, comparisons among medications are confounded by subject differences.8 6. The log-likelihood value for the model in Figure 7.5 is –1,084.635, replicated across many random starts, with 28 parameters and a BIC value of 2,278. The entropy value is 0.90. 7. The log-likelihood value for the model in Figure 7.6 is –1,077.433, replicated across many random starts, with 34 parameters and a BIC value of 2,287. The BIC value is worse than for the two-class solution. The entropy value is 0.85. 8. The log-likelihood value for the model of Figure 7.7 is –873.831, replicated across many random starts, with 27 parameters and a BIC value of 1,853. The entropy value is 0.79.
Causality and Psychopathology
170 26 24 22 20 18
HamD
16 14 12 10 8
Class 1, 84.7%
6 Class 2, 15.3%
4 2
8 weeks
7 weeks
6 weeks
5 weeks
4 weeks
3 weeks
2 weeks
1 week
lead-in 48 hrs
baseline
0
Time
Figure 7.5 Two-class GMM for medication group.
26 24 22 20 18
HamD
16 14 12 10 8
Class 1, 16.9%
6
Class 2, 14.9%
4
Class 3, 68.2%
2
Time
Figure 7.6 Three-class GMM for medication group.
8 weeks
7 weeks
6 weeks
5 weeks
4 weeks
3 weeks
2 weeks
1 week
lead-in 48 hrs
baseline
0
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
171
Class 1, 18.6% Class 2, 62.9%
week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
Class 3, 18.6%
48 hrs
HamD
7 General Approaches to Analysis of Course
Time
Figure 7.7 Three-class GMM for medication group post randomization.
As a second step, a three-class model was analyzed by a GMM, where not only class membership probability was allowed to vary across the three drugs but also the class-varying random effect means. This analysis showed no significant drug differences in class membership probabilities. As shown in Figure 7.8, the classes are essentially of different nature for the three drugs.9
Analysis of Medication Effects, Taking Placebo Response Into Account The separate analyses of the 45 subjects in the placebo group and the 49 subjects in the medication group provide the basis for the joint analysis of all 94 subjects. Two types of GMMs will be applied. The first is directly in line with the model shown earlier under Growth Mixture Modeling, where medication effects are conceptualized as postrandomization changes in the slope means. The second type uses only the postrandomization time points and class membership is thought of as being influenced by 9. The log-likelihood value for the model of Figure 7.8 is –859.577, replicated in only a few random starts, with 45 parameters and a BIC value of 1, 894. The entropy value is 0.81. It is difficult to choose between the model of Figure 7.7 and the model of Figure 7.8 based on statistical indices. The Figure 7.7 model has the better BIC value, but the improvement in the log-likelihood of the Figure 7.8 model is substantial.
Causality and Psychopathology
172
week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
48 hrs
HamD
(a)
Time
week 5
week 6
week 7
week 8
week 5
week 6
week 7
week 8
week 4
week 3
week 2
week 1
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
48 hrs
HamD
(b)
Time
week 4
week 3
week 2
week 1
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
48 hrs
HamD
(c)
Time
Figure 7.8 Three-class GMM for (a) fluoxetine subjects, (b) venlafaxine IR subjects, and (c) venlafaxine XR subjects.
medication, in line with the Figure 7.7 model. Here, the class probabilities are different for the placebo group and the three medication groups so that medication effect is quantified in terms of differences across groups in class probabilities.
7 General Approaches to Analysis of Course
173
Analysis of All Time Points For the analysis based on the earlier model (see Growth Mixture Modeling), a three-class GMM will be used, given that three classes were found to be interpretable for both the placebo and the medication groups. Figure 7.9 shows the estimated mean curves for the three-class solution for the placebo group, the fluoxetine group, the venlafaxine IR group, and the venlafaxine XR group. It is interesting to note that for the placebo group the Figure 7.9a mean curves are similar in shape to those of Figure 7.4b, although the responder class (class 3) is now estimated to be 34%. Note that for this model the class percentages are specified to be the same in the medication groups as in the placebo group. The estimated mean curves for the three medication groups shown in Figure 7.9b–d are similar in shape to those of the medication group analysis shown in Figure 7.8a–c. These agreements with the separate-group analyses strengthen the plausibility of the modeling. This model allows the assessment of medication effects in the presence of a placebo response. A key parameter is the medication-added mean of the intercept random effect centered at week 8. This is the g01k parameter of equation 5. For a given trajectory class, this indicates how much lower or higher the average score is at week 8 for the medication group in question relative to the placebo group. In this way, the medication effect is specific to classes of individuals who would or would not have responded to placebo. The g01k estimates of the Figure 7.9 model are as follows. The fluoxetine effect for the high nonresponder class 1 at week 8 as estimated by the GMM is significantly positive (higher depression score than for the placebo group), 7.4, indicating a failure of this medication for this class of subjects. In the low nonresponder class 2 the fluoxetine effect is small but positive, though insignificant. In the responder class, the fluoxetine effect is significantly negative (lower depression score than for the placebo group), –6.3. The venlafaxine IR effect is insignificant for all three classes. The venlafaxine XR effect is significantly negative, –11.7, for class 1, which after an initial slight worsening turns into a responder class for venlafaxine XR. For the nonresponder class 2 the venlafaxine XR effect is insignificant, while for the responder class it is significantly negative, –7.8. In line with the medication group analysis shown in Figure 7.7, the joint analysis of placebo and medication subjects indicates that venlafaxine XR has the most desirable outcome relative to the placebo group. None of the drugs is significantly effective for the low nonresponder class 2.10
10. The log-likelihood value for the model shown in Figure 7.9 is –2,142.423, replicated across a few random starts, with 61 parameters and a BIC value of 4,562. The entropy value is 0.76.
week 6
week 7
week 8
week 6
week 7
week 8
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
week 4
week 3
week 2
week 1
lead-in 48 hrs
ven XR, Class 1, 20.5% ven XR, Class 2, 45.9% ven XR, Class 3, 33.6% baseline
HamD week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
lead-in 48 hrs
ven IR, Class 1, 20.5% ven IR, Class 2, 45.9% ven IR, Class 3, 33.6%
Time
week 5
Time
(d)
baseline
HamD
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
week 5
Time
(c)
week 4
week 3
week 2
week 1
baseline
fluax, Class 1, 20.5% fluax, Class 2, 45.9% fluax, Class 3, 33.6% lead-in 48 hrs
HamD
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
week 8
week 7
week 6
week 5
week 4
week 3
week 2
week 1
lead-in 48 hrs
placebo, Class 1, 20.5% placebo, Class 2, 45.9% placebo, Class 3, 33.6% baseline
HamD
(b)
Time
Figure 7.9 Three-class GMM of both groups: (a) Placebo subjects, (b) fluoxetine subjects, (c) venlafaxine IR subjects, and (d) venlafaxine XR subjects.
Causality and Psychopathology
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0
174
(a)
7 General Approaches to Analysis of Course
175
Analysis of Postrandomization Time Points As a final analysis, the placebo and medication groups were analyzed together for the postrandomization time points. Figure 7.10 displays the estimated three-class solution, which again shows a responder class, a nonresponder class which initially improves but then worsens (similar to the placebo response class found in the placebo group), and a high nonresponder class.11 As a first step, it is of interest to compare the joint placebo–medication group analysis of Figure 7.10 to the separate placebo group analysis of Figure 7.4b and the separate medication group analysis of Figure 7.6. Comparing the joint analysis in Figure 7.10 to that of the placebo group analysis of Figure 7.4b indicates the improved outcome when medication group individuals are added to the analysis. In the placebo group analysis of Figure 7.4b 78% are in the two highest, clearly nonresponding trajectory classes, whereas in the joint analysis of Figure 7.10 only 36% are in the highest, clearly nonresponding class. In this sense, medication seems to have a positive effect in reducing depression. Furthermore, in the placebo analysis, 21% are in the placebo-responding class which ultimately worsens, whereas in the joint analysis 21% are in this type of class and 43% are in a clearly responding class. Comparing the joint analysis in Figure 7.10 to that of the medication group analysis of Figure 7.6 indicates the worsened outcome when placebo group individuals are added to the analysis. In the medication group analysis of Figure 7.6 only 17% are in the nonresponding class compared to 36% in the joint analysis of Figure 7.10. Figure 7.6 shows 15% in the initially improving but ultimately worsening class compared to 21% in Figure 7.10. Figure 7.6 shows 68% in the responding class compared to 43% in Figure 7.10. All three of these comparisons indicate that medication has a positive effect in reducing depression. As a second step, it is of interest to study the medication effects for each medication separately. The joint analysis model allows this because the class probabilities differ between the placebo group and each of the three medication groups, as expressed by equation 8. The results are shown in Figure 7.11. For the placebo group, the responder class (class 3) is estimated to be 26%, the initially improving nonresponder class (class 1) to be 22%, and the high nonresponder class (class 2) to be 52%. In comparison, for the fluoxetine group the responder class is estimated to be 48% (better than placebo), the initially improving nonresponder class to be 0% (better than placebo), and the high nonresponder class to be 52% (same as placebo). For the
11. The log-likelihood value for the model shown in Figure 7.10 is –1,744.999, replicated across many random starts, with 29 parameters and a BIC value of 3,621. The entropy value is 0.69.
Causality and Psychopathology 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
week 8
week 7
week 6
week 5
week 4
week 3
week 1
week 2
Class 1, 21.0% Class 2, 35.8% Class 3, 43.1%
48 hrs
HamD
176
Time
Figure 7.10 Three-class GMM analysis of both groups using post-randomization time points. Placebo Group 60
Fluoxetine Group 52
50
50
40 30
60
52
48
40 26
30
22
20
20
10
10 0
0
0 R
IINR
R
HNR
Venlafaxine IR Group 50 45 40 35 30 25 20 15 10 5 0
46
IINR
HNR
Venlafaxine XR Group
47
7
R
IINR
100 90 80 70 60 50 40 30 20 10 0
HNR
90
10 0 R
R = Responder Class IINR = Initially Improving Non-Responder Class HNR = High Non-Responder Class
Figure 7.11 Medication effects in each of 3 trajectory classes.
IINR
HNR
7 General Approaches to Analysis of Course
177
venlafaxine IR group, the responder class is estimated to be 46% (better than placebo), the initially improving nonresponder class t be 47% (worse than placebo), and the high nonresponder class to be 7% (better than placebo). For the venlafaxine XR group, the responder class is estimated to be 90% (better than placebo), the initially improving nonresponder class to be 0% (better than placebo), and the high nonresponder class to be 10% (better than placebo).
Conclusions The growth mixture analysis presented here demonstrates that, unlike conventional repeated measures analysis, it is possible to estimate medication effects in the presence of placebo effects. The analysis is flexible in that the medication effect is allowed to differ across trajectory classes. This approach should therefore have wide applicability in clinical trials. It was shown that medication effects could be expressed as causal effects. The analysis also produces a classification of individuals into trajectory classes. Medication effects were expressed in two alternative ways, as changes in growth slopes and as changes in class probabilities. Related to the latter approach, a possible generalization of the model is to include two latent class variables, one before and one after randomization, and to let the medication influence the postrandomization latent class variable as well as transitions between the two latent class variables. Another generalization is proposed in Muthe´n and Brown (2009) considering four classes of subjects: (1) subjects who would respond to both placebo and medication, (2) subjects who would respond to placebo but not medication, (3) subjects who would respond to medication but not placebo, and (4) subjects who would respond to neither placebo nor medication. Class 3 is of particular interest from a pharmaceutical point of view. Prediction of class membership can be incorporated as part of the model but was not explored here. Such analyses suggest interesting opportunities for designs of trials. If at baseline an individual is predicted to belong to a nonresponder class, a different treatment can be chosen.
References Leuchter, A. F., Cook, I. A., Witte, E. A., Morgan, M., & Abrams, M. (2002). Changes in brain function of depressed subjects during treatment with placebo. American Journal of Psychiatry, 159, 122–129. Muthe´n, B. (2004). Latent variable analysis: Growth mixture modeling and related techniques for longitudinal data. In D. Kaplan (Ed.), Handbook of quantitative methodology for the social sciences (pp. 345–368). Newbury Park, CA: Sage Publications.
178
Causality and Psychopathology
Muthe´n, B., & Asparouhov, T. (2009). Growth mixture modeling: Analysis with nonGaussian random effects. In G. Fitzmaurice, M. Davidian, G. Verbeke, & G. Molenberghs (Eds.), Longitudinal data analysis (pp. 143–165). Boca Raton, FL: Chapman & Hall/CRC Press. Muthe´n, B. & Brown, H. (2009). Estimating drug effects in the presence of placebo response: Causal inference using growth mixture modeling. Statistics in Medicine, 28, 3363–3385. Muthe´n, B., Brown, C. H., Masyn, K., Jo, B., Khoo, S. T., Yang, C. C., et al. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3, 459–475. Muthe´n, B., & Muthe´n, L. (2000). Integrating person-centered and variable-centered analysis: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research, 24, 882–891. Muthe´n, B., & Muthe´n, L. (1998–2008). Mplus user’s guide (5th ed.) Los Angeles: Muthe´n & Muthe´n. Muthe´n, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics, 55, 463–469. Quitkin, F. M., Rabkin, J. G., Ross, D., & Stewart, J. W. (1984). Identification of true drug response to antidepressants. Use of pattern analysis. Archives of General Psychiatry, 41, 782–786.
8 Statistical Methodology for a SMART Design in the Development of Adaptive Treatment Strategies alena i. oetting, janet a. levy, roger d. weiss, and susan a. murphy
Introduction The past two decades have brought new pharmacotherapies as well as behavioral therapies to the field of drug-addiction treatment (Carroll & Onken, 2005; Carroll, 2005; Ling & Smith, 2002; Fiellin, Kleber, Trumble-Hejduk, McLellan, & Kosten, 2004). Despite this progress, the treatment of addiction in clinical practice often remains a matter of trial and error. Some reasons for this difficulty are as follows. First, to date, no one treatment has been found that works well for most patients; that is, patients are heterogeneous in response to any specific treatment. Second, as many authors have pointed out (McLellan, 2002; McLellan, Lewis, O’Brien, & Kleber, 2000), addiction is often a chronic condition, with symptoms waxing and waning over time. Third, relapse is common. Therefore, the clinician is faced with, first, finding a sequence of treatments that works initially to stabilize the patient and, next, deciding which types of treatments will prevent relapse in the longer term. To inform this sequential clinical decision making, adaptive treatment strategies, that is, treatment strategies shaped by individual patient characteristics or patient responses to prior treatments, have been proposed (Greenhouse, Stangl, Kupfer, & Prien, 1991; Murphy, 2003, 2005; Murphy, Lynch, Oslin, McKay, & Tenhave, 2006; Murphy, Oslin, Rush, & Zhu, 2007; Lavori & Dawson, 2000; Lavori, Dawson, & Rush, 2000; Dawson & Lavori, 2003). Here is an example of an adaptive treatment strategy for prescription opioid dependence, modeled with modifications after a trial currently in progress within the Clinical Trials Network of the National Institute on Drug Abuse (Weiss, Sharpe, & Ling, 2010).
179
Causality and Psychopathology
180
Initial Treatment 4 week treatment
Not abstinent
Abstinent
During the initial 4 week treatment
During the initial 4 week treatment
Second Treatment: Step Up:
Second Treatment: Step Down:
12 week treatment
No pharmacotherapy +
Treat untill 16 weeks have elapsed
Treat untill 16 weeks have elapsed
from the beginning of initial treatment
from the beginning of initial treatment
Figure 8.1. An adaptive treatment strategy for prescription opioid dependence.
Example First, provide all patients with a 4-week course of buprenorphine/naloxone (Bup/Nx) plus medical management (MM) plus individual drug counseling (IDC) (Fiellin, Pantalon, Schottenfeld, Gordon, & O’Connor, 1999), culminating in a taper of the Bup/Nx. If at any time during these 4 weeks the patient meets the criterion for nonresponse,1 a second, longer treatment with Bup/Nx (12 weeks) is provided, accompanied by MM and cognitive behavior therapy (CBT). However, if the patient remains abstinent2 from opioid use during those 4 weeks, that is, responds to initial treatment, provide 12 additional weeks of relapse prevention therapy (RPT). A patient whose treatment is consistent with this strategy experiences one of two sequences of two treatments, depicted in Figure 8.1. The two sequences are 1. Four-week Bup/Nx treatment plus MM plus IDC, then if the criterion for nonresponse is met, a subsequent 12-week Bup/Nx treatment plus MM plus CBT. 2. Four-week Bup/Nx treatment plus MM plus IDC, then if abstinence is achieved, a subsequent 12 weeks of RPT.
1. Response to initial treatment is abstinence from opioid use during these first 4 weeks. Nonresponse is defined as any opioid use during these first 4 weeks 2. Abstinence might be operationalized using a criterion based on self-report of opioid use and urine screens.
8 SMART Design in the Development of Adaptive Treatment Strategies
181
This strategy might be intended to maximize the number of days the patient remains abstinent (as confirmed by a combination of urine screens and self-report) over the duration of treatment. Throughout, we use this hypothetical prescription opioid dependence example to make the ideas concrete. In the next section, several research questions useful in guiding the development of an adaptive treatment strategy are discussed. Next, we review the sequential multiple assignment trial (SMART), which is an experimental design developed to answer these questions. We present statistical methodology for analyzing data from a particular SMART design and a comprehensive discussion and evaluation of these statistical considerations in the fourth and fifth sections. In the final section, we present a summary and conclusions and a discussion of suggested areas for future research.
Research Questions to Refine an Adaptive Treatment Strategy Continuing with the prescription opioid dependence example, we might ask if we could begin with a less intensive behavioral therapy (Lavori et al., 2000). For example, standard MM, which is less burdensome than IDC and focuses primarily on medication adherence, might be sufficiently effective for a large majority of patients; that is, we might ask, In the context of the specified options for further treatment, does the addition of IDC to MM result in a better long-term outcome than the use of MM as the sole accompanying behavioral therapy? Alternatively, if we focus on the behavioral therapy accompanying the second longer 12-week treatment, we might ask, Among subjects who did not respond to one of the initial treatments, which accompanying behavioral therapy is better for the secondary treatment: MM+IDC or MM+CBT? On the other hand, instead of focusing on a particular treatment component within strategies, we may be interested in comparing entire adaptive treatment strategies. Consider the strategies in Table 8.1. Suppose we are interested in comparing two of these treatment strategies. If the strategies begin with the same initial treatment, then the comparison reduces to a comparison of the two secondary treatments; in our example, a comparison of strategy C with strategy D is obtained by comparing MM+IDC with MM+CBT among nonresponders to MM alone. We also might compare two strategies with different initial treatments. For example, in some settings, CBT may be the preferred behavioral therapy to use with longer treatments; thus, we might ask, if we are going to provide MM+CBT for nonresponders
182
Causality and Psychopathology
Table 8.1 Potential Strategies to Consider for the Treatment of Prescription Opioid Dependence Initial Treatment
Response to Initial Treatment
Secondary Treatment
Strategy A: Begin with Bup/Nx+MM+IDC; if nonresponse, provide Bup/Nx+MM+CBT; if response, provide RPT 4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment + MM+IDC MM+CBT 4-week Bup/Nx treatment + Abstinent RPT MM+IDC Strategy B: Begin with Bup/Nx+MM+IDC; if nonresponse, provide Bup/Nx+MM+IDC; if response, provide RPT 4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment + MM+IDC MM + IDC 4-week Bup/Nx treatment + Abstinent RPT MM+IDC Strategy C: Begin with Bup/Nx+MM; if nonresponse, provide Bup/Nx+MM+CBT; if response, provide RPT 4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment + MM MM + CBT 4-week Bup/Nx treatment + Abstinent RPT MM Strategy D: Begin with Bup/Nx+MM; if nonresponse, provide Bup/Nx+MM+IDC; if response, provide RPT 4-week Bup/Nx treatment + Not abstinent 12-week Bup/Nx treatment + MM MM + IDC 4-week Bup/Nx treatment + Abstinent RPT MM
to the initial treatment and RPT to responders to the initial treatment, Which is the best initial behavioral treatment: MM+IDC or MM? This is a comparison of strategies A and C. Alternately, we might wish to identify which of the four strategies results in the best long-term outcome (here, the highest number of days abstinent). Note that the behavioral therapies and pharmacotherapies are illustrative and were selected to enhance the concreteness of this example; of course, other selections are possible. These research questions can be classified into one of four general types, as summarized in Table 8.2. The SMART experimental design discussed in the next section is particularly suited to addressing these types of questions.
8 SMART Design in the Development of Adaptive Treatment Strategies
183
A SMART Experimental Design and the Development of Adaptive Treatment Strategies Traditional experimental trials typically evaluate a single treatment with no manipulation or control of preceding or subsequent treatments. In contrast, the SMART design provides data that can be used both to assess the efficacy of each treatment within a sequence and to compare the effectiveness of strategies as a whole. A further rationale for the SMART design can be found in Murphy et al. (2006, 2007). We focus on SMART designs in which there are two initial treatment options, then two treatment options for initial nonresponders (alternately, initial responders) and one treatment option for initial treatment responders (alternately, initial nonresponders). In conversations with researchers across the mental-health field, we have found this design to be of the greatest interest; these designs are similar to those employed by the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) (Rush et al., 2003) and the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) (Stroup et al., 2003); additionally, two SMART trials of this type are currently in the field (D. Oslin, personal communication, 2007; W. Pelham, personal communication, 2006). Data from this experimental design can be used to address questions from each type in Table 8.2. Because SMART specifies sequences of treatments, it allows us to determine the effectiveness of one of the treatment components in the presence of either preceding or subsequent treatments; that is, it addresses questions of both types 1 and 2. Also, the use of randomization supports causal inferences about the relative effectiveness of different treatment strategies, as in questions of types 3 and 4. Returning to the prescription opioid dependence example, a useful SMART design is provided in Figure 8.2. Consider a question of the first type from Table 8.2. An example is, In the context of the specified options for further treatment, does the addition of IDC to MM result in a better longterm outcome than the use of MM as the sole accompanying behavioral therapy? This question is answered by comparing the pooled outcomes of subgroups 1,2,3 with those of subgroups 4,5,6. This is the main effect of the initial behavioral treatment. Note that to estimate the main effect of the initial behavioral treatment, we require outcomes from not only initial nonresponders but also initial responders. Clinically, this makes sense as a particular initial treatment may lead to a good response but this response may not be as durable as other initial treatments. Next, consider a question of the second type, such as, Among those who did not respond to one of the initial treatments, which is the better subsequent behavioral treatment: MM+IDC or MM+CBT? This question is addressed by pooling outcome data from subgroups 1 and 4 and comparing the resulting mean to the
Causality and Psychopathology
184
Table 8.2 Four General Types of Research Questions Question
Type of Analysis Required to Answer Question
Research Question
Two questions that concern components of adaptive treatment strategies 1 Hypothesis test Initial treatment effect: What is the effect of initial treatment on long-term outcome in the context of the specified secondary treatments? In other words, what is the main effect of initial treatment? 2 Hypothesis test Secondary treatment effect: Considering only those who did (or did not) respond to one of the initial treatments, what is the best secondary treatment? In other words, what is the main effect of secondary treatment for responders (or nonresponders)? Two questions that concern whole adaptive treatment strategies 3 Hypothesis test Comparing strategy effects: What is the difference in the long-term outcome between two treatment strategies that begin with a different initial treatment? 4 Estimation Choosing the overall best strategy: Which treatment strategy produces the best long-term outcome?
Initial treatment: Randomization
4 wks Bup/Nx
4 wks Bup/Nx
+MM+CBT
+MM
Not
Abstinent
Not
Abstinent
Abstinent
(R=1)
Abstinent
(R=1)
Second treatment
Second treatment
Second treatment
Second treatment
12 wks Bup/Nx
12 wks Bup/Nx
Relapse
12 wks Bup/Nx
12 wks Bup/Nx
Relapse
+MM+CBT
+MM+CBT
Prevention
+MM+CBT
+MM+CBT
Prevention
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Measure days abstinent over wks 1-16
Sub-Group 1
Sub-Group 2
Sub-Group 3
Sub-Group 4
Sub-Group 5
Sub-Group 6
Figure 8.2 SMART study design to develop adaptive treatment strategies for prescription opioid dependence.
8 SMART Design in the Development of Adaptive Treatment Strategies
185
pooled outcome data of subgroups 2 and 5. This is the main effect of the secondary behavioral treatment among those not abstinent during the initial 4-week treatment. An example of the third type question would be to test whether strategies A and C in Table 8.1 result in different outcomes; to form this test, we use appropriately weighted outcomes from subgroups 1 and 3 to form an average outcome for strategy A and appropriately weighted outcomes from subgroups 4 and 6 to form an average outcome for strategy C (an alternate example would concern strategies B and D; see the next section for formulae). Note that to compare strategies, we require outcomes from both initial responders as well as initial nonresponders (e.g., subgroup 3 in addition to subgroup 1 and subgroup 6 in addition to subgroup 4). The fourth type of question concerns the estimation of the best of the strategies. To choose the best strategy overall, we follow a similar ‘‘weighting’’ process to form the average outcome for each of the four strategies (A, B, C, D) and then designate as the best strategy the one that is associated with the highest average outcome.
Test Statistics and Sample Size Formulae In this section, we provide the test statistics and sample size formulae for the four types of research questions summarized in Table 8.2. We assume that subjects are randomized equally to the two treatment options at each step. We use the following notation: A1 is the indicator for initial treatment, R denotes the response to the initial treatment (response = 1 and nonresponse = 0), A2 is the treatment indicator for secondary treatment, and Y is the outcome. In our prescription opioid dependence example, the values for these variables are as follows: A1 is 1 if the initial treatment uses MM+IDC and 0 otherwise, A2 is 1 if the secondary treatment for nonresponders uses MM+CBT and 0 otherwise, and Y is the number of days the subject remained abstinent over the 16-week study period.
Statistics for Addressing the Different Research Questions The test statistics for questions 1–3 of Table 8.2 are presented in Table 8.3; the method for addressing question 4 is also given in Table 8.3. The test statistics for questions 1 and 2 are the standard test statistics for a two-group comparison with large samples (Hoel, 1984) and are not unique to the SMART design. The estimator of a strategy mean, used for both questions 3 and 4, as well as the test statistic for question 3 are given in Murphy (2005). In large samples, the three test statistics corresponding to questions 1–3 are
Causality and Psychopathology
186
Table 8.3 Test Statistics for Each of the Possible Questions Type of Question
Test Statistic
1a Z ¼
ðY A1 ¼ 1 Y A1 ¼ 0 Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi S2A1 ¼ 1 S2A1 ¼ 0 NA1 ¼ 1 þ NA1 ¼ 0
where NA1=i denotes the number of subjects who received i as the initial treatment 2a Z ¼
ðY R¼0; A2¼1 Y R¼0; A2¼0 Þ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi S2R¼0; A2¼1 NR¼0; A2¼1
þ
S2R¼0; A2¼0 NR¼0; A2¼0
where NR=0, A2=i denotes the number of nonresponders who received i as the secondary treatment 3b
pffiffiffiffi ^ A1¼1; A2¼a2 ^ A1¼0; A2¼b2 Þ N ð
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z ¼ ^ 2A1¼1; A2¼a2 þ ^ 2A1¼0; A2¼b2 where N is the total number of subjects and a2 and b2 are the secondary treatments in the two prespecified strategies being compared
4
^ A1¼1; A2¼1 ; ^ A1¼0; A2¼1 ; ^ A1¼1; A2¼0 ;
^ A1¼0; A2¼0 Choose largest of
a
The subscripts on Y and S2 denote groups of subjects. For example YR¼0;A2¼1 is the average outcome for subjects who do not respond initially (R = 0) and are assigned A2 = 1. S2R¼0;A2¼1 is the sample variance of the outcome for subjects who do not respond initially (R = 0) and are assigned A2 = 1. Similarly, the subscript on N denotes the group of subjects. b ^ is an estimator of the mean outcome and^ 2 is the associated variance estimator for a
particular strategy. Here, the subscript denotes the strategy. The formulae for
^ and ^ 2 are in Table 8.4.
normally distributed (with mean zero under the null hypothesis of no effect). In Tables 8.3, 8.4, and 8.5, specific values of Ai are denoted by ai and bi, where i indicates the initial treatment (i = 1) or secondary treatment (i = 2); these specific values are either 1 or 0.
Sample Size Calculations In the following, all sample size formulae assume a two-tailed z-test. Let be the desired size of the hypothesis test, let 1 – be the power of the test, and let z=2 be the standard normal (1 – /2) percentile. Approximate normality of the test statistic is assumed throughout.
8 SMART Design in the Development of Adaptive Treatment Strategies
187
Table 8.4 Estimators for Strategy Means and for Variance of Estimator of Strategy Means Strategy Sequence (a1, a2)
N*Estimator for Variance of Estimator of Strategy Mean:
Estimator for Strategy Mean: N X
^ A1¼a1; A2¼a2 ¼
Wi ða1 ; a2 ÞYi
i¼1 N X
^ 2A1¼a1; A2¼a2 ¼
Wi ða1 ; a2 Þi
i¼1
(1, 1)
Wi ð1; 1Þ ¼
N 1X Wi ða1 ; a2 Þ2 N i¼1
ðYi ^ A1¼a1; A2¼a2 Þ2
A1i A2i ð1 Ri Þ þ Ri :5 :5
(1, 0)
Wi ð1; 0Þ ¼
A1i ð1 A2i Þ þ Ri ð1 Ri Þ :5 :5
(0, 1)
Wi ð0; 1Þ ¼
ð1 A1i Þ A2i ð1 Ri Þ þ Ri :5 :5
(0, 0)
Wi ð0; 0Þ ¼
ð1 A1i Þ ð1 A2i Þ ð1 Ri Þ þ Ri :5 :5
Data for subject i are of the form (A1i, Ri, A2i, Yi), where A1i, Ri, A2i, and Yi are defined as in the section Test Statistics and Sample Size Formulae and N is the total sample size.
In order to calculate the sample size, one must also input the desired detectable standardized effect size. We denote the standardized effect size by and use the definition found in Cohen (1988). The standardized effect sizes for the various research questions we are considering are summarized in Table 8.5. The sample size formulae for questions 1 and 2 are standard formulae (Jennison & Turnbull, 2000) and assume an equal number in each of the two groups being compared. Given desired levels of size, power, and standardized effect size, the total sample size required for question 1 is N1 ¼ 2 2 ðz=2 þ z Þ2 ð1= Þ2 The sample size formula for question 2 requires the user to postulate the initial response rate, which is used to provide the number of subjects who will be randomized to secondary treatments. The sample size formula uses the working assumption that the initial response rates are equal; that is, subjects respond to initial treatment at the same rate regardless of the particular initial treatment, p = Pr[R = 1|A1 = 1] = Pr[R = 1|A1 = 0]. This working assumption is used only to size the SMART and is not used to analyze the
Causality and Psychopathology
188
Table 8.5 Standardized Effect Sizes for Addressing the Four Questions in Table 8.2 Research Question
Formula for Standardized Effect Size
1
¼
2
¼
E½Y j A1 ¼ 1 E½Y j A1 ¼ 0 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½Y j A1 ¼ 1 þ Var½Y j A1 ¼ 0 2
E½Y j R ¼ 0; A2 ¼ 1 E½Y j R ¼ 0; A2 ¼ 0 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½Y j R ¼ 0; A2 ¼ 0 þ Var½Y j R ¼ 0; A2 ¼ 0 2
3
¼
E½Y j A1 ¼ 1; A2 ¼ a2 E½Y j A1 ¼ 0; A2 ¼ b2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½Y j A1 ¼ 1; A2 ¼ a2 þ Var½Y j A1 ¼ 0; A2 ¼ b2 2
where a2 and b2 are the secondary treatment assignments of A2 4
¼
E½Y j A1 ¼ a1 ; A2 ¼ a2 E½Y j A1 ¼ b1 ; A2 ¼ b2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Var½Y j A1 ¼ a1 ; A2 ¼ a2 þ Var½Y j A1 ¼ b1 ; A2 ¼ b2 2
where (a1, a2) = strategy with the highest mean outcome, (b1, b2) = strategy with the next highest mean outcome; ai and bi indicate specific values of Ai, i = 1,2
data from it, as can be seen from Table 8.3. The formula for the total required sample size for question 2 is N2 ¼ 2 2 ðz=2 þ z Þ2 ð1= Þ2 =ð1 pÞ When calculating the sample sizes to test question 3, two different sample size formulae can be used: one that inputs the postulated initial response rate and one that does not. The formula that uses a guess of the initial response rate makes two working assumptions. First, the response rates are equal for both initial treatments (denoted by p), and second, the variability of the outcome Y around the strategy mean (A1 = 1, A2 = a2), among either initial responders or nonresponders, is less than the variance of the strategy mean and similarly for strategy (A1 = 0, A2 = b2). This formula is N3a ¼ 2 ðz=2 þ z Þ2 ð2 ð2 ð1 p Þ þ 1 pÞÞ ð1= Þ2 The second formula does not require either of these two working assumptions; it specifies the sample size required if the response rates are both 0, a ‘‘worst-case scenario.’’ This conservative sample size formula for addressing question 3 is N3b ¼ 2 ðz=2 þ z Þ2 4 ð1= Þ2
8 SMART Design in the Development of Adaptive Treatment Strategies
189
We will compare the performance of these two sample size formulae for addressing question 3 in the next section. See the Appendix for a derivation of these formulae. The method for finding the sample size for question 4 relies on an algorithm rather than a formula; we will refer to the resulting sample size as N4. Since question 4 is not a hypothesis test, instead of specifying power to detect a difference in two means, the sample size is based on the desired probability to detect the strategy that results in the highest mean outcome. The standardized effect size in this case involves the difference between the two highest strategy means. This algorithm makes the working assumption that
2 = Var[Y|A1 = a1, A2 = a2] is the same for all strategies. The algorithm uses an idea similar to the one used to derive the sample size formula for question 3 that is invariant to the response rate. Given a desired level of probability for selecting the correct treatment strategy with the highest mean and a desired treatment strategy effect, the algorithm for question 4 finds the sample sizes that correspond to the range of response probabilities and then chooses the largest sample size. Since it is based on a worst-case scenario, this algorithm will result in a conservative sample size formula. See the Appendix for a derivation of this algorithm. The online sample size calculator for question 4 can be found at http://methodologymedia. psu.edu/smart/samplesize. Example sample sizes are given in Table 8.6. Note that as the response rate decreases, the required sample sizes for question 3 (e.g., comparing two strategies that have different initial treatments) increases. To see why this must be the case, consider two extreme cases, the first in which the response rate is 90% for both initial treatments and the second in which the nonresponse rate is 90%. In the former case, if n subjects are assigned to treatment 1 initially and 90% respond (i.e., 10% do not respond), then the resulting sample size for strategy (1, 1) is 0.9 * n + ½ * 0.1 * n = 0.95 * n. The ½ occurs due to the second randomization of nonresponders between the two secondary treatments. On the other hand, if only 10% respond (i.e., 90% do not respond), then the resulting sample size for strategy (1, 1) is 0.1 * n + ½ * 0.9 * n = 0.55 * n, which is less than 0.95 * n. Thus, the lower the expected response rate, the larger the initial sample size required for a given power to differentiate between two strategies. This result occurs because the number of treatment options (two options) for nonresponders is greater than the number of treatment options for responders (only one). Consider the prescription opioid dependence example. Suppose we are particularly interested in investigating whether MM+CBT or MM+IDC is best for subjects who do not respond to their initial treatment. This is a question of type 2. Thus, in order to ascertain the sample size for the SMART design in Figure 8.2, we use formula N2. Suppose we decide to
Causality and Psychopathology
190
Table 8.6 Example Sample Sizes: All Entries Are for Total Sample Size Desired Size(1)
Desired Power(2) 1–
Standardized Effect Size
Initial Response Rate(3) p
Research Question 1
2
3 (varies by p)
3 (invariant to p)
4
= 0.10 = 0.20 = 0.20 p = 0.5 p = 0.1
620 620
1,240 689
930 1,178
1240 1,240
358 358
p = 0.5 p = 0.1
99 99
198 110
149 188
198 198
59 59
p = 0.5 p = 0.1
864 864
1,728 960
1,297 1,642
1,729 1,729
608 608
p = 0.5 p = 0.1
138 138
277 154
207 263
277 277
97 97
p = 0.5 p = 0.1
784 784
1,568 871
1,176 1,490
1,568 1,568
358 358
p = 0.5 p = 0.1
125 125
251 139
188 238
251 251
59 59
p = 0.5 p = 0.1
1,056 1,056
2,112 1,174
1,584 2,007
2,112 2,112
608 608
p = 0.5 p = 0.1
169 169
338 188
254 321
338 338
97 97
= 0.50
= 0.10 = 0.20
= 0.50
= 0.05 = 0.20 = 0.20
= 0.50
= 0.10 = 0.20
= 0.50
a
All entries assume that each statistical test is two-tailed; the sample size for question 4 does not vary by since this is not a hypothesis test. b In question 4, we choose the sample size so that the probability that the treatment strategy with the highest mean has the highest estimated mean is 1–. c The sample size formulae assume that the response rates to initial treatments are equal: p = Pr[R=1|A1=1] = Pr[R=1|A1=0].
8 SMART Design in the Development of Adaptive Treatment Strategies
191
size the trial to detect a standardized effect size of 0.2 between the two secondary treatments with the power and size of the (two-tailed) test at 0.80 and 0.05, respectively. After surveying the literature and discussing the issue with colleagues, suppose we decide that the response rate for the two initial treatments will be approximately 0.10 (p = 0.10). The number of subjects required for this trial is then N2 ¼ 2 2 ðz=2 þ z Þ2 ð1= Þ2 =ð1 pÞ ¼ 4 ðz0:05=2 þ z0:2 Þ2 ð1=0:2Þ2 =0:9 ¼ 871. Furthermore, as secondary objectives, suppose we are interested in comparing strategy A:—Begin with MM+IDC; if nonresponse, provide MM+CBT; if response, provide RPT—with D—Begin with MM; if nonresponse, provide MM+IDC; if response, provide RPT— (corresponding to a specific example of question 3) and in choosing the best strategy overall (question 4). Using the same input values for the parameters and looking at Table 8.6, we see that the sample size required for question 3 is about twice as much as that required for question 2. Thus, unless we are willing and able to double our sample size, we realize that a comparison of strategies A and D will have low power. However, the sample size for question 4 is only 358 (using desired probability of 0.80), so we will be able to answer the secondary objective of choosing the best strategy with 80% probability. Suppose that we conduct the trial with 871 subjects. The hypothetical data 3 set and SAS code for calculating the following values can be found at http:// www.stat.lsa.umich.edu/~samurphy/papers/APPAPaper/. For question 2, the value of the z-statistic is ðY R¼0; A2¼1 Y R¼0; A2¼0 Þ ð5:8619 4:3135Þ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 2:1296; 109:3975 98:5540 S2R¼0; A2¼1 S2 391 þ 396 þ R¼0; A2¼0 NR¼0; A2¼1
NR¼0; A2¼0
which has a two-sided p value of 0.0332. Using the formulae in Table 8.4, we get the following estimates for the strategy means: ½ ^ ð1;1Þ ;
^ ð1;0Þ ;
^ ð0;1Þ ;
^ ð0;0Þ ¼ ½7:1246 4:9994
6:3285 5:6364:
3. We generated this hypothetical data so that the true underlying effect size for question 2 is 0.2, the true effect size for question 3 is 0.2, and the strategy with the highest mean in truth is (1, 1), with an effect size of 0.1. Furthermore, the true response rates for the initial treatments are 0.05 for A1 = 0 and 0.15 for A1 = 1. When we considered 1,000 similar data sets, we found that the analysis for question 2 led to significant results 78% of the time and the analysis for question 3 led to significant results 54% of the time. The latter result and the fact that we did not detect an effect for question 3 in the analysis is unsurprising, considering that we have half the sample size required to detect an effect size of 0.2. Furthermore, across the 1,000 similar simulated data sets the best strategy (1, 1) was detected 86% of the time.
Causality and Psychopathology
192
The corresponding estimates for the variances of the estimates of the strategy means are ½ 2ð1;1Þ ;
2ð1;0Þ ;
2ð0;1Þ ;
2ð0;0Þ ¼ ½396:4555
352:8471
456:5727
441:0138:
Using these estimates, we calculate the value of the corresponding z-statistic for question 3: pffiffiffiffi pffiffiffiffiffiffiffiffi ^ A1¼1; A2¼1
^ A1¼0; A2¼0 Þ Nð
871ð7:1246 5:6364Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 1:5178; 2 2 396:4555 þ 441:0138 A1¼1; A2¼1 þ A1¼0; A2¼0 which has a two-sided p value of 0.1291, which leads us not to reject the null hypothesis that the two strategies are equal. For question 4, we choose (1, 1) as the best strategy, which corresponds to the strategy: 1. First, supplement the initial 4-week Bup/Nx treatment with MM+IDC. 2. For those who respond, provide RPT. For those who do not respond, continue the Bup/Nx treatment for 12 weeks but switch the accompanying behavioral treatment to MM+CBT.
Evaluation of Sample Size Formulae Via Simulation In this section, the sample size formulae presented in Sample Size Calculations are evaluated. We examine the robustness of the newly developed methods for calculating sample sizes for questions 3 and 4. In addition, a second assessment investigates the power for question 4 to detect the best strategy when the study is sized for one of the other research questions. The second assessment is provided because, due to the emphasis on strategies in SMART designs, question 4 is always likely to be of interest.
Simulation Designs The sample sizes used for the simulations were chosen to give a power level of 0.90 and a Type I error of 0.05 when one of questions 1–3 is used to size the trial and a 0.90 probability of choosing the best strategy for question 4 when it is used to size the trial; these sample sizes are shown in Table 8.6. For questions 1–3, power is estimated by the proportion of times out of 1,000 simulations that the null hypothesis is correctly rejected; for question 4, the probability of choosing the best strategy is estimated by the proportion of times out of 1,000 simulations that the correct strategy with the highest
8 SMART Design in the Development of Adaptive Treatment Strategies
193
mean is chosen. We sized the studies to detect a prespecified standardized effect size of 0.2 or 0.5. We follow Cohen (1988) in labeling 0.2 as a ‘‘small’’ effect size and 0.5 as a ‘‘medium’’ effect size. The simulated data reflect the types of scenarios found in substance-abuse clinical trials (Gandhi et al., 2003; Fiellin et al., 2006; Ling et al., 2005). For example, the simulated data exhibit initial response rates (i.e., the proportion of simulated subjects with R = 1) of 0.5 and 0.1, and the mean outcome for the responders is higher than for nonresponders. For question 3 we need to specify the strategies of interest, and for the purposes of these simulations we will compare strategies (A1 = 1, A2 = 1) and (A1 = 0, A2 = 0); these are strategies A and D, respectively, from Table 8.1. For the simulations to evaluate the robustness of the sample size calculation for question 4, we choose strategy A to always have the highest mean outcome and generate the data according to two different ‘‘patterns’’: (1) the strategy means are all different and (2) the mean outcomes of the other three strategies besides strategy A are all equal. In the second pattern, it is more difficult to detect the ‘‘best’’ strategy because the highest mean must be distinguished from all the rest, which are all the ‘‘next highest,’’ instead of just one next highest mean. In order to test the robustness of the sample size formulae, we calculate a sample size given by the relevant formula in Sample Size Calculations and then simulate data sets of this sample size. However, the simulated data will not satisfy the working assumptions in one of the following ways:
•
the intermediate response rates to initial treatments are unequal, that is, Pr[R = 1|A1 = 1] 6¼ Pr[R = 1|A1 = 0]
• •
the variances relevant to the question are unequal (for question 4 only) the distribution of the final outcome, Y, is right-skewed (thus, for a given sample size, the test statistic is more likely to have a nonnormal distribution).
We also assess the power of question 4 when it is not used in sizing the trial. For each of the types of research questions in Table 8.2, we generate a data set that follows the working assumptions for the sample size formula for that question (e.g., use N2 to size the study to test the effect of the second treatment on the mean outcome) and then perform question 4 on the data and estimate the probability of choosing the correct strategy with the highest mean outcome. The descriptions of the simulation designs for each of questions 1–4 as well as the parameters for all of the different generative models can be found at http://www.stat.lsa.umich.edu/~samurphy/papers/APPAPaper/.
Causality and Psychopathology
194
Robustness of the New Sample Size Formulae As previously mentioned, since the sample size formulae for questions 1 and 2 are standard, we will focus on evaluating the newly developed sample size formulae for questions 3 and 4. Table 8.7a and b provides the results of the simulations designed to evaluate the sample size formulae for questions 3 and 4, respectively. Considering Table 8.7a, we see that the question 3 sample size formula N3a performed extremely well when the expected standardized effect size was 0.20. Resulting power levels were uniformly near 0.90 regardless of either the true initial response rates or any of the three violations of the working assumptions. Power levels were less robust when the sample sizes were smaller (i.e., for the 0.50 effect size). For example, when the initial response rates are not equal, the resulting power is lower than 0.90 in the rows using an assumed response rate of 0.5. The more conservative sample size formula, N3b, performed well in all scenarios, regardless of response rate or the presence of any of the three violations to underlying assumptions. As the response rate approaches 0, the sample sizes are less conservative but the results for power remain within a 95% confidence interval of 0.90. In Table 8.7b, the conservatism of the sample size calculation N4 (associated with question 4) is apparent. We can see that N4 is less conservative for the more difficult scenario where the strategy means besides the highest are all equal, but the probability of correctly identifying the strategy with the highest mean outcome is still about 0.90. Table 8.7a Investigation of Sample Size Assumption Violations for Question 3, Comparing Strategies A and D Simulation Parameters
Simulation Results (Power)
Effect Size
Initial Response Rate (Default)
Sample Size Formula
Total Sample Size
Default Working Assumptions Are Correct
Unequal Initial Response Rates
Non-Normal Outcome Y
0.2 0.2 0.5 0.5 0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1 0.5 0.1 0.5 0.1
N3a N3a N3a N3a N3b N3b N3b N3b
1,584 2,007 254 321 2,112 2,112 338 338
0.893 0.882 0.896 0.926a 0.950a 0.903 0.973a 0.937a
0.902 0.910 0.864a 0.886 0.958a 0.934a 0.938a 0.890
0.882 0.877a 0.851a 0.898 0.974a 0.898 0.916 0.922a
The power to reject the null hypothesis for question 3 is shown when sample size is calculated to reject the null hypothesis for question 3 with power of 0.90 and type I error of 0.05 (two-tailed). a The 95% confidence interval for this proportion does not contain 0.90.
8 SMART Design in the Development of Adaptive Treatment Strategies
195
Table 8.7b Investigation of Sample Size Violations for Question 4: Probabilitya to Detect the Correct ‘‘Best’’ Strategy When the Sample Size Is Calculated to Detect the Correct Maximum Strategy Mean 90% of the Time Simulation Parameters Effect Size
Initial Response Rate (Default)
Pattern
0.2 0.2 0.5 0.5 0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1 0.5 0.1 0.5 0.1
1 1 1 1 2 2 2 2
Simulation Results (Probability) b
Sample Sizec
Default Working Assumptions Are Correct
Unequal Initial Response Rates
Unequal Variance
Non-Normal Outcome Y
608 608 97 97 608 608 97 97
0.966d 0.962d 0.980d 0.960d 0.964d 0.905 0.922d 0.893
0.984d 0.969d 0.985d 0.919d 0.953d 0.929d 0.974d 0.917
0.965d 0.964d 0.966d 0.976d 0.952d 0.922d 0.976d 0.927d
0.972d 0.962d 0.956d 0.947d 0.944d 0.923d 0.948d 0.885
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was selected as the maximum. b 1 refers to the pattern of strategy means such that all are different but that the mean for (A1 = 1, A2 = 1), that is, strategy A, is always the highest. 2 refers to the pattern of strategy means such that the mean for strategy A is higher than the other three and the other three are all equal. c Calculated to detect the correct maximum strategy mean 90% of the time when the sample size assumptions hold. d The 95% confidence interval for this proportion does not contain 0.90.
Overall, under different violations of the working assumptions, the sample size formulae for questions 3 and 4 still performed well in terms of power. As discussed, we also assess the power for question 4 when the trial was sized for a different research question. For each of the types of research questions in Table 8.2, we generate a data set that follows the working assumptions for the sample size formula for that question, then evaluate the power of question 4 to detect the optimal strategy. From Table 8.8a–c, we see that in almost all cases, regardless of the starting assumptions used to size the various research questions, we achieve a 0.9 probability or higher of correctly detecting the strategy with the highest mean outcome. The probability falls below 0.9 when the standardized effect size for question 4 falls below 0.1. These results are not surprising as from Table 8.6 we see that question 4 requires much smaller sample sizes than all the other research questions. Note that question 4 is more closely linked to question 3 than to question 1 or 2. Question 3 is potentially a subset of question 4; this relationship occurs when one of the strategies considered in question 3 is the strategy with the highest mean outcome. The probability of detecting the correct
Causality and Psychopathology
196
Table 8.8a The Probabilitya of Choosing the Correct Strategy for Question 4 When Sample Size Is Calculated to Reject the Null Hypothesis for Question 1 (for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05) Simulation Parameters
Simulation Results
Effect Size for Question 1
Initial Response Rate
Sample Size
Question 1 (Power)
Question 4 (Probabilitya)
Effect Size for Question 4
0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1
1,056 1,056 169 169
0.880 0.904 0.934 0.920
1.000 1.000 0.987 0.998
0.325 0.425 0.350 0.630
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was selected as the maximum.
Table 8.8b The Probabilitya of Choosing the Correct Strategy for Question 4 When Sample Size Is Calculated to Reject the Null Hypothesis for Question 2 (for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05) Simulation Parameters
Simulation Results
Effect Size for Question 2
Initial Response Rate
Sample Size
Question 2 (Power)
Question 4 (Probabilitya)
Effect Size for Question 4
0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1
2,112 1,174 338 188
0.906 0.895 0.895 0.901
0.999 0.716 0.997 0.978
0.133 0.054 0.372 0.420
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was selected as the maximum.
strategy mean as the maximum when sizing for question 3 is generally very good, as can be seen from Table 8.8c. This is due to the fact that the sample sizes required to test the differences between two strategy means (each beginning with a different initial treatment) are much larger than those needed to detect the maximum of four strategy means with a specified degree of confidence. For a z-test of the difference between two strategy means with a two-tailed Type I error rate of 0.05, power of 0.90, and standardized effect size of 0.20, the sample size requirements range 1,584–2,112. The sample size required for a 0.90 probability of selecting the correct strategy mean as a maximum when the standardized effect size between it and the next highest strategy mean is 0.2 is 608. It is therefore not surprising that the selection rates for the correct strategy mean are generally high when
8 SMART Design in the Development of Adaptive Treatment Strategies
197
Table 8.8c The Probabilitya of Choosing the Correct Strategy for Question 4 When Sample Size Is Calculated to Reject the Null Hypothesis for Question 3 (for a Two-Tailed Test With Power of 0.90 and Type I Error of 0.05) Simulation Parameters
Simulation Results
Effect Size for Question 3
Initial Response Rate
Sample Size Formula
Sample Size
Question 3 (Power)
Question 4 (Probabilitya)
Effect Size for Question 4
0.2 0.2 0.5 0.5 0.2 0.2 0.5 0.5
0.5 0.1 0.5 0.1 0.5 0.1 0.5 0.1
N3a N3a N3a N3a N3b N3b N3b N3b
1,584 2,007 254 321 2,112 2,112 338 338
0.893 0.882 0.896 0.926 0.950 0.903 0.973 0.937
0.939 0.614 0.976 0.978 0.953 0.613 0.989 0.985
0.10 0.02 0.25 0.32 0.10 0.02 0.25 0.32
a
Probability calculated as the percentage of 1,000 simulations on which correct strategy mean was selected as the maximum.
powered to detect differences between strategy means each beginning with a different initial treatment.
Summary Overall, the sample size formulae perform well even when the working assumptions are violated. Additionally, the performance of question 4 is consistently good when sizing for all other research questions; this is most likely due to question 4 requiring smaller sample sizes than the other research questions to achieve good results. When planning a SMART similar to the one considered here, if one is primarily concerned with testing differences between prespecified strategy means, we would recommend using the less conservative formula N3a if one has confidence in knowledge of the initial response rates. We recommend this in light of the considerable cost savings that can be accrued by using this approach, in comparison to the more conservative formula N3b. We comment further on this topic in the Discussion.
Discussion In this chapter, we demonstrated how a SMART can be used to answer research questions about both individual components of an adaptive
198
Causality and Psychopathology
treatment strategy and the treatment strategies as a whole. We presented statistical methodology to guide the design and analysis of a SMART. Two new methods for calculating the sample sizes for a SMART were presented. The first is for sizing a study when one is interested in testing the difference in two strategies that have different initial treatments; this formula incorporates knowledge about initial response rates. The second new sample size calculation is for sizing a study that has as its goal choosing the strategy that has the highest final outcome. We evaluated both of these methods and found that they performed well in simulations that covered a wide range of plausible scenarios. Several comments are in order regarding the violations of assumptions surrounding the values of the initial response rates when investigating sample size formula N3a for question 3. First, we examined violations of the assumption of the homogeneity of response rates across initial treatments such that they differed by 10% (initial response rates differing by more than 10% in addictions clinical trials are rare) and found that the sample size formula performed well. Future research is needed to examine the question regarding the extent to which initial response rates can be misspecified when utilizing this modified sample size formula. Clearly, for gross misspecifications, the trialist is probably better off with the more conservative sample size formula. However, the operationalization of ‘‘gross misspecification’’ needs further research. In the addictions and in many other areas of mental health, both clinical practice as well as trials are plagued with subject nonadherence to treatment. In these cases sophisticated causal inferential methods are often utilized when trials are ‘‘broken’’ in this manner. An alternative to the post hoc, statistical approach to dealing with nonadherence is to consider a proactive experimental design such as SMART. The SMART design provides the means for considering nonadherence as one dimension of nonresponse to treatment. That is, nonadherence is an indication that the treatment must be altered in some way (e.g., by adding a component that is designed to improve motivation to adhere, by switching the treatment). In particular, one might be interested in varying secondary treatments based on both adherence measures and measures of continued drug use. In this chapter we focused on the simple design in which there are two options for nonresponders and one option for responders. Clearly, these results hold for the mirror design (one option for nonresponders and two options for responders). An important step would be to generalize these results to other designs, such as designs in which there are equal numbers of options for responders and nonresponders or designs in which there are three randomizations. In substance abuse, the final outcome variable is often binary; sample size formulae are needed for this setting as well. Alternately,
8 SMART Design in the Development of Adaptive Treatment Strategies
199
the outcome may be time-varying, such as time-varying symptom levels; again, it is important to generalize the results to this setting.
Appendix Sample Size Formulae for Question 3 Here, we present the derivation of the sample size formulae N3a and N3b for question 3 using results from Murphy (2005). Suppose we have data from a SMART design modeled after the one presented in Figure 8.2; that is, there are two options for the initial treatment, followed by two treatment options for nonresponders and one treatment option for responders. We use the same notation and assumptions listed in Test Statistics and Sample Size Formulae. Suppose that we are interested in comparing two strategies that have different initial treatments, strategies (a1, a2) and (b1, b2). Without loss of generality, let a1 = 1 and b1 = 0. To derive the formulae N3a and N3b, we will make the following working ^ ða1; a2Þ is approxiassumption: The sample sizes will be large enough so that
mately normally distributed. We use three additional assumptions for formula N3a. The first is that the response rates for the initial treatments are equal and the second two assumptions are indicated by * and **. The marginal variances relevant to the research question are 20 = Var[Y|A1 = a1, A2 = a2] and 21 = Var[Y|A1 = b1, A2 = b2]. Denote the mean outcome for strategy (A1, A2) by ðA1;A2Þ. The null hypothesis we are interested in testing is H0 : ð1;a2Þ ð1;b2Þ ¼ 0 and the alternative of interest is H1 : ð1;a2Þ ð1;b2Þ ¼ qffiffiffiffiffiffiffiffiffi ffi
2 þ 2 where ¼ 1 2 0 . (Note that is the standardized effect size.) As presented in Statistics for Addressing the Different Research Questions, the test statistic for this hypothesis is
pffiffiffiffi N ^ ð1; a2Þ ^ ð0; b2Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z ¼ ^ 2ð1; a2Þ þ ^ 2ð0; b2Þ where ^ ða1;a2Þ and ^ 2ða1;a2Þ are as defined in Table 8.5; in large samples, this test statistic has a standard normal distribution under the null hypothesis
Causality and Psychopathology
200
(Murphy, Van Der Laan, Robins, & Conduct Problems Prevention Group, 2001). Recall that N is the total sample size for the trial. To find the required sample size N for a two-sided test with power 1– and size , we solve Pr½Z < z=2 or Z > z=2 j ð1; a2Þ ð0; b2Þ ¼ ¼ 1 for N where z=2 is the standard normal (1–z=2 ) percentile. Thus, we have Pr½Z < z=2 j ð1;a2Þ ð0;b2Þ ¼ þ Pr½Z > z=2 j ð1;a2Þ ð0;b2Þ ¼ ¼ 1 Without loss of generality, assume that > 0 so that Pr½Z < z=2 j ð1; a2Þ ð0; b2Þ ¼ ¼ 0 and Pr½Z > z=2 j ð1; a2Þ ð0; b2Þ ¼ ¼ 1 pffiffiffiffi ^ ða1;a2Þ . Note that Define 2ða1;a2Þ ¼ Var½ N
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^ 2ð1; a2Þ þ ^ 2ð0; b2Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ð1; a2Þ þ 2ð0; b2Þ ^ ð0; b2Þ ¼ ^ ð1; a2Þ
is close to 1 in large samples (Murphy, 2005). Now, E½
ð1; a2Þ ð0; b2Þ , so we have 2 3
pffiffiffiffi pffiffiffiffi ^ ^ N
ð1; a2Þ ð0; b2Þ 6 7 N 7¼1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pr6 > z=2 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
4 5 2ð1; a2Þ þ 2ð0; b2Þ 2 þ 2 ð1; a2Þ
ð0; b2Þ
Note the distribution of
pffiffiffiffi ^ ð0; b2Þ N ^ ð1; a2Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ð1; a2Þ þ 2ð0; b2Þ follows a standard normal distribution in large samples (Murphy et al., 2001). Thus, we have pffiffiffiffi N z z=2 þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ 2ð1; a2Þ þ 2ð0; b2Þ
8 SMART Design in the Development of Adaptive Treatment Strategies
201
Now, using equation 10 in Murphy (2005) for k = 2 steps1 (initial and secondary) of treatment, " # ðY ða1; a2Þ Þ2 2 ða1; a2Þ ¼ Ea1;a2 Prða1 Þ Prða2 j R; a1 Þ "
# ðY ða1; a2Þ Þ2 ¼ Ea1;a2 R ¼ 1 Pra1 ½R ¼ 1 Prða1 Þ Prða2 j 1; a1 Þ " # ðY ða1; a2Þ Þ2 þ Ea1;a2 R ¼ 0 Pra1 ½R ¼ 0 Prða1 Þ Prða2 j 0; a1 Þ for all values of a1, a2; the subscripts on E and Pr (namely, Ea1,a2 and Pra1) indicate expectations and probabilities calculated as if all subjects were assigned a1 as the initial treatment and then, if nonresponse, assigned treatment a2. If we are willing to make the assumption (*) that Ea1;a2 ½ðY ða1; a2Þ Þ2 jR Ea1;a2 ½ðY ða1; a2Þ Þ2 for both R = 1 and R = 0 (i.e., the variability of the outcome around the strategy mean among either responders or nonresponders is no more than the variance of the strategy mean), then 2ða1; a2Þ Ea1;a2 ½ðY ða1; a2Þ Þ2 þ Ea1;a2 ½ðY ða1; a2Þ Þ2
Pra1 ½R ¼ 1 Prða1 Þ Prða2 j 1; a1 Þ Pra1 ½R ¼ 0 : Prða1 Þ Prða2 j 0; a1 Þ
Thus, we have 2ða1; a2Þ 2ða1; a2Þ
Pra1 ½R ¼ 1 Pra1 ½R ¼ 0 þ Prða1 Þ Prða2 j 1; a1 Þ Prða1Þ Prða2 j 0; a1 Þ
ð2Þ
where 2ða1; a2Þ is the marginal variance of the strategy in question. Since (**) nonresponding subjects (R = 0) are randomized equally to the two initial treatment options and since there is one treatment option for responders (R = 1), for a common initial response rate p = Pr[R = 1|A1 = 1] = Pr[R = 1|A1 = 0], 2ða1; a2Þ 2ða1; a2Þ 2 ð2 ð1 pÞ þ 1 pÞÞ
202
Causality and Psychopathology
Rearranging equation 1 gives us 0qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 12 2ð1; a2Þ þ 2ð0; b2Þ ðz þ z=2 ÞA N @ 0 12 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B ð 21 þ 20 Þð2 ð2 ð1 pÞ þ pÞÞ C rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B ðz þ z=2 ÞC @ A
2 þ 20 1 2 Simplifying, we have the formula N3a ¼ 2 ðz=2 þ z Þ2 ð2 ð2 ð1 p Þ þ pÞÞ ð1= Þ2 which is the sample size formula given in Sample Size Calculations that depends on the response rate p. Going through the arguments once again, we see that we do not need either of the two working assumptions (*) or (**) to obtain the conservative sample size formula, N3b: 2 4 ð1= Þ2 ðz þ z=2 Þ2 ¼ N3b
Sample Size Calculation for Question 4 We now present the algorithm for calculating the sample size for question 4. As in the previous section, suppose we have data from a SMART design modeled after the one presented in Figure 8.2; we use the same notation and assumptions listed in Test Statistics and Sample Size Formulae. Suppose that we are interested in identifying the strategy that has the highest mean outcome. We will denote the mean outcome for strategy (A1, A2) by ðA1;A2Þ. We make the following assumptions:
•
The marginal variances of the final outcome given the strategy are all equal, and we denote this variance by 2. This means that 2 = Var[Y|A1 = a1, A2 = a2] for all (a1, a2) in {(1,1), (1,0), (0,1), (0,0)}.
•
The sample sizes will be large enough so that ^ ða1; a2Þ is approximately normally distributed.
•
The correlation between the estimated mean outcome for strategy (1, 1) and the estimated mean outcome for strategy (1, 0) is the same as the correlation between the estimated mean outcome for strategy (0, 1) and the estimated mean outcome resulting for strategy (0, 0); we denote this identical correlation by .
8 SMART Design in the Development of Adaptive Treatment Strategies
203
The correlation of the treatment strategies is directly related to the initial response rates. The final outcome under two different treatment strategies will be correlated to the extent that they share responders. For example, if the response rate for treatment A1 = 1 is 0, then everyone is a nonresponder and the means calculated for Y given strategy (1, 1) and for Y given strategy (1, 0) will not share any responders to treatment A1 = 1; thus, the correlation between the two strategies will be 0. On the other hand, if the response rate for treatment A1 = 1 is 1, then everyone is a responder to A1 = 1 and, therefore, the mean outcomes for strategy (1, 1) and strategy (1, 0) will be directly related (i.e., completely correlated). Two treatment strategies that each begin with a different initial treatment are not correlated since the strategies do not overlap (i.e., they do not share any subjects). For the algorithm, the user must specify the following quantities:
• •
the desired standardized effect size, the desired probability that the strategy estimated to have the largest mean outcome does in fact have the largest mean,
We assume that three of the strategies have the same mean and the one remaining strategy produces the largest mean; this is an extreme scenario in which it is most difficult to detect the presence of an effect. Without loss of generality, we choose strategy (1, 1) to have the largest mean. Consider the following algorithm as a function of N: 1. For every value of in {0, 0.01, 0.02, . . . , 0.99, 1} perform the following simulation: ^ ð1;0Þ ^ ð0;1Þ ^ ð0;0Þ T from ^ ð1;1Þ
Generate K = 20,000 samples of ½
a multivariate normal with 2
3 2 3
ð1;1Þ =2 6 ð1;0Þ 7 6 0 7 7 6 7 mean M ¼ 6 4 ð0;1Þ 5 ¼ 4 0 5 and
ð0;0Þ 0 2
1 16 1 6 covariance matrix ¼ 4 N 0 0 0 0
0 0 1
3 0 07 7 5 1
This gives us 20,000 samples, V1 ; . . . ; Vk ; . . . ; V20000 , where each Vk is a vector of four entries of outcomes, one from each treatment strategy. ^ ð1; 1Þ;k
^ ð1; 0Þ;k ^ ð0; 1Þ;k ^ ð0; 0Þ;k . For example, Vkt ¼ ½
204
Causality and Psychopathology
^ ð1; 1Þ;k is highest;
Count how many times out of V1 ; . . . ; V20000 that
divide this count by 20,000, and call this value C(N). C(N) is the estimate for the probability of correctly identifying the strategy with the highest mean. 2. At the end of step 1, we will have a value of C(N) for each in {0, 0.01, 0.02, . . . , 0.99, 1}. Let N ¼ min C ðNÞ; the value of N is the lowest probability of detecting the best strategy mean. Next, we perform a search over the space of possible values of N to find the value for which N ¼ . N4 is the value of N for which N ¼ . The online calculator for the sample size for question 4 can be found at http://methodologymedia.psu.edu/smart/samplesize.
References Carroll, K. M. (2005). Recent advances in psychotherapy of addictive disorders. Current Psychiatry Reports, 7, 329–336. Carroll, K. M., & Onken, L. S. (2005). Behavioral therapies for drug abuse. American Journal of Psychiatry, 162(8), 1452–1460. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates. Dawson, R., & Lavori, P. W. (2003). Comparison of designs for adaptive treatment strategies: Baseline vs. adaptive randomization. Journal of Statistical Planning and Inference, 117, 365–385. Fiellin, D. A., Kleber, H., Trumble-Hejduk, J. G., McLellan, A. T., & Kosten, T. R. (2004). Consensus statement on office based treatment of opioid dependence using buprenorphine. Journal of Substance Abuse Treatment, 27, 153–159. Fiellin, D., Pantalon, M., Schottenfeld, R., Gordon, L., & O’Connor, P. (1999). Manual for standard medical management of opioid dependence with buprenorphine. New Haven, CT: Yale University School of Medicine, Primary Care Center and Substance Abuse Center, West Haven VA/CT Healthcare System. Fiellin, D. A., Pantalon, M. V., Chawarski, M. C., Moore, B. A., Sullivan, L. E., O’Connor, P. G., et al. (2006). Counseling plus buprenorphine-naloxone maintenance therapy for opioid dependence. New England Journal of Medicine, 355(4), 365–374. Gandhi, D. H., Jaffe, J. H., McNary, S., Kavanagh, G. J., Hayes, M., & Currens, M. (2003). Short-term outcomes after brief ambulatory opioid detoxification with buprenorphine in young heroin users. Addiction, 98, 453–462. Greenhouse, J., Stangl, D., Kupfer, D., & Prien, R. (1991). Methodological issues in maintenance therapy clinical trials. Archives of General Psychiatry, 48(3), 313–318. Hoel, P. (1984). Introduction to mathematical statistics (5th ed.). New York: John Wiley & Sons. Jennison, C., & Turnbull, B. (2000). Group sequential methods with applications to clinical trials. Boca Raton, FL: Chapman & Hall.
8 SMART Design in the Development of Adaptive Treatment Strategies
205
Lavori, P.W., & Dawson, R. (2000). A design for testing clinical strategies: Biased adaptive within-subject randomization. Journal of the Royal Statistical Association, 163, 29–38. Lavori, P. W., Dawson, R., & Rush, A. J. (2000). Flexible treatment strategies in chronic disease: Clinical and research implications. Biological Psychiatry, 48, 605–614. Ling, W., Amass, L., Shoptow, S., Annon, J. J., Hillhouse, M., Babcock, D., et al. (2005). A multi-center randomized trial of buprenorphine-naloxone versus clonidine for opioid detoxification: Findings from the National Institute on Drug Abuse Clinical Trials Network. Addiction, 100, 1090–1100. Ling, W., & Smith, D. (2002). Buprenorphine: Blending practice and research. Journal of Substance Abuse Treatment, 23, 87–92. McLellan, A. T. (2002). Have we evaluated addiction treatment correctly? Implications from a chronic care perspective. Addiction, 97, 249–252. McLellan, A. T., Lewis, D. C., O’Brien, C. P., & Kleber, H. D. (2000). Drug dependence, a chronic medical illness. Implications for treatment, insurance, and outcomes evaluation. Journal of the American Medical Association, 284(13), 1689–1695. Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, 65, 331–366. Murphy, S. A. (2005). An experimental design for the development of adaptive treatment strategies. Statistics in Medicine, 24, 1455–1481. Murphy, S. A., Lynch, K. G., Oslin, D.A., McKay, J. R., & Tenhave, T. (2006). Developing adaptive treatment strategies in substance abuse research. Drug and Alcohol Dependence. doi:10.1016/j.drugalcdep.2006.09.008. Murphy, S. A., Oslin, D. W., Rush, A. J., & Zhu, J. (2007). Methodological challenges in constructing effective treatment sequences for chronic psychiatric disorders. Neuropsychopharmacology, 32, 257–262. Murphy, S. A., Van Der Laan, M. J., Robins, J. M., & Conduct Problems Prevention Group (2001). Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456), 1410–1423. Rush, A. J., Crismon, M. L., Kashner, T. M., Toprac, M. G., Carmody, T. J., Trivedi, M. H., et al. (2003). Texas medication algorithm project, phase 3 (TMAP-3): Rationale and study design. J. Clin. Psychiatry, 64(4), 357–369. Stroup, T. S., McEvoy, J. P., Swartz, M. S., Byerly, M. J., Glick, I. D, Canive, J. M., et al. (2003). The National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) project: Schizophrenia trial design and protocol development. Schizophrenia Bulletin, 29(1), 15–31. Weiss, R., Sharpe, J. P., & Ling, W. A. (2010). Two-phase randomized controlled clinical trial of buprenorphine/naloxone treatment plus individual drug counseling for opioid analgesic dependence. National Institute on Drug Abuse Clinical Trials Network. Retrieved June 14, 2020 from http://www.clinicaltrials.gov/ct/show/ NCT00316277?order=1
9 Obtaining Robust Causal Evidence From Observational Studies Can Genetic Epidemiology Help? george davey smith
Introduction: The Limits of Observational Epidemiology Observational epidemiological studies have clearly made important contributions to understanding the determinants of population health. However, there have been high-profile problems with this approach, highlighted by apparently contradictory findings emerging from observational studies and from randomized controlled trials (RCTs) of the same issue. These situations, of which the best known probably relates to the use of hormone-replacement therapy (HRT) in coronary heart disease (CHD) prevention, have been discussed elsewhere (Davey Smith & Ebrahim, 2002) . The HRT controversy is covered elsewhere in this volume (see Chapter 5). Here, I will discuss two examples. First, consider the use of vitamin E supplements and CHD risk. Several observational studies have suggested that the use of vitamin E supplements is associated with a reduced risk of CHD, two of the most influential being the Health Professionals Follow-Up Study (Rimm et al., 1993) and the Nurses’ Health Study (Stampfer et al., 1993), both published in the New England Journal of Medicine in 1993. Findings from one of these studies are presented in Figure 9.1, where it can be seen that even short-term use of vitamin E supplements was associated with reduced CHD risk, which persisted after adjustment for confounding factors. Figure 9.2 demonstrates that nearly half of U.S. adults are taking either vitamin E supplements or multivitamin/multimineral supplements that generally contain vitamin E (Radimer et al., 2004). Figure 9.3 presents data from three available time points, where there appears to have been a particular increase in vitamin E use following 1993 (Millen, Dodd, & Subar, 2004), possibly consequent upon the publication of the two observational studies already mentioned, which 206
9 Obtaining Robust Causal Evidence From Observational Studies RR
207
2
1.5
1
0.5
0 0-1 year
2-4 years
5-9 years
>10 years
Figure 9.1 Observed effect of duration of vitamin E use compared to no use on coronary heart disease events in the Health Professional Follow-Up Study. From ‘‘Vitamin E consumption and the risk of coronary heart disease in men,’’ by E. B. Rimm, M. J. Stampfer, A. Ascherio, E. Giovannucci, G. A. Colditz, & W. C. Willett, 1993, New England Journal of Medicine, 328, 1450–1456.
have received nearly 3,000 citations between them since publication. The apparently strong observational evidence with respect to vitamin E and reduced CHD risk, which may have influenced the very high current use of vitamin E supplements in developed countries, was unfortunately not realized in RCTs (Figure 9.4), in which no benefit from vitamin E supplementation use is seen. In this example it is important to note that the observational studies and the RCTs were testing precisely the same exposure—short-term
40
Male
Percent
30
Female
20
10
0 Multivitamins/multimineral
Vitamin E
Vitamin C
Figure 9.2 Use of vitamin supplements in the past month among U.S. adults, 1999– 2000. From ‘‘Dietary supplement use by US adults: Data from the National Health and Nutrition Examination Survey, 1999–2000,’’ by K. Radimer, B. Bindewald, J. Hughes, B. Ervin, C. Swanson, & M. F. Picciano, 2004, American Journal of Epidemiology, 160, 339–349.
Causality and Psychopathology
208 30
1987
1992
2000
Percent
20
10
0 Multivitamins
Vitamin E
Vitamin C
Figure 9.3 Use of vitamin supplements in U.S. adults, 1987–2000. From ‘‘Use of vitamin, mineral, nonvitamin, and nonmineral supplements in the United States: The 1987, 1992, and 2000 National Health Interview Survey results.’’ by A. E. Millen, K. W. Dodd, & A. F. Subar, 2004, Journal of the American Dietetic Association, 104, 942–950.
vitamin E supplement use—and yet yielded very different findings with respect to the apparent influence on risk. In 2001 the Lancet published an observational study demonstrating an inverse association between circulating vitamin C levels and incident CHD (Khaw et al., 2001). The left-hand side of Figure 9.5 summarizes these data, presenting the relative risk for a 15.7 mmol/l higher plasma vitamin C level, assuming a log-linear association. As can be seen, adjustment for confounders had little impact on this association. However, a large-scale RCT, the Heart Protection Study, examined the effect of a supplement that increased average plasma vitamin C levels by 15.7 mmol/l. In this study randomization
1.1 1.0 0.9
0.7
0.5
0.3 Stampfer 1993
Rimm 1993
RCTs
Figure 9.4 Vitamin E supplement use and risk of coronary heart disease in two observational studies (Rimm et al., 1993; Stampfer et al., 1993) and in a meta-analysis of randomized controlled trials (Eidelman, Hollar, Hebert, Lamas, & Hennekens, 2004).
9 Obtaining Robust Causal Evidence From Observational Studies
209
1.06 (0.95,1.16)
Heart Protection Study EPIC m
0.72 (0.61,0.86)
EPIC m*
0.70 (0.51,0.95) 0.63 (0.49,0.84)
EPIC w
0.63 (0.45,0.90)
EPIC w* Favours Vitamin C
.4
.6
.8
Does not favour Vitamin C
1
1.2
Relative risk
Figure 9.5 Estimates of the effects of an increase of 15.7 mmol/l plasma vitamin C on coronary heart disease 5-year mortality estimated from the observational epidemiological European Prospective Investigation Into Cancer and Nutrition (EPIC) (Khaw et al., 2001) and the randomized controlled Heart Protection Study (Heart Protection Study Collaborative Group, 2002). EPIC m, male, age-adjusted; EPIC m*, male, adjusted for systolic blood pressure, cholesterol, body mass index, smoking, diabetes, and vitamin supplement use; EPIC f, female, age-adjusted; EPIC f*, female, adjusted for systolic blood pressure, cholesterol, body mass index, smoking, diabetes, and vitamin supplement use.
to the supplement was associated with no decrement in CHD risk (Heart Protection Study Collaborative Group, 2002). What underlies the discrepancy between these findings? One possibility is that there is considerable confounding between vitamin C levels and other exposures that could increase the risk of CHD. In the British Women’s Heart and Health study (BWHHS), for example, women with higher plasma vitamin C levels were less likely to be in a manual social class, to have no car access, to be a smoker, or to be obese and more likely to exercise, to be on a low-fat diet, to have a daily alcoholic drink, and to be tall (Lawlor, Davey Smith, Kundu, Bruckdorfer, & Ebrahim, 2004). Furthermore, for women in their 60s and 70s, those with higher plasma vitamin C levels were less likely to have come from a home 50 years or more previously in which their father was in a manual job, there was no bathroom or hot water, or they had to share a bedroom. They were also less likely to have limited educational attainment. In short, a substantial amount of confounding by factors from across the life course that predict elevated risk of CHD was seen. Table 9.1 illustrates how four simple dichotomous variables from across the life course
Causality and Psychopathology
210
Table 9.1 Cardiovascular Mortality According to Cumulative Risk Indicator (Father’s Social Class, Adulthood Social Class, Smoking, Alcohol Use)
4 3 2 1 0
favorable favorable favorable favorable favorable
(0 (1 (2 (3 (4
unfavorable) unfavorable) unfavorable) unfavorable) unfavorable)
n
CVD Deaths
Relative Risk
517 1,299 1,606 1,448 758
47 227 354 339 220
1 1.99 2.60 2.98 4.55
(1.45–2.73) (1.92–3.52) (2.20–4.05) (3.32–6.24)
From Davey Smith & Hart (2002).
can generate large differences in cardiovascular disease mortality (Davey Smith & Hart, 2002). In the BWHHS a 15.7 mmol/l higher plasma vitamin C level was associated with a relative risk of incident CHD of 0.88 (95% confidence interval [CI] 0.80–0.97), in the same direction as the estimates seen in the observational study summarized in Figure 9.2. When adjusted for the same confounders as were adjusted for in the observational study reported in Figure 9.5, the estimate changed very little—to 0.90 (95% CI 0.82–0.99). When additional adjustment for confounders acting across the life course was made, considerable attenuation was seen, with a residual relative risk of 0.95 (95% CI 0.85–1.05) (Lawlor et al., 2005). It is obvious that given inevitable amounts of measurement imprecision in the confounders or a limited number of missing unmeasured confounders, the residual association is essentially null and close to the finding of the RCT. Most studies have more limited information on potential confounders than is available in the BWHHS, and in other fields we may be even more ignorant of the confounding factors we should measure. In these cases inferences drawn from observational epidemiological studies may be seriously misleading. As the major and compelling rationale for doing these observational studies is to underpin public-health prevention strategies, their repeated failures are a major concern for public-health policy makers, researchers, and funders. Other processes in addition to confounding can produce robust, but noncausal, associations in observational studies. Reverse causation—where the disease influences the apparent exposure, rather than vice versa—may generate strong and replicable associations. For example, many studies have found that people with low circulating cholesterol levels are at increased risk of several cancers, including colon cancer. If causal, this is an important association as it might mean that efforts to lower cholesterol levels would increase the risk of cancer. However, it is possible that the early stages of cancer may, many years before diagnosis or death, lead to a lowering in
9 Obtaining Robust Causal Evidence From Observational Studies
211
cholesterol levels, rather than low cholesterol levels increasing the risk of cancer. Similarly, studies of inflammatory markers such as C-reactive protein and cardiovascular disease risk have shown that early stages of atherosclerosis—which is an inflammatory process—may lead to elevation in circulating inflammatory markers; and since people with atherosclerosis are more likely to experience cardiovascular events, a robust, but noncausal, association between levels of inflammatory markers and incident cardiovascular disease is generated. Reverse causation can also occur through behavioral processes—for example, people with early stages and symptoms of cardiovascular disease may reduce their consumption of alcohol, which would generate a situation in which alcohol intake appears to protect against cardiovascular disease. A form of reverse causation can also occur through reporting bias, with the presence of disease influencing reporting disposition. In case–control studies people with the disease under investigation may report on their prior exposure history in a different way from controls, perhaps because the former will think harder about potential reasons that account for why they have developed the disease.
Table 9.2a Means or proportions of blood pressure, pulse pressure, hypertension and potential confounders by quarters of C-reactive protein (CRP) N = 3,529 (from Davey Smith et al 2005) Means or proportions by quarters of C-reactive protein (Range mg/L)
Hypertension (%) 2
BMI (kg/m ) HDLc (mmol/l) Lifecourse socioeconomic position score Doctor diagnosis of diabetes (%) Current smoker (%) Physically inactive (%) Moderate alcohol consumption (%)
P trend across categories
1 (0.16-0.85)
2 (0.86-1.71)
3 (1.72-3.88)
4 (3.89-112.0)
45.8
49.7
57.5
60.
< 0.001
25.2 1.80 4.08
27.0 1.69 4.37
28.5 1. 4.46
29.7 1.53 4.75
< 0.001 < 0.001 < 0.001
3.5
2.8
4.1
8.4
< 0.001
7.9
9.6
10.9
15.4
< 0.001
11.3
14.9
20.1
29.6
< 0.001
22.2
19.6
18.8
14.0
< 0.001
Causality and Psychopathology
212
Table 9.2b Means or proportions of CRP systolic blood pressure, hypertension and potential confounders by 1059G/C genotype (from Davey Smith et al 2005) Means or proportions by genotype
a
CRP(mg/L log scale) Hypertension (%) BMI (kg/m2) HDLc (mmol/l) Lifecourse socioeconomic position score Doctor diagnosed diabetes (%) Current smoker (%) Physically inactive (%) Moderate alcohol consumption (%)
P
GG
GC or CC
1.81 53.3 27.5 1.67 4.35
1.39 53.1 27.8 1.65 4.42
< 0.001 0.95 0.29 0.38 0.53
4.7 11.2 18.9 18.6
4.5 9.3 18.9 19.8
0.80 0.24 1.0 0.56
a
Geometric means and proportionate (%) change for a doubling of CRP CRP: C-reactive protein; OR: odds ratio; FEV1: forced expiratory volume expiratory in one second; HDLc: high density lipoprotein cholesterol; CVD: cardiovascular disease (stroke or coronary heart disease)
In observational studies, associations between an exposure and disease will generally be biased if there is selection according to an exposure–disease combination in case–control studies or according to an exposure–disease risk combination in prospective studies. Such selection may arise through differential participation in research studies, conducting studies in settings such as hospitals where cases and controls are not representative of the general population, or study of unusual populations (e.g., vegetarians). If, for example, those people experiencing an exposure but at low risk of disease for other reasons were differentially excluded from a study, the exposure would appear to be positively related to disease outcome, even if there were no such association in the underlying population. This is a form of ‘‘Berkson’s bias,’’ well known to epidemiologists (Berkson, 1946). A possible example of such associative selection bias relates to the finding in the large American Cancer Society volunteer cohort that high alcohol consumption was associated with a reduced risk of stroke (Thun et al., 1997). This is somewhat counterintuitive as the outcome category included hemorrhagic stroke (for which there is no obvious mechanism through which alcohol would reduce risk) and because alcohol is known to increase blood pressure, a major causal factor for stroke. Population-based studies have found that heavy alcohol consumption tends to increase stroke risk, particularly hemorrhagic stroke risk (Hart, Davey Smith, Hole, & Hawthorne, 1999; Reynolds et al., 2003). Heavy drinkers who volunteer for a study known to be about the health effects of their lifestyle are
9 Obtaining Robust Causal Evidence From Observational Studies
213
likely to be very unrepresentative of all heavy drinkers in the population, in ways that render them to be at low risk of stroke. Moderate drinkers and nondrinkers who volunteer may be more representative of moderate drinkers and nondrinkers in the underlying population. Thus, the low risk of stroke in the heavy drinkers who volunteer for the study could erroneously make it appear that alcohol reduces the risk of stroke. These problems of confounding and bias relate to the production of associations in observational studies that are not reliable indicators of the true direction of causal associations. A separate issue is that the strength of associations between causal risk factors and disease in observational studies will generally be underestimated due to random measurement imprecision in indexing the exposure. A century ago, Charles Spearman demonstrated mathematically how such measurement imprecision would lead to what he termed the ‘‘attenuation by errors’’ of associations (Spearman, 1904; Davey Smith & Phillips, 1996). This has more latterly been renamed ‘‘regression dilution bias.’’ (MacMahon S, Peto R, Cutler J, Collins R, Sorlie P, et al 1990) Observational studies can and do produce findings that either spuriously enhance or downgrade estimates of causal associations between modifiable exposures and disease. This has serious consequences for the appropriateness of interventions that aim to reduce disease risk in populations. It is for these reasons that alternative approaches—including those within the Mendelian randomization framework—need to be applied.
Mendelian Randomization The basic principle utilized in the Mendelian randomization approach is that if genetic variants either alter the level or mirror the biological effects of a modifiable environmental exposure that itself alters disease risk, then these genetic variants should be related to disease risk to the extent predicted by their influence on exposure to the risk factor. Common genetic polymorphisms that have a well-characterized biological function (or are markers for such variants) can therefore be utilized to study the effect of a suspected environmental exposure on disease risk (Davey Smith & Ebrahim, 2003, 2004, 2005; Davey Smith, 2006; Lawlor, Harbord, Sterne, Timpson, & Davey Smith, 2008; Ebrahim & Davey Smith, 2008). The exploitation of situations in which genotypic differences produce effects similar to environmental factors (and vice versa) clearly resonates with the concepts of ‘‘phenocopy’’ and ‘‘genocopy’’ in developmental genetics (Box 9.1). It may seem illogical to study genetic variants as proxies for environmental exposures rather than to measure the exposures themselves. However, there are several crucial advantages of utilizing functional genetic variants (or their markers) in this manner that relate to the problems with
214
Causality and Psychopathology
box 9.1 Phenocopy, Genocopy, and Mendelian Randomization The term phenocopy is attributed to Goldschmidt (1938) and is used to describe the situation where an environmental effect could produce the same effect as was produced by a genetic mutation. As Goldschmidt (1938) explicated, ‘‘different causes produce the same end effect, presumably by changing the same developmental processes in an identical way.’’ In human genetics the term has generally been applied to an environmentally produced disease state that is similar to a clear genetic syndrome. For example the niacin-deficiency disease pellagra is clinically similar to the autosomal recessive condition Hartnup disease (Baron, Dent, Harris, Hart, & Jepson, 1956), and pellagra has been referred to as a phenocopy of the genetic disorder (Snyder, 1959; Guy, 1993). Hartnup disease is due to reduced neutral amino acid absorption from the intestine and reabsorption from the kidney, leading to low levels of blood tryptophan, which in turn leads to a biochemical anomaly that is similar to that seen when the diet is deficient in niacin (Kraut & Sachs, 2005; Broer, Cavanaugh, & Rasko, 2004). Genocopy is a less utilized term, attributed to Schmalhausen (see Gause, 1942), but has generally been considered to be the reverse of phenocopy—that is, when genetic variation generates an outcome that could be produced by an environmental stimulus (JablonkaTavory, 1982). It is clear that, even when the term genocopy is used polemically (e.g., Rose, 1995), the two concepts are mirror images, reflecting differently motivated accounts of how both genetic and environmental factors influence physical state. For example, Hartnup disease can be called a genocopy of pellagra, while pellagra can be considered a phenocopy of Hartnup disease. Mendelian randomization can, therefore, be viewed as an appreciation of the phenocopy–genocopy nexus that allows causation to be separated from association. Phenocopies of major genetic disorders are generally rarely encountered in clinical medicine, but as Lenz (1973) comments, ‘‘they are, however, most important as models which might help to elucidate the pathways of gene action.’’ Mendelian randomization is generally concerned with less major (and, thus, common) disturbances and reverses the direction of phenocopy ! genocopy, to utilize genocopies of known genetic mechanism to inform us better about pathways through which the environment influences health. The scope of phenocopy–genocopy has been discussed by Zuckerkandl and Villet (1988), who advance mechanisms through which there can be (continued)
9 Obtaining Robust Causal Evidence From Observational Studies
215
Box 9.1 9.3 (continued) (continued) equivalence between environmental and genotypic influences. Indeed, they state that ‘‘no doubt all environmental effects can be mimicked by one or several mutations.’’ The notion that genetic and environmental influences can be both equivalent and interchangeable has received considerable attention in developmental biology (e.g., West-Eberhard, 2003; Leimar, Hammerstein, & Van Dooren, 2006). Furthermore, population genetic analyses of correlations between different traits suggest there are common pathways of genetic and environmental influences, with Cheverud (1988) concluding that ‘‘most environmentally caused phenotypic variants should have genetic counterparts and vice versa.’’
observational studies already outlined. First, unlike environmental exposures, genetic variants are not generally associated with the wide range of behavioral, social, and physiological factors that, for example, confound the association between vitamin C and CHD. This means that if a genetic variant is used to proxy for an environmentally modifiable exposure, it is unlikely to be confounded in the way that direct measures of the exposure will be. Further, aside from the effects of population structure (see Palmer & Cardon, 2005, for a discussion of the likely impact of this), such variants will not be associated with other genetic variants, excepting those with which they are in linkage disequilibrium. Empirical investigation of the associations of genetic variants with potential confounding factors reveals that they do indeed tend to be not associated with such factors (Davey Smith et al., 2008). Second, we have seen how inferences drawn from observational studies may be subject to bias due to reverse causation. Disease processes may influence exposure levels such as alcohol intake or measures of intermediate phenotypes such as cholesterol levels and C-reactive protein. However, germline genetic variants associated with average alcohol intake or circulating levels of intermediate phenotypes will not be influenced by the onset of disease. This will be equally true with respect to reporting bias generated by knowledge of disease status in case–control studies or of differential reporting bias in any study design. Third, associative selection bias, in which selection into a study is related to both exposure level and disease risk and can generate spurious associations (as illustrated with respect to alcohol and stroke), is unlikely to occur with respect to genetic variants. For example, empirical evidence supports a lack of association between a wide range of genetic variants and participation rates in a series of cancer case–control studies (Bhatti et al., 2005).
216
Causality and Psychopathology
Finally, a genetic variant will indicate long-term levels of exposure and if the variant is taken as a proxy for such exposure, it will not suffer from the measurement error inherent in phenotypes that have high levels of variability. For example, groups defined by cholesterol level–related genotype will, over a long period, experience the cholesterol difference seen between the groups. For individuals, blood cholesterol is variable over time, and the use of single measures of cholesterol will underestimate the true strength of association between cholesterol and, say, CHD. Indeed, use of the Mendelian randomization approach predicts a strength of association that is in line with RCT findings of the effects of cholesterol lowering when the increasing benefits seen over the relatively short trial period are projected to the expectation for differences over a lifetime (Davey Smith & Ebrahim, 2004), which will be discussed further.
Categories of Mendelian Randomization The term Mendelian randomization has now become widely used (see Box 9.2), with a variety of meanings. This partly reflects the fact that there are several categories of inference that can be drawn from studies utilizing the Mendelian randomization approach. In the most direct forms, genetic variants can be related to the probability or level of exposure (‘‘exposure propensity’’) or to intermediate phenotypes believed to influence disease risk. Less direct evidence can come from genetic variant–disease associations that indicate that a particular biological pathway may be of importance, perhaps because the variants modify the effects of environmental exposures. Several examples of these categories have been given elsewhere (Davey Smith & Ebrahim, 2003, 2004; Davey Smith, 2006; Ebrahim & Davey Smith, 2008); here, a few illustrative cases are briefly outlined.
Exposure Propensity Alcohol Intake and Health The possible protective effect of moderate alcohol consumption on CHD risk remains controversial (Marmot, 2001; Bovet & Paccaud, 2001; Klatsky, 2001). Nondrinkers may be at a higher risk of CHD because health problems (perhaps induced by previous alcohol abuse) dissuade them from drinking (Shaper, 1993). As well as this form of reverse causation, confounding could play a role, with nondrinkers being more likely to display an adverse profile of socioeconomic or other behavioral risk factors for CHD (Hart et al., 1999). Alternatively, alcohol may have a direct biological effect that lessens
9 Obtaining Robust Causal Evidence From Observational Studies
217
box 9.2 Why ‘‘Mendelian Randomization’’? Gregor Mendel (1822–1884) concluded from his hybridization studies with pea plants that ‘‘the behaviour of each pair of differentiating characteristics [such as the shape and color of seeds] in hybrid union is independent of the other differences between the two original plants’’ (Mendel, 1866). This formulation was actually the only regularity that Mendel referred to as a ‘‘law,’’ and in Carl Correns’ 1900 paper (one of a trio appearing that year that are considered to represent the rediscovery of Mendel) he refers to this as ‘‘Mendel’s law’’ (Correns, 1900; Olby, 1966). Morgan (1913) discusses independent assortment and refers to this process as being realized ‘‘whenever two pairs of characters freely Mendelize.’’ Morgan’s use of Mendel’s surname as a verb did not catch on, but Morgan later christened this principle ‘‘Mendel’s second law’’ (Morgan, 1919); it has been known as this or as ‘‘the law of independent assortment’’ since this time. The law suggests that inheritance of one trait is independent of—that is, randomized with respect to—the inheritance of other traits. The analogy with a randomized controlled trial will clearly be most applicable to parent–offspring designs investigating the frequency with which one of two alleles from a heterozygous parent is transmitted to offspring with a particular disease. However, at the population level, traits influenced by genetic variants are generally not associated with the social, behavioral, and environmental factors that confound relationships observed in conventional epidemiological studies. Thus, while the ‘‘randomization’’ is approximate and not absolute in genetic association studies, empirical observations suggest that it applies in most circumstances (Davey Smith, Harbord, Milton, Ebrahim, & Sterne, 2005a; Bhatti et al., 2005; Davey Smith et al., 2008). The term Mendelian randomization itself was introduced in a somewhat different context, in which the random assortment of genetic variants at conception is utilized to provide an unconfounded study design for estimating treatment effects for childhood malignancies (Gray & Wheatley, 1991; Wheatley & Gray, 2004). The term has recently become widely used with the meaning ascribed to it in this chapter. The notion that genetic variants can serve as an indicator of the action of environmentally modifiable exposures has been expressed in many contexts. For example, since the mid-1960s various investigators have pointed out that the autosomal dominant condition of lactase persistence is associated with milk drinking. Protective associations of lactase persistence (continued)
218
Causality and Psychopathology
Box box 9.3 9.2 (continued) with osteoporosis, low bone mineral density, or fracture risk thus provide evidence that milk drinking reduces the risk of these conditions (Birge, Keutmann, Cuatrecasas, & Whedon, 1967; Newcomer, Hodgson, Douglas, & Thomas, 1978). In a related vein, it was proposed in 1979 that as Nacetyltransferase pathways are involved in the detoxification of arylamine, a potential bladder carcinogen, the observation of increased bladdercancer risk among people with genetically determined slow-acetylator phenotype provided evidence that arylamines are involved in the etiology of the disease (Lower et al., 1979). Since these early studies various commentators have pointed out that the association of genetic variants of known function with disease outcomes provides evidence about etiological factors (McGrath, 1999; Ames, 1999; Rothman et al., 2001; Brennan, 2002; Kelada, Eaton, Wang, Rothman, & Khoury, 2003). However, these commentators have not emphasized the key strengths of Mendelian randomization- the avoidance of confounding, the avoidance of bias due to reverse causation and reporting tendency, and correction for the underestimation of risk associations due to variability in behaviors and phenotypes (Davey Smith & Ebrahim, 2004). These key concepts were present in Martijn Katan’s 1986 Lancet letter, in which he suggested that genetic variants related to cholesterol level could be used to investigate whether the observed association between low cholesterol and increased cancer risk was real, and by Honkanen and colleagues’ (1996) understanding of how lactase persistence could better characterize the difficult-to-measure environmental influence of calcium intake than could direct dietary reports. Since 2000 there have been several reports using the term Mendelian randomization in the way it is used here (Youngman et al., 2000; Fallon, Ben-Shlomo, & Davey Smith, 2001; Clayton & McKeigue, 2001; Keavney, 2002; Davey Smith & Ebrahim, 2003), and its use is becoming widespread.
the risk of CHD—for example, by increasing the levels of protective highdensity lipoprotein (HDL) cholesterol (Rimm, 2001). It is, however, unlikely that an RCT of alcohol intake, able to test whether there is a protective effect of alcohol on CHD events, will be carried out. Alcohol is oxidized to acetaldehyde, which in turn is oxidized by aldehyde dehydrogenases (ALDHs) to acetate. Half of Japanese people are heterozygous or homozygous for a null variant of ALDH2, and peak blood acetaldehyde concentrations post–alcohol challenge are 18 times and five times higher, respectively, among homozygous null variant and heterozygous
9 Obtaining Robust Causal Evidence From Observational Studies
219
individuals compared with homozygous wild-type individuals (Enomoto, Takase, Yasuhara, & Takada, 1991). This renders the consumption of alcohol unpleasant through inducing facial flushing, palpitations, drowsiness, and other symptoms. As Figure 9.6a shows, there are very considerable differences in alcohol consumption according to genotype (Takagi et al., 2002). The principles of Mendelian randomization are seen to apply—two factors that would be expected to be associated with alcohol consumption, age and cigarette smoking, which would confound conventional observational associations between alcohol and disease, are not related to genotype despite the strong association of genotype with alcohol consumption (Figure 9.6b). It would be expected that ALDH2 genotype influences diseases known to be related to alcohol consumption, and as proof of principle it has been shown that ALDH2 null variant homozygosity—associated with low alcohol consumption—is indeed related to a lower risk of liver cirrhosis (Chao et al., 1994). Considerable evidence, including data from RCTs, suggests that alcohol increases HDL cholesterol levels (Haskell et al., 1984; Burr, Fehily, Butland, Bolton, & Eastham, 1986) (which should protect against CHD). In line with this, ALDH2 genotype is strongly associated with HDL cholesterol in the expected direction (Figure 9.6c). With respect to blood pressure, observational evidence suggests that long-term alcohol intake produces an increased risk of hypertension and higher prevailing blood pressure levels. A meta-analysis of studies of ALDH2 genotype and blood pressure suggests that there is indeed a substantial effect in this direction, as demonstrated in Figures 9.7, 9.8, and 9.9 (Chen et al., 2008). Alcohol intake has also been postulated to increase the risk of esophageal cancer; however, some have questioned the importance of its role (Memik, 2003). Figure 9.9 presents findings from a meta-analysis of studies of ALDH2 genotype and esophageal-cancer risk (Lewis & Davey Smith, 2005), clearly showing that people who are homozygous for the null variant, who therefore consume considerably less alcohol, have a greatly reduced risk of esophageal cancer. Indeed, this reduction in risk is close to that predicted by the joint effect of genotype on alcohol consumption and the association of alcohol consumption on esophageal-cancer risk in a meta-analysis of such observational studies (Gutjahr, Gmel, & Rehm, 2001). When the heterozygotes are compared with the homozygous functional variant, an interesting picture emerges: The risk of esophageal cancer is higher in the heterozygotes who drink rather less alcohol than those with the homozygous functional variant. This suggests that it is not alcohol itself that is the causal factor but acetaldehyde and that the increased risk is apparent only in those who drink some alcohol but metabolize it inefficiently, leading to high levels of acetaldehyde.
Causality and Psychopathology
220
Alcohol Intake ml/day
40
30
20
10
0 2*2/2*2
2*2/2*1
1*1/1*1
ALDH2 Genotype Age
70
Smoker
70
Percentage
Years
60 60
50
50 40 30 20
40 2*2/2*2
2*2/2*1
2*2/2*2
1*1/1*1
2*2/2*1
1*1/1*1
65 60
HDL mg/dl
55
50
45
40 35 2*2/2*2
2*2/2*1
1*1/1*1
Figure 9.6 a Relationship between alcohol intake and ALDH2 genotype. b Relationship between characteristics and ALDH2 genotype. c Relationship between HDL cholesterol and ALDH2 genotype. From ‘‘Aldehyde dehydrogenase 2 gene is a risk factor for myocardial infarction in Japanese men,’’ by S. Takagi, N. Iwai, R. Yamauchi, S. Kojima, S. Yasuno, T. Baba, et al., 2002, Hypertension Research, 25, 677–681.
9 Obtaining Robust Causal Evidence From Observational Studies
221
Odds radio in Hypertension (95% CI)
Study ID 12vs22 (Male) Amamoto et al, 2002 [18]
1.67 (0.92, 3.03)
Iwai et al, 2004 [31]
1.57 (0.90, 2.72)
Saito et al, 2003 [28]
2.84 (0.79, 10.15)
Subtotal
(I2=0.0%,
p=0.701)
1.72 (1.17, 2.52)
11vs22 (Male) Amamoto et al, 2002 [18]
2.50 (1.38, 4.54)
Iwai et al, 2004 [31]
2.02 (1.17, 3.47)
Saito et al, 2003 [28]
4.62 (1.31, 16.25)
Subtotal (I2 =0.0%, p =0.482)
2.42 (1.66, 3.55)
.6 .8 1
2
4
8
16
Figure 9.7 Forest plot of studies of ALDH2 genotype and hypertension. From L Chen et al. 2008.
Intermediate Phenotypes Genetic variants can influence circulating biochemical factors such as cholesterol, homocysteine, and fibrinogen levels. This provides a method for assessing causality in associations between these measures (intermediate phenotypes) and disease and, thus, whether interventions to modify the intermediate phenotype could be expected to influence disease risk.
Cholesterol and CHD Familial hypercholesterolemia is a dominantly inherited condition in which very many rare mutations of the low-density lipoprotein receptor gene (about 10 million people affected worldwide, a prevalence of around 0.2%) lead to high circulating cholesterol levels (Marks, Thorogood, Neil, & Humphries, 2003). The high risk of premature CHD in people with this condition was readily appreciated, with an early U.K. report demonstrating that by age 50 half of men and 12% of women had suffered from CHD (Slack, 1969). Compared with the population of England and Wales (mean total cholesterol 6.0 mmol/l), people with familial hypercholesterolemia (mean total cholesterol 9 mmol/l) suffered a 3.9-fold increased risk of CHD mortality, although very high relative risks among those aged less than 40 years have been observed (Scientific Steering Committee, 1991). These observations regarding
Causality and Psychopathology
222
Mean differnce DBP in mmHg (95% CI)
Study ID 12vs22 (Male) Amamoto et al., 2002 [18] Saito et al., 2003 [28] Takagi et al., 2001 [19] Tsuritani et al., 1995 [20] Yamada et al., 2002 [29] Subtotal (I2 =0.0%, p=0.720) 11vs22 (Male) Amamoto et al, 2002 [18] Saito et al., 2003 [28] Takagi et al., 2001 [19] Tsuritani et al., 1995 [20] Yamada et al., 2002 [29] Subtotal (I2 =0.0%, p =0.492) −4
−2
2.70 (−0.30, 5.70) 3.90 (−0.95, 8.75) 1.00 (−0.75, 2.75) 2.10 (−2.29, 6.49) 0.70 (−3.17, 4.57) 1.58 (0.29, 2.87) 4.40 (1.36, 7.44) 7.10 (2.36, 11.84) 3.10 (1.35, 4.85) 5.80 (1.50, 10.10) 3.80 (0.01, 7.59) 3.95 (2.66, 5.24) 0
4
8
12 Mean differnce of SBP in mmHg (95% CI)
Study ID 12vs22 (Male) Amamoto et al., 2002 [18] Saito et al., 2003 [28] Takagi et al., 2001 [19] Tsuritani et al., 1995 [20] Yamada et al., 2002 [29] Subtotal (I2 =12.1%, p=0.336)
6.00 (1.30, 10.70) 9.40 (2.75, 16.05) 2.20 (−1.05, 5.45) 3.10 (−3.18, 9.38) 4.80 (0.21, 9.39) 4.24 (2.18, 6.31)
11vs22 (Male) Amamoto et al., 2002 [18] Saito et al., 2003 [28] Takagi et al., 2001 [19] Tsuritani et al., 1995 [20] Yamada et al., 2002 [29] Subtotal (I2 =18.0%, p =0.300)
8.40 (3.67, 13.13) 13.90 (7.35, 20.45) 5.90 (2.65, 9.15) 6.80 (0.54, 13.06) 6.80 (2.32, 11.28) 7.44 (5.39, 9.49)
−4 −2 0
4
8
12
16
20
Figure 9.8 Forest plot of studies of ALDH2 genotype and blood pressure. L Chen, G. Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
genetically determined variation in risk provided strong evidence that the associations between blood cholesterol and CHD seen in general populations reflected a causal relationship. The causal nature of the association between blood cholesterol levels and CHD has historically been controversial (Steinberg, 2004). As both Daniel Steinberg (2005) and Ole Færgeman (2003) discuss, many clinicians and public-health practitioners rejected the notion of a causal link for a range of reasons. However, from the late 1930s onward, the finding that people with genetically high levels of
9 Obtaining Robust Causal Evidence From Observational Studies odds ratio (95% CI)
Study
223
% Weight
Hori
0.87 (0.19, 4.06)
26.5
Matsuo
0.19 (0.02, 1.47)
15.0
Boonyphiphat
0.22 (0.03, 1.87)
13.9
Itoga
0.48 (0.06, 3.87)
14.4
Yokoyama (2002)
0.25 (0.06, 1.07)
30.2
0.36 (0.16, 0.80)
100.0
Overall
.1
.2
.5 1 2 odds ratio
5
Figure 9.9 Risk of esophageal cancer in individuals with the ALDH2*2*2 vs. ALDH2*1*1 genotype. From ‘‘Alcohol, ALDH2 and esophageal cancer: A meta-analysis which illustrates the potentials and limitations of a Mendelian randomization approach,’’ by S. Lewis & G. Davey Smith, 2005, Cancer Epidemiology, Biomarkers and Prevention, 14, 1967–1971.
cholesterol had high risk for CHD should have been powerful and convincing evidence of the causal nature of elevated blood cholesterol in the general population. With the advent of effective means of reducing blood cholesterol through statin treatment, there remains no serious doubt that the cholesterol–CHD relationship is causal. Among people without CHD, reducing total cholesterol levels with statin drugs by around 1–1.5 mmol/l reduces CHD mortality by around 25% over 5 years. Assuming a linear relationship between blood cholesterol and CHD risk and given the difference in cholesterol of 3.0 mmol/l between people with familial hypercholesterolemia and the general population, the RCT evidence on lowering total cholesterol and reducing CHD mortality would predict a relative risk for CHD of around 2, as opposed to 3.9, for people with familial hypercholesterolemia. However, the trials also demonstrate that the relative reduction in CHD mortality increases over time from randomization—and thus time with lowered cholesterol—as would be expected if elevated levels of cholesterol operate over decades to influence the development of atherosclerosis. People with familial hypercholesterolemia will have had high total cholesterol levels throughout their lives, and this would be expected to generate a greater risk than that predicted by the results of lowering cholesterol levels for only 5 years. Furthermore, ecological studies relating cholesterol levels to CHD demonstrate that the strength of
224
Causality and Psychopathology
association increases as the lag period between cholesterol level assessment and CHD mortality increases (Rose, 1982), again suggesting that long-term differences in cholesterol level are the important etiological factor in CHD. As discussed, Mendelian randomization is one method for assessing the effects of long-term differences in exposures on disease risk, free from the diluting problems of both measurement error and having only short-term assessment of risk-factor levels. This reasoning provides an indication that cholesterol-lowering efforts should be lifelong rather than limited to the period for which RCT evidence with respect to CHD outcomes is available. Recently, several common genetic variants have been identified that are related to cholesterol level and CHD risk, and these have also demonstrated effects on CHD risk consistent with lifelong differences in cholesterol level (Davey Smith, Timpson & Ebrahim, 2008; Kathiresan et al., 2008). C-Reactive Protein and CHD Strong associations of C-reactive protein (CRP), an acute-phase inflammatory marker, with hypertension, insulin resistance, and CHD have been repeatedly observed (Danesh et al., 2004; Wu, Dorn, Donahue, Sempos, & Trevisan, 2002; Pradhan, Manson, Rifai, Buring, & Ridker, 2001; Han et al., 2002; Sesso et al., 2003; Hirschfield & Pepys, 2003; Hu, Meigs, Li, Rifai, & Manson, 2004), with the obvious inference that CRP is a cause of these conditions (Ridker et al., 2005; Sjo¨holm & Nystro¨m, 2005; Verma, Szmitko, & Ridker, 2005). A Mendelian randomization study has examined the association between polymorphisms of the CRP gene and demonstrated that while serum CRP differences were highly predictive of blood pressure and hypertension, the CRP variants—which are related to sizeable serum CRP differences—were not associated with these same outcomes (Davey Smith et al., 2005b). It is likely that these divergent findings are explained by the extensive confounding between serum CRP and outcomes. Current evidence on this issue, though statistically underpowered, also suggests that CRP levels do not lead to elevated risk of insulin resistance (Timpson et al., 2005) or CHD (Casas et al., 2006). Again, confounding, and reverse causation—where existing coronary disease or insulin resistance may influence CRP levels—could account for this discrepancy. Similar findings have been reported for serum fibrinogen, variants in the beta-fibrinogen gene, and CHD (Davey Smith et al., 2005a; Keavney et al., 2006). The CRP and fibrinogen examples demonstrate that Mendelian randomization can both increase evidence for a causal effect of an environmentally modifiable factor (as in the examples of milk, alcohol, and cholesterol levels) and provide evidence against causal effects, which can help direct efforts away from targets of no preventative or therapeutic relevance.
9 Obtaining Robust Causal Evidence From Observational Studies
225
Maternal Genotype as an Indicator of Intrauterine Environment Mendelian randomization studies can provide unique insights into the causal nature of intrauterine environment influences on later disease outcomes. In such studies, maternal genotype is taken to be a proxy for environmentally modifiable exposures mediated through the mother that influence the intrauterine environment. For example, it is now widely accepted that neural tube defects can in part be prevented by periconceptual maternal folate supplementation (Scholl and Johnson, 2000. RCTs of folate supplementation have provided the key evidence in this regard (MRC Vitamin Study Research Group, 1991; Czeizel & Duda´s, 1992). However, could we have reached the same conclusion before the RCTs were carried out if we had access to evidence from genetic association studies? Studies have looked at the MTHFR 677C!T polymorphism (a genetic variant that is associated with methyltetrahydrofolate reductase activity and circulating homocysteine levels, the TT genotype being associated with higher homocysteine levels) in newborns with neural tube defects compared to controls and have found an increased risk in TT vs. CC newborns, with a relative risk of 1.75 (95% CI 1.41–2.18) in a meta-analysis of all such studies (Botto & Yang, 2000). Studies have also looked at the association between this MTHFR variant in parents and the risk of neural tube defect in their offspring. Mothers who have the TT genotype have an increased risk of 2.04 (95% CI 1.49–2.81) of having an offspring with a neural tube defect compared to mothers who have the CC genotype (Roseboom et al., 2000). For TT fathers, the equivalent relative risk is 1.18 (95% CI 0.65–2.12) (Scholl & Johnson, 2000). This pattern of associations suggests that it is the intrauterine environment—influenced by maternal TT genotype—rather than the genotype of offspring that is related to disease risk (Figure 9.10). This is consistent with the hypothesis that maternal folate intake is the exposure of importance. In this case, the findings from observational studies, genetic association studies, and an RCT are closely similar. Had the technology been available, the genetic association studies, with the particular influence of maternal versus paternal genotype on neural tube defect risk, would have provided strong evidence of the beneficial effect of folate supplementation before the results of any RCT had been completed, although trials would still have been necessary to confirm that the effect was causal for folate supplementation. Certainly, the genetic association studies would have provided better evidence than that given by conventional epidemiological studies, which would have had to cope with the problems of accurately assessing diet and the considerable confounding of maternal folate intake with a wide variety of lifestyle and socioeconomic factors that may also influence neural tube defect risk.
226
Causality and Psychopathology
Mother – TT – foetus exposed in utero: RR 2.04
Father – TT – but no way that this can affect in utero exposure of foetus: RR 1.18
Foetus – TT – inherits 50% from mother and 50% from father – hence intermediate risk: RR 1.75
Figure 9.10 Inheritance of MTHFR polymorphism and neural tube defects.
The association of genotype with neural tube defect risk does not suggest that genetic screening is indicated; rather, it demonstrates that an environmental intervention may benefit the whole population, independent of the genotype of individuals receiving the intervention. Studies utilizing maternal genotype as a proxy for environmentally modifiable influences on the intrauterine environment can be analyzed in a variety of ways. First, the mothers of offspring with a particular outcome can be compared to a control group of mothers who have offspring without the outcome in a conventional case–control design but with the mother as the exposed individual (or control) rather than the offspring with the particular health outcome (or the control offspring). Fathers could serve as a control group when autosomal genetic variants are being studied. If the exposure is mediated by the mother, maternal genotype, rather than offspring genotype, will be the appropriate exposure indicator. Clearly, maternal and offspring genotypes are associated but conditional on each other; it should be the maternal genotype that shows the association with the health outcome among the offspring. Indeed, in theory it would be possible to simply compare genotype distributions of mothers and offspring, with a higher prevalence among mothers providing evidence that maternal genotype, through an intrauterine pathway, is of importance. However, the statistical power of such an approach is low, and an external control group, whether fathers or women who have offspring without the health outcome, is generally preferable. The influence of high levels of alcohol intake by pregnant women on the health and development of their offspring is well recognized for very high levels of intake, in the form of fetal alcohol syndrome (Burd, 2006). However, the influence outside of this extreme situation is less easy to assess, particularly as higher levels of alcohol intake will be related to a wide array of potential sociocultural, behavioral, and environmental confounding factors. Furthermore, there may be systematic bias in how mothers report alcohol intake during pregnancy, which could distort associations with health outcomes. Therefore, outside of the case of very high alcohol intake by mothers,
9 Obtaining Robust Causal Evidence From Observational Studies
227
it is difficult to establish a causal link between maternal alcohol intake and offspring developmental characteristics. Some studies have approached this by investigating alcohol-metabolizing genotypes in mothers and offspring outcomes. Although sample sizes have been low and the analytical strategies not optimal, they provide some evidence to support the influence of maternal genotype (Gemma, Vichi, & Testai, 2007; Jacobson et al., 2006; Warren & Li, 2005). For example, in one study mental development at age 7.5 was delayed among offspring of mothers possessing a genetic variant associated with less rapid alcohol metabolism. Among these mothers there would presumably be less rapid clearance of alcohol and, thus, an increased influence of maternal alcohol on offspring during the intrauterine period (Jacobson et al., 2006). Offspring genotype was not independently related to these outcomes, indicating that the crucial exposure was related to maternal alcohol levels. As in the MTHFR examples, these studies are of relevance because they provide evidence of the influence of maternal alcohol levels on offspring development, rather than because they highlight a particular maternal genotype that is of importance. In the absence of alcohol drinking, the maternal genotype would presumably have no influence on offspring outcomes. The association of maternal genotype and offspring outcome suggests that the alcohol level in mothers, and therefore their alcohol consumption, has an influence on offspring development.
Implications of Mendelian Randomization Study Findings Establishing the causal influence of environmentally modifiable risk factors from Mendelian randomization designs informs policies for improving population health through population-level interventions. This does not imply that the appropriate strategy is genetic screening to identify those at high risk and application of selective exposure reduction policies. For example, the implication of studies on maternal MTHFR genotype and offspring neural tube defect risk is that the population risk for neural tube defects can be reduced through increased folate intake periconceptually and in early pregnancy. It does not suggest that women should be screened for MTHFR genotype; women without the TT genotype but with low folate intake are still exposed to preventable risk of having babies with neural tube defects. Similarly, establishing the association between genetic variants (such as familial defective ApoB) associated with elevated cholesterol level and CHD risk strengthens causal evidence that elevated cholesterol is a modifiable risk factor for CHD for the whole population. Thus, even though the population attributable risk
Causality and Psychopathology
228
for CHD of this variant is small, it usefully informs public-health approaches to improving population health. It is this aspect of Mendelian randomization that illustrates its distinction from conventional risk identification and genetic screening purposes of genetic epidemiology.
Mendelian Randomization and RCTs RCTs are clearly the definitive means of obtaining evidence on the effects of modifying disease risk processes. There are similarities in the logical structure of RCTs and Mendelian randomization, however. Figure 9.11 illustrates this, drawing attention to the unconfounded nature of exposures proxied for by genetic variants (analogous to the unconfounded nature of a randomized intervention), the lack of possibility of reverse causation as an influence on exposure–outcome associations in both Mendelian randomization and RCT settings, and the importance of intention-to-treat analyses—that is, analysis by group defined by genetic variant, irrespective of associations between the genetic variant and the proxied for exposure within any particular individual. The analogy with RCTs is also useful with respect to one objection that has been raised for Mendelian randomization studies. This is that the environmentally modifiable exposure proxied for by the genetic variants (such as alcohol intake or circulating CRP levels) is influenced by many other factors in addition to the genetic variants (Jousilahti & Salomaa, 2004). This is, of course, true. However, consider an RCT of blood pressure–lowering medication. Blood pressure is influenced mainly by factors other than taking blood pressure–lowering medication—obesity, alcohol intake, salt consumption and Randomized controlled trial
Mendelian randomization
Random segregation of alleles
Exposed: one allelle
Control: other allelle
Confounders equal between groups Outcomes compared between groups
Randomization method
Exposed: Intervention
Control: No intervention Confounders equal between groups
Outcomes compared between groups
Figure 9.11 Mendelian randomization and randomized controlled trial designs compared.
9 Obtaining Robust Causal Evidence From Observational Studies
229
other dietary factors, smoking, exercise, physical fitness, genetic factors, and early-life developmental influences are all of importance. However, the randomization that occurs in trials ensures that these factors are balanced between the groups that receive the blood pressure–lowering medication and those that do not. Thus, the fact that many other factors are related to the modifiable exposure does not vitiate the power of RCTs; neither does it vitiate the strength of Mendelian randomization designs. A related objection is that the genetic variants often explain only a trivial proportion of the variance in the environmentally modifiable risk factor that is being proxied for (Glynn, 2006). Again, consider an RCT of blood pressure– lowering medication where 50% of participants received the medication and 50% received a placebo. If the antihypertensive therapy reduced blood pressure (BP) by a quarter of a standard deviation (SD), which is approximately the situation for such pharmacotherapy, then within the whole study group treatment assignment (i.e., antihypertensive use vs. placebo) will explain less than 2% of the variance in blood pressure. In the example of CRP haplotypes used as instruments for CRP levels, these haplotypes explain 1.66% of the variance in CRP levels in the population (Lawlor et al., 2008). As can be seen, the quantitative association of genetic variants as instruments can be similar to that of randomized treatments with respect to the biological processes that such treatments modify. Both logic and quantification fail to support criticisms of the Mendelian randomization approach based on either the obvious fact that many factors influence most phenotypes of interest or the fact that particular genetic variants account for only a small proportion of variance in the phenotype.
Mendelian Randomization and Instrumental Variable Approaches As well as the analogy with RCTs, Mendelian randomization can also be likened to instrumental variable approaches, which have been heavily utilized in econometrics and social science, although rather less so in epidemiology. In an instrumental variable approach the instrument is a variable that is related to the outcome only through its association with the modifiable exposure of interest. The instrument is not related to confounding factors nor is its assessment biased in a manner that would generate a spurious association with the outcome. Furthermore, the instrument will not be influenced by the development of the outcome (i.e., there will be no reverse causation). Figure 9.12 presents this basic schema, where the dotted line between genotype and outcome provides an unconfounded and unbiased estimate of the causal association between the exposure that the genotype is proxying for and the outcome. The development of instrumental variable methods within
230
Causality and Psychopathology
Geneotype
Exposure
Outcome
Confounders; reverse causation; bias
Figure 9.12 Mendelian randomization as an instrumental variables approach.
econometrics, in particular, has led to a sophisticated suite of statistical methods for estimating causal effects, and these have now been applied within Mendelian randomization studies (e.g., Davey Smith et al., 2005a, 2005b; Timpson et al., 2005). The parallels between Mendelian randomization and instrumental variable approaches are discussed in more detail elsewhere (Thomas & Conti, 2004; Lawlor et al., 2008). The instrumental variable method allows for estimation of the causal effect size of the modifiable environmental exposure of interest and the outcome, together with estimates of the precision of the effect. Thus, in the example of alcohol intake (indexed by ALDH2 genotype) and blood pressure discussed earlier it is possible to utilize the joint associations of ALDH2 genotype and alcohol and ALDH2 genotype and blood pressure to estimate the causal influence of alcohol intake on blood pressure. Figure 9.13 reports such an analysis, showing that for a 1 g/day increase in alcohol intake there are robust increases in diastolic and systolic blood pressure among men (Chen et al., 2008).
Mendelian Randomization and Gene by Environment Interaction
Mendelian randomization is one way in which genetic epidemiology can inform our understanding about environmental determinants of disease. A more conventional approach has been to study interactions between environmental exposures and genotype (Perera, 1997; Mucci, Wedren, Tamimi, Trichopoulos, & Adami, 2001). From epidemiological and Mendelian randomization perspectives, several issues arise with gene–environment interactions. The most reliable findings in genetic association studies relate to the main effects of polymorphisms on disease risk (Clayton & McKeigue, 2001). The power to detect meaningful gene–environment interaction is low (Wright, Carothers, & Campbell, 2002), with the result being that there are a large number of reports of spurious gene–environment interactions in
9 Obtaining Robust Causal Evidence From Observational Studies
231
Alcohol-BP effect (95% CI) Diastolic: Amamoto et al., 2002 [18]
0.17 (0.06, 0.28)
Takagi et al., 2001 [19]
0.15 (0.08, 0.22)
Tsuritani et al., 1995 [20]
0.16 (0.07, 0.26)
Subtotal (I2 = 0.0%, p = 0.970)
0.16 (0.11, 0.21)
Systolic: Amamoto et al., 2002 [18]
0.29 (0.12, 0.47)
Takagi et al., 2001 [19]
0.28 (0.16, 0.40)
Tsuritani et al., 1995 [20]
0.18 (0.05, 0.31)
Subtotal (I2 = 0.0%, p = 0.439)
0.24 (0.16, 0.32) 0
.1
.2 .3 .4
.5
mmHg per g/day
Figure 9.13 Instrumental variable estimates of difference in systolic and diastolic blood pressure produced by 1g per day hyper alcohol intake.
the medical literature (Colhoun, McKeigue, & Davey Smith, 2003). The presence or absence of statistical interactions depends upon the scale (e.g., linear or logarithmic with respect to the exposure–disease outcome), and the meaning of observed deviation from either an additive or a multiplicative model is not clear. Furthermore, the biological implications of interactions (however defined) are generally uncertain (Thompson, 1991). Mendelian randomization is most powerful when studying modifiable exposures that are difficult to measure and/or considerably confounded, such as dietary factors. Given measurement error—particularly if this is differential with respect to other factors influencing disease risk—interactions are both difficult to detect and often misleading when, apparently, they are found (Clayton & McKeigue, 2001). The situation is perhaps different with exposures that differ qualitatively rather than quantitatively between individuals. Consider the issue of the influence of smoking tobacco on bladder-cancer risk. Observational studies suggest an association, but clearly confounding and a variety of biases could generate such an association. The potential carcinogens in tobacco smoke of relevance to bladder-cancer risk include aromatic and heterocyclic amines, which are detoxified by N-acetyltransferase 2 (NAT2). Genetic variation in NAT2 enzyme levels leads to slower or faster acetylation states. If the carcinogens in tobacco smoke do increase the risk of bladder cancer, then it would be expected that slow acetylators, those who have a reduced rate of detoxification of these carcinogens, would be at an increased risk of bladder cancer if they were smokers, whereas if they were not exposed to these carcinogens
Causality and Psychopathology
232
(and the major exposure route for those outside of particular industries is through tobacco smoke), then an association of genotype with bladder-cancer risk would not be anticipated. Table 9.3 tabulates findings from a large study reported in a way that allows analysis of this simple hypothesis (Gu, Liang, Wang, Lu, & Wu, 2005). As can be seen, the influence of the NAT2 slowacetylation genotype is appreciable only among those also exposed to heavy smoking. Since the genotype will be unrelated to confounders, it is difficult to reason why this situation should arise unless smoking is a causal factor with respect to bladder cancer. Thus, the presence of a sizable effect of genotype in the exposed group but not in the unexposed group provides evidence as to the causal nature of the environmentally modifiable risk factor—in this example, smoking. It must be recognized, however, that gene by environment interactions interpreted within the Mendelian randomization framework as evidence regarding the causal nature of environmentally modifiable exposures are not protected from confounding to the extent that main genetic effects are. In the NAT2/smoking/bladder cancer example any factor related to smoking—such as social class—will tend to show a greater association with bladder cancer within NAT2 slow acetylators than within NAT2 rapid acetylators. Because there is not a one-to-one association of social class with smoking, this will not produce the qualitative interaction of essentially no effect of the genotype in one exposure stratum and an effect in the other, as in the NAT2/smoking interaction, but rather a qualitative interaction of a greater effect of NAT2 in the poorer social classes (among whom smoking is more prevalent) and a smaller (but still evident) effect in the better-off social classes, among whom smoking is less prevalent. Thus, situations in which both the biological basis of an expected interaction is well understood and a qualitative (effect vs. no effect) interaction may be anticipated are the ones that are most amenable to interpretations related to the general causal nature of the environmentally modifiable risk factor.
Problems and Limitations of Mendelian Randomization We consider Mendelian randomization to be one of the brightest current prospects for improving causal understanding within population-based studies. There are, however, several potential limitations to the application of this methodology (Davey Smith & Ebrahim, 2003; Little & Khoury, 2003). Table 9.3 NAT2 (Slow vs. Fast Acetylator) risk, stratified by smoking status and Bladder Cancer Overall
Never/Light Smokers
Heavy Smokers
1.35 (1.04–1.75)
1.10 (0.78–1.53)
2.11 (1.30–3.43)
From data in Gu et al. (2005).
9 Obtaining Robust Causal Evidence From Observational Studies
233
Failure to Establish Reliable Genotype–Intermediate Phenotype or Genotype–Disease Associations If the associations between genotype and a potential intermediate phenotype or between genotype and disease outcome are not reliably estimated, then interpreting these associations in terms of their implications for potential environmental causes of disease will clearly be inappropriate. This is not an issue peculiar to Mendelian randomization; rather, the nonreplicable nature of perhaps most apparent findings in genetic association studies is a serious limitation to the whole enterprise. This issue has been discussed elsewhere (Cardon & Bell, 2001; Colhoun et al., 2003) and will not be dealt with further here. Instead, problems with the Mendelian randomization approach even when reliable genotype–phenotype associations can be determined will be addressed.
Confounding of Genotype–Environmentally Modifiable Risk Factor–Disease Associations The power of Mendelian randomization lies in its ability to avoid the often substantial confounding seen in conventional observational epidemiology. However, confounding can be reintroduced into Mendelian randomization studies; and when interpreting the results, whether this has arisen needs to be considered. Linkage Disequilibrium It is possible that the locus under study is in linkage disequilibrium (i.e., is associated) with another polymorphic locus, with the effect of the polymorphism under investigation being confounded by the influence of the other polymorphism. It may seem unlikely—given the relatively short distances over which linkage disequilibrium is seen in the human genome—that a polymorphism influencing, say, CHD risk would be associated with another polymorphism influencing CHD risk (and thus producing confounding). There are, nevertheless, cases of different genes influencing the same metabolic pathway being in physical proximity. For example, different polymorphisms influencing alcohol metabolism appear to be in linkage disequilibrium (Osier et al., 2002). Pleiotropy and the Multifunction of Genes Mendelian randomization is most useful when it can be used to relate a single intermediate phenotype to a disease outcome. However, polymorphisms may (and probably often will) influence more than one intermediate phenotype,
234
Causality and Psychopathology
and this may mean they proxy for more than one environmentally modifiable risk factor. This can be the case through multiple effects mediated by their RNA expression or protein coding, through alternative splicing, where one polymorphic region contributes to alternative forms of more than one protein (Glebart, 1998), or through other mechanisms. The most robust interpretations will be possible when the functional polymorphism appears to directly influence the level of the intermediate phenotype of interest (as in the CRP example), but such examples are probably going to be less common in Mendelian randomization than cases where the polymorphism can influence several systems, with different potential interpretations of how the effect on outcome is generated. How to Investigate Reintroduced Confounding Within Mendelian Randomization Linkage disequilibrium and pleiotropy can reintroduce confounding and vitiate the power of the Mendelian randomization approach. Genomic knowledge may help in estimating the degree to which these are likely to be problems in any particular Mendelian randomization study, through, for instance, explication of genetic variants that may be in linkage disequilibrium with the variant under study or the function of a particular variant and its known pleiotropic effects. Furthermore, genetic variation can be related to measures of potential confounding factors in each study, and the magnitude of such confounding can be estimated. Empirical studies to date suggest that common genetic variants are largely unrelated to the behavioral and socioeconomic factors considered to be important confounders in conventional observational studies. However, relying on measuring of confounders does, of course, remove the central purpose of Mendelian randomization, which is to balance unmeasured as well as measured confounders (as randomization does in RCTs). In some circumstances the genetic variant will be related to the environmentally modifiable exposure of interest in some populations but not in others. An example of this relates to the alcohol ALDH2 genotype and blood pressure example discussed earlier. The results displayed relate to men because in the populations under study women drink very little whatever their genotype (Figure 9.14). If ALDH2 genetic variation influenced blood pressure for reasons other than its influence on alcohol intake, for example, if it was in linkage disequilibrium with another genetic variant that influenced blood pressure through another pathway or if there was a pleiotropic effect of the genetic variant on blood pressure, the same genotype–blood pressure association should be seen among men and women. If, however, the genetic variant influences only blood pressure through its effect on alcohol intake, an effect should be seen only in men. Figure 9.15 demonstrates that the genotype–blood pressure association is indeed seen
9 Obtaining Robust Causal Evidence From Observational Studies
235
only in men, further strengthening evidence that the genotype–blood pressure association depends upon the genotype influencing alcohol intake and that the associations do indeed provide casual evidence of an influence of alcohol intake on blood pressure. In some cases it may be possible to identify two separate genetic variants that are not in linkage disequilibrium with each other but that serve as proxies for the environmentally modifiable risk factor of interest. If both variants are related to the outcome of interest and point to the same underlying association, then it becomes much less plausible that reintroduced confounding explains the association since it would have to be acting in the same way for these two unlinked variants. This can be likened to RCTs of different blood pressure–lowering agents, which work through different mechanisms and have different potential side effects but lower blood pressure to the same degree. If the different agents produce the same reductions in cardiovascular disease risk, then it is unlikely that this is through agentspecific effects of the drugs; rather, it points to blood pressure lowering as being key. The use of multiple genetic variants working through different pathways has not been applied in Mendelian randomization to date but represents an important potential development in the methodology.
Canalization and Developmental Stability Perhaps a greater potential problem for Mendelian randomization than reintroduced confounding arises from the developmental compensation that may
60
Women Men
Alcohol g/day
50 40 30 20 10 0 *1*1
*1*2
*2*2
Figure 9.14 ALDH2 genotype by alcohol consumption (g/day): five studies, n = 6,815. From ‘‘Alcohol intake and blood pressure: A systematic review implementing Mendelian randomization approach,’’ by L. Chen, G. Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
236
Causality and Psychopathology
22vs11 (Male) Saito et al., 2003 Tsuritani et al., 1955 Amamoto et al., 2002 Yamada et al., 2002 Takagi et al., 2001 Subtotal (I-squared = 18.0%, p = 0.300)
−13.90 (−20.45, −7.35) −6.80 (−13.06, −0.54) −8.40 (−13.13, −3.67) −6.80 (−11.28, −2.32) −5.90 (−9.15, −2.65) −7.44 (−9.49, −5.39)
22vs11 (Female) Amamoto et al., 2002 Takagi et al., 2001 Subtotal (I-squared = 0.0%, p = 0.767)
0.90 (-3.33, 5.13) 0.10 (-3.07, 3.27) 0.39 (-2.15, 2.93)
Figure 9.15 ALDH2 genotype and systolic blood pressure. From ‘‘Alcohol intake and blood pressure: A systematic review implementing Mendelian randomization approach,’’ by L. Chen, G. Davey Smith, R. Harbord, & S. Lewis, 2008, PLoS Medicine, 5, e52.
occur through a polymorphic genotype being expressed during fetal or early postnatal development and, thus, influencing development in such a way as to buffer against the effect of the polymorphism. Such compensatory processes have been discussed since C. H. Waddington (1942) introduced the notion of canalization in the 1940s. Canalization refers to the buffering of the effects of either environmental or genetic forces attempting to perturb development, and Waddington’s ideas have been well developed both empirically and theoretically (Wilkins, 1997; Rutherford, 2000; Gibson & Wagner, 2000; Hartman, Garvik, & Hartwell, 2001; Debat & David, 2001; Kitami & Nadeau, 2002; Gu et al., 2003; Hornstein & Shomron, 2006). Such buffering can be achieved either through genetic redundancy (more than one gene having the same or similar function) or through alternative metabolic routes, where the complexity of metabolic pathways allows recruitment of different pathways to reach the same phenotypic end point. In effect, a functional polymorphism expressed during fetal development or postnatal growth may influence the expression of a wide range of other genes, leading to changes that may compensate for the influence of the polymorphism. Put crudely, if a person has developed and grown from the intrauterine period onward within an environment in which one factor is perturbed (e.g., there is elevated CRP due to genotype), then that person may be rendered resistant to the influence of lifelong elevated circulating CRP, through permanent changes in tissue structure and function that counterbalance its effects. In intervention trials—for example, RCTs of cholesterol-lowering drugs—the intervention is generally randomized to participants during middle age; similarly, in observational studies of this issue, cholesterol levels are ascertained during adulthood. In Mendelian randomization, on the other hand, randomization occurs before birth. This leads to important caveats when attempting to relate the findings of conventional observational epidemiological studies to the findings of studies carried out within the Mendelian randomization paradigm.
9 Obtaining Robust Causal Evidence From Observational Studies
237
The most dramatic demonstrations of developmental compensation come from knockout studies, where a functioning gene is essentially removed from an organism. The overall phenotypic effects of such knockouts have often been much lower than knowledge of the function of the genes would predict, even in the absence of others genes carrying out the same function as the knockout gene (Morange, 2001; Shastry, 1998; Gerlai, 2001; Williams & Wagner , 2000). For example, pharmacological inhibition demonstrates that myoglobulin is essential to maintain energy balance and contractile function in the myocardium of mice, yet disrupting the myoglobulin gene resulted in mice devoid of myoglobulin with no disruption of cardiac function (Garry et al., 1998). In the field of animal studies—such as knockout preparations or transgenic animals manipulated so as to overexpress foreign DNA—the interpretive problem created by developmental compensation is well recognized (Morange, 2001; Shastry, 1998; Gerlai, 2001; Williams & Wagner, 2000). Conditional preparations—in which the level of transgene expression can be induced or suppressed through the application of external agents—are now being utilized to investigate the influence of such altered gene expression after the developmental stages during which compensation can occur (Bolon & Galbreath, 2002). Thus, further evidence on the issue of genetic buffering should emerge to inform interpretations of both animal and human studies. Most examples of developmental compensation relate to dramatic genetic or environmental insults; thus, it is unclear whether the generally small phenotypic differences induced by common functional polymorphisms will be sufficient to induce compensatory responses. The fact that the large gene– environment interactions that have been observed often relate to novel exposures that have not been present during the evolution of a species (e.g., drug interactions) (Wright et al., 2002) may indicate that homogenization of response to exposures that are widely experienced—as would be the case with the products of functional polymorphisms or common mutations— has occurred; canalizing mechanisms could be particularly relevant in these cases. Further work on the basic mechanisms of developmental stability and how this relates to relatively small exposure differences during development will allow these considerations to be taken forward. Knowledge of the stage of development at which a genetic variant has functional effects will also allow the potential of developmental compensation to buffer the response to the variant to be assessed. In some Mendelian randomization designs developmental compensation is not an issue. For example, when maternal genotype is utilized as an indicator of the intrauterine environment, the response of the fetus will not differ whether the effect is induced by maternal genotype or by environmental perturbation and the effect on the fetus can be taken to indicate the
238
Causality and Psychopathology
effect of environmental influences during the intrauterine period. Also, in cases where a variant influences an adulthood environmental exposure (e.g., ALDH2 variation and alcohol intake), developmental compensation to genotype will not be an issue. In many cases of gene by environment interaction interpreted with respect to causality of the environmental factor, the same applies. However, in some situations there remains the somewhat unsatisfactory position of Mendelian randomization facing a potential problem that cannot currently be adequately assessed. The parallels between Mendelian randomization in human studies and equivalent designs in animal studies are discussed in Box 9.3. Complexity of Associations and Interpretations The interpretation of findings from studies that appear to fall within the Mendelian randomization remit can often be complex, as has been previously discussed with respect to MTHFR and folate intake (Davey Smith & Ebrahim, 2003). As a second example, consider the association of extracellular superoxide dismutase (EC-SOD) and CHD. EC-SOD is an extracellular scavenger of superoxide anions, and thus, genetic variants associated with higher circulating EC-SOD levels might be considered to mimic higher levels of antioxidants. However, findings are dramatically opposite to this—bearers of such variants have an increased risk of CHD (Juul et al., 2004). The explanation of this apparent paradox may be that the higher circulating EC-SOD levels associated with the variant arises from movement of EC-SOD from arterial walls; thus, the in situ antioxidative properties of these arterial walls are lower in individuals with the variant associated with higher circulating EC-SOD. The complexity of these interpretations—together with their sometimes speculative nature—detracts from the transparency that otherwise makes Mendelian randomization attractive. Lack of Suitable Genetic Variants to Proxy for Exposure of Interest An obvious limitation of Mendelian randomization is that it can examine only areas for which there are functional polymorphisms (or genetic markers linked to such functional polymorphisms) that are relevant to the modifiable exposure of interest. In the context of genetic association studies more generally it has been pointed out that in many cases even if a locus is involved in a disease-related metabolic process there may be no suitable marker or functional polymorphism to allow study of this process (Weiss & Terwilliger, 2000). In an earlier work on Mendelian randomization (Davey Smith & Ebrahim, 2003) we discussed the example of vitamin C since one of our
9 Obtaining Robust Causal Evidence From Observational Studies
239
box 9.3 Meiotic Randomization in Animal Studies The approach to causal inference underlying Mendelian randomization is also utilized in nonhuman animal studies. For instance, in investigations of the structural neuroanatomical factors underlying behavioral traits in rodents, there has been use of genetic crosses that lead to different onaverage structural features (Roderic, Wimer, & Wimer, 1976; Weimer, 1973; Lipp et al., 1989). Lipp et al. (1989) refer to this as ‘‘meiotic randomization’’ and consider that the advantages of this method are that the brain-morphology differences that are due to genetic difference occur before any of the behavioral traits develop and, therefore, these differences cannot be a feedback function of behavior (which is equivalent to the avoidance of reverse causality in human Mendelian randomization studies) and that other difference between the animals are randomized with respect to the brainmorphology differences of interest (equivalent to the avoidance of confounding in human Mendelian randomization studies). Li and colleagues (2006) apply this method to the dissection of adiposity and body composition in mice and point out that in experimental crosses meiosis serves as a randomization mechanism that distributes naturally occurring genetic variation in a combinatorial fashion among a set of cross progeny. Genetically randomized populations share the properties of statistically designed experiments that provide a basis for causal inference. This is consistent with the notion that causation flows from genes to phenotypes. We propose that the inference of causal direction can be extended to include relationships among phenotypes. Mendelian randomization within epidemiology reflects similar thinking among transgenic animal researchers. Williams and Wagner (2000) consider that A properly designed transgenic experiment can be a thing of exquisite beauty in that the results support absolutely unambiguous conclusions regarding the function of a given gene or protein within the authentic biological context of an intact animal. A transgenic experiment may provide the most rigorous test possible of a mechanistic hypothesis that was generated by previous observational studies. A successful transgenic experiment can cut through layers of uncertainty that cloud the interpretation of the results produced by other experimental designs. (continued)
Causality and Psychopathology
240
Box 9.3 (continued) The problems of interpreting some aspects of transgenic animal studies may also apply to Mendelian randomization within genetic epidemiology, however; and linked progress across the fields of genomics, animal experimentation, and epidemiology will better define the scope of Mendelian randomization in the future.
examples of how observational epidemiology appeared to have got the wrong answer related to vitamin C. We considered whether the association between vitamin C and CHD could have been studied utilizing the principles of Mendelian randomization. We stated that polymorphisms exist that are related to lower circulating vitamin C levels—for example, the haptoglobin polymorphism (Langlois, Delanghe, De Buyzere, Bernard, & Ouyang, 1997; Delanghe, Langlois, Duprez, De Buyzere, & Clement, 1999)—but in this case the effect on vitamin C is at some distance from the polymorphic protein and, as in the apolipoprotein E example, the other phenotypic differences could have an influence on CHD risk that would distort examination of the influence of vitamin C levels through relating genotype to disease. SLC23A1—a gene encoding for the vitamin C transporter SVCT1 and vitamin C transport by intestinal cells—would be an attractive candidate for Mendelian randomization studies. However, by 2003 (the date of our earlier report) a search for variants had failed to find any common single-nucleotide polymorphism that could be used in such a way (Erichsen, Eck, Levine, & Chanock, 2001). We therefore used this as an example of a situation where suitable polymorphisms for studying the modifiable risk factor of interest—in this case, vitamin C—could not be located. However, since the earlier report, a functional variation in SLC23A1 has been identified that is related to circulating vitamin C levels (N. J. Timpson et al., 2010). We use this example not to suggest that the obstacle of locating relevant genetic variation for particular problems in observational epidemiology will always be overcome but to point out that rapidly developing knowledge of human genomics will identify more variants that can serve as instruments for Mendelian randomization studies.
Conclusions: Mendelian Randomization, What It Is and What It Is Not Mendelian randomization is not predicated on the presumption that genetic variants are major determinants of health and disease within populations.
9 Obtaining Robust Causal Evidence From Observational Studies
241
There are many cogent critiques of genetic reductionism and the overselling of ‘‘discoveries’’ in genetics that reiterate obvious truths so clearly (albeit somewhat repetitively) that there is no need to repeat them here (e.g., Berkowitz, 1996; Baird, 2000; Holtzman, 2001; Strohman, 1993; Rose, 1995). Mendelian randomization does not depend upon there being ‘‘genes for’’ particular traits and certainly not in the strict sense of a gene for a trait being one that is maintained by selection because of its causal association with that trait (Kaplan & Pigliucci, 2001). The association of genotype and the environmentally modifiable factor that it proxies for will be like most genotype–phenotype associations, one that is contingent and cannot be reduced to individual-level prediction but within environmental limits will pertain at a group level (Wolf, 1995). This is analogous to an RCT of antihypertensive agents, where at the collective level the group randomized to active medication will have lower mean blood pressure than the group randomized to placebo but at the individual level many participants randomized to active treatment will have higher blood pressure than many individuals randomized to placebo. Indeed, in the phenocopy/genocopy example of pellagra and Hartnup disease discussed in Box 9.1, only a minority of the Hartnup gene carriers develop symptoms but at the group level they have a much greater tendency for such symptoms and a shift in amino acid levels that reflects this (Scriver, Mahon, & Levy, 1987; Scriver, 1988). These grouplevel differences are what creates the analogy between Mendelian randomization and RCTs, outlined in Figure 9.11. Finally, the associations that Mendelian randomization depend upon do need to pertain to a definable group at a particular time but do not need to be immutable. Thus, ALDH2 variation will not be related to alcohol consumption in a society where alcohol is not consumed, and the association will vary by gender and by cultural group and may change over time (Higuchi et al., 1994; Hasin et al., 2002). Within the setting of a study of a well-defined group, however, the genotype will be associated with group-level differences in alcohol consumption and group assignment will not be associated with confounding variables.
Mendelian Randomization and Genetic Epidemiology Critiques of contemporary genetic epidemiology often focus on two features of findings from genetic association studies: that the population attributable risk of the genetic variants is low and that in any case the influence of genetic factors is not reversible. Illustrating both of these criticisms, Terwilliger and Weiss (2003, p. 35) suggest as reasons for considering that many of the current claims regarding genetic epidemiology are hype (1) that alleles identified as increasing the risk of common diseases ‘‘tend to be
242
Causality and Psychopathology
involved in only a small subset of all cases of such diseases’’ and (2) that in any case ‘‘while the concept of attributable risk is an important one for evaluating the impact of removable environmental factors, for non-removable genetic risk factors, it is a moot point.’’ These evaluations of the role of genetic epidemiology are not relevant when considering the potential contributions of Mendelian randomization. This approach is not concerned with the population attributable risk of any particular genetic variant but the degree to which associations between the genetic variant and disease outcomes can demonstrate the importance of environmentally modifiable factors as causes of disease, for which the population attributable risk is of relevance to public-health prioritization. Consider, for example, the case of familial hypercholesterolemia or familial defective apo B. The genetic mutations associated with these conditions will account for only a trivial percentage of cases of CHD within the population (i.e., the population attributable risk will be low). For example, in a Danish population, the frequency of familial defective apo B is 0.08% and, despite its sevenfold increased risk of CHD, will generate a population attributable risk of only 0.5% (Tybjaerg-Hansen, Steffensen, Meinertz, Schnohr, & Nordestgaard, 1998). However, by identifying blood cholesterol levels as a causal factor for CHD, the triangular association between genotype, blood cholesterol, and CHD risk identifies an environmentally modifiable factor with a very high population attributable risk—assuming that 50% of the population have raised blood cholesterol above 6.0 mmol/l and this is associated with a relative risk of twofold, a population attributable risk of 33% is obtained. The same logic applies to the other examples—the attributable risk of the genotype is low, but the population attributable risk of the modifiable environmental factor identified as causal through the genotype–disease associations is large. The same reasoning applies when considering the suggestion that since genotype cannot be modified, genotype– disease associations are not of public-health importance (Terwilliger & Weiss, 2003). The point of Mendelian randomization approaches is not to attempt to modify genotype but to utilize genotype–disease associations to strengthen inferences regarding modifiable environmental risks for disease and then reduce disease risk in the population through applying this knowledge. Mendelian randomization differs from other contemporary approaches to genetic epidemiology in that its central concern is not with the magnitude of genetic variant influences on disease but, rather, with what the genetic associations tell us about environmentally modifiable causes of disease. Many years ago, in this Noble Prize acceptance speech, the pioneering geneticist Thomas Hunt Morgan contrasted his views with the then popular genetic approach to disease, eugenics. He thought that ‘‘through public hygiene and protective measures of various kinds we can more successfully cope with
9 Obtaining Robust Causal Evidence From Observational Studies
243
some of the evils that human flesh is heir to. Medical science will here take the lead—but I hope that genetics can at times offer a helping hand’’ (Morgan, 1935). More than seven decades later, it might now be time that genetic research can directly strengthen the knowledge base of public health.
References Ames, B. N. (1999). Cancer prevention and diet: Help from single nucleotide polymorphisms. Proceedings of the National Academy of Sciences USA, 96, 12216–12218. Baird, P. (2000). Genetic technologies and achieving health for populations. International Journal of Health Services, 30, 407–424. Baron, D. N., Dent, C. E., Harris, H., Hart, E. W., & Jepson, J. B. (1956). Hereditary pellagra-like skin rash with temporary cerebellar ataxia, constant renal aminoaciduria, and other bizarre biochemical features. Lancet, 268, 421–429. Berkowitz, A. (1996). Our genes, ourselves? Bioscience, 46, 42–51. Berkson, J. (1946). Limitations of the application of fourfold table analysis to hospital data. Biometric Bulletin, 2, 47–53. Bhatti, P., Sigurdson, A. J., Wang, S. S., Chen, J., Rothman, N., Hartge, P., et al. (2005). Genetic variation and willingness to participate in epidemiological research: Data from three studies. Cancer Epidemiology, Biomarkers and Prevention, 14, 2449–2453. Birge, S. J., Keutmann, H. T., Cuatrecasas, P., & Whedon, G. D. (1967). Osteoporosis, intestinal lactase deficiency and low dietary calcium intake. New England Journal of Medicine, 276, 445–448. Bolon, B., & Galbreath, E. (2002). Use of genetically engineered mice in drug discovery and development: Wielding Occam’s razor to prune the product portfolio. International Journal of Toxicology, 21, 55–64. Botto, L. D., & Yang, Q. (2000). 5,10-Methylenetetrahydrofolate reductase gene variants and congenital anomalies: A HuGE review. American Journal of Epidemiology, 151, 862–877. Bovet, P., & Paccaud, F. (2001). Alcohol, coronary heart disease and public health: Which evidence-based policy? International Journal of Epidemiology, 30, 734–737. Brennan, P. (2002). Gene environment interaction and aetiology of cancer: What does it mean and how can we measure it? Carcinogenesis, 23(3), 381–387. Broer, S., Cavanaugh, J. A., & Rasko, J. E. J. (2004). Neutral amino acid transport in epithelial cells and its malfunction in Hartnup disorder. Transporters, 33, 233–236. Burd, L. J. (2006). Interventions in FASD: We must do better. Child: Care, Health, and Development, 33, 398–400. Burr, M. L., Fehily, A. M., Butland, B. K., Bolton, C. H., & Eastham, R. D. (1986). Alcohol and high-density-lipoprotein cholesterol: A randomized controlled trial. British Journal of Nutrition, 56, 81–86. Cardon, L. R., & Bell, J. I. (2001). Association study designs for complex diseases. Nature Reviews: Genetics, 2, 91–99. Casas, J. P., Shah, T., Cooper, J., Hawe, E., McMahon, A. D., Gaffney, D., et al. (2006). Insight into the nature of the CRP–coronary event association using Mendelian randomization. International Journal of Epidemiology, 35, 922–931.
244
Causality and Psychopathology
Chao, Y.-C., Liou, S.-R., Chung, Y.-Y., Tang, H.-S., Hsu, C.-T., Li, T.-K., et al. (1994). Polymorphism of alcohol and aldehyde dehydrogenase genes and alcoholic cirrhosis in Chinese patients. Hepatology, 19, 360–366. Chen, L., Davey Smith, G., Harbord, R., & Lewis, S. (2008). Alcohol intake and blood pressure: A systematic review implementing Mendelian randomization approach. PLoS Medicine, 5, e52. Cheverud, J. M. (1988). A comparison of genetic and phenotypic correlations. Evolution, 42, 958–968. Clayton, D., & McKeigue, P. M. (2001). Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet, 358, 1356–1360. Colhoun, H., McKeigue, P. M., & Davey Smith, G. (2003). Problems of reporting genetic associations with complex outcomes. Lancet, 361, 865–872. Correns, C. (1900). G. Mendel’s Regel u¨ber das Verhalten der Nachkommenschaft der Bastarde. Berichte der Deutschen Botanischen Gesellschaft, 8, 158–168. (English translation, Correns, C. [1966]. G. Mendel’s law concerning the behavior of progeny of varietal hybrids. In Stern and Sherwood [pp. 119–132]. New York: W. H. Freeman.) Czeizel, A. E., & Duda´s, I. (1992). Prevention of the first occurrence of neural-tube defects by periconceptional vitamin supplementation. New England Journal of Medicine, 327, 1832–1835. Danesh, J., Wheller, J. B., Hirschfield, G. M., Eda, S., Eriksdottir, G., Rumley, A., et al. (2004). C-reactive protein and other circulating markers of inflammation in the prediction of coronary heart disease. New England Journal of Medicine, 350, 1387–1397. Davey Smith, G. (2006). Cochrane Lecture. Randomised by (your) god: Robust inference from an observational study design. Journal of Epidemiology and Community Health, 60, 382–388. Davey Smith, G., & Ebrahim, S. (2002). Data dredging, bias, or confounding [Editorial]. British Medical Journal, 325, 1437–1438. Davey Smith, G., & Ebrahim, S. (2003). ‘‘Mendelian randomization’’: Can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology, 32, 1–22. Davey Smith, G., & Ebrahim, S. (2004). Mendelian randomization: Prospects, potentials, and limitations. International Journal of Epidemiology, 33, 30–42. Davey Smith, G., & Ebrahim, S. (2005). What can Mendelian randomization tell us about modifiable behavioural and environmental exposures. British Medical Journal, 330, 1076–1079. Davey Smith, G., Harbord, R., Milton, J., Ebrahim, S., & Sterne, J. A. C. (2005a). Does elevated plasma fibrinogen increase the risk of coronary heart disease? Evidence from a meta-analysis of genetic association studies. Arteriosclerosis, Thrombosis, and Vascular Biology, 25, 2228–2233. Davey Smith, G., & Hart, C. (2002). Lifecourse socioeconomic and behavioural influences on cardiovascular disease mortality: The Collaborative study. American Journal of Public Health, 92, 1295–1298. Davey Smith, G., Lawlor, D. A., Harbord, R., Timpson, N. J., Day, I., & Ebrahim, S. (2008). Clustered environments and randomized genes: A fundamental distinction between conventional and genetic epidemiology. PLoS Medicine, 4, 1985–1992. Davey Smith, G., Lawlor, D., Harbord, R., Timpson, N., Rumley, A., Lowe, G., et al. (2005b). Association of C-reactive protein with blood pressure and hypertension: Lifecourse confounding and Mendelian randomization tests of causality. Arteriosclerosis, Thrombosis, and Vascular Biology, 25, 1051–1056.
9 Obtaining Robust Causal Evidence From Observational Studies
245
Davey Smith, G., & Phillips, A. N. (1996). Inflation in epidemiology: ‘‘The proof and measurement of association between two things’’ revisited. British Medical Journal, 312, 1659–1661. Davey Smith, G., Timpson, N. & Ebrahim, S. (2008). Strengthening causal inference in cardiovascular epidemiology through Mendelian randomization. Annals of Medicine, 40, 524–541. Debat, V., & David, P. (2001). Mapping phenotypes: Canalization, plasticity and developmental stability. Trends in Ecology and Evolution, 16, 555–561. Delanghe, J., Langlois, M., Duprez, D., De Buyzere, M., & Clement, D. (1999). Haptoglobin polymorphism and peripheral arterial occlusive disease. Atherosclerosis, 145, 287–292. Ebrahim, S., & Davey Smith, G. (2008). Mendelian randomization: Can genetic epidemiology help redress the failures of observational epidemiology? Human Genetics, 123, 15–33. Eidelman, R. S., Hollar, D., Hebert, P. R., Lamas, G. A., & Hennekens, C. H. (2004). Randomized trials of vitamin E in the treatment and prevention of cardiovascular disease. Archives of Internal Medicine, 164, 1552–1556. Enomoto, N., Takase, S., Yasuhara, M., & Takada, A. (1991). Acetaldehyde metabolism in different aldehyde dehydrogenase-2 genotypes. Alcoholism, Clinical and Experimental Research, 15, 141–144. Erichsen, H. C., Eck, P., Levine, M., & Chanock, S. (2001). Characterization of the genomic structure of the human vitamin C transporter SVCT1 (SLC23A2). Journal of Nutrition, 131, 2623–2627. Færgeman, O. (2003). Coronary artery disease: Genes drugs and the agricultural connection. Amsterdam: Elsevier. Fallon, U. B., Ben-Shlomo, Y., & Davey Smith, G. (2001, March 14). Homocysteine and coronary heart disease. Heart. http://heart.bmjjournals.com/cgi/eletters/85/2/153 Garry, D. J., Ordway, G. A., Lorenz, J. N., Radford, E. R., Chin, R. W., Grange, R., et al. (1998). Mice without myoglobulin. Nature, 395, 905–908. Gause, G. F. (1942). The relation of adaptability to adaption. Quarterly Review of Biology, 17, 99–114. Gemma, S., Vichi, S., & Testai, E. (2007). Metabolic and genetic factors contributing to alcohol induced effects and fetal alcohol syndrome. Neuroscience and Biobehavioral Reviews, 31, 221–229. Gerlai, R. (2001). Gene targeting: Technical confounds and potential solutions in behavioural and brain research. Behavioural Brain Research, 125, 13–21. Gibson, G., & Wagner, G. (2000). Canalization in evolutionary genetics: A stabilizing theory? BioEssays, 22, 372–380. Glebart, W. M. (1998). Databases in genomic research. Science, 282, 659–661. Glynn, R. K. (2006). Genes as instruments for evaluation of markers and causes [Commentary]. International Journal of Epidemiology, 35, 932–934. Goldschmidt, R. B. (1938). Physiological genetics. New York: McGraw-Hill. Gray, R., & Wheatley, K. (1991). How to avoid bias when comparing bone marrow transplantation with chemotherapy. Bone Marrow Transplantation, 7(Suppl. 3), 9–12. Gu, J., Liang, D., Wang, Y., Lu, C., & Wu, X. (2005). Effects of N-acetyl transferase 1 and 2 polymorphisms on bladder cancer risk in Caucasians. Mutation Research, 581, 97–104. Gu, Z., Steinmetz, L. M., Gu, X., Scharfe, C., Davis, R. W., & Li, W.-H. (2003). Role of duplicate genes in genetic robustness against null mutations. Nature, 421:63–66.
246
Causality and Psychopathology
Gutjahr, E., Gmel, G., & Rehm, J. (2001). Relation between average alcohol consumption and disease: An overview. European Addiction Research, 7, 117–127. Guy, J. T. (1993). Oral manifestations of systematic disease. In C. W. Cummings, J. Frederick, L. Harker, C. Krause, & D. Schuller (Eds.), Otolaryngology—head and neck surgery (Vol. 2). St. Louis: Mosby Year Book. Han, T. S., Sattar, N., Williams, K., Gonzalez-Villalpando, C., Lean, M. E., & Haffner, S. M. (2002). Prospective study of C-reactive protein in relation to the development of diabetes and metabolic syndrome in the Mexico City Diabetes Study. Diabetes Care, 25, 2016–2021. Hart, C., Davey Smith, G., Hole, D., & Hawthorne, V. (1999). Alcohol consumption and mortality from all causes, coronary heart disease, and stroke: Results from a prospective cohort study of Scottish men with 21 years of follow up. British Medical Journal, 318, 1725–1729. Hartman, J. L., Garvik, B., & Hartwell, L. (2001). Principles for the buffering of genetic variation. Science, 291, 1001–1004. Hasin, D., Aharonovich, E., Liu, X., Mamman, Z., Matseoane, K., Carr, L., et al. (2002). Alcohol and ADH2 in Israel: Ashkenazis, Sephardics, and recent Russian immigrants. American Journal of Psychiatry, 159(8), 1432–1434. Haskell, W. L., Camargo, C., Williams, P. T., Vranizan, K. M., Krauss, R. M., Lindgren, F. T., et al. (1984). The effect of cessation and resumption of moderate alcohol intake on serum high-density-lipoprotein subfractions. New England Journal of Medicine, 310, 805–810. Heart Protection Study Collaborative Group. (2002). MRC/BHF Heart Protection Study of antioxidant vitamin supplementation in 20536 high-risk individuals: A randomised placebo-controlled trial. Lancet, 360, 23–33. Higuchi, S., Matsuushita, S., Imazeki, H., Kinoshita, T., Takagi, S., & Kono, H. (1994). Aldehyde dehydrogenase genotypes in Japanese alcoholics. Lancet, 343, 741–742. Hirschfield, G. M., & Pepys, M. B. (2003). C-reactive protein and cardiovascular disease: New insights from an old molecule. Quarterly Journal of Medicine, 9, 793–807. Holtzman, N. A. (2001). Putting the search for genes in perspective. International Journal of Health Services, 31, 445. Honkanen, R., Pulkkinen, P., Ja¨rvinen, R., Kro¨ger, H., Lindstedt, K., Tuppurainen, M., et al. (1996). Does lactose intolerance predispose to low bone density? A populationbased study of perimenopausal Finnish women. Bone, 19, 23–28. Hornstein, E., & Shomron, N. (2006). Canalization of development by microRNAs. Nature Genetics, 38, S20–S24. Hu, F. B., Meigs, J. B., Li, T. Y., Rifai, N., & Manson, J. E. (2004). Inflammatory markers and risk of developing type 2 diabetes in women. Diabetes, 53, 693–700. Jablonka-Tavory, E. (1982). Genocopies and the evolution of interdependence. Evolutionary Theory, 6, 167–170. Jacobson, S. W., Carr, L. G., Croxford, J., Sokol, R. J., Li, T. K., & Jacobson, J. L. (2006). Protective effects of the alcohol dehydrogenase-ADH1B allele in children exposed to alcohol during pregnancy. Journal of Pediatrics, 148, 30–37. Jousilahti, P., & Salomaa, V. (2004). Fibrinogen, social position, and Mendelian randomisation. Journal of Epidemiology and Community Health, 58, 883. Juul, K., Tybjaerg-Hansen, A., Marklund, S., Heegaard, N. H. H., Steffensen, R., Sillesen, H., et al. (2004). Genetically reduced antioxidative protection and increased ischaemic heart disease risk: The Copenhagen City Heart Study. Circulation, 109, 59–65.
9 Obtaining Robust Causal Evidence From Observational Studies
247
Kaplan, J. M., & Pigliucci, M. (2001). Genes ‘‘for’’ phenotypes: A modern history view. Biology and Philosophy, 16, 189–213. Katan, M. B. (1986). Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet, I, 507–508 (reprinted International Journal of Epidemiology, 2004, 34, 9). Kathiresan, S., Melander, O., Anevski, D., Guiducci, C., Burtt, N. P., Roos, C., et al. (2008). Polymorphisms associated with cholesterol and risk of cardiovascular events. New England Journal of Medicine, 358, 1240–1249. Keavney, B. (2002). Genetic epidemiological studies of coronary heart disease. International Journal of Epidemiology, 31, 730–736. Keavney, B., Danesh, J., Parish, S., Palmer, A., Clark, S., Youngman, L., et al.; International Studies of Infarct Survival (ISIS) Collaborators. (2006). Fibrinogen and coronary heart disease: Test of causality by ‘‘Mendelian randomization.’’ International Journal of Epidemiology, 35, 935–943. Kelada, S. N., Eaton, D. L., Wang, S. S., Rothman, N. R., & Khoury, M. J. (2003). The role of genetic polymorphisms in environmental health. Environmental Health Perspectives, 111, 1055–1064. Khaw, K.-T., Bingham, S., Welch, A., Luben, R., Wareham, N., Oakes, S., et al. (2001). Relation between plasma ascorbic acid and mortality in men and women in EPICNorfolk prospective study: A prospective population study. Lancet, 357, 657–663. Kitami, T., & Nadeau, J. H. (2002). Biochemical networking contributes more to genetic buffering in human and mouse metabolic pathways than does gene duplication. Nature Genetics, 32, 191–194. Klatsky, A. L. (2001). Could abstinence from alcohol be hazardous to your health [Commentary]? International Journal of Epidemiology, 30, 739–742. Kraut, J. A., & Sachs, G. (2005). Hartnup disorder: Unravelling the mystery. Trends in Pharmacological Sciences, 26, 53–55. Langlois, M. R., Delanghe, J. R., De Buyzere, M. L., Bernard, D. R., & Ouyang, J. (1997). Effect of haptoglobin on the metabolism of vitamin C. American Journal of Clinical Nutrition, 66, 606–610. Lawlor, D. A., Davey Smith, G., Kundu, D., Bruckdorfer, K. R., & Ebrahim, S. (2004). Those confounded vitamins: what can we learn from the differences between observational versus randomised trial evidence? Lancet, 363, 1724–1727. Lawlor, D. A., Ebrahim, S., Kundu, D., Bruckdorfer, K. R., Whincup, P. H., & Davey Smith, G. (2005). Vitamin C is not associated with coronary heart disease risk once life course socioeconomic position is taken into account: Prospective findings from the British Women’s Heart and Health Study. Heart, 91, 1086–1087. Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N., & Davey Smith, G. (2008). Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology. Statistics in Medicine, 27, 1133–1163. Leimar, O., Hammerstein, P., & Van Dooren, T. J. M. (2006). A new perspective on developmental plasticity and the principles of adaptive morph determination. American Naturalist, 167, 367–376. Lenz, W. (1973). Phenocopies. Journal of Medical Genetics, 10, 34–48. Lewis, S., & Davey Smith, G. (2005). Alcohol, ALDH2 and esophageal cancer: A meta-analysis which illustrates the potentials and limitations of a Mendelian randomization approach. Cancer Epidemiology, Biomarkers and Prevention, 14, 1967–1971. Li, R., Tsaih, S. W., Shockley, K., Stylianou, I. M., Wergedal, J., Paigen, B., et al. (2006). Structural model analysis of multiple quantative traits. PLoS Genetics, 2, 1046–1057.
248
Causality and Psychopathology
Lipp, H. P., Schwegler, H., Crusio, W. E., Wolfer, D. P., Leisinger-Trigona, M. C., Heimrich, B., et al. (1989). Using genetically-defined rodent strains for the identification of hippocampal traits relevant for two-way avoidance behaviour: A noninvasive approach. Experientia, 45, 845–859. Little, J., & Khoury, M. J. (2003). Mendelian randomization: A new spin or real progress? Lancet, 362, 930–931. Lower, G. M., Nilsson, T., Nelson, C. E., Wolf, H., Gamsky, T. E., & Bryan, G. T. (1979). N-Acetylransferase phenotype and risk in urinary bladder cancer: Approaches in molecular epidemiology. Environmental Health Perspectives, 29, 71–79. MacMahon S, Peto R, Collins R, Godwin J, MacMahon S, Cutler J et al. (1990). Blood pressure, stroke, and coronary heart disease. The Lancet, 335, 765-774. Marks, D., Thorogood, M., Neil, H. A. W., & Humphries, S. E. (2003). A review on diagnosis, natural history and treatment of familial hypercholesterolaemia. Atherosclerosis, 168, 1–14. Marmot, M. (2001). Reflections on alcohol and coronary heart disease. International Journal of Epidemiology, 30, 729–734. McGrath, J. (1999). Hypothesis: Is low prenatal vitamin D a risk-modifying factor for schizophrenia? Schizophrenia Research, 40, 173–177. Memik, F. (2003). Alcohol and esophageal cancer, is there an exaggerated accusation? Hepatogastroenterology, 54, 1953–1955. Mendel, G. (1866). Experiments in plant hybridization. Retrieved from http:// www.mendelweb.org/archive/Mendel.Experiments.txt Millen, A. E., Dodd, K. W., & Subar, A. F. (2004). Use of vitamin, mineral, nonvitamin, and nonmineral supplements in the United States: The 1987, 1992, and 2000 National Health Interview Survey results. Journal of the American Dietetic Association, 104, 942–950. Morange, M. (2001). The misunderstood gene. Cambridge, MA: Harvard University Press. Morgan, T. H. (1913). Heredity and sex. New York: Columbia University Press. Morgan, T. H. (1919). Physical basis of heredity. Philadelphia: J. B. Lippincott. Morgan, T. H. (1935). The relation of genetics to physiology and medicine. Scientific Monthly, 41, 5–18. MRC Vitamin Study Research Group. (1991). Prevention of neural tube defects: Results of the Medical Research Council vitamin study. Lancet, 338, 131–137. Mucci, L. A., Wedren, S., Tamimi, R. M., Trichopoulos, D., & Adami, H. O. (2001). The role of gene–environment interaction in the aetiology of human cancer: Examples from cancers of the large bowel, lung and breast. Journal of Internal Medicine, 249, 477–493. Newcomer, A. D., Hodgson, S. F., Douglas, M. D., & Thomas, P. J. (1978). Lactase deficiency: Prevalence in osteoporosis. Annals of Internal Medicine, 89, 218–220. Olby, R. C. (1966). Origins of Mendelism. London: Constable. Osier, M. V., Pakstis, A. J., Soodyall, H., Comas, D., Goldman, D., Odunsi, A., et al. (2002). A global perspective on genetic variation at the ADH genes reveals unusual patterns of linkage disequilibrium and diversity. American Journal of Human Genetics, 71, 84–99. Palmer L and Cardon L. (2005). Shaking the tree: Mapping complex disease genes with linkage disequilibrium. Lancet, 366, 1223–1234. Perera, F. P. (1997). Environment and cancer: Who are susceptible? Science, 278, 1068–1073.
9 Obtaining Robust Causal Evidence From Observational Studies
249
Pradhan, A. D., Manson, J. E., Rifai, N., Buring, J. E., & Ridker, P. M. (2001). C-reactive protein, interleukin 6, and risk of developing type 2 diabetes mellitus. Journal of the American Medical Association, 286, 327–334. Radimer, K., Bindewald, B., Hughes, J., Ervin, B., Swanson, C., & Picciano, M. F. (2004). Dietary supplement use by US adults: Data from the National Health and Nutrition Examination Survey, 1999–2000. American Journal of Epidemiology, 160, 339–349. Reynolds, K., Lewis, L. B., Nolen, J. D. L., Kinney, G. L., Sathya, B., & He, J. (2003). Alcohol consumption and risk of stroke: A meta-analysis. Journal of the American Medical Association, 289, 579–588. Ridker, P. M., Cannon, C. P., Morrow, D., Rifai, N., Rose, L. M., McCabe, C. H., et al. (2005). C-reactive protein levels and outcomes after statin therapy. New England Journal of Medicine, 352, 20–28. Rimm, E. (2001). Alcohol and coronary heart disease—laying the foundation for future work [Commentry]. International Journal of Epidemiology, 30, 738–739. Rimm, E. B., Stampfer, M. J., Ascherio, A., Giovannucci, E., Colditz, G. A., & Willett, W. C. (1993). Vitamin E consumption and the risk of coronary heart disease in men. New England Journal of Medicine, 328, 1450–1456. Roderic, T. H., Wimer, R. E., & Wimer, C. C. (1976). Genetic manipulation of neuroanatomical traits. In L. Petrinovich & J. L. McGaugh (Eds.), Knowing, thinking, and believing. New York: Plenum Press. Rose, G. (1982). Incubation period of coronary heart disease. British Medical Journal, 284, 1600–1601. Rose, S. (1995). The rise of neurogenetic determinism. Nature, 373, 380–382. Roseboom, T. J., van der Meulen, J. H., Osmond, C., Barker, D. J. P., Ravelli, A. C. J., Schroeder-Tanka, J. M., et al. (2000). Coronary heart disease after prenatal exposure to the Dutch famine, 1944–45. Heart, 84, 595–598. Rothman, N., Wacholder, S., Caporaso, N. E., Garcia-Closas, M., Buetow, K., & Fraumeni, J. F. (2001). The use of common genetic polymorphisms to enhance the epidemiologic study of environmental carcinogens. Biochimica et Biophysica Acta, 1471, C1–C10. Rutherford, S. L. (2000). From genotype to phenotype: Buffering mechanisms and the storage of genetic information. BioEssays, 22, 1095–1105. Scholl, T. O., & Johnson, W. G. (2000). Folic acid: Influence on the outcome of pregnancy. American Journal of Clinical Nutrition, 71(Suppl.), 1295S–1303S. Scientific Steering Committee on Behalf of the Simon Broome Register Group. (1991). Risk of fatal coronary heart disease in familial hyper-cholesterolaemia. British Medical Journal, 303, 893–896. Scriver, C. R. (1988). Nutrient–gene interactions: The gene is not the disease and vice versa. American Journal of Clinical Nutrition, 48, 1505–1509. Scriver, C. R., Mahon, B., & Levy, H. L. (1987). The Hartnup phenotype: Mendelain transport disorder, multifactorial disease. American Journal of Human Genetics, 40, 401–412. Sesso, D., Buring, J. E., Rifai, N., Blake, G. J., Gaziano, J. M., & Ridker, P. M. (2003). C-reactive protein and the risk of developing hypertension. Journal of the American Medical Association, 290, 2945–2951. Shaper, A. G. (1993). Alcohol, the heart, and health [Editorial]. American Journal of Public Health, 83, 799–801. Shastry, B. S. (1998). Gene disruption in mice: Models of development and disease. Molecular and Cellular Biochemistry, 181, 163–179.
250
Causality and Psychopathology
Sjo¨holm, A., & Nystro¨m, T. (2005). Endothelial inflammation in insulin resistance. Lancet, 365, 610–612. Slack, J. (1969). Risks of ischaemic heart disease in familial hyperlipoproteinaemic states. Lancet, 2, 1380–1382. Snyder, L. H. (1959). Fifty years of medical genetics. Science, 129, 7–13. Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101. Stampfer, M. J., Hennekens, C. H., Manson, J. E., Colditz, G. A., Rosner, B., & Willett, W. C. (1993). Vitamin E consumption and the risk of coronary disease in women. New England Journal of Medicine, 328, 1444–1449. Steinberg, D. (2004). Thematic review series. The pathogenesis of atherosclerosis. An interpretive history of the cholesterol controversy: part 1. Journal of Lipid Research, 45, 1583–1593. Steinberg, D. (2005). Thematic review series. The pathogenesis of atherosclerosis. An interpretive history of the cholesterol controversy: part II. The early evidence linking hypercholesterolemia to coronary disease in humans. Journal of Lipid Research, 46, 179–190. Strohman, R. C. (1993). Ancient genomes, wise bodies, unhealthy people: The limits of a genetic paradigm in biology and medicine. Perspectives in Biology and Medicine, 37, 112–145. Takagi, S., Iwai, N., Yamauchi, R., Kojima, S., Yasuno, S., Baba, T., et al. (2002). Aldehyde dehydrogenase 2 gene is a risk factor for myocardial infarction in Japanese men. Hypertension Research, 25, 677–681. Terwilliger, J. D., & Weiss, W. M. (2003). Confounding, ascertainment bias, and the blind quest for a genetic ‘‘fountain of youth.’’ Annals of Medicine, 35, 532–544. Thomas, D. C., & Conti, D. V. (2004). Commentary on the concept of ‘‘Mendelian randomization.’’ International Journal of Epidemiology, 33, 17–21. Thompson, W. D. (1991). Effect modification and the limits of biological inference from epidemiological data. Journal of Clinical Epidemiology, 44, 221–232. Thun, M. J., Peto, R., Lopez, A. D., Monaco, J. H., Henley, S. J., Heath, C. W., et al. (1997). Alcohol consumption and mortality among middle-aged and elderly U.S. adults. New England Journal of Medicine, 337, 1705–1714. Timpson, N. J., Lawlor, D. A., Harbord, R. M., Gaunt, T. R., Day, I. N. M., Palmer, L. J., et al. (2005). C-reactive protein and its role in metabolic syndrome: Mendelian randomization study. Lancet, 366:1954–1959. Timpson NJ, Forouhi NH, Brion M-J et al. (2010). Genetic variation at the SLC23A1 locus is associated with circulating levels of L-ascorbic acid (Vitamin C). Evidence from 5 independent studies with over 15000 participants. Am J Clin Nutr, on-line, 001: 10.3945/ajen.2010.29438. Tybjaerg-Hansen, A., Steffensen, R., Meinertz, H., Schnohr, P., & Nordestgaard, B. G. (1998). Association of mutations in the apolipoprotein B gene with hypercholesterolemia and the risk of ischemic heart disease. New England Journal of Medicine, 338, 1577–1584. Verma, S., Szmitko, P. E., & Ridker, P. M. (2005). C-reactive protein comes of age. Nature Clinical Practice, 2, 29–36. Waddington, C. H. (1942). Canalization of development and the inheritance of acquired characteristics. Nature, 150, 563–565.
9 Obtaining Robust Causal Evidence From Observational Studies
251
Warren, K. R., & Li, T. K. (2005). Genetic polymorphisms: Impact on the risk of fetal alcohol spectrum disorders. Birth Defects Research A: Clinical and Molecular Teratology, 73, 195–203. Weimer, R. E. (1973). Dissociation of phenotypic correlation: Response to posttrial etherization and to temporal distribution of practice trials. Behavior Genetics, 3, 379–386. Weiss, K., & Terwilliger, J. (2000). How many diseases does it take to map a gene with SNPs? Nature Genetics, 26, 151–157. West-Eberhard, M. J. (2003). Developmental plasticity and evolution. New York: Oxford University Press. Wheatley, K., & Gray, R. (2004). Mendelian randomization—an update on its use to evaluate allogeneic stem cell transplantation in leukaemia [Commentary]. International Journal of Epidemiology, 33, 15–17. Wilkins, A. S. (1997). Canalization: A molecular genetic perspective. BioEssays, 19, 257–262. Williams, R. S., & Wagner, P. D. (2000). Transgenic animals in integrative biology: Approaches and interpretations of outcome. Journal of Applied Physiology, 88, 1119–1126. Wolf, U. (1995). The genetic contribution to the phenotype. Human Genetics, 95, 127–148. Wright, A. F., Carothers, A. D., & Campbell, H. (2002). Gene–environment interactions—the BioBank UK study. Pharmacogenomics Journal, 2, 75–82. Wu, T., Dorn, J. P., Donahue, R. P., Sempos, C. T., & Trevisan, M. (2002). Associations of serum C-reactive protein with fasting insulin, glucose, and glycosylated hemoglobin: The Third National Health and Nutrition Examination Survey, 1988–1994. American Journal of Epidemiology, 155, 65–71. Youngman, L. D., Keavney, B. D., Palmer, A., Parish, S., Clark, S., Danesh, J., et al. (2000). Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial infarction and in 6002 controls: test of causality by ‘‘Mendelian randomization.’’ Circulation, 102(Suppl. II), 31–32. Zuckerkandl, E., & Villet, R. (1988). Concentration—affinity equivalence in gene regulation: Convergence and envirnonmental effects. Proceedings of the National Academy of Sciences USA, 85, 4784–4788.
10 Rare Variant Approaches to Understanding the Causes of Complex Neuropsychiatric Disorders matthew w. state
The distinction between genetic variation that is present in more than 5% of the population (defined as common) and genetic variation that does not meet this threshold (defined as rare) is often lost in the discussion of psychiatric genetics. As a general proposition, the field has come to equate the hunt for common variants (or alleles) with the search for genes causing or contributing to psychiatric illness. Indeed, the majority of studies on mood disorders, autism, schizophrenia, obsessive–compulsive disorder, attention-deficit/hyperactivity disorder, and Tourette syndrome have restricted their analyses to the potential contribution of common alleles. Studies focusing on rare genetic mutations have, until quite recently, been viewed as outside the mainstream of efforts aimed at elucidating the biological substrates of serious psychopathology. Both the implicit assumption that common alleles underlie the lion’s share of risk for most common neuropsychiatric conditions and the notion that the most expeditious way to elucidate their biological bases will be to concentrate efforts on common alleles deserve careful scrutiny. Indeed, key findings across all of human genetics, including those within psychiatry, support the following alternative conclusions: (1) for disorders such as autism and schizophrenia, the study of rare variants already holds the most immediate promise for defining the molecular and cellular mechanisms of disease (McClellan, Susser, & King, 2007; O’Roak & State, 2008); (2) common variation will be found to carry much more modest risks than previously anticipated (Altshuler & Daly, 2007; Saxena et al., 2007); and (3) rare variation will account for substantial risk for common complex disorders, particularly for neuropsychiatric conditions with relatively early onset and chronic course. This chapter addresses the rare variant genetic approach specifically with respect to mental illness. It first introduces the distinction between the key 252
10 Rare Variant Approaches to Neuropsychiatric Disorders
253
characteristics of common and rare genetic variation. It then briefly addresses the methodologies employed to demonstrate a causal or contributory role for genes in complex disease, focusing on how these approaches differ in terms of the ability to detect and confirm the role of rare variation. The chapter will then turn to a consideration of the genetics of autism-spectrum disorders as a case study of the manner in which rare variants may contribute to the understanding of psychiatric genetics, and finally, the discussion will conclude with a consideration of the implications of emerging genomic technologies for this process.
Genetic Variation The search for ‘‘disease genes’’ is more precisely the search for diseaserelated genetic variation. Basic instructions are coded in DNA to create and sustain life; these instructions vary somewhat between individuals, creating a primary source of human diversity. Variation in these instructions is also thought to be largely responsible for differences in susceptibility to diseases influenced by genes. Concretely, when individuals differ at the level of DNA, it is often with regard to the sequence of its four constituent parts, called ‘‘nucleotides’’ or ‘‘bases,’’ which make up the DNA code: adenine (A), guanine (G), cytosine (C), and thymine (T). Indeed, within the human genome, variations at individual nucleotides appear quite frequently (approximately 1 in every 1,000 bases) (International Human Genome Sequencing Consortium, 2004; Lander et al., 2001; McPherson et al., 2001). The vast majority of this variation is related to an individual’s ethnic origin and has no overt consequence for human disease. However, it is not known at present what proportion of the observed differences between individuals either within our outside of regions of the genome that specify the production of proteins (through the process of transcription and translation) might confer subtle alterations in function. At present, while elegant and inventive approaches are being employed to address the question, particularly with regard to ‘‘noncoding’’ DNA (Noonan, 2009; Prabhakar et al., 2008), the consequences of sequence variations identified in these regions remain difficult to interpret. Consequently, while only 2% of the genome is ultimately translated into protein, it is this subset that is most readily understood with regard to its impact on a phenotype of interest (International Human Genome Sequencing Consortium, 2004; Lander et al., 2001; McPherson et al., 2001). The terminology applied to genetic variation may be somewhat confusing due to a number of redundant or loosely defined terms. While a threshold of 5% is often used as the cutoff for rare variation, many authors also
254
Causality and Psychopathology
distinguish between these and very rare (<1%) alleles. Common variations, regardless of their impact on gene function, are often referred to as polymorphisms or alleles, but both terms are also at times applied to any change in the genome, regardless of its frequency, that does not appear to be deleterious to the function of the RNA or protein that it encodes. In the ensuing discussion we use the term polymorphism to refer to common variants. Some authors refer to rare variants and mutations synonymously. In the current discussion, mutation will refer to the subcategory of rare variation that is thought to cause or carry risk for disease. Several other terms warrant definition here: Common variations at a single nucleotide are typically referred to as single-nucleotide polymorphisms (SNPs). For example, if the sequence in a specific region of DNA is ACTCTCCT in most individuals, but in more than 5% of individuals the same region reads as ACTCTACT on at least one of a pair of chromosomes carrying this sequence, this would represent an SNP with the major allele being C and the minor allele being A. Moreover, the frequency of the ‘‘A’’ would be referred to as the minor allele frequency. SNPs are thought to be most often the consequence of a single error in replication of the DNA at some point in human history that has subsequently spread through the population. A second form of variation often used in genetic studies involves variable numbers of DNA repeats: These are short, repetitive sequences of nucleotides that are prone to instability during DNA replication. This type of variation is known as a short tandem repeat (STR). In this case, the sequence abbreviated GTACACAGT found on one chromosome in an individual might be found to be GTACACACACACACT on a second chromosome or in another individual. Interestingly, these types of repeats are so frequently prone to change that there are often multiple forms within a population, but they are not so changeable as to be likely to undergo expansion among closely related individuals. These properties, along with the ease of assaying STRs, make them highly suitable for tracing DNA inheritance from generation to generation, as will be discussed later. A third, much more recently appreciated type of variation is known as a copy number variant (CNV) (Iafrate et al., 2004; Redon et al., 2006; Sebat et al., 2004). In this case, the structure of chromosomes varies among individuals. For example, a deletion or duplication of DNA might be present on one chromosome in an individual but not in another. CNVs specifically refer to these types of changes that fall below the resolution of the light microscope. Chromosomal variations that exceed this threshold are now referred to as ‘‘gross’’ cytogenetic deletions, duplications, or rearrangements. The lower size bound of a CNV and how it is distinguished from a small insertion or deletion of DNA sequence (called an in/del) vary from author to author, but a common cutoff is 1,000 base pairs of DNA.
10 Rare Variant Approaches to Neuropsychiatric Disorders
255
Over the last few years it has become clear that CNVs are distributed throughout the genome, populating even those regions that contain genes coding for RNAs and proteins. Previously, the finding of a loss of a coding region in an affected individual was taken as prima facie evidence for a causal relationship between the variation and the observed phenotype. However, as microarrays ushered in an era of much higher resolution analysis of the genome, it has become clear that this conventional wisdom reflected an implicit, incorrect assumption regarding the ‘‘intactness’’ of the genome. In fact, it is now clear that widespread copy number variation is seen among control populations (Redon et al., 2006; Sharp, Cheng, & Eichler, 2006), requiring more rigorous approaches to demonstrating a relationship between a structural change in the genome and a clinical outcome. Finally, an important distinction regarding the distribution of disorderrelated variation is often made: If multiple variations in a single gene lead to or contribute to an outcome of interest, this is referred to as allelic heterogeneity. If variations in many different genes may lead to a single disease or syndrome, which is referred to as locus heterogeneity (a locus is simply a given region of the genome). Both are widely observed in human disease and have been invoked across nearly all complex disorders to help explain the difficulties that have been encountered in clarifying the genetic bases of disease (Botstein & Risch, 2003). Irrespective of whether a variant is introduced into the sequence or structure of the DNA, once it is present in the human genome, natural selection plays a defining role. Our current understanding of this process is certainly more nuanced than when first proposed, but the basic notions continue to serve well to understand the dynamics of variation: Changes that do not impact reproductive fitness in a negative fashion may be readily passed from generation to generation and, over time, have the potential to become common. Alternatively, changes that result in impaired fitness are subject to negative or purifying selection, decreasing the frequency of that allele in the population. Of course, the impact on fitness is only one of several forces that dictate the population frequency of a specific genetic variant. A variant newly introduced into the population would most likely be rare, regardless of its functional consequences, given the large number of possible positions for this change and the lack of time for it to be distributed through the population. Moreover, the history of specific ethnic groups, including migration patterns and social norms, can significantly influence the dynamics of particular genetic variants over time. With regard to disease risk, it has nonetheless proven quite instructive to think about the distinction between advantageous or neutral and deleterious variants. Based on these classical notions, alleles contributing to early-onset disorders that reduce fertility or indirectly lead to decreased reproductive
256
Causality and Psychopathology
fitness would be expected over time to be driven down in frequency in the population and become rare. Of course, given the caveats presented in the previous paragraph, it is important to recall that while deleterious variation is likely to be rare, rare variation is not uniformly deleterious. Alleles related to diseases of later life, those that do not have a negative impact on fitness or which carry small deleterious effects, those in which a positive effect of a given variation counterbalances a negative consequence (balancing selection), and any variation that is physically quite close to an allele that is advantageous (a ‘‘hitchhiking’’ allele) are some of the mechanisms that would be expected to lead to common genetic variants. The relative contribution of rare and common variation to psychiatric illness is a matter of considerable theoretical and practical significance.
Demonstrating the Role of Common and Rare Variations in Human Disease Risk The field of human genetics has a record of tremendous accomplishment in those disorders for which a single gene directly causes or dramatically increases the risk for a disease or syndrome (Botstein & Risch, 2003). As noted, the task of gene discovery in disorders in which many genes and many variations may potentially play a causal or contributory role has remained challenging. In the face of continued uncertainty, two alternative (but not mutually exclusive) paradigms have emerged: a common variant: common disease model (Chakravarti, 1999) and a rare variant:common disease hypothesis (Ji et al., 2008; Pritchard, 2001). Until quite recently, the former has been favored in the study of neuropsychiatric disorders. For instance, it has been widely held that syndromes such as schizophrenia, depression, bipolar disorder, and autism result from the combined effect of multiple common genetic variants, each carrying modest effects and interacting with environmental factors. This hypothesis has garnered favor for a variety of reasons. First, for these and other complex disorders, both modeling and early experimental evidence have essentially ruled out the idea of a single gene of major effect explaining a substantial portion of the disease/syndrome in the population (Risch, 1990). Second, in many of these conditions, extended family members can be found to show signs of subtle, subclinical or ‘‘component’’ phenotypes, suggesting they are nearing but have not reached a liability threshold (Constantino et al., 2006; Constantino & Todd, 2005). Third, most of the genetic variation carried within a population is in the form of common variants (International Human Genome Sequencing Consortium, 2004; Lander et al., 2001; McPherson et al., 2001), and it has been hypothesized that common
10 Rare Variant Approaches to Neuropsychiatric Disorders
257
disorders will likely reflect this architecture. Finally, as many disorders of interest are both relatively common and found throughout the world, the notion that common polymorphisms shared across ethnic groups will be identified as risk factors is intuitively attractive. An alternative model, the rare variant:common disease hypothesis, has long been proposed and has gained particular traction over the past several years due to several convergent factors: (1) recent very strong rare variant findings from studies of CNVs in autism and schizophrenia, (2) an appraisal of a wave of successful common variant investigations across all of medicine, and (3) the emergence of new genomic technologies that are making largescale genomewide rare variant studies feasible. The rare variant:common disease hypothesis posits that common disorders may be the result of multiple individually rare variants that contribute either alone or in combination to common phenotypes. This notion is particularly attractive for disorders with early onset and those that theoretically alter reproductive fitness. As noted for such conditions, in the absence of balancing selection, one would expect that, on average, alleles with appreciable effects would be driven down in frequency by natural selection. In addition, one would expect a significant contribution from individually rare mutations for disorders in which so-called sporadic or de novo variation plays an important role. Of course, the contributions of common and rare variants are not mutually exclusive; it is quite possible, and indeed likely, that both will be found to contribute to many common neuropsychiatric conditions (McClellan et al., 2007; Veenstra-Vanderweele, Christian, & Cook, 2004; VeenstraVanderWeele & Cook, 2004). However, the distinction is of tremendous importance with respect to current genetic studies, in part because the approaches employed to identify a causal or contributory role for genes in neuropsychiatric disorders can differ, sometimes dramatically, in terms of their ability to detect rare versus common variation. This issue will be addressed specifically with regard to three major strategies for gene identification: linkage, association, and cytogenetic/CNV analysis.
Linkage Studies In linkage analysis, one seeks to determine if the transmission of any chromosomal segment from one generation to another within a family or families coincides with the presence of the phenotype of interest. If every chromosome (or, in practice, every autosome) is evaluated simultaneously, the study is referred to as a genomewide linkage scan. This process of tracing inheritance relies fundamentally on genetic variation. If every chromosome were identical, it would be impossible to observe
258
Causality and Psychopathology
the process of transfer from grandparent to parent to child. As one is not able feasibly (quite yet) to read the entire sequence of DNA, genetic variants such as SNPs or STRs are used to mark the difference between chromosomes. These genetic ‘‘markers’’ then are evaluated for intergenerational transmission. Broadly, linkage studies evaluate the probability that a given phenotype and a particular genetic marker, or series of markers, are transmitted together from one generation to another. To appreciate how this process might lead to the identification of a disease gene, one needs to refer back to basic genetic principles. When egg and sperm are formed, homologous chromosomes from each of the 22 pairs of autosomes exchange large blocks of information through a process known as homologous recombination and crossing over. As a consequence, each of the chromosomes in the haploid gamete is, on average, a mixture of the two parental chromosomes. During this process of gamete formation, the likelihood that any two points on a parental chromosome will have a crossover event between them is a function of how far apart they are physically on that chromosome: Loci at opposite ends will be likely to be separated by a crossover. Regions that are close to each other will be less likely to have a crossover between them and will tend to pass together through multiple generations. Thus, the closer a particular marker is to a mutation causing the disorder, the more likely the marker and the mutation will be to travel together in every generation. For a disorder in which a single gene or genetic variation is being transmitted within a family resulting in an identifiable clinical outcome, narrowing the region containing an offending genetic variant is relatively straightforward, given a sufficient number of families, markers, and transmissions. It is worthwhile to stress that this type of analysis is initially aimed at identifying a location (the piece of the ancestral chromosome) that carries both the marker and the mutation. The subsequent process of identifying the mutation in the involved gene within a linkage interval is typically referred to as fine mapping. Two basic approaches to linkage analysis predominate in the hunt for human disease genes. The first is known as parametric linkage: A hypothesis about the nature of the proposed genetic transmission is developed, related parameters are specified based on this hypothesis (e.g., whether the disorder is dominantly inherited, recessively inherited, or sex-linked), and one investigates the actual pattern of transmission among subjects. The odds of seeing the given pattern of transmission are determined based on the hypothesis that the disease and marker(s) are very close to each other, and this is compared to the odds of observing the same pattern of transmission in the absence of this linkage (the null hypothesis). The ratio of observed versus expected odds is most often then transformed into a log10 scale and expressed as the logarithm of the odds, or lod, score.
10 Rare Variant Approaches to Neuropsychiatric Disorders
259
Parametric linkage analysis is a powerful approach to investigating rare Mendelian disorders, as demonstrated by the current accumulation of more than 1,200 genes identified for such conditions (http://www.ncbi.nlm.nih. gov/omim/). In those instances in which this approach to linkage has been used with respect to complex neuropsychiatric disorders, it is typically the result of the identification of a rare family or families demonstrating inheritance that is simpler than what is presumed to be the norm for the overall condition (Brzustowicz, Hodgkinson, Chow, Honer, & Bassett, 2000; Laumonnier et al., 2004; Strauss et al., 2006). The reasons for this include the fact that such analyses can be quite sensitive to misspecification of parameters and are limited in their ability to tolerate locus heterogeneity within and across families or bilineal inheritance (risks coming from both maternal and paternal lineages). Given a clear consensus that no common neuropsychiatric condition will be accounted for solely by a single rare genetic variation, many researchers, particularly those interested in common alleles, began to favor an alternative approach known as nonparametric linkage. This approach does not require the specification of a hypothesis regarding the mode or character of inheritance. Instead, one seeks to identify any region of the genome that is shared among affected related individuals (or not shared in affected or unaffected relative pairs) more often than would be expected by chance. This method does not require all of the identified families to carry the same causal or contributory genetic variation among them. Consequently, like parametric linkage, such studies are tolerant to allelic heterogeneity and theoretically able to identify both common and rare variants. However, they are not as well suited as association studies (discussed later, see Genetic Association) to accommodate a substantial amount of locus heterogeneity as the sample sizes necessary to identify many loci simultaneously, particularly those with modest effects, can be impractically large (Risch & Merikangas, 1996). Like parametric linkage, nonparametric analyses often use the lod score to measure the statistical significance of the result. While there is ongoing debate over the precise criteria to declare a result significant, the most commonly used thresholds involve a genomewide lod score of 3.0 for parametric studies and 3.6 for nonparametric studies (Lander & Kruglyak, 1995). Of course, statistical significance may be strong evidence for a genotype– phenotype relationship, but it is not generally considered definitive. For both parametric and nonparametric approaches, either replication in an independent sample or, more importantly, the identification of the offending deleterious mutation(s) is taken as substantiating evidence. This has proven easier in practice in the case of parametric versus nonparametric analyses, likely owing both to its use in the study of simpler Mendelian disorders (in which the relationship between genotype and phenotype is highly reliable and often
260
Causality and Psychopathology
nearly 1:1) as well as to the fact that fine mapping is more readily accomplished in parametric versus nonparametric studies. Biological studies of the implicated gene in vitro and in vivo, including modeling the identified human mutations, is another highly desirable avenue for developing convergent evidence to support a linkage finding. The practical reality is that in neuropsychiatric disorders the relevant tissue is most often not accessible for direct study in humans, rendering model systems particularly attractive. Nonetheless, it is important to recall that there are critical differences between the human and the rodent brain (or fly or worm) and that these differences may be particularly relevant to the domains of function that are of most interest in neuropsychiatric disorders. On the one hand, the demonstration of a ‘‘neural’’ phenotype in an animal carrying a human mutation or allele may be instructive and often the first step toward the identification of the relevant highly conserved molecular pathways across species. However, there have been many instances in which knockouts of genes recapitulating clearly causal Mendelian mutations in neurodevelopmental syndromes have not resulted in phenotypes resembling those found in humans, suggesting the need for some caution in the interpretation of model systems data.
Genetic Association In contrast to linkage studies, association methodologies are ‘‘cross-sectional’’ as they investigate variation across populations as opposed to studying genetic transmissions within families. In essence, the methodology relies on a classic case–control design: Genetic variants are identified as the ‘‘exposure,’’ and the allele frequency is compared in affected and unaffected individuals. It is important to mention here that while case–control analysis has become the most widely used association strategy of late, there are variations on this theme, called transmission tests, that rely on a combination of linkage and association and evaluate parent–child trios. These approaches have also been quite popular, particularly with regard to pediatric disorders. Until recently, genetic association studies could feasibly investigate only one or a small number of known, common genetic polymorphisms in or near an identified gene(s) of interest. Relative to nonparametric linkage analyses, the approach is theoretically better able to detect small increments of risk; but, importantly, it is not able in practice to detect rare variants contributing at a particular locus. Given the popularity of common variant candidate gene association studies across all of medicine and particularly in psychiatric genetics, this distinction is quite important: It is not uncommon for either positive or negative results to be reported with respect to a gene suggesting that it is or is not associated with a disorder. In fact, the
10 Rare Variant Approaches to Neuropsychiatric Disorders
261
methodology in practice tests only for the contribution of common variation at that locus and not the gene itself. In addition to the tolerance to locus heterogeneity, common variant association studies have been extremely popular for a variety of reasons, including their logistical ease. The evaluation of known polymorphisms is relatively inexpensive, and the ascertainment of unrelated cases and controls, or even family-based trios in childhood disorders, is far easier than identifying and recruiting either affected siblings or finding large multiply affected families. These facts, coupled with the general conviction that common polymorphisms account for the majority of risk in the disorders of interest, understandably have made so-called candidate gene association studies historically the most frequently attempted type of human genetic investigation. Widespread experience with this approach has led to an appreciation of some of its drawbacks as well, beyond that related to its inability to detect the contribution of rare variants. Perhaps the most pressing is the observation that the vast majority of positive findings from these studies in the medical literature have not been replicated (Hirschhorn & Altshuler, 2002; Hirschhorn, Lohmueller, Byrne, & Hirschhorn, 2002). Among the various reasons for this are (1) the potential to misinterpret genetic variation related to ethnic differences in cases and controls as disorder-related polymorphisms, (2) sample sizes that have until quite recently been inadequate to reliably identify the small differences in risk attributable to alleles contributing to complex disorders, and (3) preferential publication of positive results among an ample group of underpowered studies. It is also likely that the requirement that investigators choose a small number of candidate genes and markers to evaluate in any given study has contributed to difficulties in identifying true-positive associations. A complete evaluation of these limitations is beyond the scope of this discussion; it is sufficient to note that common variant association studies are able to detect only the contribution of variants that are being tested directly, typically restricted to alleles of 5% population frequency or greater, and those that are nearby, again sharing this minimum frequency (Zondervan & Cardon, 2004). Until the early 2000s, a sufficient number of known SNPs was not available, and it was impractical to test simultaneously the number of markers that would be required to provide information regarding all genes or most genes in the genome (International HapMap Consortium, 2005; Daly, Rioux, Schaffner, Hudson, & Lander, 2001; Gabriel et al., 2002). However, recently, this calculation has been transformed: first, through the identification of millions of SNPs via the sequencing of the human genome and, second, through the development of microarray technologies that allow for hundreds of thousands to millions of SNPs to be evaluated in a single patient in a single low-cost experiment. As a result, genomewide association
262
Causality and Psychopathology
studies (GWASs) have become the gold standard for common variant discovery in complex disorders (Hirschhorn & Daly, 2005). These investigations take advantage of SNPs spaced across the entire genome to conduct association studies without the requirement of an a priori hypothesis regarding a specific gene or genes. This technological advance, in conjunction with a now sufficiently large collection of patient samples, has led to a spate of studies that have begun to confirm the clear contribution of common alleles to common disease (Bilguvar et al., 2008; Hakonarson et al., 2007; Saxena et al., 2007; Scott et al., 2007; Zeggini & McCarthy, 2007). Several aspects of these recent findings deserve comment here. First, it is remarkable to begin to see concrete evidence for the common allele:common disease hypothesis after years of uncertainty. It is notable, however, that the scale of the effect of individual alleles identified in recent studies has been extraordinarily modest, explaining why very large sample sizes have been required to clarify contradictory results (Altshuler & Daly, 2007). In neuropsychiatric genetics specifically, a great deal of effort has been expended trying to understand inconsistent common variant association findings. These recent investigations suggest that the simplest answer may suffice: When the sample size is sufficiently large and genomewide association is employed, reproducible results will emerge if common variants play a role (Psychiatric GWAS Consortium Steering Committee, 2009; Ma et al., 2009; McMahon et al., 2010; Weiss, Arking, Daly, & Chakravarti, 2009). Similarly, the total amount of individual variation in disease risk accounted for by the identified common alleles has been surprisingly modest. This underscores the fact that the contribution of rare variation might help to explain a larger amount of risk in complex disorders than previously anticipated. Whether a candidate gene study or GWAS, typically the first evidence for a probabilistic relationship between a variation and a clinical phenomenon of interest involves surpassing a preordained statistical threshold. In candidate gene common variant association, there is not yet complete agreement on this issue, including regarding how to appropriately correct for multiple comparisons. The difficulties that have attended replication of studies using this approach have now led to the general expectation that some type of internal replication of association will be attempted prior to publication. Of course, replication in an independent laboratory in a separate sample remains the gold standard. In addition, as either common variant methodology may detect association of an allele that is near, but not directly contained within, the tested set of alleles, the identification of the ‘‘functional’’ variant within the association interval is generally considered strong supporting evidence (State, 2006). As noted, associated variants identified in regulatory and other noncoding sequences mapping very far from known coding regions can pose significant challenges in this regard. Finally, while statistical thresholds
10 Rare Variant Approaches to Neuropsychiatric Disorders
263
for candidate gene studies remain a matter of debate, there has emerged something of a consensus regarding appropriate thresholds for GWAS analyses that are quite stringent and seem to contribute to the markedly improved reliability of this approach compared to prior generations of common variant association analysis. Both GWAS and most candidate gene studies assay known alleles with a preordained minor allele frequency, typically restricting the analysis to common variants. In contrast, a mutation burden approach applies association strategies to rare variants. This is critically important if one desires to test the hypothesis that rare variants may contribute broadly in the population to the occurrence of a common complex disorder or phenotype as opposed to explaining Mendelian inheritance within one or a small number of families. Establishing a population association of rare alleles may in practice be quite challenging. Taken individually, rare and especially very rare alleles at a given locus would require sample sizes that could not practically be reached to achieve a statistically significant result. An alternative method of addressing risk assesses the total amount of rare variation present within a gene or genes of interest in cases versus controls. The identification of such rare variations, apart from CNVs (which will be described later, see Cytogenetics and CNV Detection), requires that individual genes be comprehensively evaluated using either direct sequencing or a multistep mutation-screening process that identifies sequences containing possible variations followed by confirmation via sequencing (Abelson et al., 2005). While intuitively attractive, until quite recently, technological realities have placed significant limits on mutation burden approaches; detection of previously unknown rare variants, even via mutation screening, has been many times more expensive than genotyping of known variants. Consequently, in practice, the method has been applied only to candidate genes. Nonetheless, several notable investigations have highlighted the value of these investigations. For example, Helen Hobbs and colleagues have convincingly demonstrated that mutations in genes known to be involved in rare forms of hypocholesterolemia are present in the general population and contribute to the overall variation in high-density lipoprotein levels (Cohen et al., 2004). Similarly, recent work from Richard Lifton’s lab has shown a significant contribution of rare alleles in genes responsible for rare syndromic forms of hypotension to blood pressure variation in the general population (Ji et al., 2008). Moreover, technological advances are promising to vastly expand the application of these types of approaches. Within the past year, so-called next-generation sequencing has made the evaluation of all coding segments of the genome a practical reality (Choi et al., 2009; Ng et al., 2009); and within a relatively short time frame, whole-genome sequencing promises to become commonplace.
264
Causality and Psychopathology
Irrespective of the methods to detect rare variation, it is worth noting that mutation burden analysis poses its own challenges with regard to demonstrating risk. For example, it may be difficult to identify the type of variation that is truly of interest; namely, variations that result in a functional alteration of a gene. This is more easily identified among nonsynonymous or nonsense mutations (i.e., nucleotide changes that would be predicted to result in the substitution of one amino acid for another or the introduction of a stop codon into a protein). However, even among this limited set of variation, distinguishing functional from neutral mutations may be quite challenging. While there are several widely used software programs available to predict deleterious changes in coding sequence, in practice these have not proven to be highly reliable. Given the relatively small numerator expected in rare variant discovery studies, the introduction of even a small number of alleles that cannot be appropriately classified is a potentially critical confound (Ji et al., 2008). Of note, the approach so far successfully adopted by the Hobbs and Lifton groups has been to study genes that are already known to contribute to rare recessive syndromes and for which specific functional alleles have been definitively identified. These functional alleles may serve as a touchstone for the analysis of heterozygous mutations in the same genes. This approach seems to have mitigated some of the liability that has attended the selection of candidate genes in common variant studies. Certainly, at present, mutation burden studies will be most effectively applied when the function of the gene being examined is well known and biological assays exist to evaluate the consequences of the identified rare variants.
Cytogenetics and CNV Detection For 40 years, geneticists have been leveraging the discovery of gross microscopic chromosomal abnormalities to identify disease genes. This approach led to some of the most important initial findings in the field. For example, in the late 1960s, abnormal constriction on the X chromosome was observed in boys with mental retardation (Lubs, 1969; Sutherland, 1977), leading to the discovery of fragile X syndrome and ultimately to cloning of the fragile X mental retardation protein (Fu et al., 1991; Verkerk et al., 1991). As the technology to examine chromosomes has advanced, so too has the power of these approaches. Molecular methods including fluorescence in situ hybridization now readily allow for the identification of the precise location of chromosomal disruption caused by a balanced translocation or inversion. Consequently, a strategy of cloning genes at breakpoints has been applied to the study of a variety of disorders including mental retardation, autism, Tourette syndrome, and schizophrenia (Abelson et al., 2005; Dave & Sanger, 2007;
10 Rare Variant Approaches to Neuropsychiatric Disorders
265
Millar et al., 2000; Vorstman et al., 2006). Typically in this approach, the mapping of a translocation, chromosomal inversion, or deletion is used as a means of identifying a candidate gene(s), which is then further studied for rare variants in patients without known chromosomal abnormalities (Abelson et al., 2005; Jamain et al., 2003). The most recent advances in cytogenetics have been particularly fascinating. As previously noted, new technologies have recently led to the discovery that submicroscopic variations in chromosomal structure are widespread throughout the genomes of normal individuals. Of note, such CNVs can be detected using microarrays, including those designed for SNP genotyping, which currently have a resolution of as small as several hundred bases. One unexpected consequence of CNV detection has been to cast doubt on causal inferences associated with previous cytogenetic investigations. As noted, prior to the discovery of CNVs (particularly their presence in coding regions of the genome), it was largely presumed that a rearrangement or loss of genetic material disrupting gene structure was the likely cause of an observed phenotype. It is now clear that rearrangements may often physically disrupt genes without overtly negative consequences. Conversely, structural derangements that do not map to coding regions of the genome have been known for some time to have deleterious potential (Kleinjan & van Heyningen, 1998; State et al., 2003). A final important observation about copy number detection is that it was the first practical technique that was able to identify both common and rare changes in chromosomal structure at high resolution on a genomewide scale. The implications of this technological advance are discussed in more detail in the following section.
Case Study: Rare Variants and Autism-Spectrum Disorders Autism is a pervasive developmental disorder characterized by findings in three broad domains: delayed and deviant language development, fundamentally impaired social communication, and highly restricted interests or stereotypies. A range of related phenotypes including Asperger syndrome and pervasive developmental disorder (not otherwise specified) comprise a consensus category of autism-spectrum disorders (ASDs). In the Diagnostic and Statistical Manual, fourth edition (DSM-IV), autism is contained within the section on pervasive developmental disorders, which includes (in addition to the syndromes mentioned) Rett syndrome, a rare developmental disorder largely confined to girls, and childhood disintegrative disorder (also sometimes called ‘‘Heller syndrome’’), in which an extended period of normal
266
Causality and Psychopathology
development is followed by significant regression in developmental milestones, leading to an autism phenotype. This syndrome serves well as a case study of the potential value of rare variant approaches to psychiatric genetics. Extensive genetic investigations have already been undertaken in this area, an effort that is perhaps second only to that given to schizophrenia in terms of the scale of the investment. Moreover, as will be argued, ASD is in some sense a paradigmatic developmental neuropsychiatric condition, particularly with respect to conceptualizing the role of rare variant studies. Several decades of study have led to some clear consensus regarding the genetics of autism: The concordance rate is far higher in monozygotic than dizygotic twins (Bailey et al., 1995), suggesting a high degree of heritability. Moreover, as with all common neuropsychiatric syndromes, twin and family studies demonstrate that neither a substantial portion of classically defined autism nor other syndromes in the autism spectrum can be explained by variation within a single gene transmitted in Mendelian fashion (Gupta & State, 2007; O’Roak & State, 2008). Despite this general consensus, the underlying allelic architecture of autism remains uncertain. The vast majority of studies over the past decade have focused on the potential contribution of common variants based on a general consensus that autism is likely to be accounted for by the common variant:common disease model. Despite this conviction, the significant contribution of specific common alleles has only recently been suggested with any degree of confidence (Alarcon et al., 2008; Arking et al., 2008; Campbell et al., 2006; Ma et al., 2009; Weiss et al., 2009); and similar to other medical conditions, the overall individual risk attributable to these alleles is exceedingly modest. Indeed, while recent studies have begun to confirm the contribution of common alleles to ASDs, there is no other neuropsychiatric disorder for which there is stronger evidence supporting the importance of rare variants. In practice, there have been two ways in which rare variant approaches have already made important contributions to the field. The first involves the use of a so-called outlier strategy in which unusual families or rare individuals lead to the identification of rare disease-related alleles that illuminate the pathophysiology of an ASD. The second involves recent evidence that a rare variant:common disease model may apply, suggesting that a substantial risk is carried in the population of affected individuals in the form of rare alleles.
Outlier Approaches In 2004, the identification of several cases of affected females with deletions on the X chromosome led Jamain and colleagues (2003) to evaluate genes in
10 Rare Variant Approaches to Neuropsychiatric Disorders
267
the region of these deletions for rare variants among nearly 200 individuals with ASDs. The authors found a single clearly deleterious mutation in the gene Neuroligin 4 in one family with two affected males. In a second family, a rare variant was also found in Neuroligin 3, a closely related molecule on the X chromosome. This variant was not as unequivocally damaging to protein function but has subsequently been shown to influence synaptic activity in mice (Tabuchi et al., 2007). Shortly after the initial report regarding NLGNs and ASDs, a separate research group used parametric linkage in an extended family with mental retardation and ASD to identify the same X-chromosome interval for Neuroligin 4 (Laumonnier et al., 2004). Fine mapping of this region showed a unique, highly deleterious mutation in NLGN4 present in every affected family member, consistent with Mendelian expectations. These two findings represented the first identification of a functional mutation in cases of idiopathic autism (i.e., autism not accompanied by some other evidence of genetic syndrome) and the first convincing independent replication of a genetic finding in ASDs. Neuroligins are postsynaptic transmembrane neuronal adhesion molecules that interact with neurexins, which are present on the presynaptic terminal (Lise & El-Husseini, 2006). Subsequent studies have confirmed that the mutations identified in NLGN4 in humans with ASDs lead to abnormalities in the specification of excitatory glutamatergic synapses in vitro as well as to synaptic maturation defects in mice (Chih, Afridi, Clark, & Scheiffele, 2004; Chih, Engelman, & Scheiffele, 2005; Chih, Gollan, & Scheiffele, 2006; Varoqueaux et al., 2006), providing important convergent support for the initial finding. While additional mutation screenings of individuals with ASDs have not led to the characterization of further clearly functional variants in NLGN4 (Blasi et al., 2006), several recent studies have provided strong additional evidence for the importance of this finding through the identification of rare mutations among affected individuals in molecules that interact directly or indirectly with the NLGN4 protein. These include SHANK3 (Durand et al., 2007; Moessner et al., 2007) and Neurexin-1 (Kim et al., 2008; Marshall et al., 2008; Szatmari et al., 2007). Another notable rare variant finding reported in the New England Journal of Medicine (Strauss et al., 2006) used parametric linkage analysis to identify a rare homozygous mutation in the gene contactin-associated protein 2 (CNTNAP2) among the Old Order Amish population that led to intractable seizure, autism, and mental retardation. The study was notable both for the statistical power due to the inbred nature of this population and for the availability of pathological brain specimens due to epilepsy surgery performed on several of the probands. As with NLGN4, CNTNAP2 is a neuronal adhesion molecule (Poliak et al., 1999), and recent work has demonstrated that it too is present in the synaptic plasma membrane (Bakkaloglu et al., 2008).
268
Causality and Psychopathology
Moreover, two common variant association studies and a rare variant mutation burden analysis have pointed to this molecule as carrying risk for ASDs (Alarcon et al., 2008; Arking et al., 2008; Bakkaloglu et al., 2008). These findings raise several important issues with regard to rare variants and autism. First, they demonstrate the utility of the outlier approach to provide clues to the pathophysiology of complex disorders. Prior to the identification of NLGN4, no specific data had implicated a molecular or cellular mechanism underlying any aspect of idiopathic autism. Subsequently, considerable effort has been aimed at delineating the relationship between synapse function and ASDs (Zoghbi, 2003). Similarly, the identification and characterization of CNTNAP2, coupled with the long-standing appreciation of increased rates of seizures in individuals with autism, has raised considerable interest in neuronal migration and its potential contribution to ASDs. These findings also point to some of the challenges of demonstrating causality when rare events are being investigated. In the initial identification of NLGN, the link between the observed mutation and the observed phenotype was not inferred due to statistical evidence but, rather, to the specifics of the gene itself and the nature of the observed mutation. In this case, the fact that the gene is located on the X chromosome (thus, only one copy is present in males) and the mutation is clearly deleterious to the formation of protein product led the authors to conclude that the rare variant and the autism phenotype must be related. While they were able to show in their small family that transmission of the mutation was consistent with Mendelian expectations, there were not sufficient observations to support this finding with statistical analysis, nor was there a sufficient number of rare variants identified to conduct a mutation burden study (Laumonnier et al., 2004). Nonetheless, the nature of the NLGN4X mutation, its recapitulation in vitro and in model systems, the independent replication using an alternative method (parametric linkage) in a separate family, and the identification of additional rare variants in a molecular pathway specified by NLGN4 together strongly support the relevance of this molecule for ASDs. The rarity of the clearly deleterious variants among affected individuals and the finding that mutations in NGLN4 do not always result in observable pathology (Macarov et al., 2007) are reminders that even highly penetrant mutations do not always lead to the phenotype of interest and that rare variant discovery may ultimately be extraordinarily valuable even if the initial observations remain restricted to one or an extremely small number of events.
Autism and the Rare Variant:Common Disease Hypothesis From a theoretical standpoint, there is ample reason to believe that rare variants may carry considerable risk for ASD among the general population
10 Rare Variant Approaches to Neuropsychiatric Disorders
269
of affected individuals. As noted, the disorder has an early onset and affects the fundamental ability of individuals to make and keep social relationships; additionally, the monozygotic–dizygotic concordance difference is consistent with a considerable burden of new (and therefore rare) variation. Moreover, there is consistent evidence that autism incidence increases with paternal age (Cantor, Yoon, Furr, & Lajonchere, 2007; Reichenberg et al., 2006), as does the burden of de novo mutation (Crow, 2000). Perhaps most importantly, there is long-standing evidence that individuals with autism are many times more likely than normally developing controls to carry rare (now considered) gross microscopic chromosomal abnormalities, including de novo rearrangements (Bugge et al., 2000; Wassink, Piven, & Patil, 2001). In 2007, Jonathan Sebat and colleagues at Cold Spring Harbor provided dramatic confirmation of the importance of individually rare cytogenetic events in ASD when they evaluated patients with autism in search of de novo copy number changes and showed that in apparently sporadic cases there is a substantial increased burden of rare copy number variation compared both to familial cases of autism and to controls (Sebat et al., 2007). The detection of rare variation at a high resolution across the entire genome, the demonstration of its cumulative burden in the ASD phenotype, and the specific CNVs identified which may provide specific clues to the identity of genes with other rare variants contributing to autism all represent a very significant step forward in the search for multiple independent mutations contributing to ASD. These findings have subsequently been supported by additional studies demonstrating an increased burden of de novo CNVs in autism versus controls (Marshall et al., 2008), as well as studies demonstrating association of rare, recurrent CNVs with ASDs (Bucan et al., 2009; Glessner et al., 2009; Kumar et al., 2008; Szatmari et al., 2007; Weiss et al., 2008). These latter findings underscore how much the dogma regarding rare variation has begun to change, spurred by the advent of CNV detection. For example, the most notable finding with regard to specific copy number alterations and their involvement in ASD has been with respect to a region on the short arm of chromosome 16 (16p11.2) (Kumar et al., 2008; Szatmari et al., 2007; Weiss et al., 2008). The first studies to systematically address the role of this variation in ASD relied on standard case–control association methodology (Kumar et al., 2008; Weiss et al., 2008). They did not seek to demonstrate a one-to-one relationship between carrying the variation and having the phenotype within families, which would have been expected in the era of standard cytogenetics. Indeed, in these initial analyses, 16p11 was observed in unaffected individuals and, more importantly, within families de novo variations were found in one affected individual but not in a second affected sibling. This latter observation would previously have been taken as
270
Causality and Psychopathology
strong evidence against the contribution of this variation within these pedigrees, based on Mendelian expectations for rare disorders. These findings highlight the shift from conceptualizing rare variation as being synonymous with Mendelian inheritance. Indeed, as the risks associated with common variation have been found to be much smaller than previously anticipated, prior notions about effect sizes that would come under negative selection and result in rare transmitted alleles must also be reconsidered. Moreover, as the consequences of rare variants may be more subtle than previously anticipated and their contribution more complex, a shift to association methodologies became a necessity. Fortunately, the field of psychiatric genetics has several decades of experience with these strategies, which point to the key requirements for the next generation of rare variant studies, including controlling for population stratification, accounting for multiple comparisons, and leveraging sufficiently large sample sizes to allow for the detection of alleles of comparatively modest individual effect.
New Technology and Rare Variants Several recent technological advances promise to soon provide the opportunity to more easily identify rare variation contributing to autism and all complex disorders. First, the resolution of CNV detection is increasing at a tremendous pace owing to the ability to place an increasing number of probes on a microarray. In the past several years, there has been a rapid increase from approximately 10,000 to more than 2 million probes on a single assay. Coupled with this, the cost of conducting such experiments has been vastly decreasing, allowing for more and more comprehensive assessment of the contribution of CNVs to a variety of neuropsychiatric phenotypes. A more fundamental change is also on the horizon. As mentioned briefly, the means by which genomes are sequenced is undergoing a dramatic transformation. The throughput of platforms able to read sequence directly is increasing exponentially, while the cost per nucleotide is plummeting. This will have a profound impact on the identification of genetic variation. Indeed, the field is rapidly approaching an era in which the entire sequence of patient and control DNAs will be directly analyzed and the ability to detect both common and rare variations contributing to disease will be exhaustive. Moreover, these new sequencing platforms will also be able to detect gains and losses in DNA at a very high resolution, leading to unprecedented simultaneous detection of a significant proportion of all the variation that is thought to contribute to neuropsychiatric disorders. This development promises to set the stage for an era of discovery that will dwarf even the
10 Rare Variant Approaches to Neuropsychiatric Disorders
271
astonishing recent pace resulting from the development of CNV technology and the implementation of whole-genome association technologies. It is also very likely that these studies will do more than dramatically expand our understanding of both the causal and contributory roles of DNA variation in neuropsychiatric disorders. They will reveal the limits of this avenue of investigation as well, focusing attention then on the need to understand epigenetics, environmental modifications, and posttranscriptional and translational processes and their contribution to complex mental conditions.
References Abelson, J. F., Kwan, K. Y., O’Roak, B. J., Baek, D. Y., Stillman, A. A., Morgan, T. M., et al. (2005). Sequence variants in SLITRK1 are associated with Tourette’s syndrome. Science, 310(5746), 317–320. Alarcon, M., Abrahams, B. S., Stone, J. L., Duvall, J. A., Perederiy, J. V., Bomar, J. M., et al. (2008). Linkage, association, and gene-expression analyses identify CNTNAP2 as an autism-susceptibility gene. American Journal of Human Genetics, 82(1), 150–159. Altshuler, D., & Daly, M. (2007). Guilt beyond a reasonable doubt. Nature Genetics, 39(7), 813–815. Arking, D. E., Cutler, D. J., Brune, C. W., Teslovich, T. M., West, K., Ikeda, M., et al. (2008). A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism. American Journal of Human Genetics, 82(1), 160–164. Bailey, A., Le Couteur, A., Gottesman, I., Bolton, P., Simonoff, E., Yuzda, E., et al. (1995). Autism as a strongly genetic disorder: Evidence from a British twin study. Psychological Medicine, 25(1), 63–77. Bakkaloglu, B., O’Roak, B. J., Louvi, A., Gupta, A. R., Abelson, J. F., Morgan, T. M., et al. (2008). Molecular cytogenetic analysis and resequencing of contactin associated protein-like 2 in autism spectrum disorders. American Journal of Human Genetics, 82(1), 165–173. Bilguvar, K., Yasuno, K., Niemela, M., Ruigrok, Y. M., von Und Zu Fraunberg, M., van Duijn, C. M., et al. (2008). Susceptibility loci for intracranial aneurysm in European and Japanese populations. Nature Genetics, 40(12), 1472–1477. Blasi, F., Bacchelli, E., Pesaresi, G., Carone, S., Bailey, A. J., & Maestrini, E. (2006). Absence of coding mutations in the X-linked genes neuroligin 3 and neuroligin 4 in individuals with autism from the IMGSAC collection. American Journal of Medical Genetics B Neuropsychiatric Genetics, 141B(3), 220–221. Botstein, D., & Risch, N. (2003). Discovering genotypes underlying human phenotypes: Past successes for Mendelian disease, future approaches for complex disease. Nature Genetics, 33(Suppl.), 228–237. Brzustowicz, L. M., Hodgkinson, K. A., Chow, E. W., Honer, W. G., & Bassett, A. S. (2000). Location of a major susceptibility locus for familial schizophrenia on chromosome 1q21-q22. Science, 288(5466), 678–682. Bucan, M., Abrahams, B. S., Wang, K., Glessner, J. T., Herman, E. I., Sonnenblick, L. I., et al. (2009). Genome-wide analyses of exonic copy number variants in a family-based study point to novel autism susceptibility genes. PLoS Genetics, 5(6), e1000536.
272
Causality and Psychopathology
Bugge, M., Bruun-Petersen, G., Brondum-Nielsen, K., Friedrich, U., Hansen, J., Jensen, G., et al. (2000). Disease associated balanced chromosome rearrangements: A resource for large scale genotype–phenotype delineation in man. Journal of Medical Genetics, 37(11), 858–865. Campbell, D. B., Sutcliffe, J. S., Ebert, P. J., Militerni, R., Bravaccio, C., Trillo, S., et al. (2006). A genetic variant that disrupts MET transcription is associated with autism. Proceedings of the National Academy of Sciences USA, 103(45), 16834–16839. Cantor, R. M., Yoon, J. L., Furr, J., & Lajonchere, C. M. (2007). Paternal age and autism are associated in a family-based sample. Molecular Psychiatry, 12(5), 419–421. Chakravarti, A. (1999). Population genetics—making sense out of sequence. Nature Genetics, 21(1 Suppl.), 56–60. Chih, B., Afridi, S. K., Clark, L., & Scheiffele, P. (2004). Disorder-associated mutations lead to functional inactivation of neuroligins. Human Molecular Genetics, 13(14), 1471–1477. Chih, B., Engelman, H., & Scheiffele, P. (2005). Control of excitatory and inhibitory synapse formation by neuroligins. Science, 307(5713), 1324–1328. Chih, B., Gollan, L., & Scheiffele, P. (2006). Alternative splicing controls selective trans-synaptic interactions of the neuroligin–neurexin complex. Neuron, 51(2), 171–178. Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., et al. (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences USA, 106(45), 19096–19101. Cohen, J. C., Kiss, R. S., Pertsemlidis, A., Marcel, Y. L., McPherson, R., & Hobbs, H. H. (2004). Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science, 305(5685), 869–872. Constantino, J. N., Lajonchere, C., Lutz, M., Gray, T., Abbacchi, A., McKenna, K., et al. (2006). Autistic social impairment in the siblings of children with pervasive developmental disorders. American Journal of Psychiatry, 163(2), 294–296. Constantino, J. N., & Todd, R. D. (2005). Intergenerational transmission of subthreshold autistic traits in the general population. Biological Psychiatry, 57(6), 655–660. Crow, J. F. (2000). The origins, patterns and implications of human spontaneous mutation. Nature Reviews Genetics, 1(1), 40–47. Daly, M. J., Rioux, J. D., Schaffner, S. F., Hudson, T. J., & Lander, E. S. (2001). Highresolution haplotype structure in the human genome. Nature Genetics, 29(2), 229–232. Dave, B. J., & Sanger, W. G. (2007). Role of cytogenetics and molecular cytogenetics in the diagnosis of genetic imbalances. Seminars in Pediatric Neurology, 14(1), 2–6. Durand, C. M., Betancur, C., Boeckers, T. M., Bockmann, J., Chaste, P., Fauchereau, F., et al. (2007). Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders. Nature Genetics, 39(1), 25–27. Fu, Y. H., Kuhl, D. P., Pizzuti, A., Pieretti, M., Sutcliffe, J. S., Richards, S., et al. (1991). Variation of the CGG repeat at the fragile X site results in genetic instability: Resolution of the Sherman paradox. Cell, 67(6), 1047–1058. Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., et al. (2002). The structure of haplotype blocks in the human genome. Science, 296(5576), 2225–2229. Glessner, J. T., Wang, K., Cai, G., Korvatska, O., Kim, C. E., Wood, S., et al. (2009). Autism genome-wide copy number variation reveals ubiquitin and neuronal genes. Nature, 459(7246), 569–573.
10 Rare Variant Approaches to Neuropsychiatric Disorders
273
Gupta, A. R., & State, M. W. (2007). Recent advances in the genetics of autism. Biological Psychiatry, 61(4), 429–437. Hakonarson, H., Grant, S. F., Bradfield, J. P., Marchand, L., Kim, C. E., Glessner, J. T., et al. (2007). A genome-wide association study identifies KIAA0350 as a type 1 diabetes gene. Nature, 448(7153), 591–594. Hirschhorn, J. N., & Altshuler, D. (2002). Once and again—issues surrounding replication in genetic association studies. Journal of Clinical Endocrinology and Metabolism, 87(10), 4438–4441. Hirschhorn, J. N., & Daly, M. J. (2005). Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6(2), 95–108. Hirschhorn, J. N., Lohmueller, K., Byrne, E., & Hirschhorn, K. (2002). A comprehensive review of genetic association studies. Genetics in Medicine, 4(2), 45–61. Iafrate, A. J., Feuk, L., Rivera, M. N., Listewnik, M. L., Donahoe, P. K., Qi, Y., et al. (2004). Detection of large-scale variation in the human genome. Nature Genetics, 36(9), 949–951. International HapMap Consortium. (2005). A haplotype map of the human genome. Nature, 437(7063), 1299–1320. International Human Genome Sequencing Consortium. (2004). Finishing the euchromatic sequence of the human genome. Nature, 431(7011), 931–945. Jamain, S., Quach, H., Betancur, C., Rastam, M., Colineaux, C., Gillberg, I. C., et al. (2003). Mutations of the X-linked genes encoding neuroligins NLGN3 and NLGN4 are associated with autism. Nature Genetics, 34(1), 27–29. Ji, W., Foo, J. N., O’Roak, B. J., Zhao, H., Larson, M. G., Simon, D. B., et al. (2008). Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nature Genetics, 40(5), 592–599. Kim, H. G., Kishikawa, S., Higgins, A. W., Seong, I. S., Donovan, D. J., Shen, Y., et al. (2008). Disruption of neurexin 1 associated with autism spectrum disorder. American Journal of Human Genetics, 82(1), 199–207. Kleinjan, D. J., & van Heyningen, V. (1998). Position effect in human genetic disease. Human Molecular Genetics, 7(10), 1611–1618. Kumar, R. A., KaraMohamed, S., Sudi, J., Conrad, D. F., Brune, C., Badner, J. A., et al. (2008). Recurrent 16p11.2 microdeletions in autism. Human Molecular Genetics, 17(4), 628–638. Lander, E., & Kruglyak, L. (1995). Genetic dissection of complex traits: Guidelines for interpreting and reporting linkage results. Nature Genetics, 11(3), 241–247. Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860–921. Laumonnier, F., Bonnet-Brilhault, F., Gomot, M., Blanc, R., David, A., Moizard, M. P., et al. (2004). X-linked mental retardation and autism are associated with a mutation in the NLGN4 gene, a member of the neuroligin family. American Journal of Human Genetics, 74(3), 552–557. Lise, M. F., & El-Husseini, A. (2006). The neuroligin and neurexin families: From structure to function at the synapse. Cellular and Molecular Life Sciences, 63(16), 1833–1849. Lubs, H. A. (1969). A marker X chromosome. American Journal of Human Genetics, 21(3), 231–244. Ma, D., Salyakina, D., Jaworski, J. M., Konidari, I., Whitehead, P. L., Andersen, A. N., et al. (2009). A genome-wide association study of autism reveals a common novel risk locus at 5p14.1. Annals of Human Genetics, 73(Pt 3), 263–273.
274
Causality and Psychopathology
Macarov, M., Zeigler, M., Newman, J. P., Strich, D., Sury, V., Tennenbaum, A., et al. (2007). Deletions of VCX-A and NLGN4: A variable phenotype including normal intellect. Journal of Intellectual Disability Research, 51(Pt 5), 329–333. Marshall, C. R., Noor, A., Vincent, J. B., Lionel, A. C., Feuk, L., Skaug, J., et al. (2008). Structural variation of chromosomes in autism spectrum disorder. American Journal of Human Genetics, 82(2), 477–488. McClellan, J. M., Susser, E., & King, M. C. (2007). Schizophrenia: A common disease caused by multiple rare alleles. British Journal of Psychiatry, 190, 194–199. McMahon, F. J., Akula, N., Schulze, T. G., Muglia, P., Tozzi, F., Detera-Wadleigh, S. D., et al. (2010). Meta-analysis of genome-wide association data identifies a risk locus for major mood disorders on 3p21.1. Nature Genetics, 42(2), 128–131. McPherson, J. D., Marra, M., Hillier, L., Waterston, R. H., Chinwalla, A., Wallis, J., et al. (2001). A physical map of the human genome. Nature, 409(6822), 934–941. Millar, J. K., Wilson-Annan, J. C., Anderson, S., Christie, S., Taylor, M. S., Semple, C. A., et al. (2000). Disruption of two novel genes by a translocation co-segregating with schizophrenia. Human Molecular Genetics, 9(9), 1415–1423. Moessner, R., Marshall, C. R., Sutcliffe, J. S., Skaug, J., Pinto, D., Vincent, J., et al. (2007). Contribution of SHANK3 mutations to autism spectrum disorder. American Journal of Human Genetics, 81(6), 1289–1297. Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., et al. (2009). Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461(7261), 272–276. Noonan, J. P. (2009). Regulatory DNAs and the evolution of human development. Current Opinion in Genetics and Development, 19(6), 557–564. O’Roak, B. J., & State, M. W. (2008). Autism genetics: Strategies, challenges, and opportunities. Autism Research, 1(1), 4–17. Poliak, S., Gollan, L., Martinez, R., Custer, A., Einheber, S., Salzer, J. L., et al. (1999). Caspr2, a new member of the neurexin superfamily, is localized at the juxtaparanodes of myelinated axons and associates with K+ channels. Neuron, 24(4), 1037–1047. Prabhakar, S., Visel, A., Akiyama, J. A., Shoukry, M., Lewis, K. D., Holt, A., et al. (2008). Human-specific gain of function in a developmental enhancer. Science, 321(5894), 1346–1350. Pritchard, J. K. (2001). Are rare variants responsible for susceptibility to complex diseases? American Journal of Human Genetics, 69(1), 124–137. Psychiatric GWAS Consortium Steering Committee. (2009). A framework for interpreting genome-wide association studies of psychiatric disorders. Molecular Psychiatry, 14(1), 10–17. Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H., Andrews, T. D., et al. (2006). Global variation in copy number in the human genome. Nature, 444(7118), 444–454. Reichenberg, A., Gross, R., Weiser, M., Bresnahan, M., Silverman, J., Harlap, S., et al. (2006). Advancing paternal age and autism. Archives of General Psychiatry, 63(9), 1026–1032. Risch, N. (1990). Linkage strategies for genetically complex traits. I. Multilocus models. American Journal of Human Genetics, 46(2), 222–228. Risch, N., & Merikangas, K. (1996). The future of genetic studies of complex human diseases. Science, 273(5281), 1516–1517. Saxena, R., Voight, B. F., Lyssenko, V., Burtt, N. P., de Bakker, P. I., Chen, H., et al. (2007). Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science, 316(5829), 1331–1336.
10 Rare Variant Approaches to Neuropsychiatric Disorders
275
Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., et al. (2007). A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science, 316(5829), 1341–1345. Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., et al. (2007). Strong association of de novo copy number mutations with autism. Science, 316(5823), 445–449. Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., et al. (2004). Largescale copy number polymorphism in the human genome. Science, 305(5683), 525–528. Sharp, A. J., Cheng, Z., & Eichler, E. E. (2006). Structural variation of the human genome. Annual Review of Genomics and Human Genetics, 7, 407–442. State, M. W. (2006). A surprising METamorphosis: Autism genetics finds a common functional variant. Proceedings of the National Academy of Sciences USA, 103(45), 16621–16622. State, M. W., Greally, J. M., Cuker, A., Bowers, P. N., Henegariu, O., Morgan, T. M., et al. (2003). Epigenetic abnormalities associated with a chromosome 18(q21-q22) inversion and a Gilles de la Tourette syndrome phenotype. Proceedings of the National Academy of Sciences USA, 100(8), 4684–4689. Strauss, K. A., Puffenberger, E. G., Huentelman, M. J., Gottlieb, S., Dobrin, S. E., Parod, J. M., et al. (2006). Recessive symptomatic focal epilepsy and mutant contactin-associated protein-like 2. New England Journal of Medicine, 354(13), 1370–1377. Sutherland, G. R. (1977). Fragile sites on human chromosomes: Demonstration of their dependence on the type of tissue culture medium. Science, 197(4300), 265–266. Szatmari, P., Paterson, A. D., Zwaigenbaum, L., Roberts, W., Brian, J., Liu, X. Q., et al. (2007). Mapping autism risk loci using genetic linkage and chromosomal rearrangements. Nature Genetics, 39(3), 319–328. Tabuchi, K., Blundell, J., Etherton, M. R., Hammer, R. E., Liu, X., Powell, C. M., et al. (2007). A neuroligin-3 mutation implicated in autism increases inhibitory synaptic transmission in mice. Science, 318(5847), 71–76. Varoqueaux, F., Aramuni, G., Rawson, R. L., Mohrmann, R., Missler, M., Gottmann, K., et al. (2006). Neuroligins determine synapse maturation and function. Neuron, 51(6), 741–754. Veenstra-Vanderweele, J., Christian, S. L., & Cook, E. H., Jr. (2004). Autism as a paradigmatic complex genetic disorder. Annual Review of Genomics and Human Genetics, 5, 379–405. Veenstra-VanderWeele, J., & Cook, E. H., Jr. (2004). Molecular genetics of autism spectrum disorder. Molecular Psychiatry, 9(9), 819–832. Verkerk, A. J., Pieretti, M., Sutcliffe, J. S., Fu, Y. H., Kuhl, D. P., Pizzuti, A., et al. (1991). Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell, 65(5), 905–914. Vorstman, J. A., Staal, W. G., van Daalen, E., van Engeland, H., Hochstenbach, P. F., & Franke, L. (2006). Identification of novel autism candidate regions through analysis of reported cytogenetic abnormalities associated with autism. Molecular Psychiatry, 11(1), 18–28. Wassink, T. H., Piven, J., & Patil, S. R. (2001). Chromosomal abnormalities in a clinic sample of individuals with autistic disorder. Psychiatric Genetics, 11(2), 57–63. Weiss, L. A., Arking, D. E., Daly, M. J., & Chakravarti, A. (2009). A genome-wide linkage and association scan reveals novel loci for autism. Nature, 461(7265), 802–808.
276
Causality and Psychopathology
Weiss, L. A., Shen, Y., Korn, J. M., Arking, D. E., Miller, D. T., Fossdal, R., et al. (2008). Association between microdeletion and microduplication at 16p11.2 and autism. New England Journal of Medicine, 358(7), 667–675. Zeggini, E., & McCarthy, M. I. (2007). Identifying susceptibility variants for type 2 diabetes. Methods in Molecular Biology, 376, 235–250. Zoghbi, H. Y. (2003). Postnatal neurodevelopmental disorders: Meeting at the synapse? Science, 302(5646), 826–830. Zondervan, K. T., & Cardon, L. R. (2004). The complex interplay among factors that influence allelic association. Nature Reviews Genetics, 5(2), 89–100.
part iii Causal Thinking in Psychiatry
This page intentionally left blank
11 Causal Thinking in Developmental Disorders e. jane costello and adrian angold1
In this chapter we (1) lay out a definition of development as it relates to psychopathology; (2) make the case that nearly all psychiatric disorders are ‘‘developmental’’; and (3) examine, with some illustrations, methods from developmental research that can help to identify causal mechanisms leading to mental illness.
What Do We Mean by Development? The philosopher Ernst Nagel (1957, p. 15) defined development in a way that links it to both benign and pathological outcomes: The concept of development involves two essential components: the notion of a system possessing a definite structure and a definite set of pre-existing capacities; and the notion of a sequential set of changes in the system, yielding relatively permanent but novel increments not only in its structure but in its modes of operation. As summarized by Leon Eisenberg (1977, p. 220), "the process of development is the crucial link between genetic determinants and environmental variables, between individual psychology and sociology." It is characteristic of such systems that they consist of feedback and feedforward loops of varying complexity. Organism and environment are mutually constraining, however, with the result that developmental pathways show relatively high levels of canalization (Angoff, 1988; Cairns, Garie´py, & Hood, 1990; Gottlieb & Willoughby, 2006; Greenough, 1991; McGue, 1989; Plomin, DeFries, & Loehlin, 1977; Scarr & McCartney, 1983).
1. Sections of this chapter are based in part on Costello (2008) and Costello & Angold (2006).
279
280
Causality and Psychopathology
Like individual ‘‘normal’’ development, diseases have inherent developmental processes of their own—processes that obey certain laws and follow certain stages even as they destroy the individual in whom they develop (Hay & Angold, 1993). A developmental approach to disease asks what happens when developmental processes embodied in pathogenesis collide with the process of ‘‘normal’’ human development. The progression seen in chronic diseases (among which we categorize most psychiatric disorders) has much in common with this view of development. It is "structured" by the nature of the transformation of the organism that begins the process, and in general, it follows a reasonably regular course, although with wide variations in rate. Furthermore, there is hierarchical integration as a disease develops. Each stage in the progress of a given disease builds on the previous stages, and many of the manifestations of earlier stages are "integrated" into later symptomatology. For example, consider the well-established path to substance abuse (Kandel & Davies, 1982): Beer or wine ! Cigarettes or hard liquor ! Marijuana ! Other illicit drugs It is characteristic of this pathway that the number of individuals at each level becomes smaller but that those at the higher levels continue to show behaviors characteristic of the earlier stages. Having described such a pathway, the task is to understand the process by which it is established and to invent preventive strategies appropriate to the various stages of the developmental pathway. Such strategies must be appropriate to the developmental stage of both the individual at risk and the disorder. This developmental process is probabilistic, not determined (Gottlieb & Willoughby, 2006); nevertheless, studying the interplay or interaction among risk factors, developmental processes, and disease processes can yield causal insights.
What Can Timing of Risk Exposure Tell Us About the Causes of Psychiatric Disorders? The pathways leading to psychiatric disorders begin early in life. It is becoming increasingly clear that most adults with psychiatric disorders report onset in the first two decades of life (Insel & Fenton, 2005), even when they are using retrospective recall over several decades. However, the origins of a disorder may be even earlier. Twenty years ago, in their seminal work Statistical Methods in Cancer Research, Breslow and Day (1987, p. 2) pointed out that
11 Causal Thinking in Developmental Disorders
281
Most chronic diseases are the results of a process extending over decades, and many of the events occurring in this period play a substantial role in the study of physical growth, of mental and hormonal development, and in the process of aging, the essential feature is that changes over time are followed at the individual level. The methods developed by cancer and cardiovascular epidemiologists to explore causal relationships in such diseases can, we believe, provide at least a useful starting place for thinking about psychiatric disorders. We can narrow down the range of causal links between risk exposure and disease by understanding more about different aspects of the timing of risk exposure in relation to developmental stage. Age at first exposure, time since first exposure, duration of exposure, and intensity of exposure are all interrelated aspects of timing that may have different implications for causality. Age at first exposure has been studied most intensively of all the aspects of risk over time in developmental psychopathology because of the theoretical importance attached to early experiences in Freudian and other psychodynamic models of development. For example, researchers investigating the role of attachment in children’s development have concentrated on the very early months and years of life as the crucial period during which the inability to form one or more relationships may have damaging effects that last into childhood and perhaps even into adulthood (Sroufe, 1988). The critical date of risk onset appears to occur after 6 months, but the duration of the risk period is not yet clear. Timing of exposure has received less attention in studies of developmental psychopathology. Brown and Harris (1978), in their work on the social origins of depression, argued that women who lost their mothers in the first decade of life were more vulnerable as adults to depressive episodes in the face of severe life events. The study supporting this hypothesis had several limitations, including retrospective design, and did not address the question of whether these women were also at greater risk of depressive episodes during later childhood and adolescence. It is not clear whether the crucial factor was the length of time since mother’s death, the age of the child at the time of exposure, or some combination of the two. An example of the importance of timing of exposure to a protective factor comes from the study of Alzheimer disease. Risk of Alzheimer disease is reduced in people who have been exposed to nonsteroidal anti-inflammatory drugs (NSAIDs) but only if that exposure occurred several years before the age at which Alzheimer begins to frequently appear in the population (Breitner, 2007). A causal inference is that whatever NSAIDs do to prevent Alzheimer disease, they work on some prodromal aspects of the disease.
282
Causality and Psychopathology
Duration of exposure to poverty was examined by Offord and colleagues (1992) in their repeated surveys of a representative sample of children in Ontario. They showed that children whose families were living below the poverty level at two measurement points were at increased risk of behavioral disorders compared with children whose families were living below the poverty level on either one occasion only or never. Using prospective longitudinal data from the Dunedin, New Zealand, longitudinal study, Moffitt (1990) found that children identified at age 13 as both delinquent and hyperactive had experienced significantly more family adversity (poverty, poor maternal education, and mental health) consistently from the age of 7 than children who were only delinquent or only hyperactive at age 13. Data from the longitudinal Great Smoky Mountains study were used to study the role of duration of exposure to a protective factor. A representative population sample of 1,420 rural children aged 9–13 at intake were given repeated psychiatric assessments beginning in 1993. One-quarter of the sample were American Indian and the rest predominantly White. In 1996 a casino opening on an Indian reservation gave every American Indian an income supplement that increased annually. The younger children in the sample were exposed to this increase in family income for longer than those in the oldest cohort, most of whom left home not long after the income supplement began. We examined the effect of this increase in family income on drug abuse at age 21. Drug use and abuse were significantly lower in those who had the longest exposure to the increased family income than in either of the two older cohorts. They were also significantly lower than in the youngest White cohort, who had not received any family income supplement. We concluded that 3 or 4 years of exposure to the protective effects of increased family income were needed for it to have a long-term effect at age 21. Intensity of exposure to lead provides an example of a definite dose– response relationship (Needleman & Bellinger, 1991). Another aspect of intensity is the number of different risk factors to which a child is exposed. Sameroff & Seifer (1995), Rutter (1988), and others have pointed out that children exposed to one risk factor are at increased risk of exposure to others (e.g., father not in the home and poverty) and that the dose–response relationship to an increasing number of different risk factors is not a simple linear one. Most children appear to be able to cope with a single adverse circumstance, but rates of psychopathology rise sharply in children exposed to several adverse circumstances or events (Seifer, Sameroff, Baldwin, & Balwin, 1989). These examples of different relationships between time and etiology are far from exhaustive, but they indicate how we might use developmental studies in different ways to look at etiology.
11 Causal Thinking in Developmental Disorders
283
What Can ‘‘Normal’’ Development Tell Us About the Causes of Psychiatric Disorders? There are two main streams of psychological research: one into ‘‘norms’’ of human development and behavior and the other into differences among individuals and groups. Developmental psychopathology aims to integrate the two (Cicchetti, 1984). The more we understand about normal development in the general population, the more we can learn about the causes of pathology in the minority of the population who have disorders. Human development is marked by stages or turning points at which change in one or more systems occurs quite rapidly, inducing a qualitative difference in capacity (Pickles & Hill, 2006). Here, we consider what we can learn about the causes of psychiatric disorders from what developmental science has learned about two key developmental turning points: the period before and immediately after birth and the long process that leads to sexual maturation. Prenatal and perinatal development can carry risk for psychopathology later in life. Several lines of research suggest that intrauterine growth retardation creates risk for a range of psychiatric outcomes at different developmental stages, depending on the timing of exposure in relation to time-specific vulnerabilities of the developing organism (Barker, 2004). Low birth weight has been implicated in risk for schizophrenia (Nilsson et al., 2005; Silverton, Mednick, Schulsinger, Parnas, & Harrington, 1988), attention-deficit/hyperactivity disorder (ADHD) (Botting, Powls, Cooke, & Marlow, 1997; Breslau et al., 1996; Breslau & Chilcoat, 2000; Pharoah, Stevenson, Cooke, & Stevenson, 1994; Szatmari, Saigal, Rosenbaum, Campbell, & King, 1990), and eating disorders (Favaro, Tenconi, & Santonastaso, 2006). Since 1990, studies have been published both supporting (Botting et al., 1997; Frost, Reinherz, Pakiz-Camras, Giaconia, & Lefkowitz, 1999; Gale & Martyn, 2004; Gardner et al., 2004; Patton, Coffey, Carlin, Olsson, & Morley, 2004; Pharoah, Stevenson, Cooke, & Stevenson, 1994; Weisglas-Kuperus, Koot, Baerts, Fetter, & Sauer, 1993) and disconfirming (Buka, Tsuang, & Lipsitt, 1993; Cooke, 2004; Jablensky, Morgan, Zubrick, Bower, & Yellachich, 2005; Osler, Nordentoft, & Nybo Andersen, 2005; Szatmari et al., 1990) the idea that low birth weight predicts depression. The problem is moving beyond correlation to causal explanations. Clearly, experimental assignment to a high-risk perinatal environment is not ethical for human research. In a longitudinal study, we tried to narrow down the range of possible causal explanations by testing two competing hypotheses to explain why the incidence of depression increases dramatically in girls, but not boys, when they are about age 13. A simple bivariate analysis showed that depression was much more common (38.1% vs. 8.4%) in girls who had weighed less than 2,500 g at birth than in other girls. One hypothesis states that low birth weight is one of a range of risk factors that could lead
284
Causality and Psychopathology
to depression, including other perinatal risk factors, childhood factors, and adolescent factors. A second hypothesis, the fetal origins hypothesis (Barker, 2003), posits that low birth weight is a marker for poor intrauterine conditions for growth and development. These provoke adjustments in fetal physiological development, with long-term consequences for function and health (Bateson et al., 2004). According to this model, there is ‘‘a mismatch between physiologic capacities established in early development and the environments in which they later must function’’ (Worthman & Kuzara, 2005, p. 98). Adjustments by the fetus to suboptimal conditions may maximize chances for survival during gestation and early development but at a deferred cost if such adjustments leave the individual less prepared to deal with conditions encountered later in life. In this case, effects of low birth weight might be latent until the system encounters adversities that strain its capacity to adapt. The stress threshold may be lower than one that would trigger illness in individuals of normal birth weight. Low birth weight was just one of a number of potential stress factors yet continued to predict adolescent depression independently when a wide range of potential confounders from the perinatal period, childhood, and adolescence were included in the model. If low birth weight were merely one of a cluster of generic risk factors for psychopathology, then it should predict other disorders as well as depression and to do so throughout childhood and adolescence in both boys and girls. In fact, it predicted only depression, only in adolescence, and only in girls. Additionally, it did not act like just one more risk factor. In the absence of other adversities, the rate of female adolescent depression was zero in both normal– and low–birth weight girls. However, 30% of low–birth weight girls exposed to a single adversity had an episode of adolescent depression compared with 5% of normal–birth weight girls, and the difference in girls with two adversities was even more marked (84% vs. 20%). Low birth weight acted more like a potentiator of other risk factors than a separate adversity. This argues for the fetal programming hypothesis. Low birth weight has been implicated in other psychiatric disorders at different developmental stages (e.g., ADHD in 6-year-old boys [Breslau et al., 1996]), but studies have not distinguished among competing causal hypotheses. A review of animal models for causal thinking is beyond the scope of this chapter, but we should note that the perinatal period is one for which animal research has been particularly illuminating about the causes of later psychopathology. Two sets of data have been especially fruitful: the work of Meaney and colleagues (Champagne & Meaney, 2006) with rats and that of Suomi and colleagues (Champoux et al., 2002) with monkeys, on interactions between characteristics of the infant and of the mother in predicting developmental competence.
11 Causal Thinking in Developmental Disorders
285
Puberty has emerged as another important developmental stage in relation to several psychiatric disorders. The term puberty encompasses changes in multiple indices of adolescent development, including increases in several gonadal and steroidal hormones, height, weight, body fat, body hair, breast and genitalia development, powers of abstract thinking, and family and peer expectations and behavior. It can also occur at the same time as major social changes such as moving to high school (Simmons & Blyth, 1992). We may be concerned with either linear effects (e.g., increased risk as development proceeds) or nonlinear effects of various kinds (e.g., the onset of menses for girls or crossing some threshold level of sex steroids [Angold, Costello, & Worthman, 1999; Tschann et al., 1994]). Of course, both the physiological and the social impacts of puberty may interact and may vary by gender. Thus, the relationship between puberty and psychopathology may vary widely depending on which aspect of puberty is causally significant for which disorder. We examined this by pitting hormonal, morphological, and social–psychological markers of puberty against one another as predictors of adolescent depression, anxiety disorders, conduct disorder, and alcohol use and abuse in a longitudinal study. In the case of depression, high levels of estrogen and testosterone predicted adolescent depression in girls. On the other hand, higher levels of testosterone were associated with nonaggressive antisocial behavior in boys in deviant peer groups but positive leadership roles in boys who did not associate with deviant peers. There is also growing evidence indicating that both girls and boys who go through puberty earlier than their peers are at increased risk for emotional and behavioral problems, especially if they have unsupportive backgrounds or engage in early sexual intercourse (Ge, Brody, Conger, & Murry, 2002; Ge, Conger, & Elder, 1996; Magnusson, Stattin, & Allen, 1985; Moffitt, Caspi, Belsky, & Silva, 1992; Kaltiala-Heino, Kosunen, & Rimpela, 2003; Kaltiala-Heino, Marttunen, Rantanen, & Rimpela, 2003). The emergent picture is that multiple components of puberty have a variety of sex-differentiated effects on different forms of psychopathology. Thus, an understanding of the norms of development can help us to get beyond description and move along the pathway toward a better understanding of the causes of psychiatric disorders. It is also worth emphasizing that studying early developmental processes is important for disorders that spread far into adulthood like depression and drug abuse.
Methods for Causal Research in Developmental Psychopathology In this section we will not discuss formal experiments with randomized assignment, which can be enormously helpful for causal research but are
286
Causality and Psychopathology
rarely feasible in longitudinal studies, especially those that require representative population samples or high-risk community samples. Instead, we will discuss some recent examples of the use of quasi-experimental methods that capitalize on the longitudinal strengths of developmental research. There is much discussion of whether and to what extent epidemiological research can establish causes. The reason that this matters so much is, of course, the danger of confounding. Confounding due to the presence of one or more common causes of the risk factor and the outcome distorts the impact of a risk factor on the probability of disease. We discuss some methods used to reduce, if not to eliminate, this risk. Note that in formal experimental designs, there may be characteristics of setting or time that interfere with the simple logic of the experiment. For example, it is notoriously difficult to study blood pressure in laboratory settings because of the ‘‘white coat’’ phenomenon that plays havoc with ‘‘resting’’ blood pressure in the presence of nurses and doctors. Additionally, even when a causal hypothesis is supported in laboratory studies, its effect size needs to be estimated in the real world. A second problem with causal research in the context of developmental psychopathology is that there is no such thing as ‘‘a’’ single cause. Developmental epidemiologist have used a range of terms to describe what they are looking for: Examples include component causes (Rothman, 1976), mechanisms (Rutter, 1994), and pathways (Pickles & Hill, 2006). This section will discuss some recent examples of the use of quasi-experimental methods that capitalize on the longitudinal strengths of developmental research.
Quasi-Experiments What distinguishes quasi-experiments from randomized experiments is that in the former case we cannot be sure that group assignment is free of bias. In other respects (e.g., the selection of the intervention, the measures administered, the timing of measurement), the two designs may be close to identical. However, the difference—inability to use random assignment— can threaten the validity of causal conclusions based on the results (as discussed earlier). We describe three such strategies used to test whether and how traumatic events cause psychiatric disorders. (In the following diagrams, O = observation, X = event, T = time). Sample Compared Postevent to a Population Norm If data have already been collected before the event or intervention, it may be possible to set up a pre- vs. post-, exposed vs. not exposed design that comes close to random assignment. However, it is unlikely that there will have been an opportunity to collect ‘‘before’’ measures on those to whom the (typically
11 Causal Thinking in Developmental Disorders
287
unforeseen) event will occur, with the result that the most common form of quasi-experiment following an unexpected catastrophe is the following: T1 Sample Population norm O
T2 X
T3 O
For example, Hoven and colleagues (2005) compared children from New York during the September 11, 2001 (9/11), attack on the Twin Towers to a representative population sample from nearby Stamford, Connecticut, who had been assessed with the same instruments just before 9/11, as well as to other community samples. The New York children assessed 6 months after 9/11 had higher rates of most diagnoses. This design is critically dependent on the comparability of the postevent sample and the sample on which the measures were normed since otherwise any differences found might be the result of preexisting differences rather than the event. Therefore, although this design is often the only one available, it tends to be the weakest of the various quasi-experiments. Dose–Response Measures of Exposure to an Event Sometimes it is possible to use a dose–response strategy to test hypotheses about whether an exposure causes an outcome, rendering the following design:
Sample a Sample b
T1 X X
T2 O O
For example, the same researchers (Hoven et al., 2005) divided New York City into three areas at different geographical distances from the site of the World Trade Center and sampled children attending schools in each area, to test the hypothesis that physical distance from the event reduced the risk of psychiatric disorder; this finding would support a causal relationship between the event and the disorder. They found high rates of mental disorder throughout the study area but significantly lower rates in children who went to school in the area closest to the site of the attack. This took the researchers by surprise; their post hoc explanation was that the extent of social support and mental health care following 9/11 prevented the harm that the event might have caused. Next, they measured personal and family exposure to the attack and compared children who had family members involved in the attack to those who
Causality and Psychopathology
288
were geographically close but had no personal involvement. They found, as predicted, that both personal and family involvement increased the risk of a mental disorder but that involvement of a family member was the stronger risk factor even when the children were physically distant from the site. These aspects of the design of this study carry more weight than the one described earlier because they incorporate stronger and more theory-based design characteristics. However, the designs lack a pretest; therefore, we cannot rule out the possibility that these groups were different before the event. Different Groups Exposed and Not Exposed, Both Tested Before and After Exposure This design has the potential to come closest to a randomized design because the same subjects are studied both before and after an event that occurred in one group but not the other: T1 Sample a O Sample b O
T2 X
T3 O O
However, if sample a and sample b were not randomly assigned from the same subject pool, the researcher must convince the reader that there were no differences between the two groups before the event that could potentially confound the causal relationship. For example, in a longitudinal study of development across the transition to adulthood, we interviewed a representative sample of young people every 1 or 2 years since 1993. Subjects were interviewed each year on a date as close as possible to their birthday. Thus, in 2001, when the participants were aged 19 and 21, about two-thirds of them had been interviewed when, on 9/11, the Twin Towers and the Pentagon were struck. We continued to interview the remaining subjects until the end of the year (Costello, Erkanli, Keeler, & Angold, 2004), but the world facing these young people was a very different one from that in which we had interviewed the first group of participants; for example, there was talk of a national draft, which would directly affect this age group. The strength of this design is critically dependent on the comparability of the groups interviewed before and after the event. In this case we had 8 years of interviews with the participants before 2001. We compared the before-9/11 and the after-9/11 groups on a wide range of factors and were able to demonstrate that each was a random subsample of the main sample. Thus, we had a quasi-experiment that was equivalent to randomly assigning subjects who
11 Causal Thinking in Developmental Disorders
289
had experienced vs. those who had not experienced 9/11. We predicted that, even though the participants were living 500 miles away from where the events occurred, this ‘‘distant trauma’’ (Terr et al., 1999) would increase levels of anxiety and possibly, in this age group, alcohol and drug abuse. We also hypothesized that the potential for military conscription might further increase anxiety levels, especially in males. We were wrong on both counts. There was no increase in levels of anxiety. Women interviewed after 9/11 reported higher levels of drug use in general, and cannabis in particular, with rates of reported use approaching twice the pre-9/11 level. Conversely, men interviewed after 9/11 were less likely to report substance abuse, and use of all drugs was lower. The examples of quasi-experimental studies described here suggest that such designs can be quite effective at discounting previously held beliefs but that they are open to the risk of post hoc interpretations (as in the post-9/11 examples). Finally, as Shadish, Cook, and Campbell (2002) point out, ‘‘they can undermine the likelihood of doing even better studies.’’
Natural Experiments Natural experiments are gifts to the researcher; they are situations that could not have been planned or proposed but do what a randomized experiment does. That is, they assign participants to one exposure or another without bias and hold all other variables constant while manipulating the risk factor of interest. Sometimes the unbiased assignment is created by events, as when one group of families in our longitudinal study received an income supplement while others did not, where race (American Indian vs. Anglo) was the sole criterion (Costello, Compton, Keeler, & Angold, 2003). In this case, we had 4 years of assessments of children’s psychiatric status before and after the introduction of the income supplement and, thus, could compare the children’s behavior before and after the intervention in both groups. The years of measurement before the event enabled us to rule out the potential confounding of ethnicity with the children’s emotional and behavioral symptoms. A tremendously important possibility for natural experimentation occurs when genes and environment can be separated. Such naturally occurring situations provided the foundation for the rise of genetic epidemiology, which ‘‘focuses on the familial, and in particular genetic, determinants of disease and the joint effects of genes and non-genetic determinants’’ (Burton, Tobin, & Hopper, 2005, p. 941). Several researchers have made ingenious use of the fact that ‘‘people take their genes with them when they move from one country to another, but often the migration entails a radical change in lifestyle’’ (Rutter, Pickles, Murray, & Eaves, 2001, p. 310). If a comparison group
Causality and Psychopathology
290
is available from both the old and the new countries, a natural experiment exists of the following form:
Sample a1 Sample a2 Sample b
T1 X
T2 O O O
Sample a1, who migrated, can be compared both with sample a2, who stayed home, and with sample b, who grew up in the new country. If a1 and a2 are more similar than a1 and b, this suggests that the pathology being measured is more strongly influenced by the genetic similarity of the two groups of the same race/ethnicity than it is by the environmental differences in which the two groups now live. Conversely, a greater similarity between a1 and b suggests a strong environmental effect. For example, Verhulst and colleagues compared Turkish adolescents in Holland with Turkish adolescents in Turkey and Dutch adolescents in Holland on a self-report measure of child psychopathology (Janssen et al., 2004). They found that the immigrant youth reported more anxious, depressed, and withdrawn symptoms than the Dutch youth but more delinquency, attentional problems, and somatic problems than the Turkish youth in Turkey. This suggests that Turkish adolescents are, in general, more prone to emotional symptoms than Dutch adolescents but that migration caused some behavioral problems not seen at home. Interestingly, a follow-up study when the two samples living in Holland were in their 20s showed that differences between the two groups shrank significantly, largely because the immigrants’ mental health improved more than did that of the native Dutch. There are, of course, important caveats to be considered before causal conclusions can be drawn from migrant designs: Why did people migrate? Are they representative of the nonmigrants at home? So long as these issues are carefully considered, however, migrant designs can be very helpful in pulling apart entangled component causes.
Prevention Trials as Natural Experiments Trials of a treatment or prevention programs also test causal hypotheses. Treatment trials, which tend to take place in academic medical settings with highly selected samples, can rarely be used as the basis of general causal inference; but population-based prevention trials may approximate experimental conditions. Unfortunately, community-based prevention trials are often too expensive or limited in scope to address developmental issues. One example of a theory-driven prevention trial with an etiological, developmental message is Fast Track (Bierman et al., 1992). This school-based
11 Causal Thinking in Developmental Disorders
291
intervention with teachers, parents, and children tested the theory that ‘‘early starters’’ (i.e., children who show conduct problems early in childhood) tend to increase in aggressive behavior over time and to persist in antisocial behavior longer than other antisocial children (Moffitt, 1993). The intervention had positive effects 4 years later, and mediational analyses supported specific causal pathways. For example, improvements in parenting skills affected the child’s behavior at home but not at school, while improvements in social cognition about peers affected deviant peer associations. Additionally, children whose prosocial behavior in the classroom improved had improved ratings in classroom sociometric assessments (Bierman et al., 2002). It would benefit causal research greatly if prevention trials were, like Fast Track, specific about their causal theories and rigorous in testing them.
Approaches to Data Analysis for Testing Causal Models in Developmental Psychopathology Traditionally, researchers have tried to deal with the problem of confounding by controlling for potential confounders while using regression models of various types. More recently, new methods for etiological inference in epidemiological research have been introduced for both cross-sectional and longitudinal data (e.g., Robins, Hernan, & Brumback, 2000; Rosenbaum & Rubin, 1985; Rothman & Greenland, 2005). The underlying principle is to use inverse probability weighting to create conditions that approximate a randomized experiment (i.e., those exposed to the risk factor of interest are interchangeable with those not exposed [Hernan & Robins, 2006]). We compared two analytic approaches to examine the hypothesis that growing up in a single-parent household increases risk for conduct disorder against the alternative that other factors associated with a mother’s being a single parent (e.g., being a teen parent, leaving school without graduating, psychiatric or drug problems, criminal record) confound the relationship between having a single parent and developing a conduct disorder. Using a traditional regression analysis, having a single parent remained a marginally significant predictor after controlling for the potential confounders. Using the alternative (g-estimation) approach, single parenting no longer exerted a causal effect on child conduct disorder.
Conclusions Observation, categorization, pattern recognition, hypothesis testing, causal thinking: These are the stages through which a science tends to progress
292
Causality and Psychopathology
as it advances in knowledge and rigor (Feist, 2006). Developmental research starts from description and pattern recognition, but it can use those observations to test hypotheses and ask causal questions. Developmental psychopathology is helped enormously in this task by its access to 100 years of theory-driven research in normal development, a corpus of knowledge that psychiatry has yet to exploit in full (Cicchetti, 2006). As research into the causes of psychiatric disorders advances, the importance of a developmental approach to every type of disorder (not just those seen in early childhood) will become even more evident and the value of longitudinal data to answer causal questions will increase.
References Angoff, W. H. (1988). The nature–nurture debate, aptitudes, and group differences. American Psychologist, 43(9), 713–720. Angold, A., Costello, E. J., & Worthman, C. M. (1999). Pubertal changes in hormone levels and depression in girls. Psychological Medicine, 29(5), 1043–1053. Barker, D. (2003). The developmental origins of adult disease. European Journal of Epidemiology, 18(8), 733–736. Barker, D. (2004). The developmental origins of well-being. Royal Society, 359(1449), 1359–1366. Bateson, P., Barker, D., Clutton-Brock, T., Deb, D., D’Udine, B., Foley, R. A., et al. (2004). Developmental plasticity and human health. Nature, 430(6998), 419–421. Bierman, K. L., Coie, J. D., Dodge, K. A., Greenberg, M. T., Lochman, J. E., & McMahon, R. J. (1992). A developmental and clinical model for the prevention of conduct disorder: The FAST track program. Developmental Psychopathology, 4(4), 509–527. Bierman, K. L., Coie, J. D., Dodge, K. A., Greenberg, M. T., Lochman, J. E., McMahon, R. J., et al.; Conduct Problems Prevention Research Group. (2002). Using the Fast Track randomized prevention trial to test the early-starter model of the development of serious conduct problems. Development and Psychopathology, 14(4), 925–943. Botting, N., Powls, A., Cooke, R. W. I., & Marlow, N. (1997). Attention deficit hyperactivity disorders and other psychiatric outcomes in very low birthweight children at 12 years. Journal of Child Psychology and Psychiatry, 38(8), 931–941. Breitner, J. (2007). Prevention of Alzheimer’s disease: Principles and prospects. In M. Tsuang, W. S. Stone, & M. J. Lyons (Eds.), Recognition and prevention of major mental and substance use disorders (pp. 319–329). Arlington, VA: American Psychiatric Publishing. Breslau, N., Brown, G. G., DelDotto, J. E., Kumar, S., Ezhuthachan, S., Andreski, P., et al. (1996). Psychiatric sequelae of low birth weight at 6 years of age. Journal of Abnormal Child Psychology, 24(3), 385–400. Breslau, N., & Chilcoat, H. D. (2000). Psychiatric sequelae of low birth weight at 11 years of age. Biological Psychiatry, 47(11), 1005–1011. Breslow, N. E., & Day, N. E. (1987). Statistical methods in cancer research: Vol. II. The design and analysis of cohort studies (IARC Scientific Publication 82). Lyon: International Agency for Research on Cancer.
11 Causal Thinking in Developmental Disorders
293
Brown, G. W., & Harris, T. O. (1978). The social origins of depression: A study of psychiatric disorder in women. New York: Free Press. Buka, S. L., Tsuang, M., & Lipsitt, L. (1993). Pregnancy/delivery complications and psychiatric diagnosis: A prospective study. Archives of General Psychiatry, 50(2), 151–156. Burton, P. R., Tobin, M. D., & Hopper, J. L. (2005). Key concepts in genetic epidemiology. Lancet, 366(9489), 941. Cairns, R. B., Garie´py, J. L., & Hood, K. E. (1990). Development, microevolution, and social behavior. Psychological Review, 97(1), 49–65. Champagne, F. A., & Meaney, M. J. (2006). Stress during gestation alters postpartum maternal care and the development of the offspring in a rodent model. Biological Psychiatry, 59(12), 1227–1235. Champoux, M., Bennett, A., Shannon, C., Higley, J. D., Lesch, K. P., & Suomi, S. J. (2002). Serotonin transporter gene polymorphism, differential early rearing, and behavior in rhesus monkey neonates. Molecular Psychiatry, 7(10), 1058–1063. Cicchetti, D. (1984). The emergence of developmental psychopathology. Child Development, 55(1), 1–7. Cicchetti, D. (2006). Development and psychopathology. In D. Cicchetti & D. J. Cohen (Eds.), Developmental psychopathology (2nd ed., Vol. 1, pp. 1–23). Hoboken, NJ: John Wiley & Sons. Cooke, R. W. (2004). Health, lifestyle, and quality of life for young adults born very preterm. Archives of Disease in Childhood, 89(3), 201–206. Costello, E. J. (2008). Using epidemiological and longitudinal approaches to study causal hypotheses. In M. Rutter (Ed.), Rutter’s child and adolescent psychiatry (pp. 58–70). Oxford: Blackwell Scientific. Costello, E. J., & Angold, A. (2006). Developmental epidemiology. In D. Cicchetti & D. Cohen (Eds.), Theory and method (2nd ed., Vol. 1, pp. 41–75). Hoboken, NJ: John Wiley & Sons. Costello, E. J., Compton, S. N., Keeler, G., & Angold, A. (2003). Relationships between poverty and psychopathology: A natural experiment. Journal of the American Medical Association, 290(15), 2023–2029. Costello, E. J., Erkanli, A., Keeler, G., & Angold, A. (2004). Distant trauma: A prospective study of the effects of 9/11 on rural youth. Applied Developmental Science, 8(4), 211–220. Eisenberg, L. (1977). Development as a unifying concept in psychiatry. British Journal of Psychiatry, 131, 225–237. Favaro, A., Tenconi, E., & Santonastaso, P. (2006). Perinatal factors and the risk of developing anorexia nervosa and bulimia nervosa. Archives of General Psychiatry, 63(1), 82–88. Feist, G. J. (2006). The psychology of science and the origins of the scientific mind. New Haven, CT: Yale University Press. Frost, A. K., Reinherz, H. Z., Pakiz-Camras, B., Giaconia, R. M., & Lefkowitz, E. S. (1999). Risk factors for depressive symptoms in late adolescence: A longitudinal community study. American Journal of Orthopsychiatry, 69(3), 370–381. Gale, C. R., & Martyn, C. N. (2004). Birth weight and later risk of depression in a national birth cohort. British Journal of Psychiatry, 184, 28–33. Gardner, F., Johnson, A., Yudkin, P., Bowler, U., Hockley, C., Mutch, L., et al. (2004). Behavioral and emotional adjustment of teenagers in mainstream school who were born before 29 weeks’ gestation. Pediatrics, 114(3), 676–682.
294
Causality and Psychopathology
Ge, X., Brody, G., Conger, R., & Murry, V. (2002). Contextual amplification of pubertal transition effects on deviant peer affiliation and externalizing behavior among African American children. Developmental Psychology, 38(1), 42–54. Ge, X., Conger, R. D., & Elder, G. H. (1996). Coming of age too early: Pubertal influences on girls’ vulnerability to psychological distress. Child Development, 67(6), 3386–3400. Gottlieb, G., & Willoughby, M. (2006). Probabilistic epigenesis of psychopathology. In D. Cicchetti & D. Cohen (Eds.), Developmental psychopathology: Theory and method (2nd ed., Vol. 1, pp. 673–700). Hoboken, NJ: John Wiley & Sons. Greenough, W. T. (1991). Experience as a component of normal development: Evolutionary considerations. Developmental Psychopathology, 27(1), 14–17. Hay, D. F., & Angold, A. (1993). Introduction: Precursors and causes in development and pathogenesis. In D. F. Hay & A. Angold (Eds.), Precursors and causes in development and psychopathology (pp. 1–21). Chichester: John Wiley & Sons. Hernan, M. A., & Robins, J. M. (2006). Estimating causal effects from epidemiological data. Journal of Epidemiology and Community Health, 60(7), 578–586. Hoven, C. W., Duarte, C. S., Lucas, C. P., Wu, P., Mandell, D. J., Goodwin, R. D., et al. (2005). Psychopathology among New York City public school children 6 months after September 11. Archives of General Psychiatry, 62(5), 545–552. Insel, T. R., & Fenton, W. S. (2005). Psychiatric epidemiology: It’s not just about counting anymore. Archives of General Psychiatry, 62(6), 590–592. Jablensky, A., Morgan, V., Zubrick, S. R., Bower, C., & Yellachich, L.-A. (2005). Pregnancy, delivery, and neonatal complications in population cohort of women with schizophrenia and major affective disorders. American Journal of Psychiatry, 162(1), 79–91. Janssen, M. M., Verhulst, F., Bengi-Arslan, L., Erol, N., Salter, C., & Crijnen, A. M. (2004). Comparison of self-reported emotional and behavioral problems in Turkish immigrant, Dutch and Turkish adolescents. Social Psychiatry and Psychiatric Epidemiology, 39(2), 133–140. Kaltiala-Heino, R., Kosunen, E., & Rimpela, M. (2003). Pubertal timing, sexual behaviour and self-reported depression in middle adolescence. Journal of Adolescence, 26(5), 531–545. Kaltiala-Heino, R., Marttunen, M., Rantanen, P., & Rimpela, M. (2003). Early puberty is associated with mental health problems in middle adolescence. Social Science & Medicine, 57(6), 1055–1064. Kandel, D. B., & Davies, M. (1982). Epidemiology of depressive mood in adolescents: An empirical study. Archives of General Psychiatry, 39(10), 1205–1212. Magnusson, D., Stattin, H., & Allen, V. L. (1985). Differential maturation among girls and its relation to social adjustment: A longitudinal perspective. Stockholm: University of Stockholm. McGue, M. (1989). Nature–nurture and intelligence. Nature, 340, 507–508. Moffitt, T. E. (1990). Juvenile delinquency and attention deficit disorder: Boys’ developmental trajectories from age 3 to age 15. Child Development, 61(3), 893–910. Moffitt, T. E. (1993). Adolescence-limited and life-course-persistent antisocial behavior: A developmental taxonomy. Psychological Review, 100(4), 674–701. Moffitt, T. E., Caspi, A., Belsky, J., & Silva, P. A. (1992). Childhood experience and the onset of menarche: A test of a sociobiological model. Child Development, 63(1), 47–58. Nagel, E. (1957). Determinism and development. In D. B. Harris (Ed.), The concept of development (pp. 15–26). Minneapolis: University of Minnesota Press.
11 Causal Thinking in Developmental Disorders
295
Needleman, H. L., & Bellinger, D. (1991). The health effects of low level exposure to lead. Annual Review of Public Health, 12, 111–140. Nilsson, E., Stalberg, G., Lichtenstein, P., Cnattingius, S., Olausson, P. O., & Hultman, C. M. (2005). Fetal growth restriction and schizophrenia: A Swedish twin study. Twin Research & Human Genetics, 8(4), 402–408. Offord, D. R., Boyle, M. H., Racine, Y. A., Fleming, J. E., Cadman, D. T., Blum, H. M., et al. (1992). Outcome, prognosis, and risk in a longitudinal follow-up study. Journal of the American Academy of Child and Adolescent Psychiatry, 31(5), 916–923. Osler, M., Nordentoft, M., & Nybo Andersen, A.-M. (2005). Birth dimensions and risk of depression in adulthood: Cohort study of Danish men born in 1953. British Journal of Psychiatry, 186, 400–403. Patton, G. C., Coffey, C., Carlin, J. B., Olsson, C. A., & Morley, R. (2004). Prematurity at birth and adolescent depressive disorder. British Journal of Psychiatry, 184, 446–447. Pharoah, P. O. D., Stevenson, C. J., Cooke, R. W. I., & Stevenson, R. C. (1994). Prevalence of behaviour disorders in low birthweight infants. Archives of Disease in Childhood, 70, 271–274. Pickles, A., & Hill, J. (2006). Developmental pathways. In D. Cicchetti & D. Cohen (Eds.), Developmental psychopathology: Theory and method (2nd ed., Vol. 1, pp. 211–243). Hoboken, NJ: John Wiley & Sons. Plomin, R., DeFries, J., & Loehlin, J. (1977). Genotype–environment interaction and correlation in the analysis of human behavior. Psychological Bulletin, 84(2), 309–322. Robins, J., Hernan, M., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550–560. Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. American Statistician, 39(1), 33–38. Rothman, K. J. (1976). Reviews and commentary: Causes. American Journal of Epidemiology, 104(6), 587–592. Rothman, K. J., & Greenland, S. (2005). Causation and causal inference in epidemiology. American Journal of Public Health, 95(Suppl. 1), S144–S150. Rutter, M. (1988). Studies of psychosocial risk: The power of longitudinal data. New York: Cambridge University Press. Rutter, M. (1994). Concepts of causation, tests of causal mechanisms, and implications for intervention. In A. C. Petersen & J. T. Mortimer (Eds.), Youth unemployment and society (Vol. 13, pp. 147–171). New York: Cambridge University Press. Rutter, M., Pickles, A., Murray, R., & Eaves, L. (2001). Testing hypotheses on specific environmental causal effects on bevavior. Psychological Bulletin, 127(3), 291–324. Sameroff, A., & Seifer, R. (1995). Accumulation of environmental risk and child mental health (Vol. 31). New York: Garland Publishing. Scarr, S., & McCartney, K. (1983). How people make their own environments: A theory of genotype–environment effects. Child Development, 54(2), 424–435. Seifer, R., Sameroff, A. J., Baldwin, C. P., & Balwin, A. (1989, April). Risk and protective factors between 4 and 13 years of age. Paper presented at the annual meeting of the Society for Research in Child Development, San Francisco, CA. Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin. Silverton, L., Mednick, S. A., Schulsinger, F., Parnas, J., & Harrington, M. E. (1988). Genetic risk for schizophrenia, birthweight, and cerebral ventricular enlargement. Journal of Abnormal Psychology, 97(4), 496–498.
296
Causality and Psychopathology
Simmons, R. G., & Blyth, D. A. (1992). Moving into adolescence: The impact of pubertal change and school context. In P. H. Rossi, M. Useem, & J. D. Wright (Eds.), Social institutions and social change (pp. 366–403). New York: Aldine de Gruyter. Sroufe, L. A. (1988). The role of infant–caregiver attachment in development. In J. Belsky & T. Nezworski (Eds.), Clinical Implications of Attachment (pp. 18–38). Hillsdale, NJ: Lawrence Erlbaum Associates. Szatmari, P., Saigal, S., Rosenbaum, P., Campbell, D., & King, S. (1990). Psychiatric disorders at five years among children with birthweights <1000g: A regional perspective. Developmental Medicine and Child Neurology, 32(11), 954–962. Terr, L., Bloch, D., Michel, B., Shi, H., Reinhardt, J., & Metayer, S. (1999). Children’s symptoms in the wake of Challenger: A field study of distant-traumatic effects, an outline of related conditions. American Journal of Psychiatry, 156(10), 1536–1544. Tschann, J. M., Adler, N. E., Irwin, C. E., Jr., Millstein, S. G., Turner, R. A., & Kegeles, S. M. (1994). Initiation of substance use in early adolescence: The roles of pubertal timing and emotional distress. Health Psychology, 13(4), 326–333. Weisglas-Kuperus, N., Koot, H. M., Baerts, W., Fetter, W. P., & Sauer, P. J. (1993). Behaviour problems of very low birthweight children. Developmental Medicine and Child Neurology, 35(5), 406–416. Worthman, C. M., & Kuzara, J. (2005). Life history and the early origins of health differentials. American Journal of Human Biology, 17(1), 95–112.
12 Causes of Posttraumatic Stress Disorder naomi breslau
The definition of posttraumatic stress disorder (PTSD) in the Diagnostic and Statistical Manual of Mental Disorders, 3rd edition (DSM-III), and in subsequent DSM editions is based on a conceptual model that brackets traumatic events from other stressful experiences and PTSD from other responses to stress and links the two causally. The connection between traumatic experiences and a specific mental disorder has become part of the general discourse. PTSD provides a cultural template of the human response to war, violence, disaster, or very bad personal experiences. The DSM-III revolutionized American psychiatry. The manual’s editors wanted a symptom-based, descriptive classification and generally rejected any reference to causal theories about mental processes. PTSD was an exception to the rule of creating a classification that is ‘‘atheoretical with regard to etiology or pathophysiological process’’ (American Psychiatric Association, 1980 p. 7), but the exception was not noted anywhere in the manual.1 Not only did the PTSD definition include an etiological event, but it incorporated a theory, an underlying process, that connects the syndrome’s diagnostic features (McNally, 2003; Young, 1995). In 1994, the American Psychiatric Association published the fourth edition of the DSM. The definition of PTSD, which had already undergone some revisions in DSM-IIIR, maintained the syndrome’s description but changed materially the stressor criterion. The range of events was widened, and the emphasis shifted to the subjective experience of victims. The list of ‘‘typical’’ traumas in the DSM-IV left no doubt that the intent was to enlarge the variety of experiences that can be used to diagnose PTSD beyond the initial conception of directly experienced, life-threatening events such as combat, 1. The DSM-III did contain explicit exceptions for disorders with known etiology or pathophysiological process. For example, it stated that in organic mental disorders organic factors have been identified. It is hardly necessary to point out the difference between these mental disorders and PTSD with respect to causal assumptions.
297
298
Causality and Psychopathology
natural disaster, rape, and other assault. Persons who learned about a threat to the physical integrity of another person or about a traumatic event experienced by a friend could be considered victims. A novel form of PTSD took shape following the 9/11 terrorist attacks, when the entire population of the United States was considered to have been affected by a ‘‘distant’’ trauma, produced chiefly by viewing television coverage. Weeks after the attacks, researchers conducted telephone surveys and detected a rise in the prevalence of PTSD and major depression as well as a ‘‘dose–response’’ relationship between television viewing time and symptoms. Furthermore, a rise of new 9/11-related PTSD cases was reported among those who viewed televised images on the 1-year anniversary of the events (Bernstein et al., 2007). Some commentators criticized the ‘‘conceptual bracket creep’’ on several counts, including the concern that it produces a heterogeneous and ‘‘diluted’’ population of cases, making it far more difficult to detect and characterize pathological alterations in PTSD (McNally, 2003). This chapter examines three themes in research on PTSD. They concern essential features of the disorder that inform, intersect, and complicate one another. The first concerns the internal logic of PTSD in the DSM that captures the way a traumatic experience is linked to the clinical syndrome through memory. The second concerns risk factors and diathesis. The third concerns comorbidity, both bivariate associations of trauma and PTSD with specific disorders and multivariate approaches to underlying liabilities across a wide range of psychiatric disorders.
The Inner Logic of PTSD The definition of PTSD in the DSM is based on a conceptual model that brackets traumatic events from less severe stressors and links the traumatic events causally with a specific syndrome. The syndrome is defined by three clusters: (1) persistent reexperiencing of the trauma (e.g., intrusive recollections), (2) persistent avoidance of stimuli associated with the trauma and emotional numbing (e.g., diminished interest in significant activities), and (3) persistent symptoms of increased arousal (e.g., insomnia, concentration problems). The definition assumes an underlying psychological process. Its core elements are the recurrence in the present of the traumatic past (i.e., traumatic memory) and the co-occurrence (in alternating phases) of the characteristic clinical features. The trauma is reexperienced in intrusive and distressing recollections, experiences associated with increased arousal. The victim adapts by avoiding situations that stimulate painful reexperiencing. The presence of an underlying process is implicit in the description of PTSD in DSM-III. Unless the event was encoded in memory, intrusive
12 Causes of Posttraumatic Stress Disorder
299
recollections, flashback, and physiological reactivity to reminders could not occur. The majority of clinicians and researchers familiar with the literature on posttraumatic syndromes understood the classification’s subtext and accepted the presence of an underlying process, notwithstanding DSM-III’s editorial rejection of theories concerning causation. A second group also understood the subtext but interpreted it as a survival of psychoanalytic reasoning and opposed the inclusion of PTSD in DSM-III for this reason. A third group interpreted the PTSD symptom list as being consistent with the DSM-III editorial position. They believed that PTSD diagnosis is based on the presence of the indicated features and that presumptions about connections among symptoms are not justified. The centrality of traumatic memory implies a causal order: The memory of the event causes the onset of the syndrome. Diagnosing a case of PTSD requires a clinical inference regarding a causal connection between an identifiable past event and the subsequent onset of symptoms. However, this is not the only way that the event and the syndrome can be connected. A person with current unexplained symptoms may attribute a connection with a past event after gaining some new knowledge that allows him to reassess the original experience in a new light (Young, 2001). When the process begins with current symptoms, it proceeds from symptoms through a search for a past event that can qualify as a cause. A PTSD diagnosis that depends on distinguishing between these two causal possibilities would have been a problem. There are no tools for making such a distinction. There were none in 1980 and there are none today. It should be noted that the problem of distinguishing between these two causal directions is entirely different from the problem of finding objective information as to whether or not the reported event occurred. Objective information (e.g., from official records) that could corroborate the occurrence of a reported event is incapable of confirming that the event, when it occurred, created a distressing memory and that the memory caused the onset of symptoms. The DSM definition of PTSD requires only two elements, the typical clinical syndrome and an ‘‘identifiable stressor.’’ Once both are found, the correct link with traumatic memory, that is, that the memory of the event preceded and caused the onset of the syndrome, is assumed (Young, 2001). PTSD specialists have described how the clinician together with the patient ‘‘can translate current symptoms into disavowed traumatic memories’’ and how by doing that ‘‘both clinician and patient will gain compelling respect for the disorder’’ (Lindy, Green, & Grace, 1987) (p. 272). Although DSM-III circumvented the memory problem by not insisting on establishing its causal priority, it maintained a formal inner logic by linking criterion symptoms with the stressor. Specifically, recurrent nightmares and intrusive thoughts must mirror the event; avoidance must be of situations that recall the event and precipitate distress. In themselves, the
300
Causality and Psychopathology
criterion symptoms are nonspecific; they characterize other anxiety disorders and depression and are used in the definition of these other disorders. It is their co-occurrence and their connection with the stressor that transform these diagnostically ambiguous symptoms into a distinct DSM disorder that is PTSD. The disorder’s inner logic has important implications for how PTSD cases are ascertained. The use of a symptom list, with a specified cutoff score or as a continuum that represents varying degrees of PTSD severity, is a departure from the DSM construct. The problem of whether symptoms followed or instead preceded the traumatic memory has not disappeared. Although not addressed in the DSM definition, it is a central point in litigations and compensation claims. How can one decide whether the traumatic event, as portrayed by an individual patient, is the cause of the patient’s distress and not the reverse? Is there a conspicuous clinical feature that could distinguish between the two alternatives? The length of time that separates the onset of symptoms from the event might be such a feature: The longer the time interval, the greater the suspicion that the causal explanation is reversed, that is, from symptoms to a traumatic event. The intent in DSM-III was that ‘‘there would be temporally close juxtaposition between the stressor and the development of symptoms’’ (Andreasen, 2004) (p. 1322). Research in military and civilian samples supports that expectation; in most cases, symptoms begin within days of the event (Andrews, Brewin, Philpott, & Stewart, 2007; Breslau, Davis, Andreski, & Peterson, 1991; Jones & Wessely, 2005). A time lag from stressor to onset of symptoms was suspect from the start. In a volume entitled Psychiatric Diagnosis,Tthird Edition (1984), Goodwin and Guze comment on what was then a new condition in DSM-III called ‘‘PTSD.’’ They describe the political context in which PTSD was adopted and how it had been vigorously promoted by advocate groups on behalf of Vietnam veterans. They focused specifically on a subtype of PTSD called ‘‘delayed onset,’’ which entitled many veterans to compensation for serviceconnected injury, although symptoms first appeared many years after military discharge. Goodwin and Guze comment, ‘‘Rarely before had so many claimants presented themselves to psychiatric examiners having read printed checklists describing the diagnostic feature of the disorder for which they sought compensation.’’(page 82)
Risk Factors and Diathesis The Problem of Risk Factors in PTSD The bracketing of extreme stressors in the DSM-III definition of PTSD implied that, unlike more ordinary stressful life events, the causal effects
12 Causes of Posttraumatic Stress Disorder
301
of extreme stressors are independent of personal vulnerabilities. PTSD was said to be a normal response to an abnormal event. It was conceived as ‘‘normal’’ not merely because it was believed to be the norm in a statistical sense but also as produced directly (naturally) by the stressor. The role of stressors in PTSD was compared to ‘‘the role of force in producing a broken leg. (Andreasen, 1980). A 1985 comprehensive review of PTSD concluded that ‘‘the nature and intensity of the stressor is the primary etiological factor in individual differences in response to stress’’ (Green, Lindy, & Grace, 1985). Therapists and veterans’ advocates rejected any suggestions that predispositions played a part. A leading psychiatrist-advocate wrote in 1985 that the ‘‘predisposition theory’’’ has no standing at all among expert clinicians who treat war veterans or rape victims and survivors of civilian disasters and that ‘‘the predisposition theory is an instance of blaming the victim’’ (Blank, 1985). Based on the available literature, Breslau and Davis (1987) suggested that emotional disturbance is not a direct consequence of ‘‘extreme’’ stressors and that, in regard to extreme stressors, social and individual factors may modify the response in the same way that they modify the response to ordinary stressful life events. Their argument was restated in 1995 by Yehuda and McFarlane in an article on the conflict between current knowledge about PTSD and its original conceptual basis. By the ‘‘original conception’’ Yehuda and McFarlane refer to the DSM-III PTSD concept of a ‘‘natural response’’ to extraordinary events that did not depend on individual vulnerability. By ‘‘current knowledge’’ they refer to the epidemiological evidence about the relative rarity of PTSD among trauma victims and the associations of PTSD with risk factors other than trauma intensity. They note that, ‘‘even among those who are exposed to very severe and prolonged trauma [they refer to prisoners of war and Holocaust survivors], there is usually a substantial number of individuals who do not develop PTSD or other psychiatric illnesses’’ (p. 1708). Far from being a normal response to an abnormal stressor, it is an abnormal response observed only in some people. A person’s response to a stressor is determined not by the stressor but by ‘‘interaction’’ between the stressor and the victim’s risk factors. What is the epidemiological evidence that has accumulated since 1980? Studies of general population samples in the United States show a range of estimates of the probability of PTSD given exposure to trauma (Table 12.1). The heterogeneity across studies reflects differences in the way in which trauma-exposed persons were identified. Longer lists of stressors, which include events of lesser magnitude, added in DSM-IV, yield higher proportions of exposed persons but lower percentages of exposed persons meeting PTSD criteria. Estimates of PTSD associated with specific event types are more consistent across studies. Table 12.2 displays conditional probabilities
302
Causality and Psychopathology
Table 12.1 Lifetime Prevalence of Exposure and PTSD (Rate/100) Exposure Breslau et al. (1991) Norris (1992) Resnick et al. (1993) Kessler et al. (1995) Breslau et al. (1997) Stein et al. (1997) Breslau et al. (1998) Breslau et al. (2004b) Kessler et al. (2005)
PTSD
M
F
M
F
43.0 73.6 — 60.7 — 81.3 92.2 87.2 —
36.7 64.8 69.0 51.2 40.0 74.2 87.1 78.4 —
6.0 — — 5.0 — — 6.2 6.3 3.6
11.3 — 12.3 10.4 13.8 — 13.0 7.9 9.7
Table 12.2 Conditional Probability of PTSD Across Specific Traumas: Estimates From Two Population Surveys
Assaultive violence Rape Shot/stabbed Sexual assault other than rape Mugged/threatened with weapon Badly beaten up Other injury or shock Serious car crash Other serious accident Natural disaster Witnessed killing/serious injury Discovered dead body Learning about others Sudden unexpected death Any Trauma
Detroit Areaa n % PTSD
n
286 32 21 27 138 53 633 168 87 109 183 45 564 474 1,957
304 39 64 38 123 30 287 50 17 20 149 19 238 543 1,372
20.9 49.0 15.4 23.7 8.0 31.9 6.1 2.3 16.8 3.8 7.3 0.2 2.2 14.3 9.2
Baltimoreb % PTSD 15.1 46.2 9.4 29.0 4.1 13.3 6.6 10.0 5.9 0.0 5.4 5.3 2.9 9.0 8.8
a
Breslau et al. (1998). bBreslau et al. (2004a).
from two epidemiological studies that used the DSM-IV definition. In both studies, stressors grouped under ‘‘assaultive violence’’ were associated with the highest probability of PTSD (15% and 21%) and learning about trauma experience by a close friend or relative was associated with the lowest probability (2.2% and 2.9%). The risk from any qualifying trauma was <10% (8.8% and 9.2%). Clearly, even trauma types that have the highest PTSD risk leave the majority of victims unaffected by the disorder.
12 Causes of Posttraumatic Stress Disorder
303
PTSD among American Vietnam veterans was estimated in the National Vietnam Veterans Readjustment Survey (NVVRS), a representative sample of veterans (Kulka et al., 1990). The lifetime prevalence of DSM-IIIR PTSD in male Vietnam theater veterans was 30.6%. A recent revisit of the survey adjusted the lifetime estimate to 18.7% (Dohrenwend et al., 2006). As to stressor severity, the evidence from civilian studies has been weaker than from veterans’ studies (Brewin, Andrews, & Valentine, 2000). Additionally, in the two population samples that we surveyed that used DSM-IV, we found that the higher PTSD risk associated with assaultive violence (vs. other event types) was observed only in females (Breslau et al., 1998; Breslau, Wilcox, Storr, Lucia, & Anthony, 2004b) (Table 12.3).
Research on Risk Factors Studies of risk factors have examined lists of variables that included sociodemographic factors together with personality traits and biographical events. The NVVRS included race, family religion, family socioeconomic status, educational attainment, marital status, child abuse, childhood behavioral problems, family mental-health problems, and history of mental-health problems (Kulka et al., 1990). A meta-analysis of a long list of risk factors for PTSD discovered heterogeneity between civilian and veteran studies as well as across methods (Brewin et al., 2000). However, three risk factors were uniform across populations and methods: psychiatric history, family psychiatric history, and early adversity (Brewin et al., 2000). Although the effect of each individual risk factor examined in the meta-analysis was relatively small, their sum might outweigh the impact of trauma severity (Brewin et al., 2000, p. 756). Reports of high prevalence of PTSD among prisoners of war (50% or even higher in some subgroups) have not examined predispositions (Engdahl, Dikel, Eberly, & Blank, 1997; Goldstein, van Kammen, Shelly, Miller, & van Kammen, 1987). Little research effort has been devoted to the question of the Table 12.3 Conditional Probability of PTSD: Sex–Specific Comparisons Detroit Areaa M (%) Assaultive violence 6.0 Excluding rape/sexual assault 6.0 Other injury or shock 6.6 Learning about others 1.4 Sudden unexpected death 12.6 a
Breslau et al. (1998). bBreslau et al. (2004a).
Baltimoreb F (%)
M (%)
F (%)
35.7 32.3 5.4 3.2 16.2
7.1 4.7 7.9 2.8 9.2
23.5 12.7 5.2 3.1 8.8
304
Causality and Psychopathology
contribution of predispositions to the psychiatric outcomes of extreme stressors or across stressors’ magnitude. Does exposure to severe stressors override the effects of predispositions on posttrauma psychiatric disturbance? An exception is a 1981 publication by Helzer, based on data from a follow-up study of Vietnam veterans conducted in the early 1970s, that sheds light on this issue. Helzer (1981) examined the effects of antecedent factors (failure to graduate from high school, drug use) on depression across different levels of combat stress (measured by number of combat events). He reported that the influence of these antecedents was marked (and statistically significant) only at high levels of combat stress. A dose–response relationship between levels of combat exposure and psychopathology, reported in that study, was not accompanied by a corresponding decrease in the impact of predispositions. The opposite pattern was observed. A meta-analysis of risk factors for PTSD (or PTSD symptoms) by Ozer, Best, Lipsey, and Weiss (2003) concluded that ‘‘peritraumatic’’ responses (e.g., dissociation as the immediate response to the stressor) count the most. There is a conceptual problem in this analysis (and similar studies concerning negative appraisal as a risk factor, e.g., Ehlers & Clark, 2000) in that peritraumatic responses and appraisal might themselves be aspects of the outcome we wish to explain. Dissociations, negative appraisal, and PTSD are likely to be manifestations of the same psychological process or consequences of a common vulnerability.
Prior Trauma as Risk Factor A frequently replicated epidemiological finding is the enhanced probability of PTSD in exposed persons who had experienced prior traumatic events. Studies of Vietnam veterans and general population samples have reported higher rates of prior trauma (including childhood maltreatment) among exposed persons who succumbed to PTSD than among exposed persons who did not (Bremner, Southwick, Johnson, Yehuda, & Charney, 1993; Breslau, Chilcoat, Kessler, & Davis, 1999; Galea et al., 2002; Yehuda, Resnick, Schmeidler, Yang, & Pitman, 1998). The finding has been interpreted as supporting a ‘‘sensitization’’ process, that is, greater responsiveness to subsequent stressors (Post & Weiss, 1998). This interpretation further highlights stressors as a cause of PTSD. Now stressors play two roles: They cause PTSD directly and, through a separate causal pathway, increase the vulnerability to PTSD in the future. The evidence on prior trauma comes almost exclusively from cross-sectional studies and retrospective reports, a limitation that is generally acknowledged in the literature. A major limitation that has been overlooked is the failure to assess how persons had responded to the prior trauma—specifically, whether or
12 Causes of Posttraumatic Stress Disorder
305
not they had developed PTSD in response to the prior trauma. Consequently, it is unclear whether prior trauma per se or, instead, prior PTSD predicts an elevated risk for PTSD following a subsequent trauma. Evidence that previously exposed persons are at increased risk for PTSD only if their prior trauma resulted in PTSD would not support the hypothesis that exposure to traumatic events increases the risk of (i.e., sensitizes to) the PTSD effects of a subsequent trauma, transforming persons with ‘‘normal’’ reactions to stressors into persons susceptible to PTSD. It might suggest that trauma precipitates PTSD in persons with preexisting susceptibility that had already been present before the prior trauma occurred. Evidence that personal vulnerabilities, chiefly neuroticism, history of major depression and anxiety disorders, and family history of psychiatric disorders, increase the risk for PTSD has been consistently reported. There also is evidence that personal vulnerabilities might be stronger predictors of psychiatric response to traumatic events than trauma severity, especially in civilian samples. We recently examined this question in our longitudinal epidemiological study of young adults (Breslau, Peterson, & Schultz, 2008). At baseline and at three reassessments over the following 10 years, respondents were asked about the occurrence of traumatic events and PTSD. Data from one followup assessment or more were available on 990 respondents (98.3% of the initial panel). Exposure to trauma and PTSD measured at baseline and at the 5-year follow-up were used to predict new exposure and PTSD during the respective subsequent periods: from baseline to the 5-year assessment and from the baseline and 5-year assessments to the 10-year assessment. Preexisting major depression and any anxiety disorder were included as covariates to control for their effects (Table 12.4). In this adjusted model the relative risk for PTSD following exposure to traumatic events in subsequent periods was significantly higher among trauma victims with PTSD in the preceding periods than in trauma victims
Table 12.4 Prior Trauma and PTSD and the Subsequent Occurrence of Trauma and PTSD (n = 990)
Prior PTSD Prior trauma/no PTSD No Prior Trauma
First Follow-Up Period
Second Follow-Up Period
n
Exposed (%)
PTSD Among Exposed (%)
n
Exposed (%)
PTSD Among Exposed (%)
92 294
42.4 33.3
18.0 12.2
105 386
60.0 41.5
19.1 6.3
604
24.0
8.3
419
27.4
6.1
From Breslau et al 2008
306
Causality and Psychopathology
Table 12.5 Relative Risk for PTSD Following Exposure to Trauma Associated With Prior Trauma, Prior PTSD, and Covariates Variable
Bivariate estimates, OR (95% CI)
Multivariable Model, aOR (95% CI)
Prior PTSD Prior trauma/no PTSD Female (vs. male) White (vs. Black) College education (less) Preexisting major depression Preexisting anxiety
3.01 1.24 2.51 0.61 0.81 2.72 2.35
2.68 1.22 1.94 0.60 1.00 2.09 1.65
(1.52, (0.65, (1.25, (0.33, (0.41, (1.49, (1.32,
5.97)* 2.36) 5.06)* 1.12) 1.59) 4.99)* 4.20)*
(1.33, (0.64, (0.93, (0.32, (0.51, (1.71, (0.91,
5.41)* 2.34) 4.07) 1.11) 1.96) 3.75)* 2.97)
From bivariate and a multivariable generalized estimating equation (GEE) multinomial regressions. Each of the bivariate models and the multivariable model includes a term for time interval (suppressed). From :Breslau et al 2008 *p < 0.05.
who had not succumbed to PTSD. Odds ratios were 2.68 (95% confidence interval 1.33–5.41) and 1.22 (95% confidence interval 0.64–2.34), respectively, adjusted for sex, race, education, preexisting major depression and anxiety disorders, and time of assessment (Table 12.5). We concluded that there was no support in these data for the idea that traumatic events experienced in the past lurk inside, waiting to shape reactions to future traumatic events. The findings suggest that preexisting susceptibility to a pathological response to stressors accounts for the PTSD response to the prior trauma and the subsequent trauma. Our results had been foreshadowed by a 1987 Israeli study of acute combat stress reaction (CSR) among soldiers in the 1982 Lebanon War (Solomon, Mikulincer, & Jakob, 1987). The authors reported that CSR occurred more frequently among soldiers of the Lebanon War who had experienced CSR in a previous war, but not among soldiers who had fought in a previous war but had not experienced CSR, compared to new recruits who had not fought in a previous war. The authors concluded that knowledge of the outcome of prior combat was essential for predicting soldiers’ response to subsequent combat. Soldiers who suffered CSR in a previous war might have had preexisting vulnerability that also accounted for their increased risk of CSR during the subsequent war. Soldiers who had fought in a previous war but had not experienced CSR had a lower rate of CSR during the subsequent war than new recruits who had no war experience. It is tempting to interpret this observation as evidence of ‘‘inoculation.’’ However, the new recruits included soldiers who would have had CSR had they fought in a prior war. This undetected ‘‘vulnerable’’ subset would push up the rate of CSR in the group of new recruits as a whole.
12 Causes of Posttraumatic Stress Disorder
307
Summarizing their vast research on how genetic and environmental factors combine to influence the risk of depression in adulthood, Kendler and Prescott (2006) observe that environmental risk factors have time-limited effects. ‘‘Although we are sure this is an oversimplification it does seem that what causes temporally stable liability to depression comes from our genes, whereas environmental factors create the large but brief spikes in risk that induces episodes in vulnerable individuals’’ (p. 343). They make a similar observation for antisocial behavior. It is reasonable to ask, Could this time-limited effect apply also to traumatic events and PTSD? I do not wish to ignore the caveat that Kendler and Prescott issued about oversimplification. They might have had in mind childhood events, such as childhood sexual abuse. However, with respect to these distant events, they have shown that they are entangled with one another and with genetic factors influencing depression and that the path to a recent major depression episode is indirect, through its effects on lifetime trauma, conduct disorder, and recent stressful life events.
Intelligence Studies in Vietnam veterans have reported associations between intelligence test scores and the risk for PTSD (Macklin et al., 1998; McNally & Shin, 1995; Pitman, Orr, Lowenhagen, Macklin, & Altman, 1991). Evidence on the role of intelligence in children’s psychiatric response to adversity was reported for a range of disorders and for PTSD (Fergusson & Lynskey, 1996; Silva et al., 2000). Several articles published in 2006 and 2007 reported on cognitive ability measured in early childhood and subsequent PTSD in general population samples (Breslau, Lucia, & Alvarado, 2006; Koenen, Moffitt, Poulton, Martin, & Caspi, 2007; Storr, Ialongo, Anthony, & Breslau, 2007) and on Vietnam veterans from the twin registry for whom predeployment test scores were available (Kremen et al., 2007). Some of the studies found that a decrease in risk was conferred by high IQ rather than the full range of IQ. For example, we found that age 6 Wechsler Intelligence Scale for Children– Revised IQ >115 was associated with a lower risk of subsequent exposure to trauma and, among those exposed, a markedly lower risk of PTSD (adjusted odds ratio = 0.21) (Breslau et al., 2006). Similarly, Gilbertson et al. (2006) reported that above average cognitive functions protect from chronic PTSD and that those with PTSD had average, rather than below average, cognitive function. The mean IQ of PTSD veterans and their monozygotic twin brothers was 105, whereas the mean IQ of non-PTSD combat veterans and their monozygotic twin brothers was 118. These studies perform two tasks. First, they dispel the notion that IQ deficits observed among patients with PTSD reflect stress-induced
308
Causality and Psychopathology
neurotoxicity, the primary hypothesis in earlier PTSD studies (Bremner, 1999; Sapolsky, Uno, Rebert, & Finch, 1990). Observed cross-sectional associations with IQ do not reflect the effects of psychological trauma but are more likely to reflect preexisting differences. Second, these studies suggest that high IQ plays a protective role. The mechanisms by which high IQ deters the PTSD effects of trauma are unclear. Gilbertson et al. (2006) suggested that high IQ signals a general capacity to effectively and flexibly manipulate verbal information and, thus, a capacity to place traumatic experiences into meaningful concepts, which may reduce negative emotional impact (p. 493).
Neuroticism Neuroticism is a personality trait that at the high end is a disposition to respond to stress with negative affect, depression, and anxiety and at the low end manifests as emotional stability and ‘‘normality.’’ An early study that called attention to neuroticism’s salience in the psychiatric response to traumatic experiences reported on the survivors of the 1983 Australian bushfires. In contrast with the expectation that the intensity of the stressor would be the primary cause, neuroticism and history of predisaster disturbances emerged as stronger predictors of morbidity (McFarlane, 1988, 1989). Studies of Vietnam combat veterans reported that PTSD and PTSD symptoms were correlated with neuroticism (Casella & Motta, 1990; Hyer et al., 1994; Talbert, Braswell, Albrecht, Hyer, & Boudewyns, 1993). In a general population sample of young adults, neuroticism predicted both exposure to traumatic events and PTSD after exposure, controlling for other risk factors (Breslau, Davis, & Andreski, 1995; Breslau et al., 1991). In most of the studies neuroticism was measured after the trauma, but three studies measured neuroticism prior to the trauma and reported an association between neuroticism and PTSD or postdisaster disturbance (Alexander & Wells, 1991; Engelhard, van den Hout, & Kindt, 2003; Parslow, Jorm, & Christensen, 2006). Recently, prospective studies have reported that anxious/depressed mood, anxiety disorders, and difficult temperament measured in childhood predicted subsequent PTSD (Breslau et al., 2006; Koenen et al., 2007; Storr et al., 2007). Research on neuroticism demonstrated connections with neurophysiological substrates, in particular, the lability of the autonomic nervous systems. There is evidence supporting heritability and stability from childhood to adulthood. Genetic control of neuroticism has been reported in numerous studies since the 1970s. Molecular genetics studies, using both association and linkage methods, identified gene regions that are likely to influence variation in neuroticism (Fullerton et al., 2003; Lesch et al., 1996). A recent meta-analysis concluded that there is a strong association between a serotonin transporter promoter polymorphism (5-HTTLPR) and neuroticism, when
12 Causes of Posttraumatic Stress Disorder
309
neuroticism is measured by the NEO Personality Inventory (Sen, Burmeister, & Ghosh, 2004). Analysis of the Virginia Twin Registry showed that neuroticism was only minimally changed following major depression episodes (Kendler & Prescott, 2006), suggesting that reported cross-sectional associations between neuroticism and disorders reflect primarily the effect of neuroticism. Kendler and Prescott also showed that genetic factors underlying neuroticism are largely shared with those that influence the liability for internalizing disorders, although PTSD was not included. There is evidence that neuroticism contributes to psychopathology even more broadly, including externalizing disorders and comorbidity between internalizing and externalizing disorders (Khan, Jacobson, Gardner, Prescott, & Kendler, 2005; Krueger & Markon, 2001). Placing neuroticism in a list of risk factors containing chiefly attributes of aggregates might obscure the potential status of neuroticism as diathesis. Neuroticism accounts for the process that is PTSD: the way in which a stressor is perceived and appraised and the characteristic features of the PTSD syndrome. As a propensity related to stress reactivity, high neuroticism predicts the repetitions of the memory of the past trauma in the present (ruminating and reexperiencing symptoms), the phobic avoidance, the dysphoria, and associated sleep and concentration problems that characterize PTSD. While it maps on the PTSD characteristic features, it simultaneously connects PTSD with the main body of neurobiological science.
Comorbidity Studies of general population samples have confirmed earlier observations from clinical samples and samples of Vietnam veterans that persons diagnosed with PTSD have high rates of other psychiatric disorders. One explanation that has been proposed is that stressors that cause PTSD also cause other disorders. Because in PTSD there is always an identified etiological stressor, it is logical to ask, Could not the same stressor also have caused the comorbid disorder via a separate and distinct pathway (Yehuda, McFarlane, & Shalev, 1998)? The hypothesis that stressors increase the risk for other disorders, independent of their PTSD effects, would be supported by evidence of elevated incidence of other disorders in trauma victims who did not succumb to PTSD relative to persons who were not exposed to trauma. Conversely, evidence of an increased risk for the subsequent onset of major depression or substanceuse disorders only in victims with PTSD, relative to persons who did not experience trauma, would suggest that PTSD might be the cause or, alternatively, that the two disorders share a common underlying vulnerability.
Causality and Psychopathology
310
The uniqueness of PTSD in the DSM has methodological implications for evaluating this question. Because a link with a stressor is required for PTSD, the risk of PTSD in trauma victims is measured by a conditional probability (probability among those exposed to a stressor). In contrast, the risk for other psychiatric disorders following trauma is measured by a ratio of risks, or a relative risk, which is the standard epidemiological method for evaluating a suspected cause. The definition of depression or substance-use disorder requires no link with a stressor, and the risk for these disorders in trauma victims is evaluated relative to an unexposed reference group. Additionally, because we are interested in whether exposure per se, independent of PTSD, causes another disorder, victims are separated into two subsets, those with and those without PTSD. In our longitudinal study of young adults, the adjusted odds ratio for the first onset of major depression in persons with preexisting PTSD was 2.96 and in persons who were exposed to trauma but did not develop PTSD, 1.35 (not significant); the difference between the odds ratios was statistically significant (Table 12.6). Neither PTSD nor history of trauma exposure increased the risk for the onset of alcohol-use disorder. However, preexisting PTSD, but not history of trauma exposure, increased the risk for the subsequent onset of drug-use disorder. The difference between odds ratios associated with prior PTSD and prior exposure was statistically significant. A recent report from another longitudinal study on the incidence of drug-use disorders shows similar results. An increased 1-year incidence of drug disorder was found for persons with preexisting PTSD but not for persons with exposure alone (Reed, Anthony, & Breslau, 2007). To gain further understanding of the relationship between PTSD and these other disorders, we estimated associations in the reverse direction.
Table 12.6 Incidence and Relative Risk for Other Disorders in 10-Year Follow-Up in Detroit Area Study of Young Adults PTSD
Major depressiona,b Alcohol A/D Drug A/Da
Exposed/No PTSD
Not Exposed
%
aOR (95% CI)
%
aOR (95% CI)
%
aOR (95% CI)
38.5
2.96* (1.59–5.53)
19.5
1.35 (0.89–2.03)
17.1
—
15.8 10.6
1.45 (0.67–3.17) 4.34* (1.63–11.53)
15.6 2.2
1.14 (0.71–1.85) 0.72 (0.25–2.05)
12.8 2.6
— —
aOR, odds ratio adjusted for sex, race, and education. a Comparisons of aORs between PTSD and exposure/no PTSD are significant at *p < 0.05. Data on substance-use disorders are from Breslau et al. (2003). bSimilar results on major depression from 5-year follow-up are in Breslau et al. (2000). A/D = abuse or dependence
12 Causes of Posttraumatic Stress Disorder
311
We examined whether preexisting major depression and substance-use disorders increased the risk for new exposure to traumatic events or the conditional risk for PTSD. What we found is that preexisting major depression predicted an increased risk for subsequent exposure and for PTSD among persons exposed to trauma. Drug-use disorder was associated with an increased risk for neither subsequent exposure nor PTSD. Preexisting alcohol disorder was associated with an increased risk for PTSD following exposure but not for exposure (Table 12.7). The prospective results, taken together, suggest different explanations across comorbid disorders. (1) The bidirectional relationship between major depression and PTSD together with the evidence that preexisting major depression increased the likelihood of subsequent exposure and PTSD following exposure suggest a shared diathesis or shared environmental causes other than the traumatic event (for which there was no evidence). (2) In contrast, drug-use disorder, for which there was support for only one direction—from PTSD to drug-disorder onset—might be a complication of PTSD. A general conclusion that can be drawn is that persons who experienced trauma and who did not develop PTSD (i.e., most of those exposed to traumatic events) are not at an elevated risk for major depression and drug-use disorders compared with unexposed persons. The excess incidence of these disorders in persons exposed to trauma is concentrated primarily in the small subset of exposed persons with PTSD. Reports from other studies on lifetime and current co-occurrence of other disorders with trauma and PTSD support this generalization (Breslau, Chase, & Anthony, 2002; North & Pfefferbaum, 2002). Another approach to comorbidity that illuminates etiology is the application of quantitative models to large data sets on multiple disorders. Specific psychiatric disorders are understood as manifestations of a small number of
Table 12.7 Risk for Exposure to Trauma and PTSD by Preexisting Disorders Preexisting Diagnoses
Major depression Drug A/D Alcohol A/D
Exposure in Total Sample (n = 1,007)
PTSD in Exposed (n = 399)
HR (95% CI)
HR (95% CI)
2.0* (1.3–3.0) 1.1 (0.7–1.7) 1.1 (0.8–1.6)
3.7* (2.0–6.7) 1.1 (0.5–2.7) 2.1* (1.2–3.9)
HR, hazard ratio, adjusted for sex, race, and education from eight Cox proportional hazards models with time-dependent variables; CI, confidence interval. From ‘‘Estimating post-traumatic stress disorder in the community: Lifetime perspective and the impact of typical traumatic events,’’ by N. Breslau, E. L. Peterson, L. M. Poisson, L. R. Schultz, & V. C. Lucia, 2004a, Psychological Medicine, 34(5), 889–898.
Causality and Psychopathology
312
liability factors. These latent factors explain comorbidity by virtue of their impact on multiple disorders. Krueger and Markon (2006) presented results from a meta-analysis of data on 11 disorders from five population samples. The best-fitting model comprises internalizing and externalizing liability factors (correlated at 0.50), and the internalizing factor splits into two separable (but highly correlated) liabilities, labeled ‘‘distress’’ and ‘‘fear’’ (Figure 12.1). Distress is the liability for depressive disorders and generalized anxiety disorder; fear is the liability for panic and phobic disorders. Similar models were identified in previous studies by Krueger (1999), using phenotypic data, and by Kendler, Prescott, Myers, and Neale (2003), using twin designs. PTSD was not included in these analyses. A factor analysis of multiple disorders that included PTSD by Slade and Watson (2006) found that PTSD loaded highly on the distress factor (Figure 12.2). In an earlier analysis by Watson (2005), PTSD also loaded on the distress factor together with depression and generalized anxiety disorder. However, its affinity to the distress factor was weaker than the affinity of those other disorders (Table 12.8). Watson cites evidence from an analysis of data on Gulf War veterans, suggesting that PTSD’s affinity to the distress liability is due to a dysphoria factor in PTSD, which combines symptoms of emotional numbing with insomnia, irritability, and poor concentration that are prevalent in depression and anxiety disorders (Simms, Watson, & Doebbeling, 2002).
.50
Internalizing
.95
Externalizing
.78
Distress
.86 .84 Major Dysthymia Depression
Fear
.74
.78
.79 .71
Generalized Anxiety Agoraphobia Disorder
Social Phobia
.70
.70 Specific Phobia
Panic Disorder
Alcohol Disorder
.75 .84 Drug Disorder
.59 Conduct Disorder
Adult Antisocial Behavior
Figure 12.1 Path diagram for best-fitting meta-analysis model. Used with permission of ANNUAL REVIEWS, INC., from Annual Review of Clinical Psychology, article by R. Krueger and K. E. Markon, volume 2, 2006; permission conveyed through Copyright Clearance Center, Inc.
12 Causes of Posttraumatic Stress Disorder Major depression
0.81
Dysthmyia
0.82
Posttraumatic stress
0.83 0.85
Generalized anxiety
313
Distress 0.9
0.75 Internalizing
Neurasthenia 0.82 Social Phobia Fear
0.83
0.9
Panic disorder Agoraphobia
0.83 0.78
0.6
Obsessive-compulsive Alcohol dependence
Externalizing
0.72 0.70
Drug dependence
Figure 12.2 Best-fitting model of the structure of 10 DSM-IV disorders (Australian NSMHWB) (N = 10,641). From Tim Slade and David Watson, ‘‘The structure of common DSM-IV and ICD-10 mental disorders in the Australian general population,’’ Psychological Medicine, volume 32, issue 11, page 1597, 2006 ! Cambridge Journals, reproduced with permission.
Table 12.8 Factor Loadings of Lifetime DSM-III-R Diagnoses: NCS Data (n = 5,877)
Dysthymia Major depressive episode Generalized anxiety disorder Posttraumatic stress disorder Alcohol dependence Antisocial personality disorder Drug dependence Simple phobia Agoraphobia Social phobia Panic disorder Bipolar disorder
Factor 1
Factor 2
Factor 3
.80 .75 .62 .40 –.02 .01 .01 –.02 .10 –.09 .31 .33
.01 .00 –.01 .15 .76 .75 .74 –.04 –.04 .08 –.06 .29
–.13 .02 .15 .17 –.05 –.03 .04 .74 .68 .67 .50 .29
Reproduced with permission from Watson, D. (2005). Rethinking the mood and anxiety disorders: A quantitative hierarchical model for DSM-V. Journal of Abnormal Psychology, 114(4), 522–536. Copyright ! 2005 by the American Psychological Association. The use of APA information does not imply endorsement by APA.
314
Causality and Psychopathology
What might these underlying liabilities be? Kruege (1999) points out that the two major spectra of disorders, internalizing and externalizing, are linked in the literature to personality traits of neuroticism and disinhibition: neuroticism to internalizing disorders and neuroticism in the presence of high disinhibition to externalizing disorders. Based on our findings, we have suggested the possibility of a common diathesis between PTSD and major depression and proposed that it might be a mistake to regard PTSD and major depression in ‘‘comorbid’’ cases as separate and distinct (Breslau, Davis, Peterson, & Schultz, 2000). As to the relationship of PTSD with substance-use disorders, the evidence suggests a different explanation. If there are common underlying liabilities, they are probably weaker. It is clear that we cannot conclude that the association of PTSD with alcohol- or drug-use disorder is environmental, with alcohol or drug involvement increasing the probability of exposure to traumatic events and indirectly increasing the risk for PTSD. Also, survival analysis with time-dependent covariates of the retrospective data gathered at baseline did not support a causal pathway from substance-use disorders to exposure. A recent study on another sample replicates these findings (Reed et al., 2007). The evidence in our prospective data that PTSD increased the risk for drug-use disorders, especially prescription medicines, if replicated, would provide a part of the explanation. PTSD and substance-use disorder are probably connected by multiple pathways, including a more complex shared liabilities pattern which involves both neuroticism and disinhibition.
Conclusion Findings from our prospective research help to rule out some of the potential pathways that might account for PTSD comorbidity. Trauma-exposed persons who did not succumb to PTSD (i.e., about 90% of those exposed) are not at a markedly increased risk for other disorders. PTSD following exposure to stressors might identify persons with preexisting liability to a range of disorders. The findings do not support the idea that trauma caused PTSD in some victims and major depression in others. They led us to conclude that the two disorders might have a shared diathesis and that, when observed together in ‘‘comorbid’’ cases, they are not distinct disorders with separate etiologies. Multivariate analysis of psychiatric comorbidity illuminates etiology by seeking to identify core processes underlying multiple disorders. The liability constructs that emerge resemble personality traits linked to psychopathology. Neuroticism is a liability for internalizing disorders, a spectrum that contains PTSD. The construct of PTSD as a process with an inner logic has close affinity to neuroticism. The core dimension in neuroticism is the
12 Causes of Posttraumatic Stress Disorder
315
individual’s propensity to respond to stressors. At the low end, it describes a normal response style. At the high end, it describes a predisposition for emotional instability, negative affect, and lability of the autonomic nervous system. Neuroticism is not merely a risk factor that, with other risk factors that are attributes of aggregates, increases the probability of PTSD. It is better conceived of as an underlying liability for PTSD and its association with other disorders, including depression and substance-use disorders. The possibility that another liability trait, extroversion (or disinhibition), comes into play in the comorbidity trajectory of PTSD, particularly with respect to substance-use disorders, fits within the general liability framework, in which the two traits are moderately correlated. Drug-use disorder in PTSD cannot be assumed to be external to this liability framework. It cannot be regarded as an environmental factor that increases the likelihood of exposure to traumatic events, indirectly raising the risk for PTSD. We found no evidence for this pathway in two prospective studies. What was left out in this examination of causes of PTSD is history and society. For that, we now have rich accounts that trace trauma theories, psychiatric observations, and policies, including the waxing and waning of attention to social factors and individual predispositions in cross-cultural and military psychiatry (Breslau, 2004, 2005; Jones & Wessely, 2005; Shephard, 2001; Young, 1995). Key factors accounting for variability in psychiatric casualties during warfare were morale, group cohesion, and leadership. Perhaps the most important and simplest historical lesson is contained in Shephard’s comment that living in a robust and self-confident culture helps.
References Alexander, D. A., & Wells, A. (1991). Reactions of police officers to body-handling after a major disaster. A before-and-after comparison. British Journal of Psychiatry, 159, 547– 555. American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders (3rd ed.). Washington DC: Author. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders: DSM-IV (4th ed.). Washington DC: Author. Andreasen, N. C. (1980). Post-traumatic stress disorder. In A. M. Freedman, H. I. Kaplan, & B. J. Sadock (Eds.), Comprehensive textbook of psychiatry (3rd ed.). Baltimore: Williams & Wilkins. Andreasen, N. C. (2004). Acute and delayed posttraumatic stress disorders: A history and some issues. American Journal of Psychiatry, 161(8), 1321–1323. Andrews, B., Brewin, C. R., Philpott, R., & Stewart, L. (2007). Delayed-onset posttraumatic stress disorder: A systematic review of the evidence. American Journal of Psychiatry, 164(9), 1319–1326.
316
Causality and Psychopathology
Bernstein, K. T., Ahern, J., Tracy, M., Boscarino, J. A., Vlahov, D., & Galea, S. (2007). Television watching and the risk of incident probable posttraumatic stress disorder: A prospective evaluation. Journal of Nervous and Mental Disease, 195(1), 41–47. Blank, A. S. (1985). Irrational reactions to post-traumatic stress disorder and Viet Nam veterans. In S. Sonnenberg, A. S. Blank, Jr., & J. A. Talbott (Eds.), The trauma of war: Stress and recovery in Viet Nam veterans (p. xxi). Washington DC: American Psychiatric Press. Bremner, J. D. (1999). Does stress damage the brain? Biological Psychiatry, 45(7), 797–805. Bremner, J. D., Southwick, S. M., Johnson, D. R., Yehuda, R., & Charney, D. S. (1993). Childhood physical abuse and combat-related posttraumatic stress disorder in Vietnam veterans. American Journal of Psychiatry, 150(2), 235–239. Breslau, J. (2004). Cultures of trauma: Anthropological views of posttraumatic stress disorder in international health. Culture, Medicine and Psychiatry, 28(2), 113–126. Breslau, J. (2005). Response to ‘‘Commentary: Deconstructing critiques on the internationalization of PTSD’’. Culture, Medicine and Psychiatry, 29(3), 371–376. Breslau, N., Chase, G. A., & Anthony, J. C. (2002). The uniqueness of the DSM definition of post-traumatic stress disorder: Implications for research. Psychological Medicine, 32(4), 573–576. Breslau, N., Chilcoat, H. D., Kessler, R. C., & Davis, G. C. (1999). Previous exposure to trauma and PTSD effects of subsequent trauma: Results from the Detroit Area Survey of Trauma. American Journal of Psychiatry, 156(6), 902–907. Breslau, N., & Davis, G. C. (1987). Posttraumatic stress disorder. The stressor criterion. Journal of Nervous and Mental Disease, 175(5), 255–264. Breslau, N., Davis, G. C., & Andreski, P. (1995). Risk factors for PTSD-related traumatic events: A prospective analysis. American Journal of Psychiatry, 152(4), 529–535. Breslau, N., Davis, G. C., Andreski, P., & Peterson, E. (1991). Traumatic events and posttraumatic stress disorder in an urban population of young adults. Archives of General Psychiatry, 48(3), 216–222. Breslau, N., Davis, G. C., Peterson, E. L., & Schultz, L. (1997). Psychiatric sequelae of posttraumatic stress disorder in women. Archives of General Psychiatry, 54(1), 81–87. Breslau, N., Davis, G. C., Peterson, E. L., & Schultz, L. R. (2000). A second look at comorbidity in victims of trauma: The posttraumatic stress disorder–major depression connection. Biological Psychiatry, 48(9), 902–909. Breslau N, Davis GC, Schultz L. (2003). Posttraumatic Stress Disorder and the incidence of nicotine, alcohol and drug disorders in persons who have experienced trauma. Archives of General Psychiatry, 60, 289–294. Breslau, N., Kessler, R. C., Chilcoat, H. D., Schultz, L. R., Davis, G. C., & Andreski, P. (1998). Trauma and posttraumatic stress disorder in the community: The 1996 Detroit Area Survey of Trauma. Archives of General Psychiatry, 55(7), 626–632. Breslau, N., Lucia, V. C., & Alvarado, G. F. (2006). Intelligence and other predisposing factors in exposure to trauma and posttraumatic stress disorder: A follow-up study at age 17 years. Archives of General Psychiatry, 63(11), 1238–1245. Breslau, N., Peterson, E., & Schultz, L. (2008). A second look at prior trauma and the posttraumatic stress disorder-effects of subsequent trauma: A prospective epidemiological study. Archives of General Psychiatry, 65(4), 431–437. Breslau, N., Peterson, E. L., Poisson, L. M., Schultz, L. R., & Lucia, V. C. (2004a). Estimating post-traumatic stress disorder in the community: Lifetime perspective and the impact of typical traumatic events. Psychological Medicine, 34(5), 889–898.
12 Causes of Posttraumatic Stress Disorder
317
Breslau, N., Wilcox, H. C., Storr, C. L., Lucia, V. C., & Anthony, J. C. (2004b). Trauma exposure and posttraumatic stress disorder: A study of youths in urban America. Journal of Urban Health, 81(4), 530–544. Brewin, C. R., Andrews, B., & Valentine, J. D. (2000). Meta-analysis of risk factors for posttraumatic stress disorder in trauma-exposed adults. Journal of Consulting and Clinical Psychology, 68(5), 748–766. Casella, L., & Motta, R. W. (1990). Comparison of characteristics of Vietnam veterans with and without posttraumatic stress disorder. Psychological Reports, 67(2), 595–605. Dohrenwend, B. P., Turner, J. B., Turse, N. A., Adams, B. G., Koenen, K. C., & Marshall, R. (2006). The psychological risks of Vietnam for U.S. veterans: A revisit with new data and methods. Science, 313(5789), 979–982. Ehlers, A., & Clark, D. M. (2000). A cognitive model of posttraumatic stress disorder. Behaviour Research and Therapy, 38(4), 319–345. Engdahl, B., Dikel, T. N., Eberly, R., & Blank, A., Jr. (1997). Posttraumatic stress disorder in a community group of former prisoners of war: A normative response to severe trauma. American Journal of Psychiatry, 154(11), 1576–1581. Engelhard, I. M., van den Hout, M. A., & Kindt, M. (2003). The relationship between neuroticism, pre-traumatic stress and post-traumatic stress: A prospective study. Personality and Individual Differences, 35, 381–388. Fergusson, D. M., & Lynskey, M. T. (1996). Adolescent resiliency to family adversity. Journal of Child Psychology and Psychiatry, 37(3), 281–292. Fullerton, J., Cubin, M., Tiwari, H., Wang, C., Bomhra, A., Davidson, S., et al. (2003). Linkage analysis of extremely discordant and concordant sibling pairs identifies quantitative-trait loci that influence variation in the human personality trait neuroticism. Amercian Journal of Human Genetics, 72(4), 879–890. Galea, S., Ahern, J., Resnick, H., Kilpatrick, D., Bucuvalas, M., Gold, J., et al. (2002). Psychological sequelae of the September 11 terrorist attacks in New York City. New England Journal of Medicine, 346(13), 982–987. Gilbertson, M. W., Paulus, L. A., Williston, S. K., Gurvits, T. V., Lasko, N. B., Pitman, R. K., et al. (2006). Neurocognitive function in monozygotic twins discordant for combat exposure: Relationship to posttraumatic stress disorder. Journal of Abnormal Psychology, 115(3), 484–495. Goldstein, G., van Kammen, W., Shelly, C., Miller, D. J., & van Kammen, D. P. (1987). Survivors of imprisonment in the Pacific theater during World War II. American Journal of Psychiatry, 144(9), 1210–1213. Goodwin, D. W., & Guze, S. B. (1984). Psychiatric diagnosis (3rd ed.). New York: Oxford University Press. Green, B. L., Lindy, J. D., & Grace, M. C. (1985). Posttraumatic stress disorder. Toward DSM-IV. Journal of Nervous and Mental Disease, 173(7), 406–411. Helzer, J. E. (1981). Methodological issues in the interpretations of the consequences of extreme situations. In B. S. Dohrenwend & B. P. Dohrenwend (Eds.), Stressful life events and their contexts: Monographs in psychosocial epidemiology (Vol. 2, pp. 108–129). New York: Prodist. Hyer, L., Braswell, L., Albrecht, B., Boyd, S., Boudewyns, P., & Talbert, S. (1994). Relationship of NEO-PI to personality styles and severity of trauma in chronic PTSD victims. Journal of Clinical Psychology, 50(5), 699–707. Jones, E., & Wessely, S. (2005). War syndromes: The impact of culture on medically unexplained symptoms. Medical History, 49(1), 55–78.
318
Causality and Psychopathology
Kendler, K. S., & Prescott, C. A. (2006). Genes, environment, and psychopathology: Understanding the causes of psychiatric and substance use disorders. New York: Guilford Press. Kendler, K. S., Prescott, C. A., Myers, J., & Neale, M. C. (2003). The structure of genetic and environmental risk factors for common psychiatric and substance use disorders in men and women. Archives of General Psychiatry, 60(9), 929–937. Kessler, R. C., Chiu, W. T., Demler, O., Merikangas, K. R., & Walters, E. E. (2005). Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Archives of General Psychiatry, 62(6), 617–627. Kessler, R. C., Sonnega, A., Bromet, E., Hughes, M., & Nelson, C. B. (1995). Posttraumatic stress disorder in the National Comorbidity Survey. Archives of General Psychiatry, 52(12), 1048–1060. Khan, A. A., Jacobson, K. C., Gardner, C. O., Prescott, C. A., & Kendler, K. S. (2005). Personality and comorbidity of common psychiatric disorders. British Journal of Psychiatry, 186, 190–196. Koenen, K. C., Moffitt, T. E., Poulton, R., Martin, J., & Caspi, A. (2007). Early childhood factors associated with the development of post-traumatic stress disorder: Results from a longitudinal birth cohort. Psychological Medicine, 37(2), 181–192. Kremen, W. S., Koenen, K. C., Boake, C., Purcell, S., Eisen, S. A., Franz, C. E., et al. (2007). Pretrauma cognitive ability and risk for posttraumatic stress disorder: A twin study. Archives of General Psychiatry, 64(3), 361–368. Krueger, R. F. (1999). The structure of common mental disorders. Archives of General Psychiatry, 56(10), 921–926. Krueger, R. F., & Markon, K. E. (2001). The higher-order structure of common DSM mental disorders: Internalization, externalization, and their connections to personality. Genetic and environmental relationships between normal and abnormal personality. Personality and Individual Differences, 30, 1245–1259. Krueger, R. F., & Markon, K. E. (2006). Reinterpreting comorbidity: A model based approach to understanding and classifying psychopathology. Annual Review of Clinical Psychology, 2, 111–133. Kulka, R. A., Schlenger, W. E., Fairbank, J. A., Hough, R. L., Jordan, B. K., Marmar, C. R., et al. (1990). Trauma and the Vietnam War generation: Report of findings from the National Vietnam Veterans Readjustment Study. New York: Brunner/Mazel. Lesch, K. P., Bengel, D., Heils, A., Sabol, S. Z., Greenberg, B. D., Petri, S., et al. (1996). Association of anxiety-related traits with a polymorphism in the serotonin transporter gene regulatory region. Science, 274(5292), 1527–1531. Lindy, J. D., Green, B. L., & Grace, M. C. (1987). The stressor criterion and posttraumatic stress disorder. Journal of Nervous and Mental Disease, 175(5), 269–272. Macklin, M. L., Metzger, L. J., Litz, B. T., McNally, R. J., Lasko, N. B., Orr, S. P., et al. (1998). Lower precombat intelligence is a risk factor for posttraumatic stress disorder. Journal of Consulting and Clinical Psychology, 66(2), 323–326. McFarlane, A. C. (1988). The longitudinal course of posttraumatic morbidity. The range of outcomes and their predictors. Journal of Nervous and Mental Disease, 176(1), 30–39. McFarlane, A. C. (1989). The treatment of post-traumatic stress disorder. British Journal of Medical Psychology, 62(Pt. 1), 81–90. McNally, R. J. (2003). Progress and controversy in the study of posttraumatic stress disorder. Annual Review of Psychology, 54, 229–252.
12 Causes of Posttraumatic Stress Disorder
319
McNally, R. J., & Shin, L. M. (1995). Association of intelligence with severity of posttraumatic stress disorder symptoms in Vietnam combat veterans. American Journal of Psychiatry, 152(6), 936–938. Norris, F. H. (1992). Epidemiology of trauma: Frequency and impact of different potentially traumatic events on different demographic groups. Journal of Consulting and Clinical Psychology, 60(3), 409–418. North, C. S., & Pfefferbaum, B. (2002). Research on the mental health effects of terrorism. Journal of the American Medical Association, 288(5), 633–636. Ozer, E. J., Best, S. R., Lipsey, T. L., & Weiss, D. S. (2003). Predictors of posttraumatic stress disorder and symptoms in adults: A meta-analysis. Psychological Bulletin, 129(1), 52–73. Parslow, R. A., Jorm, A. F., & Christensen, H. (2006). Associations of pre-trauma attributes and trauma exposure with screening positive for PTSD: Analysis of a community-based study of 2,085 young adults. Psychological Medicine, 36(3), 387–395. Pitman, R. K., Orr, S. P., Lowenhagen, M. J., Macklin, M. L., & Altman, B. (1991). PreVietnam contents of posttraumatic stress disorder veterans’ service medical and personnel records. Comprehensive Psychiatry, 32(5), 416–422. Post, R. M., & Weiss, S. R. (1998). Sensitization and kindling phenomena in mood, anxiety, and obsessive–compulsive disorders: The role of serotonergic mechanisms in illness progression. Biological Psychiatry, 44(3), 193–206. Reed, P. L., Anthony, J. C., & Breslau, N. (2007). Incidence of drug problems in young adults exposed to trauma and posttraumatic stress disorder: Do early life experiences and predispositions matter? Archives of General Psychiatry, 64(12), 1435–1442. Resnick, H. S., Kilpatrick, D. G., Dansky, B. S., Saunders, B. E., & Best, C. L. (1993). Prevalence of civilian trauma and posttraumatic stress disorder in a representative national sample of women. Journal of Consulting and Clinical Psychology, 61(6), 984–991. Sapolsky, R. M., Uno, H., Rebert, C. S., & Finch, C. E. (1990). Hippocampal damage associated with prolonged glucocorticoid exposure in primates. Journal of Neuroscience, 10(9), 2897–2902. Sen, S., Burmeister, M., & Ghosh, D. (2004). Meta-analysis of the association between a serotonin transporter promoter polymorphism (5-HTTLPR) and anxiety-related personality traits. American Journal of Medical Genetics B Neuropsychiatric Genetics, 127(1), 85–89. Shephard, B. (2001). A war of nerves: Soldiers and psychiatrists in the twentieth century. Cambridge, MA: Harvard University Press. Silva, R. R., Alpert, M., Munoz, D. M., Singh, S., Matzner, F., & Dummit, S. (2000). Stress and vulnerability to posttraumatic stress disorder in children and adolescents. American Journal of Psychiatry, 157(8), 1229–1235. Simms, L. J., Watson, D., & Doebbeling, B. N. (2002). Confirmatory factor analyses of posttraumatic stress symptoms in deployed and nondeployed veterans of the Gulf War. Journal of Abnormal Psychology, 111(4), 637–647. Slade, T., & Watson, D. (2006). The structure of common DSM-IV and ICD-10 mental disorders in the Australian general population. Psychological Medicine, 36(11), 1593–1600. Solomon, Z., Mikulincer, M., & Jakob, B. R. (1987). Exposure to recurrent combat stress: Combat stress reactions among Israeli soldiers in the Lebanon war. Psychological Medicine, 17(2), 433–440.
320
Causality and Psychopathology
Stein, M. B., Walker, J. R., Hazen, A. L., & Forde, D. R. (1997). Full and partial posttraumatic stress disorder: Findings from a community survey. American Journal of Psychiatry, 154(8), 1114–1119. Storr, C. L., Ialongo, N. S., Anthony, J. C., & Breslau, N. (2007). Childhood antecedents of exposure to traumatic events and posttraumatic stress disorder. American Journal of Psychiatry, 164(1), 119–125. Talbert, F. S., Braswell, L. C., Albrecht, J. W., Hyer, L. A., & Boudewyns, P. A. (1993). NEO-PI profiles in PTSD as a function of trauma level. Journal of Clinical Psychology, 49(5), 663–669. Terr, L. C., Bloch, D. A., Michel, B. A., Shi, H., Reinhardt, J. A., & Metayer, S. (1999). Children’s symptoms in the wake of Challenger: A field study of distant-traumatic effects and an outline of related conditions. American Journal of Psychiatry, 156(10), 1536–1544. Watson, D. (2005). Rethinking the mood and anxiety disorders: A quantitative hierarchical model for DSM-V. Journal of Abnormal Psychology, 114(4), 522–536. Yehuda, R., & McFarlane, A. C. (1995). Conflict between current knowledge about posttraumatic stress disorder and its original conceptual basis. American Journal of Psychiatry, 152(12), 1705–1713. Yehuda, R., McFarlane, A. C., & Shalev, A. Y. (1998). Predicting the development of posttraumatic stress disorder from the acute response to a traumatic event. Biological Psychiatry, 44(12), 1305–1313. Yehuda, R., Resnick, H. S., Schmeidler, J., Yang, R. K., & Pitman, R. K. (1998). Predictors of cortisol and 3-methoxy-4-hydroxyphenylglycol responses in the acute aftermath of rape. Biological Psychiatry, 43(11), 855–859. Young, A. (1995). The harmony of illusions: Inventing post-traumatic stress disorder. Princeton, NJ: Princeton University Press. Young, A. (2001). Our traumatic neurosis and its brain. Science in Context, 14(4), 661–683. Zammit, S., Allebeck, P., David, A. S., Dalman, C., Hemmingsson, T., Lundberg, I., et al. (2004). A longitudinal study of premorbid IQ score and risk of developing schizophrenia, bipolar disorder, severe depression, and other nonaffective psychoses. Archives of General Psychiatry, 61(4), 354–360. Breslau N, Davis GC, Schultz L. (2003). Posttraumatic Stress Disorder and the incidence of nicotine, alcohol and drug disorders in persons who have experienced trauma. Archives of General Psychiatry, 60, 289–294.
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria A Programmatic Approach in Therapeutic Context donald f. klein
Introduction: Ascribing Disease and Illness Terms such as disorder, illness, disease, dysfunction, and deviance embody the preconceptions of historical development (Klein, 1999). That individuals become ill for no apparent reason, suffering from pain, dizziness, malaise, rash, wasting, etc., has been known since prehistoric days. The recognition of illness led to the social definition of the patient and the development of various treatment institutions (e.g., nursing, medicine, surgery, quacks, and faith healers). Illness is an involuntary affliction that justifies the sick, dependent role (Parsons, 1951). That is, because the sick have involuntarily impaired functioning, it is a reasonable social investment to exempt them (at least temporarily) from normal responsibilities. Illness implies that something has gone wrong. However, gaining exemption from civil or criminal responsibilities is often desired. Therefore, if no objective criteria are available, an illness claim can be viewed skeptically. By affirming involuntary affliction, diagnosis immunizes the patient against charges of exploitative parasitism. Therefore, illness may be considered a hybrid concept, with two components: (1) the necessary inference that something has actually, involuntarily, gone wrong (disease) and (2) the qualification that the result (illness) must be sufficiently major, according to current social values, to ratify the sickness exemption role. The latter component is related to the particular historical stage, cultural traditions, and values. This concept has been exemplified by the phrase ‘‘harmful dysfunction’’ (Wakefield, 1992). However, this does not mean that the illness concept is arbitrary since the inference that something has gone wrong is necessary. Beliefs as to just what has gone wrong (e.g., demon possession, bad air, bacterial infection) as well 321
322
Causality and Psychopathology
as the degree of manifested dysfunction that warrants the sick role reflect the somewhat independent levels of scientific and social development (for further reference, see Lewis, 1967). How can we affirm that something has gone wrong if there is no objective evidence? The common statistical definition of abnormality simply is ‘‘unusual.’’ Something is abnormal if it is rare. Although biological variability ensures that someone is at an extreme, there is a strong presumption that something has gone wrong if sufficiently extreme. For instance, hemoglobin of 5g/100ml exceeds normal biological variation, indicating that something has gone wrong. Therefore, infrequency (e.g., dextrocardia) usefully indicates that something is probably wrong but is not sufficient (e.g., left-handedness) or necessary (e.g., dental caries). A mysterious shift from well-being to pain and manifest dysfunction strongly indicates that something has gone wrong. That such distressing states may remit affirms that somehow repair had come. What has gone wrong is a deviation from an implicit standard, formulated by the evolutionary theory of adaptive functions and dysfunctions (Millikan, 1993; Klein, 1993, 1999). Medical diagnosis was placed on a firmer footing by Sydenham in the seventeenth century by the concept of syndromes, forms of ill health comparable to the types of animal and vegetable species in terms of symptoms and course, for example, gout and rheumatoid arthritis. A symptom complex was more than a concatenation of symptoms and signs. It implied some common latent cause distinct from those supporting ordinary health, even if such causes were unknown. Kraepelin made use of syndromes in distinguishing dementia praecox from manic–depressive illness by initially arguing that the different symptom complexes provided firm prognostic differences. ‘‘Points of rarity’’ are not necessary to differentiate syndromes and even if a latent cause is entirely categorical, its manifestations may not evidence bimodality (Murphy, 1964).
The Search for an Ideal Diagnostic Entity Since around the middle of the nineteenth century, disease has become defined by objectively demonstrated etiology and pathology, thanks to crucial discoveries made by scientists such as Pasteur and Virchow. Causal analysis allowed diagnostic progression past simple syndromal definition by objectively elucidating necessary etiologies. Evident examples can be found among infectious diseases and avitaminoses. This is not the case in psychiatry. Attempting to find objective differences (biomarkers) between normal subjects and subjects with various psychiatric syndromes has been the overriding focus of biological psychiatry. Beset by
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria
323
lack of pathophysiological knowledge, measures of biochemical or physiological variables are almost always the result of irrelevant artifactual differences produced by prior or current treatment or secondary complications (e.g., avitaminosis, alcohol abuse, head trauma). Worse, syndromal nonspecificity regularly occurs, even for familial genetic markers (e.g., catechol O-methyltransferase–inefficient alleles), which leads some to argue for syndromal amalgamation; this falsely implies that these syndromes are offshoots of a common causal pathophysiology. Thus, it is clear that psychiatric nosology lags behind general medical nosology. Diagnostic and Statistical Manual of Mental Disorders, third edition (DSM-III), disorders are syndromes defined by the consensus of clinical scientists. It can be argued that these polythetic categories are never simply related to etiology, pathophysiology, or laboratory tests.
Syndromal Heterogeneity Psychiatric disorders are largely familial. Twin, adoption, and related study designs indicate that syndromal familiality is largely genetic (or due to geneby-environment interactions). Therefore, the tremendous advances in molecular genetics and genomics raised hopes for objective diagnostic genetic tests, revealed by such methods as linkage and association studies. Unfortunately, increasing disappointment set in once molecular genetic research focused on highly familial but non-Mendelian syndromes. Despite remarkably significant statistical associations found in individual studies, there have been repeated failures of replication (Riley & Kendler, 2006). Further, as seen in the example of Huntington disease, gene identification does not necessarily lead to advances in treatment and/or more complete knowledge of pathophysiology. The lack of replicability of many genomic studies as well as the lowmagnitude effect estimates reported highlight the central problem of syndromal heterogeneity (Bodmer, 1981). Smoller and Tsuang (1998, p. 1152) clearly state the problem and suggest a straightforward solution: With recent advances in molecular genetics, the rate-limiting step in identifying susceptibility genes for psychiatric disorders has become phenotype definition. The success of psychiatric genetics may require the development of a ‘‘genetic nosology’’ that can classify individuals in terms of the heritable aspects of psychopathology. However, this may not be feasible. As Crow (2007, p. 13) states, ‘‘Recent metaanalyses have not identified consistent sites of linkage. The three largest studies of schizophrenia fail to agree on a single locus . . . there is no replicable support for any of the current candidate genes.’’ Although Crow’s remarks
324
Causality and Psychopathology
address psychoses, the general nosological applicability is clear. Repeated failures of replication highlight the difficulty of finding a consistent genetic etiology for syndromes. It is doubtful if the fundamentally genocentric nosology proposed by Smoller and Tsuang (1998) can be achieved. Nonetheless, their argument that syndromal heterogeneity invalidates genetic linkage and association studies is entirely reasonable. However, this sound observation has not dissuaded syndrome linkage genomic efforts.
Replacing Syndromes With Endophenotypes One recent popular strategy has been to move from syndromes to endophenotypes for gene linkage partners. Endophenotypes are held to be superior to biomarkers by Gottesman and Gould (2003, p. 636): Endophenotypes, measurable components unseen by the unaided eye along the pathway between disease and distal genotype . . . represent simpler clues to genetic underpinnings than the disease syndrome itself, promoting the view that psychiatric diagnoses can be decomposed or deconstructed. . . . In addition to furthering genetic analysis, endophenotypes can clarify classification and diagnosis. However, Flint and Munafo (2006) criticize this optimistic assumption in their detailed meta-analytic review of human and animal data. Endophenotypes appear to be only on a par with genetic biomarkers, which blunts optimism. These conclusions as well as prior assumptions of enhanced endophenotype utility for genetic analysis are limited by the paucity of studies specifically addressing differential endophenotypic utility. Unfortunately, Flint and Munafo’s critique has not, as yet, been widely discussed in the literature.
Conceptual Problems With Endophenotypes The use of endophenotypes to circumvent the difficulty of linking genes to syndromes amounts to dismissing syndromes as meaningful. This presents conceptual difficulties. For example, Braff, Freedman, Schork, and Gottesman (2007) describe explicitly deconstructing syndromes into multiple independent genetic abnormalities that engender multiple independent functional abnormalities. However, if psychiatric syndromes are caused by the agglomeration of multiple small independent effects of many different genes, which may or may not be phenotypically evident, as well as multiple interactions with the fluctuating internal and external environment (Rutter, 2007), it is
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria
325
surprising that there are any consistently evident symptom complexes at all. However, certain syndromes (e.g., mania, melancholia, depressive disorder, obsessive–compulsive disorder, and panic disorder) have been stereotypically described for centuries, albeit under different labels, in many places and languages. Even more cogent, syndromal decomposition into a heap of independent dysfunctions, each underlying a particular syndromal facet, is inconsistent with total syndromal remission and recurrence. The observation of surprising remissions, periods of apparent health, and recurrences was facilitated by the long-term mental hospitals, where remission and discharge was a notable event. Further, since there was often only one available hospital, relapses could be noted. Falret in 1854 (as reprinted in Pichot, 2006, p. 145) described ‘‘Circular insanity [Folie circulaire] . . . characterized by the successive and regular reproduction of the manic state, the melancholic state, and a more or less prolonged lucid interval.’’ This description anticipated Kraepelin’s more inclusive concept of manic–depressive disorder. Both syndromal descriptions emphasized periods of remission as an essential diagnostic element. Remissions and relapses occur in many psychiatric and general medical illnesses (gout, intermittent porphyria, etc.). Since it is highly improbable that multiple independent causes should concertedly cease, the inference that complex syndromes have multiple independent causes is implausible. Those diverse syndromes are recognizable, familial, and extraordinarily different from ordinary health and behavior. This is consonant with Sydenham’s hypothesis that each syndrome has a common underlying proximal cause, even if there is no common distal genetic defect. However, the frequent reliable recognition, since Sydenham, of quite distinct syndromes implies that multiple small genetic contributions become manifest by taking different routes to impairing, perhaps in several ways, a distinct evolved function— which may generate a distinct syndrome (Klein & Stewart, 2004). The argument is not that all complex psychiatric presentations evidence periods of total remission; rather, it is logically incorrect to assume that a symptom complex must be due to a group of independent endophenotypes. ‘‘Comorbidity’’ suggests sequential and/or interactive causal processes. However, the argument that one aspect of a complex syndrome is likely to be the direct manifestation of an endophenotype is not logically supported and unlikely to pay off. For instance, the sudden onset of an apparently spontaneous panic attack causes an immediate flight to help. With the repetition of such attacks, chronic anticipatory anxiety often develops. Panic attacks are often followed by avoidant and dependent measures, misleadingly referred to as ‘‘agoraphobia.’’ Six weeks of imipramine treatment prevents spontaneous panic attacks. However, chronic anticipatory anxiety and phobic avoidances remit more
326
Causality and Psychopathology
slowly after successful exposure experiences. This sequence of panic attacks, chronic anticipatory anxiety and phobic avoidance, as well as the relationship to imipramine treatment and real-life exposure treatment have been analyzed using structural equation modeling with data from two experimental trials (Klein, Ross, & Cohen, 1987). The variation in severity of anticipatory anxiety and phobic avoidance following panic disorder is clear. The relative independence of panic attack remissions and anxiety/phobic avoidance remissions indicates multiple processes. Perhaps independent functions regulating chronic anxiety and/or behavioral avoidance are rendered dysfunctional by recurrent panics. Therefore, genetic linkage studies of familial panic disorders, whose probands are largely agoraphobic, may be linking numerous distal contributors to several proximal dysfunctions. This may account for the largely inconsistent and negative results of such studies (Fyer et al., 2006).
Brain Imaging and Pathophysiology It is also hoped that the remarkable development of structural and functional brain imaging may bring about nosological improvement by objectively demonstrating specific brain dysfunctions. However, these technical triumphs present substantial practical and inferential problems. First, the statistical problems generated by relatively few individual subjects, each providing an enormous number of nonindependent data points, are both formidable and controversial. Thirion et al. (2007) suggest that 20– 27 subjects is the minimum sample size necessary for sensitive, reproducible analyses of functional magnetic resonance imaging (fMRI) data. Studies of this magnitude are quite rare because of daunting practical problems of recruitment, screening, and expense. Therefore, the necessary positive replications are exceptional. Second, the detection of a signal of increased perfusion is usually interpreted as increased brain activity; however, a distinction among neuronal firing, synaptic but nonspiking activity, and dendrite pile or glial activity remains unclear. The time window of functional imaging is much longer than the time interval of neuronal processes, while the spatial resolution is too coarse to delineate neuronal centers. Using complex, large anatomical structures to identify distinct brain processes implies a doubtful uniformity of function at micro levels. Third, whether the functional implications of increased perfusion are of brain excitation or inhibition (or other processes) has not yet been determined. An increase in brain activity might amount to stepping on either a brake or an accelerator (or another unknown process). There is no necessary parallel between increases in brain function and increases in psychological or
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria
327
physiological functioning. Such relationships are hypotheses to be tested. However, the correlational, naturalistic setting impairs causal assertions. Finally, there are overriding inferential issues besides these technical and developmental problems. The outpouring of studies comparing syndromes with each other and with normal subjects has the same design structure as comparative biomarker studies. Therefore, the same problems are present as in biomarker studies (e.g., syndromal heterogeneity, artifact contamination, causal ambiguity, and lack of specificity). One justification for the nosological relevance of brain imaging argues that techniques such as fMRI combined with activation paradigms identify ‘‘brain circuits recruited during specific processes . . . in healthy people. . . . Understanding these processes in healthy people is a prerequisite for advancing research in psychiatric disorders where these capacities are affected’’ (Kupfer, First, & Regier, 2002, p. 46). This sequence, often referred to as ‘‘translational research,’’ assumes that basic research delineating normal physiology is a ‘‘prerequisite’’ for useful clinical research and improved care.
Therapeutics as a Guide to Pathophysiology History provides many examples of how practice precedes and enables theory. Artificial selection by culling unwanted hereditary traits and inbreeding desired traits was essential to Darwin’s formulation of evolution by natural selection. Remarkably, studies of pathology led to the discovery of unsuspected normal functions. Clinical studies of scurvy, beriberi, and pellagra led to treatment with nutritional supplements, discovery of specific vitamins, and discovery of enzymatic cofactors. The serendipitous observation of cowpox-induced immunity to smallpox led to vaccination, while the study of beverage contamination led to pasteurization. Germ theory, bacteriology, immunology, and other evolved mechanisms of resistance to infection followed. This list could be extended indefinitely. Empirical therapies often illuminate dysfunctions, thus bringing unknown normal functions into sight.
Clinical Psychopharmacology and Pathophysiology The psychopharmacological revolution was a clear case of unpredictable therapeutic advance well before relevant pathophysiological knowledge. All major psychotropic drug discoveries occurred when surprising clinical benefits were serendipitously observed during treatment for other purposes. The basic finding in the 1960s that psychotropic agents blocked both neuronal receptors and synaptic neurotransmitter reuptake led to a continuing explosion of discovery regarding neurotransmitters, synapses, receptors,
328
Causality and Psychopathology
and neural transmission. It was apparent, however, that inhibiting reuptake or receptor blockade may not be therapeutically important since some effective drugs did not show this prerequisite (McGrath & Klein, 1983). In any case, since reuptake inhibition and receptor blockage are almost immediate effects, they could only be the first dominoes, while remissions took weeks to appear. Further, benefit ranged from negligible to remarkable, once again indicating syndromal heterogeneity. Chlorpromazine was a safe, presurgical antihistamine sedative whose antipsychotic properties were completely unsuspected. The antipsychotic action of chlorpromazine contradicted the conventional psychogenic wisdom of psychiatry and was greeted with profound skepticism, if not open derision. The clinical trial by random assignment to concurrent placebo and putative treatment groups was slowly adopted only after the late 1940s as a scientifically necessary part of general pharmacology. Its rapid acceptance into psychopharmacological trials in the 1950s, amplified by double-blind precautions, was in part to deal with frequent claims that the evidence for pharmacotherapeutic benefit was invalid or due to ‘‘chemical straight jackets.’’ The initial focus was to test the specificity and the somewhat ambiguously defined efficacy of the medication, that is, its activity. Statistical superiority of the randomized, double-blind evaluated medication under test to the parallel placebo group, as measured by average outcome scale scores, established specific drug activity. The 1962 Kefauver-Harris Food and Drug Administration (FDA) amendment required the demonstration of acute drug efficacy prior to marketing, without any stipulation concerning effect size, translation into clinical benefit, determination of who would benefit, or systematic attention to long-term maintenance of benefit or late-onset toxicities. The pursuit of short-term statistical superiority to placebo became industry’s search for the Holy Grail of marketability. This effort incurred several persisting ambiguities. If 60% of those treated with medication had substantial scale-measured improvements, while only 30% of those on placebo did so (assuming statistical significance), then the drug was not causal for about half of those who seemed to have a direct drug benefit. Identifying the individuals for whom specific beneficial drug action occurred remained obscure. Therefore, attempts to determine how a drug brought about specific benefits by studying those who got better while receiving the drug were handicapped by actually studying a causally heterogeneous mixture. Within this phenotypically homogeneous syndrome, suppose that the medication benefit occurred in only about 30%. Was response variation due to some irrelevant differences in drug pharmacokinetics, or might those 30% have a different cause for their proximal pathophysiology? Since even an apparently common syndrome can be induced by an array of possible
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria
329
saboteurs, response to a particular medication may well depend on particular causal paths. An example is diabetes. Recognized in antique times as a syndrome of polydipsia and polyuria, the sweetness of the urine distinguished diabetes mellitus from diabetes insipidus. More recently, these subsyndromes could be objectively distinguished by urinary sugar tests. Urinary sugar was then found to be due to abnormally high blood sugar. In psychiatry, such simple objective diagnostic facts are lacking. The attempt to understand the causes of illness by studying ultimate distal genetic causes has faltered. While genetic approaches to causal identification in psychiatry have faltered, the observation that major psychotropic agents can specifically induce remissions suggests an experimental approach to proximal pathophysiological processes. However, this inference requires evidence that medications directly ameliorate proximal pathophysiology, rather than yield benefit by nonspecific compensation. For example, gastric acidity is reduced by calcium carbonate to approximately normal in patients with hyperacidity, thus decreasing symptoms. However, it also decreases gastric acidity below normal in subjects without hyperacidity. This action is compensatory rather than ameliorative of the hyperacidity’s cause. Conversely, aspirin substantially decreases fever but does not lower normal body temperature. Consequently, it is considered an antipyretic rather than a poikilothermic. Its benefit comes from its direct effect on a particular febrile defense system. Therefore, normal homeothermy is not affected. Therefore, to distinguish compensatory drug benefits from those that affect underlying pathophysiology, the study of drug effects on normal subjects is telling. For instance, it has been known since the 1940s that amphetamines are stimulants, increasing feelings of vigor, elation, arousal, and positive mood in normals. One might reasonably expect that amphetamines should benefit major clinical syndromal depression. However, clinical experience and placebo-controlled trials (Satel & Nelson, 1989) indicated no such benefit. It was noted that depressed mood, as occurs in the medically ill, often improved with stimulants. This suggests that such depressed feelings lack continuity with syndromal depression. When imipramine was reported to benefit severe syndromal depression, it was assumed by those without direct experience that it was a superstimulant. It was astonishing that there was no resemblance to stimulants. Severely depressed patients slept and ate better, and sudden remissions (often described by patients with such phrases as ‘‘the veil has lifted’’) occurred only after weeks of treatment. This observation led to unsystematic attempts to evaluate whether imipramine elevated the mood of normal subjects, but these pilot efforts did not
330
Causality and Psychopathology
show mood elevation. A properly controlled trial of chronic administration of clinically appropriate antidepressant doses to normal subjects was not feasible given their unpleasant anticholinergic side effects. It was stated that a battalion of Danish soldiers went through such a study without showing mood effects, but these findings were not published. Rapoport et al. (1980) studied enuresis. A placebo-controlled trial of lower doses of tricyclics in enuretic, nondepressed children demonstrated enuresis benefit but no effect on mood. In a 4-week placebo-controlled trial of 20mg of the selective serotonin reuptake inhibitor (SSRI) paroxetine, normal subjects showed no evidence of mood elevation (Knutson et al., 1999). Similarly, Adly, Straumanis, and Chesson (1992), in a placebo-controlled trial of fluoxetine, demonstrated benefit for migraine but no antidepressant effect. While other studies of antidepressants have found utility in pain management and cigarette cessation, among other benefits, induced elevated mood has not been reported in any subject lacking a prior affective disorder. These findings suggest that amphetamine’s effect on depressive feelings is compensatory rather than due to interaction with an underlying affective pathophysiology. Conversely, the mood effects of antidepressants required the presence of pathophysiology, thus obviating mood effects in normals. Given the complex pharmacodynamics, other effects in normal subjects, such as the fast onset of decreased irritability, are not surprising but are not associated with mood elevation. Other studies of major psychotropic drugs, given for periods approximating those of clinical trials, have also failed to show that the remarkable cognitive and affective effects manifest in patients have parallels in normal subjects. These include studies of antipsychotic drugs (Dimascio, Havens, & Klerman, 1963a, 1963b; de Visser, van der Post, Pieters, Cohen, & van Gerven, 2001), lithium (Judd, Hubbard, Janowsky, Huey, & Attewell, 1979), and a wide body of literature on SSRIs (Pace-Schott et al., 2001; Loubinoux et al., 2005; Dumont, de Visser, Cohen, & van Gerven, 2005). Monoamine oxidase inhibitors have been used to treat angina pectoris, but reports of these clinical treatments similarly do not note any mood shifts (although they have been noted in the treatment of chronic tuberculosis patients). The syndromal depression status of the tuberculosis patients, however, is unknown. It is particularly telling that Rosenzweig et al. (2002, p. 10) found that studies using standard psychometric tests ‘‘carried out in schizophrenic patients have failed to demonstrate any consistent effects of typical or atypical neuroleptics on psychomotor or cognitive functions.’’ Despite this finding, these agents have major clinical benefits for patients regarding both psychomotor and cognitive symptomatology. This implies that their specific therapeutic effect is due to normalizing one or more pathophysiological processes
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria
331
that are not present in normal individuals, rather than by an extension of effects discernible in normal individuals. All of these observations, although not entirely definitive, consistently point to a lack of parallelism between the clinical benefits of major psychotropic drugs in patients and their effects on normal subjects. This is consonant with the view that these agents’ specific benefit is due to normalizing pathophysiology rather than some nonspecific compensatory action, as may be the case with sedatives and stimulants. A telling observation is that there is no illicit street-market demand for the major psychotropic agents, but there is a substantial demand for stimulants and sedatives.
The Import of Remission That the benefits of major psychopharmacological agents are tightly tied to their normalizing effects on underlying pathophysiology is also supported by their remarkably specific effects in certain syndromes, for example, retarded unipolar depressions, manic states, angry hyperactive paranoid states, panic disorder, and psychotic states approximating bipolar disorder. Under drug treatment, these syndromes frequently show a complete restitution to normal premorbid status. Strikingly, the natural history of each of these syndromes is also marked by episodes of apparently complete spontaneous remission, a parallel that can hardly be accidental. Specific drug-induced remission in these syndromes may be due to a normalizing interaction with episodically dysfunctional cybernetic feedback controls (e.g., reversible decreased negative feedback or pathologically induced positive feedback) that engender remitting syndromes (Klein, 1964a, 1988; Klein, Gittelman, Quitkin, & Rifkin, 1980). Patients who benefit from specific pharmacotherapy to remit may share a common pathophysiology that differs from those refractory to such treatment as well as those who respond during placebo treatment. Reducing syndrome heterogeneity by relating baseline and historical characteristics to specific treatment outcomes may yield useful practical clues to treatment choice and the delineation of clinically meaningful subsyndromes, as well as heuristic clues to the causation of pathophysiology. This is elaborated in the following section.
The Vicissitudes of Pharmacological Dissection There were early statistical attempts using multiple regression analysis to contrast the effects of different drugs from each other and placebo so as to identify subgroups with distinctive response patterns. These regularly failed
332
Causality and Psychopathology
on replication. The pharmacological dissection approach has been most successful when applied to hypothesized subsyndromes, when a distinctive therapeutic response had been clinically (serendipitously) noted. Validation was followed by controlled treatment trials—for example, panic disorder (distinguished from other anxiety disorders [Klein, 1964b]), agoraphobia (distinguished from other phobias [Zitrin, Klein, & Woerner, 1978]), atypical depression (distinguished from other major depressions [Quitkin et al., 1989]), and schizophrenia with childhood asociality (distinguished from schizophrenic patients with superior antipsychotic benefit [Klein, 1967]). Notably, these subsyndromal distinctions were powerfully reinforced by differences in premorbid and early course. The frequent history of severe childhood separation anxiety disorder in hospitalized adult patients with panic disorder and agoraphobia prompted successful controlled clinical trials of imipramine in such children. With the development of the SSRIs, this therapeutic approach received widespread clinical acceptance, although the controlled evidence is not uniformly positive. However, studies reducing syndromal heterogeneity by pharmacological dissection failed to thrive as the two major funding sources turned away. The National Institute of Mental Health (NIMH) abandoned support for placebo-controlled studies of marketed drugs, arguing that this was the proper province of industry. The NIMH and academia were to focus on basic processes. Unfortunately, industry did not pursue such diagnostically informative studies since a statistically significant benefit indicated by an average scale outcome difference between drug and placebo was the efficacy requirement needed for the primary goal of FDA marketing approval. The broadest possible diagnostic indication allows the broadest marketing. Since profitability would drop by narrowing a broad syndrome to a subsyndrome, this is not a priority for industry. The unintended effect of the NIMH completely allocating placebocontrolled trials of marketed agents to industry was to prevent the development of clinical psychopharmacological approaches to improving nosology and elucidating pathophysiology.
Splitting and Lumping A distinction should be made between pharmacological dissection and pharmacological amalgamation. Dissection is not a high-level inference. If tricyclic antidepressants do not specifically benefit early-onset chronic atypical depression but instead specifically benefit late-onset periodic atypical depression, then there are likely to be different pathophysiologies underlying these symptomatically identical disorders (Stewart, McGrath, Quitkin, & Klein, 2007).
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria
333
Because both enuresis and melancholia responded to imipramine, some thought that enuresis was covert depression, an example of amalgamation. Rapoport et al. (1980) and Jorgensen, Lober, Christiansen, and Gram (1980) effectively dispelled this notion by noting that, in their enuretic samples, there was no clinical depression or psychotropic response to imipramine. The anti-enuretic effects were almost immediate, and the dose and blood level necessary were far smaller than those required for depression. Since imipramine had a variety of distinct pharmacodynamic effects, pharmacological amalgamation is a questionable but testable hypothesis. In any case, it is logically and factually irrelevant to the validity of pharmacological dissection. Although the clinically useful subsyndromes discerned by pharmacological dissection have stimulated hypotheses about specific adaptive dysfunctions, they do not thereby provide objective diagnostic criteria.
Refining Remission Heterogeneity Among patients who remit during drug treatment, it is still not known if they benefit from a specific pharmacological action or just get better. Chassan (1967) addressed how one can tell whether a treatment intervention actually worked in an individual patient. He recommended ‘‘intensive design,’’ that is, repeating periods of intervening and nonintervening and evaluating whether the benefit synchronized with the intervention. This would be an alternative clinical trials design since if psychotropic drugs are discontinued immediately after remission, relapse rates are usually high. It is only among those whose specific benefit requires medication that double-blind placebo substitution should incur relapse. Therefore, the alternative design is to initially openly treat all patients with the medication under study, titrating for the individual’s optimal dose, until it is clear if the patient shows such a minor response that he or she could not be a treatment responder. These patients would leave the trial. Responders would be maintained on medication for a period but then randomly assigned, double-blind, to either placebo or continued medication. All patients would be closely followed for defined signs of worsening. A worsening rate higher in the placebo-substituted group than the medication-maintained group would be evidence of medication efficacy. Medication retreatment would be indicated for patients exhibiting signs of worsening. Those who worsened on placebo and then improved on medication retreatment are likely specific drug responders. Those who switched to placebo but continued to do well would be less likely to be specific medication responders. By sequentially repeating this process, nonspecific and specific responders would be progressively identified. Even without repetition, specific drug responder identification would be substantially enhanced. This design would define individuals who were very likely medication-specific
334
Causality and Psychopathology
responders, likely nonspecific responders, and nonresponders. One concern might be that this design would fail if a drug were curative; in that case, worsening on placebo substitution in those remitted during drug treatment should not occur. (We would be grateful for such a design failure.) Other practical benefits are that all patients initially receive active treatment, which will foster recruitment since often patients will not risk being assigned to placebo. Also, patients will learn whether medication is necessary for them to remit. This design has been used successfully (McGrath et al., 2000).
Combining Objective Measures With Intensive Design Modifying the usual clinical trial by randomized, double-blind placebo substitution in putative responders (intensive design) allows for the isolation of a clinically meaningful, experimentally defined subsyndrome (i.e., individuals who require specific medication for both remission and relapse prevention). Past experience indicates that medications that specifically induce remission, when continued, also prevent relapse. This again affirms that their specific therapeutic activity occurs by normalizing a dysfunction. Further, extending the intensive clinical trial design by including objective baseline measures can allow the isolation of objective diagnostic criteria for this subsyndrome. Also, finding objective dependent outcome measures (in those who require medication to improve) renders blinding unnecessary. Heuristically, if such objective measures also normalize during the course of syndrome remission, then they must be an integral part of the causation of dysfunction rather than simply a correlate. This suggests that embedding objective measures within intensive clinical trials of already known specific therapeutic agents would allow for the discovery of objective, clinically relevant, psychiatric diagnostic criteria. Even better, this experimental approach can isolate causation both of dysfunction and of the specific medication under study. Note that this requires a long-term programmatic approach that substantially differs from and exceeds current National Institutes of Health (NIH) road maps or DSM-V discussions.
Conclusion Since the current American Psychiatric Association (APA) DSM is primarily a diagnostic manual for practitioners, the threshold for including objective findings should depend on clearly demonstrated practical value related to differential diagnosis. This suggested approach to the objective investigation of the pathophysiologies manifested as psychiatric syndromes requires expensive, long-term, programmatic support. It is not likely to affect any
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria
335
DSM for quite a while. The large difficulty is that neither NIH nor industry nor the APA DSM process supports such study designs, especially of marketed medications. Achieving the necessary long-term support may depend on the realization that current genomic and brain-imaging efforts are unlikely to succeed in resolving nosological ambiguities because syndrome and genetic heterogeneity defeats group contrast and correlative studies. Our suggestion is to substantially diminish heterogeneity by objectively identifying specific pharmacotherapeutic responders through intensive design. We argue that major psychotropic drug effects depend on normalization of proximal pathophysiology. Objective predictors of specific medication remission that are also specifically treatment-responsive must be causally relevant to the underlying pathophysiology; these predictors are central clues to both pathophysiology and drug response. Finally, studying known effective agents hastens this goal. This is worth emphasis as it affords a strong basis for programmatic support. Such specific objective signs would improve psychiatric differential diagnosis beyond both the current clinical consensus and biomarker approaches.
References Adly, C., Straumanis, J., & Chesson, A. (1992). Fluoxetine prophylaxis of migraine. Headache, 32, 101–104. Bodmer, W. F. (1981). Gene clusters, genome organization and complex phenotypes. When the sequence is known, what will it mean? American Journal of Human Genetics, 33, 664–682. Braff, D. L., Freedman, R., Schork, N. J., & Gottesman, I. I. (2007). Deconstructing schizophrenia: An overview of the use of endophenotypes in order to understand a complex disorder. Schizophrenia Bulletin, 33, 21–32. Chassan, J. B. (1967). Research design in clinical psychology and psychiatry. New York: Appleton-Century-Crofts. Crow, T. J. (2007). How and why genetic linkage has not solved the problem of psychosis: Review and hypothesis. American Journal of Psychiatry, 30, 13–21. de Visser, S. J., van der Post, J., Pieters, M. S., Cohen, A. F, & van Gerven, J. M. (2001). Biomarkers for the effects of antipsychotic drugs in healthy volunteers. British Journal of Clinical Pharmacology, 51, 119–132. Dimascio, A., Havens, L. L., & Klerman, G. L. (1963a). The psychopharmacology of phenothiazine compounds: A comparative study of the effects of chlorpromazine, promethazine, trifluoperazine and perphenazine in normal males. I. Introduction, aims and methods. Journal of Nervous and Mental Disease, 136, 15–28. Dimascio, A., Havens, L. L., & Klerman, G. L. (1963b). The psychopharmacology of phenothiazine compounds: A comparative study of the effects of chlorpromazine, promethazine, trifluoperazine, and perphenazine in normal males. II. Results and discussion. Journal of Nervous and Mental Disease, 136, 168–186.
336
Causality and Psychopathology
Dumont, G. J., de Visser, S. J., Cohen, A. F., & van Gerven, J. M.; Biomarker Working Group of the German Association for Applied Human Pharmacology. (2005). Biomarkers for the effects of selective serotonin reuptake inhibitors (SSRIs) in healthy subjects. British Journal of Clinical Pharmacology, 59, 495–510. Flint, J., & Munafo, M. R. (2006). The endophenotype concept in psychiatric genetics. Psychological Medicine, 37, 163–180. Fyer, A. J., Hamilton, S. P., Durner, M., Haghighi, F., Heiman, G. A., Costa, R., et al. (2006). A third-pass genome scan in panic disorder: Evidence for multiple susceptibility loci. Biological Psychiatry, 60(4), 388–401. Gottesman, I. I., & Gould, T, D. (2003). The endophenotype concept in psychiatry: Etymology and strategic intentions. American Journal of Psychiatry, 160, 636–645. Jorgensen, O. S., Lober, M., Christiansen, J., & Gram, L. F. (1980). Plasma concentration and clinical effect in imipramine treatment of childhood enuresis. Clinical Pharmacokinetics, 5, 386–393. Judd, L. L., Hubbard, B., Janowsky, D. S., Huey, L. Y., & Attewell, P. A. (1979). The effect of lithium carbonate on affect, mood, and personality of normal subjects. Archives of General Psychiatry, 36, 860–866. Klein, D. F. (1964a). Behavioral effects of imipramine and phenothiazines: Implications for a psychiatric pathogenic theory and theory of drug action. In J. Wortis (Ed.), Recent advances in biological psychiatry (Vol. VII, pp. 273–287). New York: Plenum Press. Klein, D. F. (1964b). Delineation of two drug-responsive anxiety syndromes. Psychopharmacologia, 5, 397–408. Klein, D. F. (1967). Importance of psychiatric diagnosis in prediction of clinical drug effects. Archives of General Psychiatry, 16(1), 118–126. Klein, D. F. (1978). A proposed definition of mental illness. In R. Spitzer, D. F. Klein (Eds.), Critical Issues in Psychiatric Diagnosis (pp. 41–71). New York: Raven Press. Klein, D. F. (1988). Cybernetics, activation, and drug effects. Acta Psychiatrica Scandinavica Supplementum, 341, 126–137. Klein, D. F. (1993). False suffocation alarms, spontaneous panics, and related conditions; an integrative hypothesis. Archives of General Psychiatry, 50, 306–317. Klein, D. F. (1999). Harmful dysfunction, disorder, disease, illness, and evolution. Journal of Abnormal Psychology, 108, 421–429. Klein, D. F., Gittelman, R., Quitkin, F., & Rifkin, A. (Eds.). (1980). Diagnosis and drug treatment of psychiatric disorders: Adults and children (2nd ed.). Baltimore, MD: Williams & Wilkins. Klein, D. F., Ross, D. C., & Cohen, P. (1987). Panic and avoidance in agoraphobia: Application of PATH analysis to treatment studies. Archives of General Psychiatry, 44(3), 377–385. Klein, D. F., & Stewart, J. (2004). Genes and environment: Nosology and psychiatry. Neurotoxicity Research, 6(1), 11–15. Knutson, B., Wolkowitz, O. M., Cole, S. W., Chan, T., Moore, E. A., Johnson, R. C., et al. (1999). Selective alteration of personality and social behavior by serotonergic intervention. American Journal of Psychiatry, 155, 373–379. Kupfer, D. J., First, M. B., & Regier, D. A. (Eds.). (2002). A research agenda for DSM-V. Washington DC: American Psychiatric Association. Lewis, Aubrey (1967). The state of psychiatry: essays and addresses. London: Routledge and Kegan Paul. Loubinoux, I., Tombari, D., Pariente, J., Gerdelat-Mas, A., Franceries, X., Cassol, E., et al. (2005). Modulation of behavior and cortical motor activity in healthy subjects by a chronic administration of a serotonin enhancer. NeuroImage, 27, 299–313.
13 Causal Thinking for Objective Psychiatric Diagnostic Criteria
337
McGrath, P. J., & Klein, D. F. (1983). Heuristically important mood altering drugs. In J. Angst (Ed.), The origins of depression: current concepts and approaches (pp. 331–349). New York: Springer-Verlag. McGrath, P. J., Stewart, J. W., Petkova, E., Quitkin, F. M., Amsterdam, J. D., Fawcett, J., et al. (2000). Predictors of relapse during fluoxetine continuation or maintenance treatment of major depression. Journal of Clinical Psychiatry, 61, 518–524. Meehl, P. E. (1990). Appraising and amending theories: The strategy of lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141. Meehl, P. E. (1992). Factors and taxa, traits and types, difference of degree and differences in kind. Journal of Personality, 60, 117–174. Murphy, E. A. (1964). One cause? Many causes? The argument from the bimodal distribution. J. chronic Dis., 17, 301–324. Pace-Schott, E. F., Gersh, T., Silvestri, R., Stickgold, R., Salzman, C., & Hobson, A. J. (2001). SSRI treatment suppresses dream recall frequency but increases subjective dream intensity in normal subjects. Sleep Research, 10, 129–142. Parsons, T. (1951). The social system. New York: Free Press. Pichot, P. (2006). Tracing the origins of bipolar disorder: From Falret to SM-IV and ICD-10. Journal of Affective Disorders, 96, 145–148. Preter, M., & Klein, D. F. (2008). Panic, suffocation false alarms, separation anxiety and endogenous opioids. Progress in Neuropsychopharmacology and Biological Psychiatry, 32, 603–612. Quitkin, F. M., McGrath, P. J., Stewart, J. W., Harrison, W., Wager, S. G., Nunes, E., et al. (1989). Phenelzine and imipramine in mood reactive depressives: Further delineation of the syndrome of atypical depression. Archives of General Psychiatry, 46(9), 787–793. Rapoport, J. L., Mikkelsen, E. J., Zavadil, A., Nee, L., Gruenau, C., Mendelson, W., et al. (1980). Childhood enuresis. II. Psychopathology, tricyclic concentration in plasma, and antienuretic effect. Archives of General Psychiatry, 37, 1146–1152. Riley, B., & Kendler, K. S. (2006). Molecular genetic studies of schizophrenia. European Journal of Human Genetics, 14, 669–680. Rosenzweig, P., Canal, M., Patat, A., Bergougnnan, L., Zieleniuk, I., & Bianchetti, G. (2002). A review of the pharmacokinetics, tolerability and pharmacodynamics of amisulpride in healthy volunteers. Human Psychopharmacology, 17, 1–13. Rutter, M. (2007). Gene–environment interdependence. Developmental Science, 10, 12–18. Satel, S. L., & Nelson, J. C. (1989). Stimulants in the treatment of depression: A critical overview. Journal of Clinical Psychiatry, 50, 241–249. Smoller, J. W., & Tsuang, M. T. (1998). Panic and phobic anxiety: Defining phenotypes for genetic studies. American Journal of Psychiatry, 155, 1152–1162. Stewart, J. W., McGrath, P. J., Quitkin, F. M., & Klein, D. F. (2007). Atypical depression: Current status and relevance to melancholia. Acta Psychiatrica Scandinavica Supplementum, 433, 58–71. Thirion, B., Pinel, P., Meriaux, S., Roche, A., Dehaene, S., & Poline, J. (2007). Analysis of a large fMRI cohort: Statistical and methodological issues for group analyses. NeuroImage, 35, 105–120. Wakefield, J. C. (1992). Disorder as harmful dysfunction: A conceptual critique of DSM-III-R’s definition of mental disorder. Psychological Review, 99(2), 232–247. Zitrin, C. M., Klein, D. F., & Woerner, M. G. (1978). Behavior therapy, supportive psychotherapy, imipramine, and phobias. Archives of General Psychiatry, 35(3), 307–316.
14 The Need for Dimensional Approaches in Discerning the Origins of Psychopathology robert f. krueger and daniel goldman
Introduction The 2008 meeting of the American Psychopathological Association was framed by a very challenging topic: causality. Indeed, setting aside any possible application in understanding psychopathology, causality is a deep concept—a fact that has kept philosophers gainfully employed for some time now. One thing is clear, however, at least in the behavioral sciences: If one wants to make credible causal claims, it helps to be able to directly manipulate the variables of interest. Indeed, some would go so far as to say that causality cannot be inferred without this kind of experimental manipulation. Through manipulation, one can systematically vary a variable of interest, while holding others constant, including the observational conditions. Consider, for example, how this is conveyed to new students in the behavioral sciences in a very useful text by Stanovich (2007). Stanovich (2007) first reviews the classic observation that simply knowing that two things (A and B) tend to occur together more often than one would expect by chance (a correlation) is not enough evidence to conclude that those two things have some sort of causal relationship (e.g., A causes B). To really claim that A causes B, ‘‘the investigator manipulates the variable hypothesized to be the cause and looks for an effect on the variable hypothesized to be the effect while holding all other variables constant by control and randomization’’ (p. 102). The implications of this experimental perspective on causality for psychopathology research are readily apparent: The situation is nearly hopeless, at least in terms of getting at the original, antecedent, distal causes of psychopathology. It is axiomatically unethical to manipulate variables to enhance the likelihood of psychopathology; we cannot directly manipulate things to create psychopathology in persons who do not already suffer from psychopathology. This is not to say that, once psychopathology is present, experimental designs 338
14 The Need for Dimensional Approaches
339
are not fundamentally helpful in understanding the mechanisms underlying its expression. Indeed, the discipline of experimental psychopathology is founded on this premise, involving comparisons of the behaviors of persons with psychopathology and persons without psychopathology under precisely controlled conditions. Still, this approach discerns the mechanisms that are disturbed in psychopathology once it is present, as opposed to the reasons that those mechanisms became disturbed in the first place. In addition, preventive interventions designed to remove putative causes of psychopathology may be tested through experimental manipulations, but many major putative causal factors cannot be removed in this manner in humans (e.g., genes). Given this state of affairs, can we ever figure out ‘‘what causes psychopathology’’? In this chapter, we will argue that much can be learned about the origins and nature of psychopathology without worrying too much about claims of causality per se. Although much progress has been made in explicating tractable models for causal inference (see Chapters 3, 4 and 11; Pearl, 2000), our current thinking is that straightforward claims of causality (e.g., A causes disorder B) are unlikely to map empirical realities in most of psychopathology research. Moreover, this common-language understanding of the word ‘‘cause,’’ as implying a two-variable system of cause and effect, is almost certainly how the word would be interpreted if one were daring enough to use it in reference to conditions that antedate the development of psychopathology. This is in spite of the fact that statisticians and philosophers acknowledge that causality needs to be understood in terms of multiple, probabilistic causes (Holland, 1986). Most of psychopathology is probably too multifactorial to be reducible to straightforward, two-variable statements of cause and effect. For example, aggregate genetic influences on any form of psychopathology summarize the effects of numerous individual polymorphisms. Even if each of these polymorphisms could be individually manipulated in humans while holding the others constant (allowing causality in the experimental sense to be evaluated), each of these relationships is probabilistic and part of a larger system, as opposed to deterministic and part of a two-variable (one gene causes one disorder) system (Kendler, 2005). This multivariable situation at the genomic level then intersects with the ways in which genetic effects are contingent on environmental moderators (Rutter, 2006), thereby further reducing the veridicality of straightforward causal claims. Fortunately, considerations of the complexity of the mechanisms that are likely to underlie psychopathology do not render it empirically intractable. The process of discovering circumstances where psychopathology is more or less likely is empirically tractable, and this endeavor can be enhanced by evolving our conceptualization of psychopathology from an ‘‘either/or’’
340
Causality and Psychopathology
issue to a matter of degree. In particular, if we cannot directly induce psychopathology in people, we are left with naturally occurring patterns of covariation as bedrock empirical observations for our science. Our argument is that these patterns will be more clearly revealed, and their implications for ameliorating psychopathology will be more readily understood, if we evolve our field to think of psychopathology as a matter of degree as opposed to a matter of kind. We also argue that, rather than thinking in terms of specific bivariate relations (e.g., gene A causes disorder B), it will be more generative to embrace the multivariate complexity of both psychopathology and its antecedents and work to model this complexity directly. In the course of this chapter, we will make this argument by first reiterating something that has been known for some time but that bears repeating: If one wants to accurately discern how strongly two things are related, it is helpful for statistical power to conceive of both things as continuous, dimensional variables, as opposed to discrete variables (Cohen, 1983). We will then turn from a discussion of statistical power to a discussion of the conundrums that have emerged from psychopathology research using polythetic categorical concepts: the interrelated problems of within-category heterogeneity and comorbidity among categories. Putting together the statistical advantages of dimensions with the conceptual conundrums generated by polythetic categories emphasizes the need to move toward a novel dimensional approach to classifying psychopathology. With this in mind, we will argue that the current zeitgeist is also optimal for a smooth transition from more categorical to more dimensional ways of thinking about psychopathology. We will then briefly describe some empirical research on psychopathology from a dimensional perspective, ways of integrating this research into the upcoming Diagnostic and Statistical Manual of Mental Disorders, fifth edition (DSM-5), and ways of enhancing the DSM-5 with dimensional concepts to frame the collection of future data. We will conclude with a discussion of how dimensional conceptualization is relevant not only to psychopathology itself but also to conceptualization of the etiology of psychopathology.
Enhancing the Power of Psychopathology Research With Dimensions In scientific disciplines where direct manipulation of critical variables is unethical or infeasible (e.g., psychopathology research, astrophysics), patterns of covariation take on fundamental importance. For example, physicists cannot induce or manipulate the formation of distant celestial bodies, but they can observe the behavior of such bodies using tools like telescopes.
14 The Need for Dimensional Approaches
341
They can then apply formal mathematics to model the resulting data, thereby instantiating scientific theories in empirical data. The situation is very similar in psychopathology research. We cannot induce psychopathology in human beings, but given a group of persons who differ in their psychopathology status, we can see if psychopathology status covaries with other variables (e.g., test performance, physiology, genes, family history, developmental antecedents). The definition of psychopathology status is a fundamental and historically vexing issue. For much of the history of our discipline, such definitions were highly chaotic because different investigators meant different things when they used the same label. The solution to this problem has been to implement a system that solves this definitional problem by providing consensus definitions that draw on the opinions of diverse experts. This is the diagnostic system that originated with DSM-III and has continued forward in much the same form in DSM-III-R and DSM-IV. The modern DSMs have been indispensable in psychopathology research because, to a large extent, we know what other researchers mean when they say they are studying a specific DSM diagnosis. Nevertheless, the modern DSMs embody an important assumption, namely, that all mental disorders are an either/or matter. Each diagnostic construct described in recent DSMs is a polythetic category. That is, for each diagnosis, multiple criteria are listed, and a certain combination of criteria indicates membership in a category of mental disorder, whereas not having those criteria indicates membership in the complementary nondisordered group. The practical needs for dichotomous categorical psychopathology labels (e.g., for third-party payment purposes) have been acknowledged elsewhere (First, 2005; Krueger & Markon, 2006b). However, if our goal is scientific—to understand the origins and nature of psychopathology—dimensional psychopathology constructs are indispensable (Helzer, Kramer, & Krueger, 2006). A fundamental reason for this is that dichotomous variables (e.g., presence vs. absence of a mental disorder) contain less information than variables that can take on more values (e.g., how much a research participant resembles a mental disorder prototype on a multipoint scale) (Kraemer, Noda, & O’Hara, 2004; MacCallum, Zhang, Preacher, & Rucker, 2002). This means that many more research participants are needed to discern the correlates of a dichotomous psychopathology construct, as opposed to a continuous psychopathology construct. The literal ‘‘costs’’ of dichotomization can be quite profound, if one thinks of the problem in terms of the finite amount of money available for research on psychopathology. We can therefore achieve greater research traction with less money by using dimensional constructs because we do not need as many research participants to ask key research questions.
342
Causality and Psychopathology
Limitations of Polythetic Categories: Heterogeneity and Comorbidity In addition to limitations of a more statistical nature, other conceptual problems have emerged from trying to use polythetic-categorical psychopathology constructs in research. The first of these problems is within-category heterogeneity. For specific diagnostic categories, the modern DSMs list symptoms that define the categories, and various combinations of symptoms are equally legitimate indicators of membership in the category. For example, DSM-IV defined obsessive–compulsive personality disorder (OCPD) as consisting of eight symptoms, with four of these symptoms needing to be present to meet the criteria for diagnosis. As a result, two different persons can both legitimately meet the criteria for OCPD in spite of having four entirely different symptoms. Nevertheless, a group of persons meeting the criteria for OCPD is meant to be interpreted as a homogenous group, in the sense that they have a single disorder that presumably has a coherent etiology, course, etc. A related problem is comorbidity, or the tendency for putatively distinct categorical disorders to co-occur more than one would expect by chance. Comorbidity among DSM-defined mental disorders is extensive and is not limited to pairs of disorders. Indeed, the phenomenon is probably better thought of as ‘‘multimorbidity’’ in the sense that two-variable patterns of co-occurrence do not capture the extent of it. That is, persons meeting the criteria for three or more diagnoses are not uncommon, and they carry much of the social burden (e.g., diminished educational and occupational attainment) of mental illness compared with persons who meet the criteria for only one diagnosis (Krueger & Markon, 2006a). The resultant conceptual problem for psychopathology research is clear. Much psychopathology research is organized around single DSM-defined categories, but if the persons whom we most need to understand are ‘‘multimorbid’’ and not well-conceptualized in terms of single categories, then the typical research strategy is not identifying the persons who actually carry the major social burden of mental illness.
The Zeitgeist: The Time Is Right for Dimensions in Psychopathology Research In sum, there are at least three major limitations of categorical psychopathology constructs: (1) reduced statistical power, (2) heterogeneity within categories, and (3) comorbidity among categories. We are neither the first to identify these problems nor the first to suggest that they signal the need
14 The Need for Dimensional Approaches
343
for new dimensional alternatives to categorical conceptions of psychopathology (see Kupfer, 2005). The challenge is how to incorporate dimensional concepts into psychopathology research without returning to the nosological chaos that reigned before the modern DSMs were created. In our view, there are two interrelated pathways to making psychopathology research more dimensional. First, dimensional research on psychopathology can proceed by drawing on, but not being wedded to, existing DSM constructs, and this work could eventuate in an empirically based dimensional nosology. Second, the DSM per se can be enhanced with dimensional concepts, thereby providing a uniform platform for further research. We will discuss both of these pathways in turn.
Examples of Existing Dimensional Research There are a number of recent examples of programmatic research on psychopathology from a dimensional perspective, and this work can be roughly divided into three themes: (1) research comparing dimensional and categorical accounts of psychopathology, (2) research modeling comorbidity among modern DSM constructs dimensionally, and (3) research modeling symptoms within psychopathological categories dimensionally.
Comparing Dimensional and Categorical Models Much of the historical discussion surrounding categorical and dimensional accounts of psychopathology has been framed in terms of practical considerations and disciplinary matters (e.g., clinicians need category labels to communicate, physicians are historically accustomed to using categorical diagnostic concepts), but it is now possible to move this discussion into a more empirical arena. Specifically, there are ways of asking if data are more supportive of a specific model of psychopathology as it occurs in nature, as well as ways of inquiring about hybrid conceptualizations involving both categories and dimensions. Seminal work in this area was pursued by Paul Meehl in his program of research aimed at developing what he termed ‘‘taxometric’’ methods (Meehl, 1992). Taxometric methods are ways of using data to ask if the variables in the data are indicative of a nonarbitrary and coherent category of persons. If such a category can be discerned, this group is referred to as a ‘‘taxon.’’ These methods have become popular in psychopathology research, and a recent special section of the Journal of Abnormal Psychology was devoted to discussing these methods and their application (Cole, 2004). Although taxometric methods have certain methodological strengths (e.g., the use of multiple
344
Causality and Psychopathology
independent procedures or ‘‘multiple epistemic pathways’’ to converge on a taxonic conjecture), they have evolved to some extent outside the realm of mainstream statistics; some limitations of these approaches may be traced to this separate evolutionary path. For example, rather than formally parameterizing the dimensional alternative to the taxonic model as a comparison model, a dimensional structure is assumed when a taxonic model does not provide a good account of the data. Other approaches have therefore been pursued that originate more within the domain of traditional statistical inquiry, with its focus on explicit model parameterization and using data derived from samples to derive inferences about the situation in the broader population. Our recent work in this area has focused on ways of comparing categorical and dimensional accounts of the latent structure of a series of measured indicators of psychopathology via the comparison of latent class and latent trait models. In a latent class model, the observed data reflect the existence of a series of mutually exclusive groups of persons. For example, a set of signs and symptoms might delineate distinct groups of persons characterized by specific and distinct profiles on the measured indicators. In contrast, in a latent trait model, the same signs and symptoms delineate an underlying dimension, such that people are characterized by their position along that dimension, as opposed to being characterized by their membership in discrete groups. Recent methodological work has shown how these models can often be distinguished by their differential fit to the same data (Lubke & Neale, 2006; Markon & Krueger, 2006). In addition, we have applied these modeling comparisons to show that comorbidity among DSM-defined disorders involving substance dependence and antisocial behavior indicate a dimensionally organized ‘‘externalizing spectrum,’’ as opposed to membership in discrete classes of mental disorder (Krueger, Markon, Patrick, & Iacono, 2005; Markon & Krueger, 2005). Muthe´n (2006) has discussed related modeling developments and articulated models that represent hybrids of latent class and latent trait models. In these hybrid models, both categorical and continuous latent variables are accommodated, providing flexibility in terms of thinking of both categorical and continuous aspects of the organization of mental disorders. The general point is that categorical and continuous conceptions need no longer be adjudicated based on a priori preferences. Data can be brought to bear directly on these issues.
Modeling Comorbidity Among Existing Constructs Dimensionally As described earlier, comorbidity has presented a vexing problem in psychopathology research. One approach to the problem is to attempt to unravel the
14 The Need for Dimensional Approaches
345
meaning of comorbidity by fitting explicit quantitative models to comorbidity data. Krueger and Markon (2006a) reviewed research in this vein and concluded that the existing literature supports a liability-spectrum model of comorbidity among DSM-defined mental disorders commonly observed in the population. In this model, comorbidity is neither an artifact nor a nuisance but, instead, a predictable consequence of the involvement of shared liability factors in multiple disorders. These shared factors are well-conceptualized as continuous dimensions of personality that confer risk for the development of psychopathology. In particular, a personality style characterized by emotional instability confers risk for a broad internalizing spectrum of unipolar mood and anxiety disorders; when this style is also accompanied by disinhibition, there is elevated risk for a broad externalizing spectrum of substance-use and antisocial-behavior disorders (Krueger, 2005). This research has mostly been confined to disorders that are prevalent in the general, community-dwelling population because epidemiological data provide the needed diversity of assessed constructs and sample sizes. The approach could be extended to samples (e.g., outpatient psychiatric) with a greater density of other kinds of psychopathology, and other putative spectra would likely be delineated as a result (e.g., a psychosis spectrum) (Wolf et al., 1988).
Modeling Symptoms Within Existing Constructs Dimensionally Although the DSM conceptualizes all mental disorders as categories, many instruments for assessing DSM constructs allow for the collection of symptomlevel information. Data on symptoms can be analyzed in a dimensional manner by using statistical techniques designed to model relationships between symptoms and an underlying dimension; these are the aforementioned latent trait models, which are also known as item response theory (IRT) models. The alcoholism literature is one place where this approach has been used extensively in the last few years, although it has also been fruitful in studying symptoms within other constructs (e.g., unipolar depression) (Aggen, Neale, & Kendler, 2005). Earlier work on alcohol problems from a categorical perspective (e.g., using latent class models) (Heath et al., 1994) revealed latent groups that represented gradations on an underlying continuum. That is, the groups tended not to have unique profiles (e.g., a group with primarily one type of symptom as opposed to another) but, rather, were characterized by increasing probabilities of all types of symptoms. As a result, investigators in this literature have recently turned to IRT models because such models are well-suited to exactly this kind of dimensional latent structure.
346
Causality and Psychopathology
In typical IRT models, symptoms of psychopathology are modeled in terms of two parameters: the strength of the relationship between the symptom and the underlying dimension of psychopathology and the place along the dimension where the symptom is most relevant (a concept that can be understood as the severity of the symptom). These sorts of IRT models have yielded some basic insights about the nature of alcohol problems. First, the relatively good fit of IRT models indicates that alcohol problems are well conceptualized as lying along a dimension, with the presence of more severe symptoms (e.g., medical complications) indicating a higher probability that less severe symptoms (e.g., heavy use) are also present (Krueger et al., 2004; Saha, Chou, & Grant, 2006). Second, the arrangement of abuse and dependence symptoms across the alcohol-problems dimension does not align with the severity arrangement suggested in the DSM; some dependence symptoms can be quite mild, whereas some abuse symptoms can be quite severe (Langenbucher et al., 2004). These two basic insights provide an important impetus for changing the diagnostic criteria for alcohol diagnoses in DSM-5 to better map a continuum of severity as it occurs in nature (see Martin, Chung, Kirisci, & Langenbucher, 2006). The utility of IRT models in alcoholism research suggests that such models might be useful in studying diverse psychopathological concepts. This is an empirical question that can be addressed by fitting these models to other data and interpreting the results. Such an endeavor might result in the realization that IRT models are not optimal for all psychopathological domains. This does not mean, however, that we should abandon dimensional concepts or statistical modeling as a means of instantiating theory in data. Rather, what might be needed are more subtle models, perhaps containing both categorical and dimensional elements (Muthe´n, 2006). Consider, for example, the possibility of uncovering a categorical distinction between having no psychopathological symptoms and having at least some symptoms but a dimensional distinction within those persons who manifest at least one symptom. One interpretation of these modeling results is that, in nature, there exists a cusp—a categorical distinction between zero and at least one symptom. Drawing from this conceptualization, diagnostic criteria need to be optimized for two purposes: (1) distinguishing between people above or below the cusp and (2) distinguishing severity among cases or persons beyond the cusp. Another interpretation, however, is that the symptom list is incomplete; the cusp is therefore an illusion created by not assessing symptoms that occur below the cusp. Because DSM conceptualizations of psychopathology have historically derived from clinical conceptualizations of psychopathology, milder symptoms may be omitted simply because they have not traditionally been observed in clinical settings. Nevertheless, such symptoms may be of
14 The Need for Dimensional Approaches
347
importance to society; ‘‘subclinical’’ psychopathology has very real publichealth consequences (Horwath, Johnson, Klerman, & Weissman, 1994). Given this interpretation, the next step in the scientific process is to identify and assess symptoms less severe than those traditionally recognized in clinical settings and to see if they lie along the same dimension as more severe symptoms. The overall point is that conceptions of psychopathology are now amenable to a program of empirical research involving close links between methodological and substantive developments. Sorting out conceptual possibilities and working them through in data via the application of formal statistical models will tell us a great deal about the underlying nature of psychopathology.
Augmenting DSM-5 With Dimensions As can be seen from the foregoing sections, there is now a nontrivial corpus of research on psychopathology from a dimensional perspective. This seems particularly remarkable given the exclusively categorical nature of mental disorders as defined in the modern DSMs. In recognition of this burgeoning dimensional literature, the American Psychiatric Institute for Research and Education (APIRE) organized a meeting in July 2006 to discuss a research agenda for contemplating the inclusion of dimensions throughout the upcoming DSM-5 (Helzer et al., 2008). Although the primary sources should be consulted to understand the numerous ideas discussed at the meeting, a general consensus was that the DSM-5 could benefit from the explicit inclusion of dimensional elements in many areas of psychopathology.
Enhancing Future Inquiry: Separable Research and Official Nosology Streams Although there is enthusiasm for dimensional concepts in the classification of psychopathology, many areas of psychopathology have not been extensively studied from a dimensional perspective due to the traditional categorical focus of the DSM. Moreover, categorical concepts have their place in any applied nosology, particularly for practical clinical purposes, such as providing descriptive labels to facilitate third-party payment. Hence, conversion of the DSM to an entirely dimensional system in the course of one revision is likely infeasible and may not be entirely desirable. With these considerations in mind, a dual-track strategy might be pursued. First, research on dimensional approaches can continue to flourish, separate from the DSM per se. Second, the DSM can be enhanced with dimensional concepts. In areas with
348
Causality and Psychopathology
a rich history of dimensional research (e.g., personality disorders) (Widiger, Simonsen, Krueger, Livesley, & Verheul, 2005), this enhancement process will likely be more straightforward. For other areas, the process of dimensional augmentation might take longer, necessitating research that may not be directly based on categorical DSM concepts. Logically speaking, these separable streams—a research stream on dimensions of psychopathology and the DSM nosology per se—will intertwine. The important point in distinguishing the streams, however, is that the DSM need not be seen as a barrier to dimensionally oriented research, even in areas where practical consideration or a lack of relevant literature results in a more traditional categoricalpolythetic classification scheme.
Etiologic Factors: Toward a Structural Perspective to Dovetail With the Phenotypic Structure of Psychopathology We have argued that psychopathology research can benefit from incorporating dimensional constructs more extensively. Dimensionality extends to diverse aspects of a comprehensive nosology. Symptoms within diagnoses can be understood as indicators of underlying dimensions, and the arrangement of diagnostic concepts can be explored empirically by thinking of diagnoses as lying within dimensionally organized spectrums. In taking this approach, some shortcomings of a purely categorical nosology can be overcome. Statistical power is enhanced through the use of dimensional constructs. Within-category heterogeneity can be dealt with by isolating the correlates of dimensions in models that take into account the structural organization of psychopathology. For example, the unique correlates of a specific narrow syndrome (e.g., unipolar depression) can be identified by taking into account, or holding constant, the variance shared between that syndrome and neighboring syndromes within specific spectrums (e.g., anxiety disorders, understood as closely related dimensions within a broader spectrum of internalizing psychopathology). Comorbidity can be dealt with by modeling the natural tendencies for disorders to co-occur within broader spectrums of variation. So far, we have discussed only the internal, nosological aspects of this kind of dimensional-hierarchical perspective on psychopathology. We leave the reader with some thoughts about the flipside of this perspective. That is, could a dimensional-hierarchical perspective also benefit the way we think about the causes of psychopathology (or at least its antecedents and correlates)? Often, putative causal factors have been studied and framed individually and thought of as dichotomous. For example, Kendler (2005) described
14 The Need for Dimensional Approaches
349
the history of the idea that there are ‘‘specific genes for specific psychopathologies,’’ with its implication that causal genes will operate in a straightforward Mendelian manner (e.g., there is one relevant gene, and it has two forms, mutated/disease-causing and nonmutated, and the etiologic effect of the mutated form is insensitive to environmental inputs). Although there are some human neuropsychiatric diseases where the etiology can be understood in this way (e.g., Huntington disease), Kendler (2005) concluded that genetic effects on most psychopathological conditions are not likely to be this straightforward. Rather, genetic effects on psychopathology are likely smaller, many genes are likely relevant, and these genes are likely sensitive to environmental inputs. As with the complexity of psychopathological phenotypes, this etiologic complexity can also be usefully parsed using dimensional-structural approaches. Indeed, the dimensional structure of etiologic factors may resemble the structure of psychopathology itself, a finding that breaks down the conceptual barrier between ‘‘cause’’ (or etiology) and ‘‘effect’’ (psychopathological phenotypes). To pick one example, twin research on the externalizing spectrum shows that the genetic effects on individual DSM disorders involving antisocial behavior and substance dependence are largely (but not exclusively) in common, and this common genetic risk can be well-modeled as a dimension (Krueger et al., 2002; Kendler, Prescott, Myers, & Neale, 2003; Young, Stallings, Corley, Krauter, & Hewitt, 2000). This genetic risk dimension represents the effects of numerous individual genes that increase the probability of psychopathology in concert; as such, it provides a compelling target for identifying specific genetic polymorphisms that increase the risk for psychopathology. This strategy appears to have greater traction for identifying relevant polymorphisms when compared with a strategy aimed at detecting putatively separate and dichotomous genetic effects on putatively separate and dichotomous externalizing disorders (see Dick, 2007). The general point is that dimensional thinking can extend usefully beyond nosology to also encompass thinking about etiology. We look forward to seeing if a dimensional perspective can get us closer to understanding not only what psychopathology is but also where it comes from.
References Aggen, S. H., Neale, M. C., & Kendler, K. S. (2005). DSM criteria for major depression: Evaluating symptom patterns using latent-trait item response models. Psychological Medicine, 35, 475–487. Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7, 249–253.
350
Causality and Psychopathology
Cole, D. A. (2004). Taxometrics in psychopathology research: An introduction to some of the procedures and related methodological issues. Journal of Abnormal Psychology, 113, 3–9. Dick, D. M. (2007). Identification of Genes Influencing a Spectrum of Externalizing Psychopathology. Current Directions in Psychological Science, 16(6), 331–335. First, M. B. (2005). Clinical utility: A prerequisite for the adoption of a dimensional approach in DSM. Journal of Abnormal Psychology, 114, 560–564. Heath, A. C., Bucholz, K. K., Slutske, W. S., Madden, P. A. F., Dinwiddie, S. H., Dunne, M. P., et al. (1994). The assessment of alcoholism in surveys of the general community: What are we measuring? Some insights from the Australian twin panel interview survey. International Review of Psychiatry, 6, 295–307. Helzer, J. E., Kraemer, H. C., & Krueger, R. F. (2006). The feasibility and need for dimensional psychiatric diagnoses. Psychological Medicine, 36, 1671–1680. Helzer, J. E., Kraemer, H. C., Krueger, R. F., Wittchen, H-U., Sirovatka, P. J., & Regier, D. A. (Eds.). (2008). Dimensional approaches in diagnostic classification: Refining the research agenda for DSM-5. Arlington, VA: American Psychiatric Association. Holland, P .W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960. Horwath, E., Johnson, J., Klerman, G. L., & Weissman, M. M. (1994). What are the public health implications of subclinical depressive symptoms? Psychiatric Quarterly, 65, 323–337. Kendler, K. S. (2005). "A Gene for . . .": The nature of gene action in psychiatric disorders. American Journal of Psychiatry, 162, 1243–1252. Kendler, K. S., Prescott, C., Myers, J., & Neale, M. C. (2003). The structure of genetic and environmental risk factors for common psychiatric and substance use disorders in men and women. Archives of General Psychiatry, 60, 929–937. Kraemer, H. C., Noda, A., & O’Hara, R. (2004). Categorical versus dimensional approaches to diagnosis: Methodological challenges. Journal of Psychiatric Research, 38, 17–25. Krueger, R. F. (2005). Continuity of axes I and II: Toward a unified model of personality, personality disorders, and clinical disorders. Journal of Personality Disorders, 19, 233–261. Krueger, R. F., Hicks, B. M., Patrick, C. J., Carlson, S. R., Iacono, W. G., & McGue, M. (2002). Etiologic connections among substance dependence, antisocial behavior, and personality: Modeling the externalizing spectrum. Journal of Abnormal Psychology, 111, 411–424. Krueger, R. F., & Markon, K. E. (2006a). Reinterpreting comorbidity: A model-based approach to understanding and classifying psychopathology. Annual Review of Clinical Psychology, 2, 111–133. Krueger, R. F., & Markon, K. E. (2006b). Understanding psychopathology: Melding behavior genetics, personality, and quantitative psychology to develop an empiricallybased model. Current Directions in Psychological Science, 15, 113–117. Krueger, R. F., Markon, K. E., Patrick, C. J., & Iacono, W. G. (2005). Externalizing psychopathology in adulthood: A dimensional-spectrum conceptualization and its implications for DSM-5. Journal of Abnormal Psychology, 114, 537–550. Krueger, R. F., Nichol, P. E., Hicks, B. M., Markon, K. E., Patrick, C. J., Iacono, W. G., et al. (2004). Using latent trait modeling to conceptualize an alcohol problems continuum. Psychological Assessment, 16, 107–119.
14 The Need for Dimensional Approaches
351
Kupfer, D. J. (2005). Dimensional models for research and diagnosis: A current dilemma. Journal of Abnormal Psychology, 114, 557–559. Langenbucher, J. W., Labouvie, E., Martin, C. S., Sanjuan, P. M., Bavly, L., Kirisci, L., et al. (2004). An application of item response theory analysis to alcohol, cannabis, and cocaine criteria in DSM-IV. Journal of Abnormal Psychology, 113, 72–80. Lubke, G. H., & Neale, M. C. (2006). Distinguishing between latent classes and continuous factors: Resolution by maximum likelihood? Multivariate Behavioral Research, 41, 499–532. MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40. Markon, K. E., & Krueger, R. F. (2005). Categorical and continuous models of liability to externalizing disorders: A direct comparison in NESARC. Archives of General Psychiatry, 62, 1352–1359. Markon, K. E., & Krueger, R. F. (2006). Information-theoretic latent distribution modeling: Distinguishing discrete and continuous latent variable models. Psychological Methods, 11, 228–243. Martin, C. S., Chung, T., Kirisci, L., & Langenbucher, J. W. (2006). Item response theory analysis of diagnostic criteria for alcohol and cannabis use disorders in adolescents: Implications for DSM-5. Journal of Abnormal Psychology, 115, 807–814. Meehl, P. E. (1992). Factors and taxa, traits and types, differences in degree and differences in kind. Journal of Personality, 60, 117–174. Muthe´n, B. (2006). Should substance use disorders be considered as categorical or dimensional? Addiction, 101(Suppl. 1), 6–16. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press. Rutter, M. (2006). Genes and behavior: Nature–nurture interplay explained. Malden, MA: Blackwell. Saha, T. D., Chou, S. P., & Grant, B. F. (2006). Toward an alcohol use disorder continuum using item response theory: Results from the National Epidemiologic Survey on Alcohol and Related Conditions. Psychological Medicine, 36, 931–941. Stanovich, K. E. (2007). How to think straight about psychology (8th ed.). Boston: Pearson. Widiger, T. A., Simonsen, E., Krueger, R., Livesley, J. W., & Verheul, R. (2005). Personality disorder research agenda for the DSM-5. Journal of Personality Disorders, 19, 315–338. Wolf, A. W., Schubert, D. S., Patterson, M. B., Grande, T. P., Brocco, K. J., & Pendleton, L. (1988). Associations among major psychiatric diagnoses. Journal of Consulting and Clinical Psychology, 56, 292–294. Young, S. E., Stallings, M. C., Corley, R. P., Krauter, K. S., & Hewitt, J. K. (2000). Genetic and environmental influences on behavioral disinhibition. American Journal of Medical Genetics, 96, 684–695.
This page intentionally left blank
Index
Note: page numbers followed by ‘‘f ’’ and ‘‘t’’ denote figures and tables, respectively. Abnormality, 322
alcohol consumption and, 219, 220f,
Adaptive treatment strategies
221t, 222f, 234–35, 235f, 236f
defined, 179 development of. See Sequential multiple
Algebra, 60–61 Alleles, 254
assignment trial
Allelic heterogeneity, 255. See also
research questions to refine, 181–82,
Heterogeneity
182t, 184t
Alzheimer’s disease
Adly, C., 330
nonsteroidal anti-inflammatory
Aerodigestive cancer, alcohol dependence
drugs for, 281
and, 231f
American Cancer Society, 212
Agnostic causal model, 103–4, 117. See also Causal thinking; Directed acyclic
American Psychiatric Association, 297 American Psychiatric Institute for Research
graph
and Education (APIRE), 347
associated with directed acyclic graphs,
American Psychopathological
123–24
Association, 338
Agoraphobia, 325
Amphetamine, 330
best-fitting meta-analysis model for,
Anderson, G. L., 79, 83
312t, 313t
Angold, A., 13, 279
factor loadings of, 313t Alcohol dependence
Angrist, J., 60 Animal studies
aerodigestive cancer and, 231f
meiotic randomization in, 239–40
and ALDH2 genotype influences
Mendelian randomization in, 237.
diseases, 219, 220f, 221t, 222f, 234–35,
See also Mendelian randomization Anticipatory anxiety, 326 Antipsychotic drugs, 330
235f, 236f best-fitting meta-analysis model for,
Antisocial behavior, 344
312t, 313t
genetic effects on, 349 Antisocial personality disorder
and coronary heart disease, 216, 218 early drinking and, 70–72, 71f
best-fitting meta-analysis model for, 312t, 313t
and esophageal cancer, 219, 223f factor loadings of, 313t
factor loadings of, 313t
prevalence of, 70
Ascherio, A., 207
PTSD and, 310
Asparouhov, T., 164
ALDH2 genotype influences diseases
353
Index
354 Associational versus causal concept. See Causal versus associational
Bounds, on counterfactual probability, 153–54
concept Associative selection bias, 215. See also Bias
Braff, D. L., 324 Brain imaging, 326–27
Attenuation by errors. See Regression
Breslau, N., 297, 301–3, 305, 306, 310, 311
dilution bias Autism-spectrum disorders (ASDs) deletions on the X-chromosome and, 266–67 functional variants in NLGN4 and,
Breslow, N. E., 280 British Women’s Heart and Health Study (BWHHS), 209–10 Brown, G. W., 281 Brown, H., 177
267, 268 genetics of, 265–70 homozygous mutation in CNTNAP2, 267, 268 and rare variant:common disease hypothesis, 268–70 Average causal effect(s), 29, 30. See also Causal effect from experiments, 5–6, 6f nonadherence impact on, 6 Avin, C., 108, 125, 140–43
Campbell, D., 289 Campbell, D. T., 27, 37, 41, 43, 44 Canalization, 236 Candidate gene association studies, 261, 262. See also Genetic variation Categorical psychopathology. See also Psychopathology comparison with dimensional psychopathology, 343–44 research modeling symptoms within, 345–47
Baba, T., 220, 221
Cauley, J. A., 83
Back-door criterion, 54–56, 55f. See also
Causal analysis, 4
Directed acyclic graph
average causal effects from
Baron, R. M., 16, 18, 21 Bayesian information criterion (BIC), 164
experiments, 5–6, 6f confounding, 9–12
Berkson’s bias, 212. See also Bias
clinical trials, 6–7
Best, S. R., 304
innovative designs and analyses for
Best-fitting meta-analysis model of DSM-IV disorders, 313f path diagram for, 312f Bias
improving causal inferences, 8–9 nonexperimental observational studies, 6f, 8 temporal patterns of, 12–15, 14f
associative selection, 215 Berkson’s, 212
Causal Bayesian networks, 52n5 Causal effect, 51–54
regression dilution, 213
average, 5–6, 6f, 29, 30 context dependency, 36–37
selection, 25 Bilineal inheritance, 259
defined, 4–5, 28
Bindewald, B., 207
estimation, confounding and, 54–57
Bipolar disorder
mediation of, 15–21, 16f, 17f, 19f
best-fitting meta-analysis model for,
moderation of, 21
312t, 313t factor loadings of, 313t
narrowly defined outcomes, 36 natural confounding effects
Bladder cancer, smoking and, 231–32, 232t
on, 35–36
Index of treatment, 28–29 true, 30, 35, 37 Causal explanation, 27, 41–43 construct validity, 41–42 external validity, 42–43 Causal identification, 27, 37–41, 38f, 39t Causal inference
355
Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE), 183 Clinical psychopharmacology normalizing effects on pathophysiology, 328–31 Clinical trials, 6–7
analyses for improving, 8–9
Cognitive ability, 9
fundamental problem of, 5, 15
Cognitive behavior therapy (CBT), 15–21
Causal manipulation, 27, 43–44
Cohen, J., 187, 193
Causal relations, mathematics of, 47–62 associational versus causal concept,
Colditz, G. A., 207 Cole, D. A., 20
48–51 graphical models, 54–57 nonparametric structural equations, 51–53 symbiosis between counterfactual and graphical methods, 57–61 Causal thinking in developmental disorders, 279–92 for objective psychiatric diagnostic criteria, 321–35 in psychiatry, 66–78 Causal versus associational concept coping with change, 48
Combat stress reaction (CSR), 306 Common variant:common disease model, 256–57, 262. See also Genetic variation Comorbidity, 325, 342, 348. See also Multimorbidity among modern DSM constructs dimensionally, research modeling, 344–45 Conditional ignorability, 59 Conditional probability, 310 Conduct disorder and drug dependence, 72–73 Confounding, 6f, 8, 25, 286
formulating, 48–49
analytic approaches to, 9–12
ramification of, 49–50 untested assumptions and new
bias, 49 and causal effect estimation, 54–57
notation, 50–51 Causation deductive-nomological approach to, 73–74
control of, 56–57 of genotype–environmentally modifiable risk factor–disease associations, 233–35
interventionist model, 73–78
natural, 35–36
mechanistic model approach to, 74
within Mendelian randomization,
reverse, 210, 211 Chassan, J. B., 333
investigating reintroduced, 234–35, 235f, 236f
Chen, L., 222, 235, 236
Construct validity, 41–42. See also Validity
Chesson, A., 330
Context dependency, 36–37
Cheverud, J. M., 215
Controlled direct effects (CDEs), 119–20.
Childhood disintegrative disorder, 265
See also Direct effect
Chlebowski, R. T., 83
Cook, T. D., 8, 27, 37, 41, 43, 44, 289
Chlorpromazine, 328
Coping with change, 48
Cholesterol–coronary heart disease relationship, 221–24
Copy number variant (CNV), 254–55. See also Genetic variation
Christiansen, J., 333
detection, cytogenetics and, 264–65
Index
356 Coronary heart disease (CHD) alcohol consumption and, 216, 218
quasi-experiments for, 286–89 timing of risk exposure, 280–82
cholesterol and, 221–22 C-reactive protein and, 224
Developmental stability, 235–38 Diabetes, 329
hormone-replacement therapy for, 79,
Diagnostic and Statistical Manual of Mental
206–13, 207–9f, 210–12t Correns, C., 217
Disorders (DSM), 71, 265, 297, 323, 340, 347
Costello, E. J., 8–9, 12, 13, 279
Didelez, V., 123–24
Co-twin–control method, 68–73, 69f, 71f
Dimensions, use of, in psychopathology
Counterfactual, 25, 29, 51–54, 103–4
research, 338–49
closest-world, 54 identifiability of causal contrasts in, 104
augmentation of, 347 comorbidity among categories, 342
independence of, 104–5, 111–15, 146,
etiological factors, 348–49
148, 152, 153, 155 structural definition of, 53–54
future inquiry, 347–48 heterogeneity within categories, 342
Covariate selection, 54–56, 55f
reduced statistical power, 340–41
C-reactive protein (CRP)
research comparing dimensional and
and coronary heart disease, 224 Crossing over, 258 Crow, T. J., 323 Curb, J. D., 83 Cytogenetics CNV detection and, 264–65
categorical accounts of psychopathology, 343–44 research modeling comorbidity among modern DSM constructs dimensionally, 344–45 research modeling symptoms within psychopathological categories dimensionally, 345–47
Data and Safety Monitoring Board, 82
Direct effect
Davey Smith, G., 9, 206, 210, 222, 223, 235, 236
controlled, 119–20 identification of, 103
Davis, G. C., 301
principal stratum, 120–22
Dawid, A. P., 105, 107, 118, 119, 123–24, 131, 135, 147 Day, N. E., 280 Deductive-nomological approach to causation, 73–74 Depression, 42, 159–65, 329–30 major, 72, 74–75, 298, 305–7, 309–11, 312t, 313t, 314 Development, defined, 279–80 Developmental disorders, causal thinking in, 279–92 approaches to data analysis for, 291
pure (natural). See Pure direct effect Directed acyclic graph (DAG), 10, 108–10, 108f. See also Graphs causal model associated with, 123–24 pure direct effect, identification of, 122–23 DNA. See also Genetic variation coding, 253 noncoding, 253 Dodd, K. W., 208 Dose–response relationship, 282 Drug dependence, 344
normal development and, 283–85
adaptive treatment strategy for, 179–82,
normal experiments for, 289–90 prevention trials as natural experiments,
180f, 182t best-fitting meta-analysis model for,
290–91
312t, 313t
Index conduct disorder and, 72–73
357
Fitness
factor loadings of, 313t
impact of genetic variation on, 255
genetic effects on, 349 PTSD and, 310, 311
Flint, J., 324 Fluorescence in situ hybridization, 264
Dysthymia factor loadings of, 313t best-fitting meta-analysis model for, 312t, 313t
Fong, G. T., 19 Food and Drug Administration (FDA), 80–81, 328 Frangakis, C. E., 4, 7, 121 Freedland, K. E., 15 Freedman, R., 324
Effect of treatment on the treated (ETT), 118–19, 144–49, 147f, 148f
Functional magnetic resonance imaging (fMRI), 326, 327
Eisenberg, L., 279 Endogenous variable, 52 Endophenotypes conceptual problems with, 324–26 replacing syndromes with, 324 Enuresis, 333 Ervin, B., 207 Esophageal cancer alcohol consumption and, 219, 223f Exchangeability, 30, 32, 33, 35, 40, 106 Exogenous variable, 52 Expectation-maximization algorithm, 164 Exposure propensity, 216–20, 220–23f External validity, 42–43. See also Validity
Gastric acidity, 329 Gene by environment interactions, 237 Mendelian randomization and, 230–32. See also Mendelian randomization Geneletti, S., 107, 119, 123–24, 135 Generalized anxiety disorder best-fitting meta-analysis model for, 312t, 313t factor loadings of, 313t Genes and fine mapping, 258, 267 multifunction of, 233–34 Genetic approaches to causal identification, 329
Færgeman, O., 222 Familial hypercholesterolemia and coronary heart disease, 221–24 Fetal origins hypothesis, 284 Finest Fully Randomized Causally Interpretable Structured Tree Graph (FFRCISTG), 105, 110–11. See also Graphical causal models bounds on pure direct effect under, 153–54 counterfactual independence conditions for, 113 data-generating process leading to, 149–53, 150f, 152f, 153f interventions restricted to a subset, 155–56 path specific effects, 140–41
syndromal heterogeneity, 323–24 Genetic association, 260–64 candidate, 261 genomewide, 261–62 Genetic buffering, 237 Genetic epidemiology Mendelian randomization and, 241–43. See also Mendelian randomization Genetic redundancy, 236 Genetic variation, 217–18, 252, 253–56 common variant:common disease model, 256–57 copy number variation, 254–55 between gDNA and phenotypes, 67 impact on fitness, 255 rare variant:common disease hypothesis, 257
Index
358 short tandem repeat, 254 single-nucleotide polymorphisms. See Single-nucleotide polymorphisms Genocopy, 214
Genomewide association studies (GWASs), 261–62 Genomewide linkage scan, 257
medication group analysis, 168–77, 170f, 171f placebo group analysis, 165–67, 166–68f, 171–77, 172f, 174f, 176f postrandomization time points analysis, 169–71, 175–77
Genotype–phenotype relationship, 259–60
Gu, J., 232
Germline DNA (gDNA)
Guze, S. B., 300
and phenotypes, causal relationship between variation in, 66–68 g-Functional, 115–17 density, 115–16 Gilbertson, M. W., 307, 308 Gill, R. D., 117
Hafeman, D. M., 18, 42, 108 Halpern, J., 58 Hamilton Depression Rating Scale, 159, 161
Giovannucci, E., 207
Harbord, R., 222, 235, 236
Glymour, M. M., 32
Harris, T. O., 281
Goldschmidt, R. B., 214
Hart, C., 210
Goodwin, D. W., 300 Gottesman, I. I., 324
Health Professionals Follow-Up Study, 206
Gould, T, D., 324
Heart Protection Study, 208
Gram, L. F., 333
Heckerman, D., 118
Graphical causal models, 108–19
Heller syndrome. See Childhood
agnostic causal models, 117
disintegrative disorder
and algebra, combining, 60–61
Helzer, J. E., 304
directed acyclic, 108–10, 108f. See also
Hendrix, S. L., 83 Herna´n, M., 7, 11, 12, 26–28, 291 Heterogeneity, 342
Directed acyclic graph FFRCISTG causal models, 110–11, 113
allelic, 255
g-functional, 115–17
locus, 255, 259
interventions restricted to a subset of
remission, 333–34
variables, 118 minimal counterfactual models, 112, 113–14
syndromal, 323–24 Hill, A. B., 77 Hobbs, H., 263, 264
manipulable contrasts and parameters, 118–19
Holland, P. W., 4, 15, 29–30, 50, 51, 60, 339
non-parametric structural equation
Homologous recombination, 258
models, 114–15 Great Smoky Mountains study, 9, 12, 282 Greenland, S., 26–28, 37, 38, 40, 55. 60, 62, 104, 109, 120, 131, 136, 291
Honkanen, R., 218 Hormone-replacement therapy (HRT) for coronary heart disease, 206–13,
162f all time points analysis, 169, 173
207–9f, 210–12t Hoven, C. W., 287 Hsia, J., 83 Hughes, J., 207
estimation and model choice, 164–65
Huntington disease, 323
Growth mixture modeling (GMM), 159–77,
Index Illness, 321–22. See also Syndromes components of, 321
359
Katan, M. B., 218 Kaufman, J. S., 154
Imai, K., 108 Imbens, G., 60
Kaufman, S., 154 Keele, L., 108
Imipramine, 333
Kendler, K. S., 66, 307, 309, 312, 348, 349
Independence, 49
Kenny, D. A., 16–21, 42
conditional, 130, 132–33, 156
Kessler, R. C., 302
of counterfactuals, 111–15, 146, 148, 152,
Klein, D. F., 324
153, 155
Kojima, S., 220, 221
Inheritance, 67
Kraemer, H. C., 20, 341
Instrumental variables approach, Mendelian randomization as, 229–30,
Krueger, R. F., 312, 338, 345
230f, 231f. See also Mendelian randomization ‘‘Insufficient but necessary components
Laan, M. van der, 108 Latent class model, 344
of unnecessary but sufficient’’
Latent trait model, 344, 345
(INUS), 38, 54
Law of independent assortment, 217
Integrated counterfactual approach (ICA)
Lenz, W., 214
causal explanation, 41–43 causal identification, 37–41, 38f
Leuchter, A. F., 159 Levins, R., 26, 43
causal manipulation, effects of, 43–44
Lewis, D., 54
distinguished from potential outcomes
Lewis, S., 222, 223, 235, 236
model, 39t Intelligence and posttraumatic stress
Li, R., 239 Liability-spectrum model, 345
disorder, 307–8. See also Posttraumatic
Lifton, R., 263, 264
stress disorder
Linear equation, 52
Intensive design, 333 combining objective measures with, 334 Intent to treat (ITT), 7, 21, 83 Intermediate phenotypes, 221–24. See also Phenotype Intervention model (IM), 54, 66, 73–78 relationship between mechanistic causal models and, 76–77
Linkage disequilibrium, 233 Linkage studies, 257–60 nonparametric, 259 parametric, 258–59, 267 Lipp, H. P., 239 Lipsey, T. L., 304 Lober, M., 333 Locus heterogeneity, 255, 259. See also Heterogeneity Lumping, 332–33
Inverse probability weighting, 11 Item response theory (IRT), 345–46 Iwai, N., 220, 221
Mackie, J. L., 37–38, 40, 41, 54 MacLehose, R. F., 154 Major depression, 72, 74–75, 298, 305–7,
Jackson, R. D., 83
309–11, 314. See also Depression
Jamain, S., 267 Jorgensen, O. S., 333
best-fitting meta-analysis model for, 312t, 313t
Judd, C. M., 16, 19
factor loadings of, 313t
Index
360 PTSD and, 310, 311
genetic epidemiology and, 241–43
Manic–depressive disorder, 325
as instrumental variables approach,
Manipulable contrasts/parameters relative to a graph G, 118–19, 144–49,
229–30, 230f, 231f intermediate phenotypes, 221–24
147f, 148f Manson, J. E., 83 Markon, K. E., 312, 345 Maternal genotype, as indicator of intrauterine environment, 225–27, 226f, 237–38. See also Genotype Mathematics of causal relations, 47–62 associational versus causal concept, 48–51 graphical models, 54–57 nonparametric structural equations, 51–53 symbiosis between counterfactual and graphical methods, 57–61
maternal genotype as indicator of intrauterine environment, 225–27, 226f, 237–38 problems and limitations of, 232–40 study findings, implications of, 227–28 Mendel’s law, 217 Menopausal hormone therapy, 79–97 background, 80–81 estrogen plus progestin therapy for, 79–80, 82, 83–84, 83t estrogen therapy for, 83–84, 83t hypothesized effects of, 82–84, 83t observational studies of, 80–81, 84–91, 86t, 87–90f
Maxwell, S. E., 20 McFarlane, A. C., 301
randomized clinical trial for, 84–91, 86t, 87–90f, 93–97, 94f, 95f
Mechanistic model approach to
randomized clinical trial with
causation, 74 Mediation analysis, 4, 15–21, 16f, 17f, 19f, 41, 60, 291 Meehl, P. E., 343 Meiotic randomization, in animal
observational studies for, 91–93 WHI trial design, 81–82 Millen, A. E., 208 Minimal Counterfactual Model (MCM), 105, 112
studies, 239 Melancholia, 333
counterfactual independence conditions for, 113–14
Mendel, G., 217
effect of treatment on the treated,
Mendelian randomization, 9, 213. See also Genetics
145–49, 147f, 148f path specific effects, 140
in animal studies, 237, 239–40
Minor allele frequency, 254
canalization and developmental
Missing data, 7, 8, 164, 166
stability, 235–40
model, 51, 57
comparison with randomized controlled trials, 228–29, 228f
problem, 29 Moderation of causal effects, 21
confounding, 233–35 exposure propensity, 216–20
Moffitt, T. E., 282 Mohr, D., 20
failure to establish reliable
Monoamine oxidase inhibitors, 330
genotype–disease associations, 233 failure to establish reliable genotype–intermediate phenotype associations, 233 gene by environment interactions and, 230–32
Morgan, T. H., 242 Mplus program, 164 Multimorbidity, 342. See also Comorbidity Multiple regression analysis, 331 Munafo, M. R., 324 Murphy, S. A., 179, 183, 185, 199, 201
Index Mutation, 254 Muthe´n, B., 159, 164, 177, 344 Myers, J., 312
361 syndromes with endophenotypes, replacing, 324
Observational epidemiology, limits of, 206–13, 207–9f, 210–12t Obsessive–compulsive personality disorder
Nagel, E., 279 National Institute of Mental Health (NIMH), 332 National Institutes of Health (NIH), 334
(OCPD), 342 Offord, D. R., 282 O’Mahony, S. M., 5 Ozer, E. J., 304
Natural confounding, 35–36. See also Confounding Natural direct effect. See Pure direct effect Natural experiments, 289–90 prevention trials as, 290–91 Natural response, 301
Panic disorder, 325–26 best-fitting meta-analysis model for, 312t, 313t factor loadings of, 313t
Neale, M. C., 312
Parametric growth mixture models, 13, 15
NEO Personality Inventory, 309
Parametric linkage, 258
Neuropsychiatric disorders, 252–71
Paroxetine, 330
Neuroticism, 9 genetic control of, 308
Pasteur, L., 322 Path diagram, 52
5-HTTLPR and, 308–9 and posttraumatic stress disorder, 308–9 Nonadherence, 6 Noncompliance, 61 Nonexperimental observational studies, 6f, 8 Non-parametric structural equation model (NPSEM), 105–6, 114–15 refutationist critique of, 106 Nonsteroidal anti-inflammatory drugs (NSAIDs), 281
Pathophysiology, 326–31 import of remission, 331 normalizing effects of clinical psychopharmacology on, 328–31 objective measures with intensive design, combining, 334 therapeutics as guide to, 327–31 Path-specific effects, 137–43, 138f Pearl, J., 4, 10, 47, 49, 51, 53, 55–62, 76, 105–8, 114, 120, 122, 125, 128–35, 137, 138, 140–45, 339
Norris, F. H., 302
Peritraumatic responses, 304
Nurses’ Health Study, 206
Petersen, M., 108 Pharmacological dissection, 331–34
Objective psychiatric diagnostic criteria, causal thinking for, 321–35 brain imaging, 326–27 conceptual problems with endophenotypes, 324–26
lumping, 332–33 splitting, 332–33 Phenocopy, 214 Phenotype and germline DNA, causal relationship between variation in, 66–68
ideal diagnostic entity, search for, 322–27
intermediate, 221–24
pathophysiology, 326–31
relationship between genotype and,
pharmacological dissection, vicissitudes of, 331–34
259–60 Picciano, M. F., 207
syndromal heterogeneity, 323–24
Placebo response, 160
Index
362 Pleiotropy, 233–34
Psychiatry, causal thinking in, 66–78
Polymorphism. See Single-nucleotide
Psychopathology
polymorphism Posttraumatic stress disorder (PTSD), 297–15. See also Trauma comorbidity of, 309–14, 310t, 312f, 313f, 314t conditional probability of, 301–3, 302t, 303t
categorical. See Categorical psychopathology dichotomous, 341 dimensional. See Psychopathology research with dimensions integrating causal analysis into, 3–22 polythetic-categorical, 342
defined, 298
and puberty, relationship between, 285
factor loadings of, 313t inner logic of, 298–00
status of, 341 subclinical, 347
intelligence and, 307–8 neuroticism and, 308–9 prevalence of exposure, 301, 302t prior trauma as risk factor, 304–7, 305t, 306t
Puberty and psychopathology, relationship between, 285 Pure direct effect (PDE), 104, 105, 120. See also Direct effect determinism, role of, 131–32
research on risk factors, 303–4
extended causal model, role of, 130–31
risk factors, problem of, 300–303 risk of exposure to, 311t
identification of, 122–27, 124f interventional interpretation of,
Potential outcomes model, 4, 25 causal effect as comparison between hypothetical exposure condition, 29–30 distinguished from integrated counterfactual approach, 39t exposure conditions, 28 formulating assumptions, 57–59
135–37, 136f manipulation, effects of, 128–37 with measured common cause of Z and Y, 124–27, 124f, 134 need for conditioning on events of probability zero, 132–34 substantive motivation for, 128–35, 129f
graphs and algebra, combining, 60–61 history and principles of, 27–32 language of, 57–61 limitations of, 32–37 performing inferences, 59–60 response types, 28 Prentice, R. L., 79, 86, 87, 89, 93, 94, 97 Prescott, C. A., 70, 307, 309, 312
Quasi-experimental design, 8–9, 286–89 dose–response measures of exposure to an event, 287–88 sample compared postevent to a population norm, 286–87 Quitkin, F. M., 172
Prevention trials, as natural experiments, 290–91 Principal stratum direct effects (PSDEs), 120–22. See also Direct effect Probabilistically equivalent, 5
Radimer, K., 207 Randomized controlled trials (RCTs), 206–8 comparison with Mendelian randomization, 228–29, 228f
Probability calculus, 49–50
Rapoport, J. L., 330, 333
Propensity scores, 10–11, 61 Psychiatric disorders, causes of
Rare variant:common disease hypothesis, 257
timing of risk exposure and, 280–82
autism-spectrum disorders and, 268–70
Index new technology and, 270–71 Regression dilution bias, 213. See also Bias Relapses, 325 Remission, 325–26 heterogeneity, refining, 333–34 import of, 331 Resnick, H. S., 302 Responder class, 160
363
Selective serotonin reuptake inhibitor (SSRI), 330, 332 Sequenced Treatment Alternatives to Relieve Depression (STAR*D), 183 Sequential multiple assignment trial (SMART), 179, 183–204, 184f sample size calculations, 186–92, 187t, 188t, 190t, 202–4
Response to treatment, 160
sample size formulae, 199–202
Rett syndrome, 265
simulation design, for sample size
Reverse causation, 210, 211. See also Causation
formulae evaluation, 192–93 robustness for new sample size
Richardson, T. S., 103, 107, 108, 122, 119, 135, 146, 147, 154 Rimm, E. B., 207 Risk exposure, timing of, 280–82 age at first exposure, 281
formulae, 194–97, 194–97t test statistics, 185–86, 186t Serotonin transporter promoter polymorphism (5-HTTLPR) and neuroticism, 308–9
in developmental psychopathology, 281
Shachter, R. D., 118
in dose–response relationship, 282 to poverty, 282
Shadish, W. R., 8, 27, 41, 44, 289 Short tandem repeat (STR), 254
to a protective factor, 281, 282
Shpitser, I., 108, 125, 140–43
Risk-taking, 9 Ritenbaugh, C., 83 Robins, J. M., 4, 11, 12, 26, 28, 40, 44, 54, 55, 58, 60, 61, 103–5, 107, 108, 115, 117–22, 131, 135, 146, 147, 149, 154,
Simple phobia best-fitting meta-analysis model for, 312t, 313t factor loadings of, 313t Single-nucleotide polymorphisms (SNPs),
155, 200, 291 Robins, L. N., ix
253, 254 Sinisi, S., 108
Robustness analysis, 194–97, 194–97t
Slade, T., 312
Rosenzweig, P., 330
Smoller, J. W., 323, 324
Rothman, K. J., 38, 41, 54
Sobel, M., 60
Rotnitzky, A., 122
Social phobia
Rubin, D. B., 4, 5, 7, 10, 25, 28–31, 44, 60, 121 Rubin’s causal model. See Potential outcomes model Rutter, M., 282
best-fitting meta-analysis model for, 312t, 313t factor loadings of, 313t Spencer, S. J., 19 Spirtes, P., 108, 122 Splitting, 332–33 Stable unit treatment value (SUTVA), 5
Sameroff, A., 282 Schizophrenia, 38, 330–31 Schork, N. J., 324
Stable unit treatment value assumption (SUTVA), 5, 30–32, 39t, 41, 43, 44 interference between units, 34–35
Sebat, J., 269 Seifer, R., 282
stable treatment effect, 33–34 Stampfer, M. J., 207
Selection bias, 25. See also Bias
Stanovich, K. E., 338
Index
364 State, M. W., 252 Stefanick, M. L., 83
VanderWeele, T. J., 107, 108, 119, 135,
Stein, M. B., 302 Straumanis, J., 330
146, 147 Vansteelandt, S., 122
Structural equation modeling, 68
Villet, R., 214
Subar, A. F., 208
Virchow, R., 322
Subclinical psychopathology, 347. See also
Virginia Adult Twin Study of
Psychopathology Substance-use disorder
Psychiatric and Substance Use Disorders, 71
PTSD and, 310, 311 Sufficient cause, 38 Sufficient set, 55
Waddington, C. H., 248
Swanson, C., 207
Wagner, P. D., 251
Sydenham, T., 322, 325
Wassertheil-Smoller, S., 83
Syndromal heterogeneity, 323–24. See also
Watson, D., 312, 313
Heterogeneityby pharmacological dissection, reducing, 331–34 Syndromes, 322. See also Illness
Wechsler Intelligence Scale for Children–Revised, 307 Weiss, D. S., 304
diagnosis of, 322–23 with endophenotypes, replacing, 324
Weiss, R. D., 179 Weiss, W. M., 241
phenotypically homogeneous, 328
Willett, W. C., 207 Williams, R. S., 251 Women’s Health Initiative (WHI),
Takagi, S., 220–22, 231, 236 Taxometric methods, 343–44 Temporal patterns, of causal processes,
79, 80, 83 menopausal hormone therapy trial design, 81–82
12–15, 14f Terwilliger, J. D., 241
Steering Committee, 83 trial findings, 82–84, 83
Thirion, B., 326
Writing Group for the Women’s
Toh, S., 7, 11 Translational research, 327
Health Initiative Investigators, 83
Transmission tests, 260
Woodward, J., 54, 66
Trauma. See also Posttraumatic stress
Wright, S., 51–52, 62
disorder risk of exposure to, 311t Tsuang, M. T., 323, 324
Yamamoto, T., 108
Twins as natural experiment.
Yamauchi, R., 220, 221
See Co-twin–control method
Yasuno, S., 220, 221 Yehuda, R., 301
Validity construct, 41–42
Zanna, M. P., 19
external, 42–43
Zuckerkandl, E., 214