Screening for Depression in Clinical Practice
This material is not intended to be, and should not be considered, a su...
35 downloads
1685 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Screening for Depression in Clinical Practice
This material is not intended to be, and should not be considered, a substitute for medical or other professional advice. Treatment for the conditions described in this material is highly dependent on the individual circumstances. While this material is designed to offer accurate information with respect to the subject matter covered and to be current as of the time it was written, research and knowledge about medical and health issues is constantly evolving, and dose schedules for medications are being revised continually, with new side effects recognized and accounted for regularly. Readers must therefore always check the product information and clinical procedures with the most up-to-date published product information and data sheets provided by the manufacturers and the most recent codes of conduct and safety regulation. Oxford University Press and the authors make no representations or warranties to readers, express or implied, as to the accuracy or completeness of this material, including without limitation that they make no representations or warranties as to the accuracy or efficacy of the drug dosages mentioned in the material. The authors and the publishers do not accept, and expressly disclaim, any responsibility for any liability, loss, or risk that may be claimed or incurred as a consequence of the use and/or application of any of the contents of this material.
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE An Evidence-Based Guide
ALEX J. MITCHELL, MRCPsych Consultant and Honorary Senior Lecturer, Department of Liaison Psychiatry, Leicester General Hospital and University of Leicester, UK
JAMES C. COYNE, PhD Professor of Psychology, Department of Psychiatry, University of Pennsylvania Health System
1
2010
1 Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright 2010 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Mitchell, Alex J. Screening for depression in clinical practice: an evidence-based guide / by Alex J. Mitchell, James C. Coyne. p. ; cm. Includes bibliographical references and index. ISBN 978-0-19-538019-4 1. Depression, Mental—Diagnosis. 2. Primary care (Medicine) I. Coyne, James C., 1947– II. Title. [DNLM: 1. Depressive Disorder—diagnosis. 2. Primary Health Care. WM 171 C881s 2009] RC537.M5625 2009 616.850 27075—dc22 2009007863 9
8
7 6
5
4
3
2 1
Printed in the United States of America on acid-free paper
Contents
List of Contributors, xi Preface, xv Wayne Katon
1. Is the Syndrome of Depression a Valid Concept?, 3 Alex J. Mitchell and Mark Zimmerman What is Meant by Depression?, 3 Value and Validity of the Syndrome Concept, 7 Diagnostic Checklists (including DSM and ICD), 10 Unstructured (Unassisted) Clinician Diagnosis, 15 Structured and Semi-Structured Assisted Diagnostic Interviews, 19 Conclusion, 22 References, 24
2. Overview of Depression Scales and Tools, 29 Alex J. Mitchell Background, 29 The Classic Severity Scales (1960–1980), 36 The New Severity Scales (1981–2008), 39 The Future of Screening Scales, 44 References, 51
3. Why Do Clinicians Have Difficulty Detecting Depression?, 57 Alex J. Mitchell Introduction to the Problem of Over- and Under-Detection, 57 Predictors of Detection, 62 v
vi
CONTENTS
Patient and Clinician Influences on Detection, 66 Illness-Related Influences on Detection, 71 Conclusions, 74 References, 75
4. How Can Existing Mood Scales Be Improved? How to Test, Refine, and Improve Existing Scales, 83 Adam B. Smith Introduction, 83 The Rasch Model and Other Item Response Models, 86 Conclusion, 95 References, 96
5. How Do We Know When a Screening Test is Clinically Useful?, 99 Alex J. Mitchell How Do Clinicians Make a Diagnosis?, 99 Scientific Aspects of Diagnostic Accuracy, 103 Clinical Aspects of Diagnostic Accuracy, 105 Testing Screening via Implementation Studies, 109 Conclusions, 111 References, 111
6. Clinical Judgment and the Influence of Screening on Decision Making, 113 Howard N. Garb Introduction, 113 Research on Clinical Judgment, 114 The Limits of Screening, 119 References, 120
7. Implementing Screening as Part of Enhanced Care: Screening Alone is Not Enough, 123 Simon Gilbody and Dan Beck The Case for Screening, 123 Screening and Enhanced Care for Depression, 128 New and Additional Evidence Relating to Enhanced Care, 128 Is Screening a Necessary Intervention to Improve the Quality and Outcome of Care?, 129 To Screen or Not to Screen?, 136 References, 137
CONTENTS
8. Technological Approaches to Screening and Case Finding for Depression, 143 William H. Rogers, Debra Lerner, and David A. Adler Technological Methods of Screening for Depression, 144 Ten Issues When Developing Computerized Screening for Depression, 147 Examples of Implementation of Computerized Screening for Depression, 150 Discussion, 153 Conclusion, 154 References, 154
9. Screening for Depression in Primary Care: Can It Become More Efficient?, 161 Kathryn M. Magruder and Derik E. Yeager Introduction, 161 Epidemiology of Depression in Primary Care, 162 Is Screening for Depression in Primary Care Worthwhile?, 165 Which Screening Tool Should Be Used?, 169 Implementing Screening in Primary Care, 178 What Developments Are on the Horizon?, 183 Conclusions, 185 References, 185
10. Screening for Depression in Medical Settings: Are Specific Scales Useful?, 191 Gordon Parker and Matthew Hyett An Introductory Logic, 191 Depression in the Medically Ill, 192 ‘‘False-Positive’’ Depression Reflecting Confounding by Physical Symptoms Associated with Medical Illness, 193 Screening Measures Used to Assess Depression in the Medically Ill, 194 Discussion, 198 References, 199
11. Screening for Depression in Medical Settings: The Case Against Specific Scales, 203 Fariba Babaei and Alex J. Mitchell Overview of Depression in Physical Disease, 203 Defining Somatic Symptoms, 205
vii
viii
CONTENTS
Diagnostic Accuracy of Somatic Symptoms in Depression, 209 Evidence For and Against Somatic Symptoms when Diagnosing Comorbid Depression, 211 Implications for Screening, 217 References, 236
12.
Screening for Depression in Neurologic Disorders, 241 Andres M. Kanner Depression in Stroke, 242 Depression in Multiple Sclerosis, 246 Depression in Epilepsy, 249 Depression in Parkinson’s Disease, 255 Conclusions, 258 References, 258
13. Screening for Depression in Cancer Care, 265 Linda E. Carlson, Sheena K. Clifford, Shannon L. Groff, Olga Maciejewski, and Barry D. Bultz Prevalence of Depression in Cancer Care, 265 Screening Methods for Depression, 266 Screening for Depression in Oncology, 267 Implementing Screening Programs in Oncology Settings, 276 Special Issues in Screening Cancer Patients, 292 Summary, Integration, Future Directions, 293 Acknowledgments, 294 References, 295
14. Screening for Depression in Perinatal Settings, 299 Jodi Barton and Philip Boyce Introduction: Perinatal Screening in Context, 299 Why Screen, and What Are We Screening For?, 301 Screening Practices in Perinatal Settings, 303 Screening Guidelines and Recommendations, 304 Evidence-Based Comparison of Screening Methods, 305 Implementation in Practice: Does Screening Make any Real-World Difference?, 310 Service Delivery and Treatment Implications, 311 Summary and Key Recommendations, 313 References, 314
CONTENTS
15. Screening in Cardiovascular Care, 317 Brett D. Thombs and Roy C. Ziegelstein Depression in Cardiovascular Disease, 318 The Prevalence of Depression in Cardiovascular Disease, 319 Screening Instruments for Depression in Cardiovascular Care, 320 Recommendations for Evaluation and Treatment of Patients in Cardiovascular Care, 326 Conclusions, 328 References, 329
16. Screening in Diabetes Care: Detecting and Managing Depression in Diabetes, 335 Norbert Hermanns and Bernhard Kulzer Depression in Diabetes is a Major Health Problem, 337 Screening Tests, 340 Treatment Options, 343 Screening Program, 344 Conclusions for Clinical Practice, 345 References, 346
17. Commentary and Integration: Is it Time to Routinely Screen for Depression in Clinical Practice?, 349 James C. Coyne Integration: Deflating the Puffer Phenomenon and Making the Case Against Screening, 364 References, 366
Appendix, 371 Index, 385
ix
This page intentionally left blank
List of Contributors
David Adler, Professor of Psychiatry and Medicine, Tufts University School of Medicine, and Senior Psychiatrist, Department of Psychiatry and ICRHPS, Tufts Medical Center Fariba Babaei, Specialist Trainee in Psychiatry, Lincolnshire Partnership Trust, Grantham, UK Jodi Barton, Research Co-ordinator, Westmead Perinatal Psychiatry & Clinical Research Unit, Westmead Hospital Dan Beck, Research Fellow, Department of Health Sciences, University of York, UK Philip Boyce, Professor of Psychiatry, Department of Psychological Medicine, University of Sydney, Westmead Hospital Barry D. Bultz, Director, Department of Psychosocial Resources, Tom Baker Cancer Centre, and Head and Adjunct Professor, Division of Psychosocial Oncology, Department of Oncology, Faculty of Medicine, University of Calgary, Calgary, Alberta, Canada Linda E. Carlson, Enbridge Research Chair in Psychosocial Oncology, Associate Professor, Division of Psychosocial Oncology, Department of Oncology, Faculty of Medicine, University of Calgary, and Clinical Psychologist, Tom Baker Cancer Centre, Calgary, Alberta, Canada Sheena K. Clifford, Department of Psychosocial Resources, Tom Baker Cancer Centre, Alberta Cancer Board/Alberta Health Services, Calgary, Alberta, Canada
xi
xii
LIST OF CONTRIBUTORS
James C. Coyne, Director, Behavioral Oncology Program, Abramson Cancer Center, and Professor of Psychology, Department of Psychiatry, University of Pennsylvania School of Medicine Howard N. Garb, Lackland Air Force Base Simon Gilbody, Professor of Psychological Medicine and Health Services Research, Department of Health Sciences, University of York, UK Shannon L. Groff, Department of Psychosocial Resources, Tom Baker Cancer Centre, Alberta Cancer Board Norbert Hermanns, Head of the Research Institute of the Diabetes Academy Mergentheim Matthew Hyett, Research Assistant, Black Dog Institute, Sydney, Australia Andres M. Kanner, Department of Neurological Sciences, Rush Medical College, Rush Epilepsy Center, Rush University Medical Center, Chicago, IL Wayne Katon, Professor and Vice Chair of Psychiatry and Behavioral Sciences, Director of Division of Health Services and Epidemiology, University of Washington Medical School, Seattle, WA Bernhard Kulzer, Head of the Psychosocial Department of the Diabetes Centre Mergentheim Debra Lerner, Associate Professor of Medicine and Psychiatry, Tufts University School of Medicine (TUSM), and Senior Researcher, ICRHPS, Tufts Medical Center. Olga Maciejewski, Department of Psychosocial Resources, Tom Baker Cancer Centre, Alberta Cancer Board/Alberta Health Services, Calgary, Alberta, Canada Kathryn M. Magruder, Veterans Administration Medical Center, Charleston, SC, and Department of Psychiatry and Behavioral Sciences, Medical University of South Carolina, Charleston, SC Alex J. Mitchell, Consultant in Liaison Psychiatry, Leicester General Hospital, Leicester, and Honorary Senior Lecturer in Liaison Psychiatry, Department of Cancer & Molecular Medicine, Leicester Royal Infirmary, UK Gordon Parker, Scientia Professor, School of Psychiatry, University of New South Wales, Sydney, Australia, Executive Director, Black Dog Institute
LIST OF CONTRIBUTORS
xiii
William Rogers, Senior Statistician, Institute of Clinical Research and Health Policy Studies (ICRHPS), Tufts Medical Center Adam B. Smith, Lecturer in Quantitative Methods, Centre for Health and Social Care, Leeds Institute of Health Sciences, University of Leeds, UK. Brett D. Thombs, Department of Psychiatry, McGill University and Jewish General Hospital, Montreal, Quebec Derik E. Yeager, Department of Biometry, Biostatistics, and Epidemiology, Medical University of South Carolina, Charleston, SC Roy C. Ziegelstein, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD Mark Zimmerman, Department of Psychiatry and Human Behavior, Brown University School of Medicine, Rhode Island Hospital, Providence, RI
This page intentionally left blank
Preface
Researchers became interested in screening patients for depression in primary care in the early 1980s because of evidence of poor recognition of depression by primary care physicians and gaps in adequacy of treatment.1 Because of extensive epidemiologic research as well as the development of antidepressant medications that have fewer side effects and evidence-based brief therapies, recognition rates of depression by primary care physicians have improved over the past two decades, with recent studies suggesting that as many as 50% to 65% of patients are accurately diagnosed.2 Most studies also show that greater severity of depression and increased functional impairment are associated with higher rates of recognition.3 A study by Rost and colleagues that examined recognition rates over a 6-month period rather than for just one visit also found higher rates of accurate diagnosis by primary care physicians.4 This latter study is important because primary care physicians often make diagnoses over time as they work up patients over several visits. Studies have also shown that a much higher percentage of patients in primary care are exposed to antidepressant medications compared to two decades ago.5 However, there are many remaining gaps in the quality of care for depression in primary care: only 20% of patients receive the Health Employer Data and Information Set (HEDIS)-recommended three or more visits in the first 90 days after starting an antidepressant and only 40% to 50% remain on medication at 6 months.6 Over the past 20 years (from the tricyclic era to the selective serotonin reuptake inhibitor era), studies consistently report that only 40% of patients started on antidepressants for major depression recover (a greater than 50% decrease in symptoms) by 4 to 6 months.7 Less than 10% of patients with major depression in primary care receive evidence-based psychotherapy.5 There is clearly room for improvement of quality of care in patients with major depression from screening to improved detection, to healthcare models that provide enhanced exposure to evidence-based treatments. xv
xvi
PREFACE
One of the unexpected findings of increased interest by primary care physicians in the detection and treatment of patients with depression is that approximately half of patients started on medication for depression actually meet DSM-IV criteria for minor depression.8 This is important because antidepressant-versus-placebo trials have generally shown high rates of placebo response in patients with minor depression and lack of active drug-versusplacebo differences.9 Screening for depression may actually increase the number of patients with minor depression who are potentially treated because many patients cluster around the DSM-IV diagnostic threshold and, depending on the stressful life events of the past few days, may or may not meet criteria for major depressive disorder. Patients with minor depression or adjustment reactions to stressful life events must be distinguished from those with a history of major depression who have significant residual symptoms necessitating active treatment. For patients who have mild major depression, brief counseling, watchful waiting, and rescreening them for depression 2 to 4 weeks later may allow better recognition of whether the patient needs treatment with medication or psychotherapy. If screening of depression is to be integrated into primary care, healthcare organizations are faced with the decision about which screening tool is optimal. Primary care organizations, the American Psychiatric Association, and many research foundations have recommended the use of the Patient Health Questionnaire (PHQ-9) as the optimal depression screening tool in primary care. The PHQ-9 has the advantage of being able to help measure the severity of depression (0 to 27 is the severity range of this tool) and, at a score of above 10, has high sensitivity and specificity compared to structured psychiatric interviews for the diagnosis of major depression.10 The U.S. Preventive Services Task Force recommended routine depression screening in primary care in systems that have been reorganized to provide effective treatment for depression.11 This reflects the fact that studies that tested depression screening alone showed mild to modest improvement in the quality of depression treatment provided, but generally no effect on depression outcomes.12 What do we know about methods to organize care to improve outcomes of depression? Although screening for depression alone has not been shown to improve outcomes, when screening is paired with an organized system of depression care, multiple studies have shown that depression outcomes can be improved.13 The chapter by Gilbody reviews the recent meta-analysis of an intervention called ‘‘collaborative care.’’ A total of 37 randomized trials that compared collaborative versus usual primary care found that collaborative care was associated with a twofold increase in adherence to antidepressant medication and improvements in depression that lasted 2 to 5 years. 13 The key elements of the most successful collaborative care interventions
PREFACE
xvii
included two core components. The first component incorporates a depression care manager who improves patient education and, with telephone and/ or in-person frequent contacts, tracks depressive symptoms, side effects, and adherence to treatment.14 The care manager facilitates return appointments with the primary care doctor or, in some instances, a mental health specialty referral for patients with persistent symptoms, problematic side effects, or poor adherence.14 The second crucial component is supervision of the case manager by a psychiatrist who recommends changes in medication based on clinical response and side effects. Many recent collaborative care trials also have used psychologists’ skills to teach care managers motivational interviewing techniques and brief, evidence-based psychotherapies such as problem-solving therapy.15 In summary, this excellent book summarizes two decades of research on depression screening and quality-improvement efforts in primary care. We now have state-of-the-art depression screening tools, and research studies have shown that pairing depression screening with evidence-based models that enhance exposure to antidepressant medication and evidence-based psychotherapies can markedly improve depression outcomes for patients with major depression. Wayne Katon
References 1. Zung WW, Magill M, Moore JT, et al. Recognition and treatment of depression in a family medicine practice. J Clin Psychiatry. 1983;44:3–6. 2. Katon WJ, Simon G, Russo J, et al. Quality of depression care in a population-based sample of patients with diabetes and major depression. Med Care. 2004;42:1222–1229. 3. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care physicians reconsidered. Gen Hosp Psychiatry. 1995;17:3–12. 4. Rost K, Zhang ML, et al. Persisently poor outcomes of undetected major depression in primary care. Gen Hosp Psychiatry. 1998;20(1):12–20. 5. Olfson M, Marcus SC, Druss B, et al. National trends in the outpatient treatment of depression. JAMA. 2002;287:203–209. 6. Druss BG, Miller CL, Rosenheck RA, et al. Mental health care quality under managed care in the United States: a view from the Health Employer Data and Information Set (HEDIS). Am J Psychiatry. 2002;159:860–862. 7. Simon GE. Evidence review: efficacy and effectiveness of antidepressant treatment in primary care. Gen Hosp Psychiatry. 2002;24:213–224. 8. Katon W, Von Korff M, Lin E, et al. Collaborative management to achieve treatment guidelines. Impact on depression in primary care. JAMA. 1995;273:1026–1031. 9. Barrett JE, Williams JW, Jr., Oxman TE, et al. Treatment of dysthymia and minor depression in primary care: a randomized trial in patients aged 18 to 59 years. J Fam Pract. 2001;50:405–412. 10. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–613.
xviii
PREFACE
11. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2002;136:765–776. 12. Katon W, Gonzales J. A review of randomized trials of psychiatric consultation-liaison studies in primary care. Psychosomatics. 1994;35:268–278. 13. Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a systematic review and cumulative meta-anlysis. Arch Intern Med. 2006;166:2314–2321. 14. Katon W, Unutzer J. Collaborative care models for depression: time to move from evidence to practice. Arch Intern Med. 2006;166:2304–2306. 15. Unu¨tzer J, Katon W, Callahan CM, et al. Collaborative care management of late-life depression in the primary care setting: a randomized controlled trial. JAMA. 2002;288:2836–2845.
Screening for Depression in Clinical Practice
This page intentionally left blank
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT? Alex J. Mitchell and Mark Zimmerman
1. 2. 3. 4. 5. 6.
What is Meant by Depression? Value and Validity of the Syndrome Concept Diagnostic Checklists (including DSM and ICD) Unstructured (Unassisted) Clinician Diagnosis Structured and Semi-Structured Assisted Diagnostic Interviews Conclusion
Context Depression is an everyday term, but if clinical management is to be empirically based, there needs to be a valid and reliable definition of the disorder that is distinct from normal sadness. The validity of the concept and all studies of screening for depression are hampered by the absence of a gold standard. Nevertheless, various thorough methods of assessment may help to improve the clinical utility of our concept of depression.
1.
What is Meant by Depression?
This book is built around the premise that major depressive disorder (MDD) exists in a way that is recognizable time and again by clinicians around the world. Considerable effort has been expended in developing and refining methods to measure depression. This chapter takes a step back and asks whether this effort is built upon a solid foundation. This begins with an important question: What is the purpose of making a meaningful diagnosis in any field of medicine? We suggest it is primarily to gain consensus and 3
4
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 1.1. Levels of Diagnostic Certainty in Psychiatry Highest Externally validated by ‘‘perfect’’ biological test High Consensus expert panel performing longitudinal evaluation using all possible data Medium to High Structured or semi-structured interview performed by a trained interviewer or clinician Low to High Severity questionnaires rated by the patient or clinician Low to Medium Unstructured, unassisted interview performed by an interested clinician Low Unstructured, unassisted interview performed by an inexperienced (or uninterested) clinician
knowledge that may help individuals and populations who have healthrelated ‘‘meetable unmet needs.’’ A medical diagnosis (spurious or not) has several other benefits (Textbox 1.1). It facilitates agreement with colleagues, it lends confidence to patients, it adds legitimacy to treatments, and it may allow the development of targeted interventions. Because many conditions can be successfully treated without knowing the true etiology or the precise diagnosis, the lack of gold standard should not be a cause of therapeutic nihilism. Consider neurologists attempting to treat a midlife inherited chorea in 1862. Meticulous clinical method could bring some success despite the absence of a name and a description for another 10 years and the absence of a known etiology for another 110 years. Although many early treatments were based largely on placebo effects or environmental manipulation, once a definitive cause is found and the pathophysiologic mechanism is revealed, the potential for treatment becomes vast, whereas once it was small. Yet there is an even more fundamental issue. Kraepelin believed the major psychiatric disorders were ‘‘natural disease entities’’ simply awaiting a discovery of a specific medical cause. After intensive effort the search for fundamental causes was resigned and nosology underwritten by internal cohesion of symptoms and signs.1 What if depression has no
5
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
single pathophysiologic explanation and is a complex manifestation of severe external stress?2 Would our concept be invalid and would existing treatment be rendered obsolete overnight? Similarly, if severe stress and mild depression were closely related, then attempting to find a test that separated them would be difficult to the point of impossibility (Fig. 1.1). After many decades of debate, it is not at all clear that depression is a discrete entity and justifies a categorical classification as opposed to a continuum merging with normal healthy but unhappy people.3 In the continuum argument, the distribution of symptoms of depression would theoretically approximate to a skewed normal or half-normal distribution with no point of rarity (Fig. 1.2).4 Cloninger stated that there is no empirical evidence for natural boundaries between major syndromes and that ‘‘no one has ever found a set of symptoms, signs, or tests that separate mental disorders fully into non-overlapping categories.’’5 Yet all current diagnostic systems that include MDD appear to assume there is a distinct syndrome (depressive disorder as distinct from depressive symptoms) and try to suggest an optimal method to identify it (Fig. 1.3). Even if this approach was correct and the current nosology of DSM-IV entirely perfect, there would be a significant danger of over relying on the concept of MDD to exclusion of other under researched forms. In other words, given Point of Partial Rarity Number of Individuals
Normal Stress
Depressed
True –ve
True +ve
False –ve
Score on Hypothetical Diagnostic Test
False +ve
Optimum Cut-off value
Figure 1.1. Hypothetical distribution of test scores in two related conditions. Two distinct conditions should be separated by a point of rarity on at least one fundamental measure (see also Fig. AP.4).
Distribution of HADS Scores in Cancer Outpatients (n=3071) 3000
2500
2000
1500
1000
500
S Se ix ve n Ei gh t N in e Te El n ev Tw en e Th lve Fo irte ur en te en Fi fte e Se Six n ve tee nt n ee n Ei gh te en
ro
Tw Th o re e Fo ur Fi ve
O
Ze
ne
0
16 14 12 10 8 6 4 2
Ze
ro Tw o Fo ur Si x Ei gh t T Tw en Fo elve ur te Si en xt Ei een gh te Tw Tw en e en nt Tw ty- y en two Tw ty-f o Tw ent ur y en -s ty ix -E ig ht Th Th irt irty y Th -Tw irt y- o Th Fou irt r ySi x
0
Figure 1.2. Distribution of HADS scores in cancer outpatients (n ¼ 3,071). This continuous distribution of HADS scores in primary care and secondary (cancer care) illustrates a skewed normal distribution. Data from Thompson et al. Br J Psychiatry. 2001;179:317–323 and Sharpe et al. Br J Cancer. 2004;90:314–320. 6
7
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
Distribution of DSMIV Symptoms of Depression in Zurich Study 100 90 80 70 60 50 40 30 20 10 0
Zero
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
Figure 1.3. Distribution of DSM-IV symptoms from Zurich study. The sample comprised 591 individuals originally selected in 1978 from the total population of 18- and 19-year-olds in Zurich, Switzerland, based on their scores of the Symptom Checklist-90 (SCL-90-R) (Derogatis, 1977). Two thirds of the sample was randomly selected from members of the total population who scored above the 85th percentile on the SCL-90-R, and one third was randomly selected from the remainder of the total population. Reprinted from Journal of Affective Disorders 62, Angst J, Merikangas KR, Multi-dimensional criteria for the diagnosis of depression, 7–15, Copyright (2001).
recent evidence, psychiatrists would be well advised to pay as much attention to minor (mild and syndromal) disorders as diabetologists are now paying to impaired glucose tolerance.6
2.
Value and Validity of the Syndrome Concept
The concept of a syndrome is fundamental to diagnostic classification and may be valuable even if imperfect.7 Without the concept of a syndrome, a disorder would be defined by a single symptom or simple symptom count. A syndrome is a special collection of symptoms that cluster in a peculiar way determined by the underlying pathophysiology, even if that mechanism is unknown. Careful identification of many psychiatric syndromes and their relationships has formed a detailed family of mental disorders not dissimilar to the Linnaean taxonomy proposed by Carl Linnaeus (1707–1778). In defining clinical syndromes, we rely on certain essential or core symptoms occurring commonly in those with the disorder but rarely in those without (Textbox 1.2). By the same token, we often ignore other symptoms that occur without much discrimination. Hence, some symptoms
8
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 1.2. Types of Validity Testing Content validity (Strength: Weak) The degree of measurement of all fields of interest Criterion validity (Strength: Strong) Agreement against a criterion that is external to the measuring instrument itself Construct validity (Strength: Moderate) Agreement with other measures consistent with theoretically derived hypotheses Procedural validity Strength: Weak) Agreement with an existing procedure
are more important diagnostically than others, but without large samples and rigorous examination, it isn’t obvious which ones these are. Further, life is rarely simple and rarely is any symptom both entirely unique to a psychiatric disorder and at the same time always manifest. If it were, then when this particular symptom was absent, we would know the disorder itself was impossible. We would therefore have a single question diagnostic test with perfect specificity (see Chapter 5). In MDD, DSM-IV suggests that the core features involve dysphoria (low mood) and anhedonia (loss of interest), and ICD-10 suggests that fatigue should also be an essential feature.8 In addition to these symptoms, aspects such as clinical significance, duration, disability, and distress have been added as a requirement in many diagnostic categories. We suggest it is no longer sufficient for an expert panel to mandate such features, no matter how logical it seems, because their predictive values will be uncertain until tested. In fact, all aspects of a definition (the symptoms, signs, associated features, and rules binding them together) should be amenable to clarification and empiric testing. If a syndrome is adopted too easily, the concept can become a pitfall, as Kendell and Jablensky explained: ‘‘Once a diagnostic concept such as syndrome has come into general use, it tends to become reified.’’9 In other words, its validity is assumed rather than tested. How, then, can a syndrome be tested and better tests developed? This is discussed in detail in Chapters 4 and 5, but in brief, accuracy is usually determined by validity and reliability. Reliability refers to the extent to which an observation yields the same results on repeated independent assessments. Essentially, this is a measure of consensus between assessors. Validity, derived from the Latin validus, meaning strong, refers to how well
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
9
the instrument measures what it purports to measure (see Textbox 1.2). In essence this is a measure of truth—how much agreement is there with the actual disorder, assuming it could be defined by some criterion reference (or gold standard). In MDD there is no accepted gold standard,10 and therefore reliability and validity testing must be reduced to measures of agreement, where the critical question becomes: How good is the comparison? In medical specialties, aspects of the history such as nature of the chest pain have been subjected to diagnostic validity testing in a similar way to established investigations such as the electrocardiogram.11,12 In psychiatry (outside of organic brain disease), such objective tests are rarely if ever available. Many influences favor the adoption of a medical model in which an etiologic agent, a pathologic process, and symptoms and signs are assumed to be present even if unknown. This is often highly acceptable to patients, clinicians, and other interested parties (eg, the pharmaceutical industry), not least because stigma may be reduced and help-seeking and adherence encouraged. The flip side is that patient responsibility may be diminished and biologic treatments may be overprescribed. If the medical model of depression is correct, then eventually a definitive core disease process underlying depression will be found and a diagnostic test developed that (regardless of convenience) will enable current clinical diagnostic methods to be fully evaluated. If the medical model of depression is incorrect, then a definitive biologic test will never be developed, and we will continue to develop proxies of illness that may nevertheless correspond to important correlates of disorder and suffering, such as treatment response, course, and quality of life. The astute reader will probably conclude that measures of reliability and validity in psychiatry (and by implication diagnosis itself) are essentially all tests of agreement, albeit against different standards. Reliability is agreement with peers, and validity is agreement with an accepted method. As no group has yet found a robust biologic test for depression, most work has focused on attempts to improve the reliability of assessments conducted by researchers and clinicians. Often this involves refinement of the clinical interview using methods that assist the clinician. Semi-structured interviews provide questions that might best elicit symptoms but the clinician retains flexibility to deviate from this if necessary. Structured interviews provide questions that must be asked as described, purposely removing flexibility, with the useful benefit that clinical training is not a prerequisite and large population surveys using lay interviewers becomes possible. One level of assistance to clinicians that does not interfere with the clinical interview is provision of symptom checklists, together with the rules for their combination (Textbox 1.3). This essentially forms the basis of ICD-10 and DSM-IV.
10
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 1.3. Development of Diagnostic Checklists 1972 Feighner, Diagnostic Criteria (FDC): Primary Depression 1978 Spitzer, Research Diagnostic Criteria (RDC): Major Depressive Disorder 1980 Diagnostic and Statistical Manual III: Major Depressive Episode 1987 Diagnostic and Statistical Manual III-R: Major Depressive Episode 1990 ICD-10 International Classification of Diseases: Mild, Moderate, or Severe Depression 2000 Diagnostic and Statistical Manual IV: Major Depressive Episode 2012 ICD11 International Classification of Diseases Diagnostic and Statistical Manual V
3.
Diagnostic Checklists (including DSM and ICD)
Diagnostic checklists are a list of features, together with the rules for making a particular diagnosis. If the criteria are monothetic, then all the items must be present; if polythetic, then only a proportion are required. If features are necessary, then specific features must be present; if sufficient, then only certain criteria but no others are needed. Several checklists that generate one or more systems of psychiatric diagnosis have been proposed (Textbox 1.4).13–15 Checklists leave the clinician to conduct the clinical interview in any way he or she feels appropriate. Advanced systems may use diagnostic algorithms that prioritize certain items and use more complex rules, such as ‘‘if x, then y.’’ DSM and ICD-10 use diagnostic checklists but also include some suggestions for the interview itself. That said, a diagnostic interview defined only by DSM-IV/ICD-10 lacks clearly defined probe questions, requiring clinicians to formulate their own approach. Although this adds to the acceptability, equally it contributes to interrater variability.16 Some consider DSM and ICD distinct from other checklist methods because of the claim that DSM and ICD are operationalized—that is, each and every step is described and subject to unambiguous instructions as well as reliability or
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
11
Textbox 1.4. Checklists for Aiding Psychiatric Diagnosis Lists of Integrated Criteria for the Evaluation of Taxonomy (LICET) LICET-D for depressive disorders assembles all criteria from 9 diagnostic systems. Operational Criteria Checklist (OPCRIT) OPCRIT generates diagnoses of 13 diagnostic systems and has been proposed to generate diagnoses direct from medical notes. ICD-10 Symptom Checklist Developed by Janca; takes about 15 minutes. International Diagnostic Checklists (IDCL) Two 30-item lists, one for ICD-10 and one for DSM-IV.
validity testing. This is probably not the case. Efforts to measure the reliability of DSM-IV have been published.17
ICD and DSM The World Health Organization (WHO) introduced mental disorders in the sixth revision of the International Classification of Diseases (ICD-6) in 1948.18 The American Psychiatric Association Committee on Nomenclature and Statistics published the first edition of the Diagnostic and Statistical Manual: Mental Disorders (DSM-I) in 1952 (see Textbox 1.3).19 Current diagnostic classification manuals (DSM-IV and ICD-10) deliberately do not contain mutually exclusive diagnostic categories; rather, they contain overlapping areas. Indeed, if carefully applied, each diagnostic system yields a different number of cases, as illustrated by Erkinjuntti and colleagues (1997) for dementia20 and Furukawa and associates21 for depression. Of note, agreement between diagnostic systems examined in the same sample is often modest (Table 1.1). It was in the eighth revision of ICD (ICD-8) in 1967 and in the third edition of DSM (DSM-III) in 1980 where a systematic effort to improve the diagnosis and classification of mental disorders was made. Until then, textbooks containing descriptions of individual conditions were the main source of information, but naturally this led to numerous disputes. DSM and ICD go beyond textbook descriptions by providing a checklist of useful criteria and, importantly, suggesting a diagnostic threshold determined by specific symptoms, which usually have to fulfill both frequency (symptom count) and duration criteria. The key difference between a severity questionnaire and an operational method is that certain criteria are required in the latter, whereas
12
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Table 1.1. Clinician Agreement (Kappa) Using Different Diagnostic Systems for Depression
DSM RDC ICD-10 FDC
DSR
DSM
RDC
ICD-10
0.95 0.71 0.71 0.59
0.71 0.7 0.6
0.74 0.77
0.63
Adapted from Philipp M, Delmo CD, Buller R, et al. Differentiation between major and minor depression. Psychopharmacology. 1992;106:S75–S78.
severity questionnaires usually rely on symptom counts alone, without weighting of symptoms (see Appendix Table 1). That said, questionnaires can be constructed to follow the DSM diagnostic algorithm.22 This is not surprising, because most mood questionnaires were proposed by experts based on clinical experience alone, whereas careful field testing is needed to rank important items (see Chapter 4). Given this, it is remarkable that severity questionnaires may perform quite well against structured interviews.
Validation of the DSM-IV/ICD-10 Criteria for Depression The criteria for major depression, minor depression, and dysthymia are shown in Table 1.2. Subsyndromal depression is not currently included in DSM-IV but can be considered present if there are at least two DSM-IV symptoms but the overall criteria for major or minor depression are not met.23 MDD is defined by depressed mood or loss of interest in nearly all activities for at least 2 weeks accompanied by at least three or four (for a total of 5) symptoms. The criteria for minor depression are identically but require two to four
Table 1.2. Diagnostic Categories for Depressive Disorders Diagnostic Category
DSM-IV Criteria
Symptom Duration
Major depression
5 depressive symptoms, including depressed mood or anhedonia, causing significant impairment in social, occupational, or other important areas of functioning 2–4 depressive symptoms, including depressed mood or anhedonia, causing significant impairment in social, occupational, or other important areas of functioning 3 or 4 dysthymic symptoms, including depressed mood, poor appetite or overeating, insomnia or hypersomnia, low energy, low self-esteem, poor concentration or indecisiveness, and hopelessness
2 weeks
Minor depression (research criteria diagnosis) Dysthymia
2 weeks 2 years
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
13
symptoms and require exclusion of previous major depression in an attempt to avoid confusion over residual symptomatology. Dysthymia is characterized by fewer symptoms than major depression (three or four) and a chronic course lasting at least 2 years. In ICD-10 the core symptoms of depression include decreased energy or increased fatigability in addition to low mood and loss of interest. Further, only four symptoms are required for a mild episode, and six (five in early versions) symptoms qualify as moderate depressive episode. Thus, DSM-IV major depression is broadly analogous to the ICD-10 concept of moderate or severe depression. Both ICD and DSM suggest a minimum number of typical and associated symptoms and a minimum duration of symptoms of 2 weeks. In DSM-IV, but not in ICD-10, a third feature is added: that the disorder causes significant impairment in social, occupational, or other important areas of functioning. As a result, there is discordance in diagnosis based on ICD-10 versus DSM-IV.24–26 Over the past 10 years there have been accumulating challenges to the diagnostic criteria in DSM-IV, including but not limited to MDD. Philipp and colleagues (1992) were one of the first groups to show that the major depression concept may be too narrow.27 In a primary care study using DSM-III-R, MDD occurred in 17.4%, but the majority of depressed patients fell into the group of depression ‘‘not otherwise specified’’ (NOS). Adding the minor depression concept resulted in the reclassification of 38.3% of the NOS patients to minor depression. Data from the National Comorbidity Survey have shown that across the minor, major, and severe categories of depression (depending on the number of symptoms) there is a ‘‘monotonic’’ increase for a number of fundamental indices such as average number of episodes, impairment, comorbidity, and parental psychopathology,28 suggesting a continuum within depression rather than categorical groupings. Kendler and Gardner’s 1998 longitudinal analysis of the Virginia Twin Registry demonstrated that the presence of five or more symptoms of depression was not a more accurate definition of depression at 1-year follow-up than the presence of three or four symptoms.29 Additionally, there is little empirical support for the DSM-IV requirement for 2-week duration or, indeed, ‘‘clinically significant impairment.30,31 In the Rhode Island MIDAS project, Zimmerman and colleagues (2006)32 conducted an in-depth analysis of symptoms for MDD by having trained raters administer a semi-structured interview to 1,523 psychiatric outpatients. 54.4% of the sample had a current MDD. They analyzed a 17-item bank of possible symptoms of depression, including the standard 9 DSM items but separating the compound criteria that encompass more than one symptom (eg, increased sleep or insomnia), along with non-DSM diagnostic items such as hopelessness, helplessness, and unreactive mood. The authors found that some items
14
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 1.5. Inter-Rater Reliability Eliciting Individual Symptoms of Depression
Symptoms Suicidality Depressed mood Insomnia Anhedonia Decreased appetite Loss of energy Indecisiveness Thoughts of death Psychomotor agitation Feelings of worthlessness Increased weight Decreased concentration Excessive guilt Decreased weight Increased appetite Psychomotor retardation Hypersomnia
Kappa 0.94 0.92 0.91 0.90 0.89 0.88 0.88 0.86 0.83 0.80 0.79 0.78 0.76 0.69 0.63 0.63 0.54
were rated more reliably than others—for example, suicidal ideas, plan, or attempt (suicidality) achieved almost perfect agreement, whereas raters often disagreed about what constituted psychomotor retardation (Textbox 1.5). The authors found that the ranked order of diagnostic weight (by individual item) for DSM-IV membership on logistic regression was depressed mood > anhedonia > sleep disturbance > concentration/indecision > worthlessness/excessive guilt > loss of energy > appetite/weight disturbance > psychomotor change > death/suicidal thoughts. Some items seemed redundant in making a diagnosis. Zimmerman’s group also looked at a validity of so-called core criteria.33 Only 1.5% of the 1,800 patients reported five or more criteria in the absence of low mood or loss of interest or pleasure. Twenty-five of these 27 patients reported depressed mood at a subthreshold level, often in partial remission. Thus, only a small handful of cases would be false positives if no core criteria existed. In another paper in the series, they found that few patients who met the symptom criteria for MDD were ruled out of the diagnosis by the other components of the diagnostic algorithm, thereby explaining why selfadministered depression symptom questionnaires perform well as diagnostic
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
15
proxies.34 Finally, they addressed the longstanding issue of applying some of the criteria in patients with comorbid medical illnesses because of symptom nonspecificity. Based on a series of psychometric analyses that were crossvalidated, they developed an alternative set of diagnostic criteria for MDD that did not include somatic symptoms but would nonetheless demonstrate a high level of concordance with the current DSM-IV definition.
4.
Unstructured (Unassisted) Clinician Diagnosis
Clinician-based assessment has been poorly investigated compared with assisted methods of diagnosis. In fields of medicine where a robust external validation such as postmortem is available, routine diagnostic accuracy has often proven to be remarkably poor.35,36 It should be no surprise, then, if in the absence of a gold standard, health professionals have considerable difficulty making accurate and reliable diagnoses (see Table 1.2).37,38 Regarding missed diagnoses, one study suggested that only 26% were complete mistakes; 25% were underestimates of severity and 38% misidentifications. Conversely, regarding false-positive diagnoses, 35% were overestimates of severity, 24% misdiagnoses, and 41% complete errors. To compound this problem, 90% of psychiatrists do not routinely use case identification and severity measurement for depression (and more than half never do so).39 Most clinicians rely on their own abilities based on training received earlier in their career. On the other hand, clinician-based assessment is purported to be a gold standard in psychiatry if the clinician is given adequate time and resources. This was best conceptualized by Spitzer, who proposed the LEAD standard.40 LEAD is an acronym that stands for the Longitudinal evaluation performed by Expert clinicians who utilize All available Data. The LEAD standard is an important way of obtaining the most likely diagnosis by requiring clinicians to use a collateral history, hospital records, psychological evaluations, and laboratory results. However, uncertainty about who is ‘‘expert’’ and which data are mandatory, as well as availability, limits both the actuarial and practical value of this standard.41 A related clinical standard is the best estimates procedure (BEP), which is simpler than the LEAD.42 In the BEP, all available information is evaluated by experienced clinicians who assign a consensus ‘‘best-estimate diagnosis.’’ As with the LEAD standard, the number of clinicians and source of information should always be stated.
Accuracy of Psychiatrists’ Routine Diagnoses The accuracy of psychiatrists’ diagnostic skills can be compared against BEP diagnoses and/or structured interviews. The value of BEP was investigated by
16
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Kosten and Rounsaville (1992),43 who interviewed 475 subjects using the Schedule for Affective Disorders and Schizophrenia-Lifetime (SADS-L). Two psychologists independently evaluated and diagnosed the same subjects, applying the BEP. Higher rates of diagnoses of major and minor depressive disorder, antisocial personality, alcoholism, and drug abuse were revealed when the BEP was applied than with routine interview alone and with a minimal rate of false positives. More recently, Taiminen and colleagues (2001)44 compared routine discharge diagnoses based on DSM-IV and BEP diagnoses in 116 first-admission patients with psychosis and severe affective disorder (Table 1.3). However, in this case the BEP included data from a Schedules for Clinical Assessment in Neuropsychiatry (SCAN) interview, enforcing an even higher gold standard. Diagnostic agreement was moderate (kappa 0.51), suggesting frequent errors in the routine diagnoses even when using DSM-IV criteria. Of note, clinicians tended to miss depressive symptoms in psychotic patients, to overdiagnose psychotic symptoms in depressive patients, and to overlook earlier hypomanic or depressive episodes in depressive patients. Spitzer and colleagues (1999)45 evaluated the unassisted accuracy of mental health professionals (1 psychologist and 3 mental health social workers) in comparison with 62 primary care physicians (PCPs) using the depression scale of the Patient Health Questionnaire (PHQ-9). Accuracy was calculated in 585 cases who had both assessments within a 48-hour period. PCPs recognized 61% of cases thought to have major depression by mental health professionals and excluded 98% of cases thought not to have major depression. Accuracy in the other direction was not reported. Recently Carballeira and colleagues from Switzerland (2007)46 studied 212 patients admitted to the internal medicine units of the University Hospitals of Geneva (Table 1.4). Each patient completed the PHQ-9 and underwent a blind DSM-IV diagnostic assessment by a psychiatrist. Compared to the PHQ-9, psychiatrists recognized 50% of cases with major depression but only 22% of those with Table 1.3. Diagnostic Accuracy of Primary Care Physicians Against CIDI
Depressed (Unassisted Diagnosis) Not Depressed (Unassisted Diagnosis) Total
Gold Standard Depressed (CIDI)
Gold Standard Not Depressed (CIDI)
70
76 (false positives) 459
104 (false negatives) Se 40.2%
PPV 48% NPV 81.5%
Sp 85.8%
Reprinted from General Hospital Psychiatry 21(2), Tiemens BG, VonKorff M, Lin EH, Diagnosis of depression by primary care physicians versus a structured diagnostic interview. Understanding discordance, 87–96, Copyright (1999).
17
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
Table 1.4. Diagnostic Accuracy of Psychiatrists vs. PHQ-9 (Patient-Rated) Psychiatrist
PHQ-9 PHQ-9 No PHQ-9 Mj Mn Mj Depressed Depressed Depressed
No PHQ-9 Mn Depressed
Depressed (Unassisted Diagnosis) Not Depressed (Unassisted Diagnosis) Total
12
5
12 (false negatives) Se 50%
18 (false negatives) Se 22%
26 (false positives) 162
30 (false positives) 159
Sp 86%
Sp 84%
PPV (Mj) 32% PPV (Mn) 14% NPV (Mj) 93% NPV (Mn) 90%
Mj, major (DSM-IV); Mn, minor (DSM-IV). Reproduced from Carballeira et al. Criterion validity of the French version of Patient Health Questionnaire (PHQ) in a hospital department of internal medicine. Psychology and Psychotherapy: Theory, Research and Practice (2007), 80, 69–77.
more milder forms. Rule-out accuracy was high but rule-in accuracy was poor, with a high rate of false positives. The authors also compared diagnoses of psychiatrists by internists in medicine, finding a kappa agreement of only 0.20. This study is valuable because patient-rated symptoms have particular importance.47 Several groups have explored the accuracy of routine diagnoses against the Structured Clinical Interview for DSM Disorders (SCID), although few have used other methods such as the Composite International Diagnostic Interview (CIDI).48 Helzer and colleagues (1985)49 examined the level of agreement between a lay-rated Diagnostic Interview Schedule (DIS) in the Epidemiologic Catchment Area project and routine clinical diagnoses made by psychiatrists. Overall agreement between the DIS and the psychiatrists ranged from 79% to 96%, but specificities were all 90% or better. Anthony and associates (1985)50 studied DSM-III diagnoses made by the DIS in comparison to a ‘‘standardized’’ DSM-III diagnosis by psychiatrists in the two-stage Baltimore Epidemiologic Catchment Area mental morbidity survey. There were considerable disagreements; the only category of modest agreement was alcohol use disorder. Steiner and colleagues (1995)51 studied the relationship between diagnoses generated by the SCID and unstructured psychiatric interviews. Diagnoses generated by researchers using the SCID and routinely by psychiatrists were compared for 100 patients. Overall agreement between the SCID diagnosis and the clinical diagnosis was low (kappa of 0.30). Shear and coworkers (2000)52 examined 164 nonpsychotic patients at two community treatment facilities using the SCID and compared results to diagnoses obtained from clinician
18
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
records. The majority (59%) of patients met the SCID criteria for a primary depressive disorder. Diagnoses agreed in only a small minority of cases (kappa 0.24 overall and 0.33 for mood disorder). Overall, use of the SCID resulted in more diagnoses than the standard clinical procedures, particularly where comorbidity was present. Anxiety disorders, in particular, were much more likely to be overlooked by a clinical rater. One exception was ‘‘adjustment disorder,’’ which was more frequently diagnosed by a clinician than by the SCID rater. In an important but small-scale study, Miller and colleagues (2001)53 compared three methods of diagnosis for 56 psychiatric inpatients against the LEAD criterion standard. These were unassisted clinical assessment, SCID, and a structured Computer-Assisted Diagnostic Interview (CADI). Psychiatrists’ unassisted assessment had 54% agreement against LEAD (kappa 0.43), whereas SCID and CADI had agreements above 85% (kappa 0.81). Compared with similarly trained colleagues, there was an interrater agreement of only 45.5% (kappa 0.24) for unassisted clinicians, meaning independent clinicians disagreed most of the time.54 In one of the largest studies of diagnostic accuracy, Kashner and coworkers (2003)55 looked at 294 newly enrolled adult psychiatric patients based on clinical records. Within 2 weeks of their primary evaluation, patients were randomly assigned to receive a nurse-administered SCID with feedback to the attending psychiatrist or usual care. The kappa agreement between the SCID and chart diagnoses of MDD was 0.56 at baseline (unassisted), rising to 0.90 at the end of the study after feedback of results to clinicians. Against the SCID, clinicians underdiagnosed all psychiatric disorders (for example, missing over 60% of substance abuse disorders and anxiety disorders). However, unassisted clinicians also made several false-positive diagnoses, most commonly for schizophrenia, bipolar disorders, and MDD. Basco and associates (2003)56 interviewed 200 psychiatric outpatients and attempted to establish gold standard diagnoses based on SCID, all medical records, and a follow-up interview with a psychiatrist or a psychologist trained in diagnostic procedures (in effect, the LEAD procedure). The percentage of agreements with this gold standard was 53% for routine diagnoses, 68% for the SCID, and 79% for the SCID plus chart review. Concordance was better for depression. Looking at the subset of patients examined by a psychiatrist, 70% of those thought by psychiatrists to have MDD actually did on the SCID (43 of 61 participants), but half of the SCID cases of MDD were not previously recognized as such, typically assigned adjustment disorder or no clinical diagnosis, anxiety disorder, substance abuse, or bipolar disorder. The accuracy of unassisted clinical ability was examined for both rule-in and rule-out accuracy (Table 1.5). Psychiatrists were good at excluding depression but missed 50% of cases when attempting to rule in a diagnosis. In all groups, when discrepancies occurred, most were judged to
19
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
Table 1.5. Diagnostic Accuracy of Psychiatrists vs. SCID Plus
Depressed (Unassisted Diagnosis) Not Depressed (Unassisted Diagnosis) Total
SCID þ Standard Depressed
SCID þ Standard Depressed
17
7 (false positives)
PPV 76%
17 (false negatives)
155
NPV 89%
Se 50%
Sp 96%
SCID+ refers to SCID, plus all medical records and a follow-up interview with a trained psychiatrist or a psychologist; see text. Basco et al. Methods to improve diagnostic accuracy in a community mental health setting. Am J Psychiatry. 2000;157(10):1599–1605.
be of substantial clinical importance. Performance shows remarkable similarity to those of PCPs (see Table 1.4 for comparison). The kappa coefficients showed that administration of the SCID without the benefit of a medical record review improved accuracy beyond routine diagnosis alone, while adding information derived from the chart review resulted in an additional 25% improvement over and above the SCID alone. These findings are consistent with reports from other studies showing the advantage of diagnostic interviews over unstructured clinical interviews (see below).57,58 This is one study in which the importance of the competing diagnoses was investigated. Psychiatrists found separation of MDD versus obsessive-compulsive disorder and MDD versus dysthymia to be relatively straightforward but struggled with MDD versus adjustment disorder and MDD versus organic disorder, among others. Reasons for suboptimal accuracy are discussed in Chapter 3.
5.
Structured and Semi-Structured Assisted Diagnostic Interviews
Semi-structured diagnostic interviews were introduced in the 1970s as a method that would allow lay interviewers to obtain psychiatric diagnoses close to those a psychiatrist would obtain.59,60 Rogers suggested that one third of clinical variability was due to idiosyncratic questioning and two thirds to interpretation of the information gleaned.60 The premise is that standardization forces an assessor to cover all the areas of psychopathology and provides consistency in the way questions are asked. Three main components of the structured interview are (1) to use the standardized language of clinical method, (2) to sequence the order of inquiry, and (3) to quantify the responses.
20
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
However, assisted interviews have several significant limitations. First, they are time-consuming: the average time to administer the SCID is approximately 1 hour and 44 minutes, compared to about 40 minutes for a standard interview (Textbox 1.6). Second, they have modest acceptability to patients and staff, who often find these interviews restrictive (for staff) and repetitive (for
Textbox 1.6. Summary of Assisted Interviews Partially Structured The PSE (Present State Examination)/ SCAN Type: Semi-structured interview Recommended Use by: Clinicians Generates: ICD-10 and DSM-IV criteria Duration: 45 minutes
SCID-I (Structured Clinical Interview for DSM-IIIR) Type: Semi-structured interview Recommended Use by: Trained interviewer and/or clinicians Generates: DSM-IV Duration: 1 hour and 44 minutes Schedule for Affective Disorders and Schizophrenia (SADS) Type: Semi-structured interview Recommended Use by: Trained interviewer and/or clinicians Generates: RDC Duration: 90 minutes Fully Structured CIDI (Composite International Diagnostic Interview) Type: Structured Recommended Use by: Trained interviewer (clinician optional) Generates: ICD-10 and DSM-III-R criteria Duration: 75 minutes
M.I.N.I (Mini-International Neuropsychiatric Interview) Type: Structured Recommended Use by: Trained interviewer (clinician optional) Generates: ICD-10 and DSM-IV criteria Duration: 20 minutes Diagnostic Interview Schedule (DIS), Type: Structured Recommended Use by: Trained interviewer (clinician optional) Generates: DSM-IV Duration: 120 minutes
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
21
patients).61 Third, and perhaps unexpectedly, diagnostic interviews can produce far from uniform results even in the same population. For example, 12-month prevalence rates of major depression in the United States using two instruments were 4.2%62 and 10.1%.63 Further, no before-and-after study or randomized trial has shown how much these methods can improve routine care. These cautions call into question the value of these instruments for clinical care, at least until further data are available.64 The most common instruments are illustrated in Textbox 1.6. The SCID was developed alongside DSM-III-R.65 As with most instruments, raters must first be trained. Compared with the CIDI, the clinician makes more judgments as to whether each criterion is met and whether all criteria taken together validate the clinical diagnosis. Numerous studies have evaluated interrater reliability for major depression using the SCID. One of the largest, from Williams and colleagues (1992),66 evaluated the ability of psychiatrists (n = 14), psychologists (n = 6) and master’s degree students (n = 4) to diagnose depression. There was a modest kappa agreement of 0.64. There are also several studies comparing the SCID and CIDI. In a sample of 325 patients from the National Comorbidity Survey, the sensitivity of CIDI was 55% and specificity was 93.7% for lifetime major depression compared with the SCID (kappa 0.54).67 In the study by Basco and associates (2003) mentioned previously, the added value of SCID plus chart diagnoses suggests that the SCID can be improved using very experienced clinical raters—hence the need for a clinician-led assisted interview. Interestingly, feedback of SCID results to psychiatrists can lead to more positive outcomes.68 Philipp and colleagues (1986) proposed a refinement to the SCID called the Polydiagnostic Interview (PODI).69 The advantage of this approach is that the PODI generates diagnosis according to several completing diagnostic checklists, including DSM-III-R, ICD-10, Research Diagnostic Criteria (RDC), and Feighner Diagnostic Criteria. The Present State Examination (PSE) is a semi-structured interview designed for use only by clinicians. The current 10th edition can generate both ICD-10 and DSM-IV diagnoses. A computer program derived from PSE (CATEGO-5) has also been developed, as has a short version of PSE. SCAN is a semi-structured interview based on PSE and is also the product of a collaborative study between the World Health Organization (WHO) and the U.S. Alcohol, Drug Abuse, and Mental Health Administration (ADAMHA).70 Again, the PSE requires a thorough training course, making it expensive and time-consuming for many.
Fully Structured Assisted Interviews The DIS was developed by National Institute of Mental Health (NIMH) and was released in its first version in 1978. It was an adaption of the Renard Diagnostic Instruments designed to assess Feighner’s diagnostic criteria.
22
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
DIS-4 focuses on DSM-IV and is similar to the CIDI. It has been validated, but one study found low sensitivity of the DIS versus the SCID.71 The CIDI was produced jointly by WHO and ADAMHA and is designed to enable a trained interviewer to arrive at a either an ICD-10 or a DSM diagnosis in about 75 minutes. The CIDI is an amalgamation of two pre-existing instruments, the DIS and the PSE. It contains 276 symptom questions, many of which are probes to evaluate symptom severity, as well as questions for assessing help-seeking and psychosocial impairments. A computerized version, CIDI 2.1, is available. The first field showed high interrater reliability but poor test–retest reliability for depressive disorders.72 Subsequent reliability studies (using slightly different versions of the CIDI) demonstrated a high interrater reliability.73,74 One validity study used a clinician-scored DSM-III-R symptom checklist as the gold standard.75 Compared with this gold standard checklist, the CIDI had a sensitivity of 85% and a specificity of 98% (kappa 0.84). A second study compared the CIDI against the SCIDassisted LEAD procedure.76 There was modest positive predictive value and a high negative predicted value (kappa 0.46). The Mini-International Neuropsychiatric Interview (M.I.N.I.) is an abbreviated structured psychiatric interview that takes only 15 to 20 minutes to administer.77 It uses decision-tree logic to elicit all the symptoms listed in the symptom criteria for DSM-IV and ICD-10 for 15 major Axis 1 diagnostic categories, for one Axis II disorder, and for suicidality. Several specific tools are available: M.I.N.I.-Screen, M.I.N.I.-Plus, and the M.I.N.I.-Kid. Validation of the M.I.N.I. in relation to the SCID Patient Version, the CIDI, and expert professional opinion has been conducted.77
6.
Conclusion
Some will find the conclusion that a diagnosis of mental disorders is not based on a robust gold standard surprising.78 Current evidence has repeatedly shown that unassisted psychiatric diagnoses are neither particularly reliable (when judged by repeat assessments) nor particularly valid (when judged by consensus methods or assisted interviews), especially when comorbidity is present.79 Miller and colleagues (2001)53 found that when unassisted, clinicians evaluated an average of only 53% of key criteria present in diagnostic algorithms (32% in the case of depression). Psychiatrists asked about low mood in 86% of cases but asked about loss of pleasure in only 8%.80 As awareness of these limitations increases, there will be an increased call for clinicians to use diagnostic aids as a routine in clinical practice. If this occurs with proper diagnostic scrutiny (comparing accuracy with and without assistance head to head), psychiatric diagnosis will slowly move from being a
23
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
nonscientific art based on the overall clinical impression to a science where the accuracy of each method—indeed each question—is known. As Kendell and Jablensky9 observed: ‘‘Psychiatry is in the position—that most of medicine was in 200 years ago—of still having to define most of its disorders by their syndromes. Because of the consequent need to distinguish one disorder from another by differences between syndromes, the validity of diagnostic concepts remains an important issue in psychiatry. In this situation, to search for boundaries between syndromes and to use zones of rarity as criteria of validity is, we contend, the best strategy available to us.’’ Here Kendell and Jablensky highlight a fundamental problem in the search for accuracy. That is the notion that many of our current diagnoses are labels of convenience not any more distinct from each other than short stature and normal height. Like many conditions based largely on phenotypes alone, normal height has a Gaussian (normal) distribution that overlaps with many diseases and disorders that cause growth retardation. The result may be two distributions with significant overlap and little point of rarity (see Fig. 1.1).
Kappa 160 140 120 1.00
100
0.80
80
0.60
60
0.40
40
0.20
20
0.00
Routine Diagnoses
Diagnoses Based on SCID
Diagnoses Based on SCID Plus Medical Records
Time Required (minutes)
Agreement With Gold Standatd on Specific Diagnoses (kappa)
Time required
0
Figure 1.4. Time required to produce accurate diagnoses. Time requirement and reliability of routine diagnoses, SCID-based diagnoses, and diagnoses based on the SCID plus medical records for 200 outpatients with severe mental illness. Reprinted from Basco RM, Bostic JQ, Davies D, Rush AJ, Witte B, Hendrickse W, Barnett V. Methods to improve diagnostic accuracy in a community mental health setting. Am J Psychiatry. 2000 Oct;157(10): 1599–605 with permission.
24
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
DSM-III and ICD-8 were landmark publications that allowed us to scrutinize the mysterious process of psychiatric diagnosis. Each new release brings an incremental improvement. Although neither DSM nor ICD has been universally accepted (in one study, clinicians used DSM criteria in 23% of visits in which a psychosocial problem was recognized),81 they have had a beneficial influence.82 As these checklist-based diagnostic systems with rule-based criteria are field-tested, it becomes apparent that many of the suggested symptoms, combinations, and associated features are not particularly useful diagnostically. However, this could be seen as an advantage, as previously no attempt was made at all to change mainstream psychiatric diagnoses. Finding out what doesn’t work may be as valuable as finding out what does. Beyond the checklist approach lie assisted interviews, which have a good evidence base for reliability, validity, or both. What is missing are formal implementation trials where one group of clinicians are randomized to assisted interviews and one group to diagnosis as usual to discover if clinical outcome actually improves. Unfortunately, most of the assisted methods so far developed are too long for routine clinical use. Indeed, a rule of thumb in this field is that the more accurate the diagnostic method, the longer the time required—and, further, this effect may not be linear (Fig. 1.4). A key challenge for the future, therefore, is to develop reliable diagnostic methods of sufficient brevity that they become routinely accepted in busy clinical practice, including primary and secondary care.
References 1. Jablensky A. Categories, dimensions and prototypes: critical issues for psychiatric classification. Psychopathology. 2005;38:201–205. 2. van Praag HM. Can stress cause depression? Prog Neuropsychopharmacol Biol Psych. 2004;28(5):891–907. 3. Parker G. Classifying depression: should paradigms lost be regained? Am J Psychiatry. 2000;157:1195–1203. 4. Sneath PHA. Some thoughts on bacterial classification. J Gen Microbiol. 1957;17:184–200. 5. Cloninger CR. A new conceptual paradigm from genetics and psychobiology for the science of mental health. Aust N Z J Psychiatry. 1999;33:174–186. 6. Lyness JM, Kim JH, Tang W, et al. The clinical significance of subsyndromal depression in older primary care patients. Am J Geriatr Psychiatry. 2007;15:214–223. 7. Angst J, Merikangas KR. Multi-dimensional criteria for the diagnosis of depression. J Affect Disord. 2001;62:7–15. 8. The ICD-10 classification of mental and behavioral disorders: diagnostic criteria for research, 10th edition. Geneva: World Health Organization, 1993. 9. Kendell R, Jablensky A. Distinguishing between the validity and utility of psychiatric diagnoses. Am J Psychiatry. 2003;160:4–12.
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
25
10. Aboraya A, Compton III W. Biological markers and external validators in psychiatry: progress report on the validity of psychiatric diagnoses. eCommunity Int J Mental Health Addiction. Nov. 7, 2004 [online]. 11. Tierney W, Fitzgerald J, McHenry R, et al. Physicians’ estimates of the probability of myocardial-infarction in emergency room patients with chest pain. Medical Decision Making. 1986;6(1):12–17. 12. Chun AA, McGee SR. Bedside diagnosis of coronary artery disease: A systematic review. Am J Med. 2004;117(5):334–343. 13. Pull CB, Pull MC, Pichot P. Integrated lists of taxonomic evaluation criteria: LICET-S and LICET-D. Acta Psychiatr Belg. 1984;84(4):297–309. 14. Mihalopoulos C, McGorry P, Roberts S, et al. The procedural validity of retrospective case note diagnosis. Aust N Z J Psychiatry. 2000;34(1):154–159. 15. Janca A, Hillerb W. ICD-10 checklists—A tool for clinicians’ use of the ICD-10 classification of mental and behavioral disorders. Comprehensive Psychiatry. 1996;37(3):180–187. 16. Hamilton JD. Do we underutilise actuarial judgement and decision analysis? EvidenceBased Mental Health. 2001;4:102–103. 17. Holzer III CE, Nguyen HT, Hirschfeld RMA. Reliability of the diagnosis in mood disorders. Psychiatric Clin North Am. 1996;19(1):73–84. 18. Manual of the international classification of diseases, injuries and causes of death, 6th ed. Geneva: World Health Organization, 1948. 19. Diagnostic and statistical manual of mental disorders. Washington, DC: American Psychiatric Publishing, 1952. 20. Erkinjuntti T, Ostbye T, Steenhuis R, et al. The effect of different diagnostic criteria on the prevalence of dementia. N Engl J Med. 1997;337(23):1667–1674. 21. Furukawa TA, Anraku K, Hiroe T, et al. A polydiagnostic study of depressive disorders according to DSM-IV and 23 classical diagnostic systems. Psychiatry Clin Neurosci. 1999;53(3):387. 22. Zimmerman M, Chelminski I, McGlinchey JB, et al. Diagnosing major depressive disorder VI: Performance of an objective test as a diagnostic criterion. J Nerv Ment Dis. 2006;194:565–569. 23. Diagnostic and statistical manual of mental disorders, 4th ed. Washington, DC: American Psychiatric Publishing, 1994. 24. Philipp M, Maier W, Delmo CD. The concept of major depression. I. Descriptive comparison of six competing operational definitions including ICD-10 and DSMIII-R. Eur Arch Psychiatry Clin Neurosci. 1991;240(4–5):258–265. 25. Andrews G, Slade T, Peters L, et al. Classification in psychiatry: ICD-10 versus DSM-IV. Br J Psychiatry. 1999;174(1):3–5. 26. Ravelli A, Bijl RV, Van Brink WD. Consequences of the use of different classification systems: A comparison of the DSM-III-R and the ICD10 for depression. Int J Methods Psychiatric Res. 1999;8(4):192–203. 27. Philipp M, Delmo CD, Buller R, et al. Differentiation between major and minor depression. Psychopharmacology. 1992;106:S75–S78. 28. Kessler RC, Zhao S, Blazer DG, et al. Prevalence, correlates, and course of minor depression and major depression in the National Comorbidity Survey. J Affect Disord. 1997;45:19–30. 29. Kendler KS, Gardner CO Jr. Boundaries of major depression: an evaluation of DSM-IV criteria. Am J Psychiatry. 1998;155:172–177. 30. Spitzer RL, Wakefield JC. DSM-IV diagnostic criterion for clinical significance: does it help solve the false positives problem? Am J Psychiatry. 1999;156:1856–1864.
26
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
31. Beals J, Novins DK, Spicer P, et al., the AI-SUPERPFP Team. Challenges in operationalizing the DSM-IV clinical significance criterion. Arch Gen Psychiatry. 2004;61(12):1197–1207. 32. Zimmerman M, McGlinchey JB, Young D, et al. Diagnosing major depressive disorder, I. A psychometric evaluation of the DSM-IV symptom criteria. J Nerv Ment Dis. 2006;194:158–163. 33. Zimmerman M, McGlinchey JB, Young D, et al. Diagnosing major depressive disorder, IV. Relationship between number of symptoms and the diagnosis of disorder. J Nerv Ment Dis. 2006;194:450–453. 34. Zimmerman M, Chelminski I, McGlinchey JB, et al. Diagnosing major depressive disorder, VI. Performance of an objective test as a diagnostic criterion. J Nerv Ment Dis. 2006;194:565–569. 35. Lundberg GD. Low-tech autopsies in the era of high-tech medicine: continued value for quality assurance and patient safety. JAMA. 1998;2801:1273–1274. 36. Mayeux R, Saunders AM, Shea S, et al. Utility of the apolipoprotein E genotype in the diagnosis of Alzheimer’s disease. Alzheimer’s Disease Centers Consortium on Apolipoprotein E and Alzheimer’s Disease. N Engl J Med. 1998;338(8):506–511. 37. Matarazzo JD. The reliability of psychiatric and psychological diagnosis. Clin Psychol Rev. 1983;3:103–145. 38. Tiemens BG, VonKorff M, Lin EH. Diagnosis of depression by primary care physicians versus a structured diagnostic interview. Understanding discordance. Gen Hosp Psychiatry. 1999;21(2):87–96. 39. Gilbody SM, House AO, Sheldon TA. Psychiatrists in the UK do not use outcomes measures: National survey. Br J Psychiatry. 2002;80:101–103. 40. Spitzer RL. Psychiatric diagnosis: Are clinicians still necessary? Comprehensive Psychiatry. 1983;24:399–411. 41. Antony MM, Barlow DH. Structured and semistructured diagnostic interviews. In Barlow DH, ed. Handbook of assessment and treatment planning for psychological disorders. New York: Guilford, 2002:3–37. 42. Leckman JF, Sholomskas D, Thompson WD, et al. Best estimate of lifetime psychiatric diagnoses. Arch Gen Psychiatry. 1982;39:879–883. 43. Kosten TA, Rounsaville BJ. Sensitivity of psychiatric diagnosis based on the best estimate procedure. Am J Psychiatry. 1992;149:1225–1227. 44. Taiminen T, Ranta K, Karlsson H, et al. Comparison of clinical and best-estimate research DSM-IV diagnoses in a Finnish sample of first-admission psychosis and severe affective disorder. Nord J Psychiatry. 2001;55(2):107–111. 45. Spitzer RL, Kroenke K, Williams JBW, et al. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. JAMA. 1999;282:1737–1744. 46. Carballeira Y, Dumont P, Borgacci S, et al. Criterion validity of the French version of Patient Health Questionnaire (PHQ) in a hospital department of internal medicine. Psychol Psychotherapy Theory Res Pract. 2007;80:69–77. 47. Moller HJ. Rating depressed patients: observer- vs self-assessment. Eur Psychiatry. 2000;15(3):160–172. 48. Becker J, Kocalevent RD, Rose M, et al. Standardized diagnosing: Computer-assisted (CIDI) diagnoses compared to clinically-judged diagnoses in a psychosomatic setting. Psychotherapie Psychosomatik Medizinische Psychologie. 2006;56(1):5–14. 49. Helzer JE, Robins LN, McEvoy LT, et al. A comparison of clinical and diagnostic interview schedule diagnoses. Physician reexamination of lay-interviewed cases in the general population. Arch Gen Psychiatry. 1985;42:657–666.
1 IS THE SYNDROME OF DEPRESSION A VALID CONCEPT?
27
50. Anthony JC, Folstein M, Romanoski AJ, et al. Comparison of the Lay Diagnostic Interview Schedule and a standardized psychiatric diagnosis. Experience in eastern Baltimore. Arch Gen Psychiatry. 1985;42(7):667–675. 51. Steiner J, Tebes J, Sledge W, et al. A comparison of the structured clinical interview for DSM-III-R and clinical diagnoses. J Nerv Ment Dis. 1995;183(6):365–369. 52. Shear MK, Greeno C, Kang J, et al. Diagnosis of nonpsychotic patients in community clinics. Am J Psychiatry. 2000;157:581–587. 53. Miller PR. Dasher R, Collins R, et al. Inpatient diagnostic assessments: 1. Accuracy of structured versus unstructured interviews. Psychiatry Res. 2001;105:265–272. 54. Miller PR. Inpatient diagnostic assessments: 2. Interrater reliability and outcomes of structured vs. unstructured interviews. Psychiatry Res. 2001;105:265–271. 55. Kashner TM, Rush AJ, Suris A, et al. Impact of structured clinical interviews on physicians’ practices in community mental health settings. Psychiatr Serv. 2003;54:712–718. 56. Basco RM, Bostic JQ, Davies D, et al. Methods to improve diagnostic accuracy in a community mental health setting. Am J Psychiatry. 2000;157(10):1599–1605. 57. Riskind JH, Beck AT, Berchick RJ, et al. Reliability of DSM-III diagnoses for major depression and generalized anxiety disorder using the Structured Clinical Interview for DSM-III. Arch Gen Psychiatry. 1987;44:817–820. 58. Williams JBW, Gibbon M, First MB, et al. The Structured Clinical Interview for DSM-III-R (SCID), II: multisite test–retest reliability. Arch Gen Psychiatry. 1992;49:630–636. 59. Robins L. National Institute of Mental Health diagnostic interview schedule—its history, characteristics, and validity. Arch General Psychiatry. 1981;38:381. 60. Rogers R. Handbook of diagnostic and structured interviewing. New York: Guilford Publications, 2001. 61. Gibson C. Semi-structured and unstructured interviewing: a comparison of methodologies in research with patients following discharge from an acute psychiatric hospital. J Psychiatric Mental Health Nursing. 1998;5(6):469–477. 62. Robins LN. Psychiat Disorders A: 1991. 63. Kessler RC, McGonagle KA, Zhao S, et al. Lifetime and 12-month prevalence of DSMIII-R psychiatric disorders in the United States—results from the National Comorbidity Survey. Arch Gen Psychiatry. 1994;51:8. 64. Brugha TS, Bebbington PE, Jenkins R. A difference that matters: comparisons of structured and semi-structured psychiatric diagnostic interviews in the general population. Psychol Med. 1999;29:1013–1020. 65. Spitzer RL, Williams JB, Gibbon M, et al. The Structured Clinical Interview for DSM-III-R (SCID). I: History, rationale, and description. Arch Gen Psychiatry. 1992;49(8):624–629. 66. Williams JB, Gibbon M, First MB, et al. The Structured Clinical Interview for DSM-III-R (SCID), II: multisite test–retest reliability. Arch Gen Psychiatry. 1992;49:630–636. 67. Haro JM, Arbabzadeh-Bouchez S, Brugha TS, et al. Concordance of the Composite International Diagnostic Interview Version 3.0 (CIDI 3.0) with standardized clinical assessments in the WHO World Mental Health Surveys. Int J Methods Psychiatric Res. 2006;15(4):167–180. 68. Kashner TM, Rush AJ, Suris A, et al. Impact of structural clinical interviews on physicians’ practices in community mental health settings. Psychiatric Services. 2003;54(5):712–718.
28
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
69. Philipp M, Maier W. The polydiagnostic interview: a structured interview for polydiagnostic classification of psychiatric patients. Psychopathology. 1986;19:175–185. 70. Wing JK, Babor T, Brugha T, et al. SCAN. Schedules for Clinical Assessment in Neuropsychiatry. Arch Gen Psychiatry. 1990;47(6):589–593. 71. Murphy JM, Monson RR, Laird NM, et al. A comparison of diagnostic interviews for depression in the Stirling County Study Challenges for Psychiatric Epidemiology. Arch Gen Psychiatry. 2000;57:230–236. 72. Wittchen HU, Robins LN, Cottler LB, et al. Cross-cultural feasibility, reliability and sources of variance of the Composite International Diagnostic Interview (CIDI). The multicentre WHO/ADAMHA field trials. Br J Psychiatry. 1991;159:645–658. 73. Wittchen HU. Reliability and validity studies of the WHO-Composite International Diagnostic Interview (CIDI): a critical review. J Psychiatr Res. 1994;28:57–84. 74. Andrews G, Peters L. The psychometric properties of the Composite International Diagnostic Interview. Soc Psychiatry Psychiatr Epidemiol. 1998;33:80–88. 75. Janca A, Robins LN, Bucholz KK, et al. Comparison of Composite International Diagnostic Interview and clinical DSM-III-R criteria checklist diagnoses. Acta Psychiatr Scand. 1992;85:440–443. 76. Booth BM, Kirchner JE, Hamilton G, et al. Diagnosing depression in the medically ill: validity of a lay-administered structured diagnostic interview. J Psychiatric Res. 1998;32(6):353–360. 77. Sheehan DV, Lecrubier Y, Sheehan KH, et al. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry. 1998;59(Suppl 20):22–57. 78. Kendell RE. Clinical validity. Psychol Med. 1989;19:45–55. 79. Zimmerman M, Mattia JI. Psychiatric diagnosis in clinical practice: is comorbidity being missed? Comprehensive Psychiatry. 1999;40:182–191. 80. Miller PR. Inpatient diagnostic assessments: 3. Causes and effects of diagnostic imprecision. Psychiatry Res. 2002;111:191–197. 81. Gardner W, Kelleher KJ, Pajer KA, et al. Primary care clinicians’ use of standardized psychiatric diagnoses. Child Care Health Development. 2004;30(5):401–412. 82. Toshiyuki S, Makoto T. Is DSM widely accepted by Japanese clinicians? Psychiatry Clin Neurosci. 2001;55:437–450.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS Alex J. Mitchell
1. 2. 3. 4.
Background The Classic Severity Scales (1960–1980) The New Severity Scales (1981–2008) The Future of Screening Scales
Context There have been a large number of depression tools published for the purposes of detecting depression or rating its severity. Choosing between them is difficult without adequate information on their validity, reliability, and acceptability. Recently, ever-shorter-version mood measures have been released. Is a shorter scale a better scale? It is important to study each method against our best standard and ideally compare scales head to head to judge the optimal scale for each situation.
1.
Background
Clinicians and researchers have developed a bewildering number of tools for the assessment of depression. These are most often questionnaires designed to help elicit symptoms of depression for the purpose of screening, diagnosis, and monitoring progress (Textbox 2.1). Although we often use the terms screening, diagnosis, and case-finding interchangeably, in an epidemiologic sense screening refers to the attempted detection of disorder in those who had not sought testing or did not suspect they had a particular condition. Often a screening test is not usually intended to be diagnostic, in that those with suspicious findings may be referred for more definitive examination. The latter is perhaps better known as case-finding. This means a screening tool can favor negative predictive value 29
30
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 2.1. Definitions of Screening and Related Procedures Screening ‘‘The systematic application of a test or inquiry, to identify individuals at sufficient risk of a specific disorder to warrant further actions among those who have not sought medical help for that disorder’’ Case-Finding ‘‘The selected application of a test or inquiry, to identify those individuals with a suspected disorder and exclude those without a disorder, usually in a population who have sought medical help’’ Targeted (High-Risk) Case-Finding ‘‘The highly selected application of a test or inquiry, to identify individuals at high risk of a specific disorder by virtue of known risk factors’’ Severity Assessment ‘‘The application of a test or inquiry, to quantify the severity of a specific disorder’’ Adapted from Department of Health. Annual report of the National Screening Committee. London: DoH, 1997.
(NPV) over positive predictive value (PPV) (see Chapter 5). In both screening and case-finding the test may be applied ‘‘routinely’’ to all cases, or selectively to those thought to be at high risk. A screening test applied to many individuals should be as simple as possible to retain high uptake, and positive results must be paired with an acceptable next step.1 A case-finding measure may be more involved but should still consider acceptability. Adoption of a test in clinical practice probably depends more on acceptability than accuracy.2
Historical Aspects During the past five decades there has been a considerable effort to improve the methods used to detect and quantify depression (Textbox 2.2).3–6 Some scales, such as the Cronholm-Ottosson Depression Scale, have fallen into obscurity, while others, such as the Hamilton Depression Rating Scale and the Beck Depression Inventory, have each been cited over 10,000 times. Given that there are so many similar depression scales, it is not surprising that clinicians have trouble choosing between them. The American College of Psychology Consultants lists 213 psychologically oriented scales with variable validation and reliability data,7 simplified here to 50 depression scales (Textbox 2.3). Fortunately, this may be distilled further to ten key depression instruments, five created before 1980 and five more modern inventions (table 2.1, 2.2). The classic scales are the Hamilton Depression ˚ sberg Depression Rating Scale Rating Scale (HAM-D), the Montgomery-A
31
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
(MADRS), the Beck Depression Inventory (BDI), the Zung Self-Rating Depression Scale (SDS), and the Centre for Epidemiologic Studies Depression Scale (CES-D). The five key scales developed since 1980 are the Hospital Anxiety Depression Scale (HADS), the Geriatric Depression Scale (GDS), the Edinburgh Postnatal Depression Scale (EPDS), the MOS 8-Item Depression Screener (Burnam Screen), and the Patient Health Questionnaire (PHQ-9). In addition, I have included the less-well-known Major Depression Inventory (MDI) as it has a special role, facilitating a diagnosis based on both DSM-IV and ICD-10 criteria. Tools examining more general psychopathology are purposely omitted from this chapter even if they include a rating of depression. This includes some seminal scales such as the General Health Questionnaire (GHQ) and the Hopkins Symptom Checklist (SCL) family (SCL-90, SCL-25, and SCL-8).8–10 To keep this chapter manageable I will also not discuss reliability and validity data in detail, but further information can be found in relevant chapters by setting. A comparison of these key scales is shown in Appendix 1.
Textbox 2.2. Development of Major Depression Scales 1952 1960 1961 1965 1968 1977 1977 1979 1980 1980 1982 1983 1986 1987 1987 1987 1988 1992 1994 1996 2001 2001
DSM-I published Hamilton Depression Scale (HAM-D) Beck Depression Inventory (BDI) Zung Self-Rating Depression Scale (SDS) DSM-II published Center for Epidemiologic Studies Depression Scale (CES-D) ICD-9 published ˚ sberg Depression Rating Scale (MADRAS) Montgomery-A DSM-III published The Bech–Rafaelsen Melancholia Scale (MES) Geriatric Depression Scale (GDS-30) Hospital Anxiety and Depression Scale (HADS) Abbreviated version of Geriatric Depression Scale (GDS-15) DSM-IIIR published Edinburgh Postnatal Depression Scale (EPDS) Inventory to Diagnose Depression (IDD) MOS-8 Burnam Screen ICD-10 published DSM-IV published Revision of BDI to BDI-II Patient Health Questionnaire (PHQ) Major Depression Inventory (MDI)
DSM (Diagnostic and Statistical Manual ICD – International Classification of Disease
of
Mental
Disorders);
32
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 2.3. Listing of Depression Scales Generic Scales Beck Depression InventoryTM-Second Edition (BDI-II)TM Brief Psychiatric Rating Scale (BPRS) Brief Symptom Inventory (BSI) Burns Depression Checklist (BDC) Carroll Depression Scales-Revised (CDS-R) Center for Epidemiological Studies Depression Scale (CES-D) Depression Anxiety Stress Scales (DASS) Depression Questionnaire (DQ) Depression 30 Scale (D-30) Diagnostic Interview Schedule (DIS-IV) Diagnostic Inventory for Depression (DID) Hamilton Depression Inventory (HDI) Hamilton Rating Scale for Depression (HRSD) Hopelessness Depression Symptom Questionnaire (HDSQ) Hospital Anxiety and Depression Scale (HADS) Inventory to Diagnose Depression (IDD) Inventory of Depressive Symptomatology (IDS) IPAT Depression Scale Manual for the Diagnosis of Major Depression (MDMD) Minnesota Multiphasic Personality Inventory 2 (MMPI-2) Depression Scale ˚ sberg Depression Rating Scale Montgomery–A (MADRS) MOS 8-Item Depression Screener Multiple Affect Adjective Checklist-Revised (MAACL-R) Multiscore Depression Inventory for Adolescents and Adults (MDI) Newcastle Scales Positive and Negative Affect Scales (PANAS) Primary Care Evaluation of Mental Disorders (PRIME-MD) Profile of Mood States (POMS) Raskin Three-Area Severity of Depression Scale Revised Hamilton-Rating Scale for Depression (RHRSD): Reynolds Depression Screening Inventory (RDSI) Rimon’s Brief Depression Scale (RBDS) State Trait-Depression Adjective Check List (ST-DACL) Symptom Checklist-90-Revised (SCL-90-R) Zung Self-Rating Depression Scale (Zung SDS)
Special Population Scales Aphasic Depression Rating Scale (ADRS) Calgary Depression Scale for Schizophrenia (CDSS) Children’s Depression Inventory (CDI) The Children’s Depression Index (CDI) Children’s Depression Rating Scale-Revised (CDRS-R) Cornell Scale for Depression in Dementia (Cornell Scale) Depression and Anxiety in Youth Scale (DAYS) Depression Intensity Scale Circles (DISCs) Depression Rating Scale (DRS) Geriatric Depression Scale (GDS) Kiddie-Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime Version (K-SADS-PL) Medical-Based Emotional Distress Scale (MEDS) Multiscore Depression Inventory for Children (MDI-C) Postpartum Depression Interview Schedule (PDIS) Psychopathology Inventory for Mentally Retarded Adults (PIMRA) Reynolds Adolescent Depression Scale (RADS) Reynolds Child Depression Scale (RCDS) Signs of Depression Scale (SDSS) Stroke Aphasic Depression Questionnaire (SADQ) Visual Analog Mood Scales (VAMS) Youth Depression Adjective Checklist (Y-DACL)
Adapted from Nezu AM, Ronan GF, Meadows EA, eds. Practitioner’s guide to empirically-based measures of depression. Springer, 2007.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
33
The Limitations of Severity Scales Most mood scales have only an approximate relationship to the criteria of ICD and DSM (see Textbox 2.2). None adhere strictly to these algorithmic criteria (including duration and function), and as such they do not produce operational diagnoses. Several early scales were developed to measure severity (see Sensitivity to Change below) during treatment.11 Yet the value of a scale does not necessarily correspond to its original or intended use—for example, the EPDS may not be the optimal choice in perinatal settings and yet may be valuable elsewhere. The evaluation and refinement of existing scales is discussed in Chapter 4. It remains a significant limitation that only a small number of well-powered studies have compared the value of multiple scales head to head.12,13 From these comparative studies, most suggest that severity scales provide somewhat distinct estimates of depression diagnosis and severity (this has been confirmed by Rasch analysis).14–16 For example, although all measure low mood, not all measure anhedonia, somatic symptoms, anxiety, suicidal ideation, and well-being. Depression scales are predominantly symptom counts over a narrowly defined period. They do not tend to measure chronicity or effect on daily function. Thus, they should not be considered a precise measure of burden of depression. Neither do they measure met or unmet needs or the desire for help. One fundamental issue is that it is not clear which of many possible symptoms of depression are most important for diagnosis (see Chapter 1). For example, some symptoms appear more likely to be associated with greater severity and pervasiveness of depression.17 If some symptoms are more important than others, should the scale weight items differently? This has been tried, but without good validation and at a cost of significant scale complexity. A second unresolved issue is whether depression differs significantly by setting and by comorbid disease. If one presupposes that there is one syndrome of depression manifest in all situations (eg, primary care, specialist care) and all medical conditions, then the role of any scale is simply to best identify and quantify these core symptoms. Although the ‘‘one size fits all’’ approach sounds unlikely, it is essentially the approach taken by DSM-IV and ICD-10. These do not attempt to define a syndrome of, say, ‘‘post-stroke depression’’ as opposed to uncomplicated depression in primary care. A number of very specific depression scales have been proposed to elicit special types of mood disorders. Examples are listed in Textbox 2.3 and include the Depression Scale in Schizophrenia (DEPS) scale,18 the Cornell Scale for the Assessment of Depression in Dementia (CSDD),19 the post-stroke depression scale,20 the Stroke Aphasic Depression Questionnaire (SADQ),21 and the Aphasic Depression Rating Scale.22 The scientific basis for and against having special scales for medical settings is discussed in Chapters 10 and 11. This usually
34
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
revolves around the issue of whether to keep or omit somatic items (see Appendix 2). A final limitation is the temptation to overrely on scales to improve quality of care. Numerous studies have explored this issues, which is discussed by Gilbody in Chapter 7.
Patient-Rated Versus Clinician-Rated Scales In the case of a mental illness where there is no foolproof gold standard, it is by no means clear whether patient-rated or clinician-rated measures are more useful.23 A list of such scales is shown in box 4. Neither patient (self)-rated Textbox 2.4. Major Clinician vs. Self-Report Scales Clinician-Rated Protocols Hamilton Rating Scale for Depression Inventory of Depressive Symptomatology (IDS-C) Manual for the Diagnosis of Major Depression Montgomery–Asberg Depression Rating Scale Newcastle Scales Raskin Three-Area Scale Rimon’s Brief Depression Scale Self-Report Inventories Beck Depression Inventory-Second Edition Carroll Depression Scales-Revised Center for Epidemiological Studies Depression Scale Diagnostic Inventory for Depression Hamilton Depression Inventory Hopelessness Depression Symptom Questionnaire Inventory to Diagnose Depression Inventory of Depressive Symptomatology (IDS-SR) IPAT Depression Scale Minnesota Multiphasic Personality Inventory 2 Depression Scale MOS 8-Item Depression Screener Multiscore Depression Inventory for Adolescents and Adults Positive and Negative Affect Scales Revised Hamilton Rating Scale for Depression: Self-Report Reynolds Depression Screening Inventory State Trait-Depression Adjective Check Lists Zung Depression Self-Rating Depression Scale Adapted from Nezu AM, Ronan GF, Meadows EA, eds. Practitioner’s guide to empiricallybased measures of depression. Springer, 2007.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
35
scales nor clinician-rated scales are inherently more sensitive to change nor more accurate.24,25 A self-rated scale has certain benefits over interviewerrated scales and clinical interviews in large population studies. A self-rated scale takes less time and does not require trained personnel. The administration and scoring process is probably more standardized for self-rated scales.26 Clinician-rated scales can directly augment a clinical interview. If training is a requirement, then the skills of the clinician may also improve. The major advantage of interviewer-rated scales is that the experience of the interviewer comes into play. Faravelli and coworkers (1986)27 compared the distributions of three doctor-rated scales and three self-rated scales in a series of 100 depressed patients and noted that doctor-rated scales tend to be asymmetric toward the left, while self-rated scales tend to be asymmetric toward the right. This may result from the tendency of patients to judge their own condition as more severe than average, while doctors tend to rate severity as less than average. On the other hand, patients can underreport symptoms in some situations.28 Our advice is to choose the type of scale most suited to the purpose at hand.
Sensitivity to Change In psychiatry the concept of sensitivity to change of mood was first used in psychometric research during the 1970s.29,30 Yet sensitivity to change is a phrase that has been variably defined in the literature and is poorly understood. Most consider sensitivity to change to be the ability of a severity scale to detect small changes in outcomes over time with repeated assessment. A more accurate description of sensitivity to change is the proportion of those who actually changed according to a gold standard (eg, responders) that were correctly identified by the instrument under study (Fig. 2.1). One should also consider specificity to change as a useful concept. This is the proportion of those who actually did not change (eg, nonresponders) who are correctly identified as such by the instrument. That said, no group has yet documented specificity to change. The HAM-D has been the main comparator in most sensitivity to change papers.31 The HAM-D, MADRAS, BDI, and HADS have all been compared head to head, but results do not demonstrate any consistent superiority of one scale over another. Vermeersch and associates (2004)32 describe five factors that may influence the sensitivity of a scale: inclusion of irrelevant items, categorical items, items not conducive to detect change, items assessing traits, and items susceptible to floor and ceiling effects. Fundamentally, scales with many items are more likely to be sensitive to subtle changes.
36
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Gold Standard Change
Gold Standard No Change
Instrument Change
A/A + B PPV A
B
Instrument No Change
Total
D/C + D NPV C
D
A/A + C Se to Change
D/B + D Sp to Change
Figure 2.1. Accuracy of change in 2 2 format.
2.
The Classic Severity Scales (1960–1980)
Hamilton Rating Scale for Depression (HAM-D)33 In 1953 Max Hamilton moved to Leeds, where he developed one of the bestknown scales in psychiatry.34 The original HAM-D was developed to quantify severity after an interview had established a diagnosis of depression. Despite its age the HAM-D remains the most commonly used scale in treatment studies, helped by the fact that it is in the public domain.34 Indeed, it may have been a victim of its own success, as independent groups have produced as many as 20 conflicting variations.35 The HAM-D is rather unusual in that it is designed to be administered by a trained clinician on the basis of the clinical condition at the time of the interview. It requires a rather long semi-structured interview, taking 15 to 20 minutes. As such, it is probably not a good choice for screening in busy clinical settings. It was developed before DSM criteria were established for depression and differs significantly from the DSM approach, assessing four of the nine DSM-IV criteria. It may favor somatic presentations, as eight items are related to six somatic symptoms: insomnia, psychomotor retardation, loss of appetite, loss of energy, loss of weight, and loss of libido. There have been other criticisms, such as lack of a single unifying structure; differential item weighing, and limited interrater reliability (although this can be improved).36,37 In the past 5 years several shortened versions of the
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
37
HAM-D have appeared, including a seven-item version and a six-item version.38–40 Using Rasch analysis, Bech and coworkers (1981)41,42 confirmed that six items associated with unidimensionality could be combined. These were depressed mood, guilt, work/interests, psychomotor retardation, anxiety psychic, and general somatic symptoms. Several versions provide standardized explicit scoring conventions and/or structured interview guidance.43
˚ sberg Depression Rating Scale (MADRS)44 Montgomery-A ˚ sberg45 published this 10-item scale in 1979 following Montgomery and A earlier development of the Comprehensive Psychopathological Rating Scale (CPRS).46 Ratings of patients on the 65-item CPRS were used to identify the 17 most common symptoms in depression, which were fieldtested in four antidepressant trials and hence refined to 10 items suggested to show the largest changes with treatment. However, it is a mistake to assume the MADRS is necessarily most sensitive to change (see above); indeed, a meta-analysis showed that the HAM-D has superior sensitivity to change.47 Like the HAM-D, this is a clinician-rated scale designed for a trained interviewer, although a self-rating form was later developed. It covers the clinical condition at the time of the interview and does not specify a timeframe during which the patient should be rated. The 10-item checklist actually consists of 1 observational item and 9 question items that require about 15 minutes of additional interview time. The items covered are apparent sadness, reported sadness, inner tension, reduced sleep, reduced appetite, concentration difficulties, lassitude, inability to feel, pessimistic thoughts, and suicidal thoughts. These items also cover all the DSM-IV criteria for major depression, with the exception of psychomotor retardation or agitation.
Beck Depression Inventory (BDI)48 The original version of this scale was developed by Aaron Beck and colleagues at the University of Pennsylvania and first published in 1961.49 It can be administered by a trained professional or self-administered and covers an explicit 2 weeks before the evaluation (1 week in the original version). The 21-item version requires 5 to 10 minutes. Each item is scored on a consistent scale of 0 to 3, with options presented in a multiple-choice format. A reading age of about 10 years is required for a person who is self-administering the test. In the original publication no timeframe is mentioned, but in the BDI-IA
38
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
revision, this was changed to 1 week and in the BDI-II the time frame was extended to 2 weeks to more closely follow the DSM criteria for MDD. Version II (1996) also replaced body image change, weight loss, somatic preoccupation, and work difficulty with agitation, worthlessness, concentration difficulty, and loss of energy. The scale is considered to emphasize psychological items. In fact, there are eight ‘‘cognitive items’’ (pessimism, past failures, guilty feelings, punishment feelings, self-dislike, self-criticalness, suicidal thoughts or wishes, and worthlessness) and nine ‘‘somatic items’’ (crying, agitation, indecisiveness, loss of energy, change in sleep patterns, change in appetite, concentration difficulties, tiredness and/or fatigue, and loss of interest in sex). Other items are sadness, loss of pleasure, loss of interest, and irritability. The cognitive and somatic items, when considered as subscales, are typically moderately correlated.50 Recently Beck and associates developed the Beck Depression Inventory Fast Screen (BDI-FS) to address possible somatic contamination.51 It contains 7 of the original 21 BDI-II items to assess cognitive and affective aspects of depression, conforming with DSM-IV diagnostic criteria. It was developed to permit more rapid detection of depression in primary care and hospital settings. Original validation data was derived two samples, a group of 500 patients from four psychiatric outpatient facilities and a group of 120 college students. Rasch analysis of BDI has been reported.52 The BDI was administered to 660 adult patients with unipolar depression and examined using factor analysis. BDI was internally consistent but yet distinct in severity rating from the MADRS.53
The Zung Self-Rating Depression Scale (SDS)54 The Zung SDS is a 20-item scale in its original form that takes about 5 to 8 minutes to administer.55 It is the prototypical self-report depression scale. Of the 20 items, half are worded positively (‘‘I feel hopeful about the future’’) and half negatively (‘‘I feel downhearted and blue’’). Each item is consistently rated with a 4-point Likert scale (a little of the time ¼ 1; some of the time ¼ 2; a good part of the time ¼ 3; or most of the time ¼ 4). A meta-analysis summarized validity studies up to 1986.56 A large factor analysis in over 1,000 cancer patients showed a four-factor solution: a cognitive symptom factor, a depressed mood factor, and two somatic factors (eating-related and non–eating-related), accounting for 20%, 13%, 7%, and 8% of the variance on the Zung, respectively.57 Rasch analysis of the Zung SDS has been performed.58 Several short forms have been developed, including a 12-item,59 an 11-item,60 and a 10-item version.61
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
39
The Centre for Epidemiologic Studies Depression Scale62 This 20-item scale was originally developed as a screening instrument for community-based studies from existing scales such as the BDI and Zung SDS.63 It was designed at the U.S. National Institute of Mental Health (NIMH) with government rather than university funding. It bridged both epidemiologic and clinical needs and was first used in an epidemiologic study of Kansas City64 and became the most used depression scale in the 1990s. It includes items concerning low mood and loss of interest but not suicidal ideation. Original psychometric properties were based on three community samples and two psychiatric patient samples consisting of about 5,000 healthy individuals but only 70 adult psychiatric patients. Four of the 20 items are positively worded and reverse scored (negatively keyed). CES-D is designed for self-completion, telephone administration, or webbased administration. The approach is mostly psychological, with some somatic items. The CES-D has four separate factors: low mood, somatic symptoms, positive affect, and interpersonal relations. A revised version has been published, the CESD-R, which is more in line with DSM. There are a variety of short forms, most notably several 10-item versions and a 5-item version.65 Recently Rasch-modeled short forms have been reported in a general population.66 A second model has been applied to the depressed population.67
3.
The New Severity Scales (1981–2008)
Hospital Anxiety Depression Scale (HADS)68 The HADS can be considered the first in a new generation of scales that were shorter, easier to score, and no less accurate than the first generation. It is a relatively brief self-administered rating scale of symptoms and functioning. Anxiety and depression are assessed as separate components, each with seven items that are rated from 0 (no problem) to 3. A cut-off of 7v8 in each subscale is usually recommended, although others have been used.69 Although the scores for the two components have often been added together to give a composite anxiety–depression score (or emotional distress), this is not recommended by the authors. It is a fairly simple scale that does not include somatic and cognitive signs of depression. Limitations are that seven of nine DSM criteria are not covered in the HADS and the reverse rating of some items, together with the random sorting of depression and anxiety questions, can cause confusion. It excludes reduced appetite, weight loss, sleeping disturbances, fatigue, and concentration difficulties and also excludes guilt, worthlessness, and suicidality. Notably, it does not include a
40
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
3000
2500
2000
1500
1000
500
ev en Tw el v Th e irt ee n Fo ur te en Fi fte en Si xt Se een ve nt ee n Ei gh te en
El
e
n Te
t
in N
n
gh ei
x
ve
Si
Se
ur
e
ve Fi
Fo
o
re Th
ne
Tw
O
Ze
ro
0
Figure 2.2. Distribution of HADS-D scores in 18,414 primary care attendees. Adapted from Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition of depressive symptoms in primary care: The Hampshire Depression Project 3. Br J Psychiatry. 2001;179:317–323.
question on low mood per se. These choices may or may not be advantageous in general hospital and primary care settings (see Chapters 10 and 11 for discussion). Despite these limitations, the HADS has found an important place and has been used in impressive studies involving thousands of patients (Fig. 2.2).70–72 Good data are also available on values in nonclinical populations.73
Geriatric Depression Scale (GDS)74 In its original form the GDS consists of a simple list of 30 questions, all of which require a ‘‘yes’’ or ‘‘no’’ answer.75 However, a 15-item version is very commonly used. Ten of the items on the GDS-30 and five of the items on the GDS-15 are negatively keyed (ie, a ‘‘no’’ response is an endorsement of a depressive symptom). The GDS is a self-report instrument, and a telephone version has demonstrated good agreement with the self-report questionnaire. The GDS focuses on the psychological symptoms of depression, particularly changes in mood and thoughts. Few somatic items are included on the GDS—specifically, sleep, appetite, gastrointestinal symptoms, autonomic symptoms, and sexual symptoms
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
41
are not assessed. GDS-30 covers five of the DSM-IV criteria using differing terminology (lowered mood, loss of interest, loss of energy, impaired concentration, and restlessness), and GDS-15 covers three (lowered mood, loss of interest, and loss of energy). Questions about suicidal ideation were intentionally not included, and the scoring of items makes the GDS a poor choice for rating the burden or severity of depression. Rasch analysis of GDS has been reported.76 In one study of 526 people over 65 in home care, the optimal cutoff on the GDS-15 was 5, which yielded a sensitivity of 71.8% and a specificity of 78.2%.77 A systematic review of the GDS found 42 studies with a mean sensitivity of 0.753 and specificity of 0.770 for the GDS-30 and a sensitivity of 0.805 and a specificity of 0.750 for the GDS-15.71 GDS versions showed significantly better validity indices than the ‘‘Yale-1-question’’ screen but were similar to the CES-D. Briefer 10-item, 5-item, and 4-item versions and even a 1-item version have been developed, but their value is currently uncertain.
The Edinburgh Postnatal Depression Scale (EPDS)78 Cox and colleagues developed this scale after noting that some women endorse somatic items on existing scales because of the physiologic changes of childbearing and because of normal postnatal sleep disturbance.79,80 The authors used clinical intuition to identify possible items from questionnaires such as the SAD and HAD scales and the BDI. Thirty items were initially tested, and 13 items that were thought likely to detect mothers with clinical depression were tested on a sample of 60 postnatal women against the Clinical Interview Schedule. After factor analysis this was shortened to the final 10-item scale. Interestingly, the EPDS contains no specific item about mother–baby interaction or about irritability, which allowed its use to be expanded beyond perinatal settings. Its appeal is enhanced by its simple Likert scoring—0 for no presence of the symptom through 3 for marked presence/change in usual state. It incorporates anxiety but not suicidality. Studies suggest that the EPDS includes three factors expressing euthymic mood, anxiety, and depression. Anxiety (items 3, 4, 5, 6, and 7), depression (items 8, 9, and 10), and anhedonia (items 1 and 2) are the main components of the questionnaire, accounting for 63% of the variance.81 A short five-item version of the EPDS was developed after stepwise multiple regression analysis was used to find the combination of items that explains the maximum proportion of the variance of the full-scale sum score in 2,730 women. The selected EPDS items were thereafter correlated with the Hopkins Symptom Check List (HSCL-25)82 for external validation. The five items were ‘‘I have felt sad or
42
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
miserable,’’ ‘‘I have been anxious or worried for no good reason,’’ ‘‘I have been so unhappy that I have had difficulty sleeping,’’ ‘‘I have blamed myself unnecessarily when things went wrong,’’ and ‘‘I have looked forward with enjoyment to things.’’ Rasch analysis of the EPDS suggested that a revised eight-item version (EPDS-8) might provide a more psychometrically robust scale.83 Recent mandated screening programs in Australia and the United States have recommended routine administration of the EPDS, although National Institute for Health and Clinical Excellence (NICE) guidance in the United Kingdom does not.
MOS 8-Item Depression Screener (Burnam Screen)84 This short tool was developed for use in the National Study of Medical Care Outcomes (MOS).85 It was essentially an adaptation of the CES-D, although two items related to duration of symptoms (required for DSM diagnosis/ caseness) were drawn from the DIS. The tool has only eight items, although #7 and #8 are rather unwieldy single questions: 1. I felt depressed, 2. My sleep was restless, 3. I enjoyed life, 4. I had crying spells, 5. I felt sad, 6. I felt that people disliked me, 7. In the past year, have you had 2 weeks or more that you felt sad, blue, depressed, or lost pleasure in things that you usually cared about or enjoyed?, 8. Have you had 2 years or more in your life when you felt depressed or sad most days, even if you felt okay sometimes? (If yes:) Have you felt depressed or sad much of the time in the past year? Validation data were provided by two samples: 3,132 adults in the Los Angeles sample of the Epidemiological Catchment Area (ECA) study, and 525 adults from the Psychiatric Screening Questionnaire for Primary Care Patients (PSP) study. However, a limitation is that a complex scoring algorithm has been suggested. Additionally, in comparison with the NIMH’s Structured Clinical Interview for DSM-IV, the screen had low positive predictive value (Tuunainen et al., 2001).86
The Patient Health Questionnaire (PHQ)87 The PHQ is the self-administered version of the Primary Care Evaluation of Mental Disorders (PRIME-MD) instrument, which was designed to diagnose specific disorders in primary care settings using DSM criteria.88 The whole PRIME-MD has two components: a 1-page patient questionnaire (PQ) and a 12-page clinician evaluation guide (CEG). The PQ, which is completed by the patient before seeing the primary care physician (PCP), consists of 26 yes/no questions inquiring about symptoms that were present during the past month.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
43
The focus is on a depressive episode (the SCID focuses on depressive disorder). The depression module comprises nine questions (PHQ-9). The first two questions (known as the PHQ-2), which refer to the ‘‘cardinal’’ symptoms of anhedonia and depressed mood, can be administered separately as a screening tool. This scale rates the proportion of time from ‘‘0’’ (not at all) to ‘‘3’’ (nearly every day). Rated linearly, a cutoff of 10 is suggested to represent mild depression. However, individual items can be combined according to a DSM-IV algorithm to generate a diagnosis of major or minor depression. The DSM-IV exclusion criteria for a depressive disorder are not included in the PHQ-9; therefore, the PHQ9 diagnosis closely approximates but is not identical to a DSM-IV diagnosis. Validation of the PHQ-9 took place in 6,000 patients in eight primary care clinics and seven obstetricsgynecology clinics.89 The short version of the PHQ is almost as well known as the long version. The PHQ-2 is a two-item screen which uses the first two items from the PHQ that inquire about the frequency of depressed mood (question 2) and loss of interest (question 1) over the past 2 weeks, scoring each as 0 (‘‘not at all’’) to 3 (‘‘nearly every day’’). A score of three points or more on this version of the PHQ-2 is sometimes recommended.81 However, an even simpler version calls for simple ‘‘yes’’ or ‘‘no’’ responses, with a ‘‘yes’’ response to either question constituting a positive screen. The questions are as follows: Over the past month, have you often had little interest or pleasure in doing things? (Yes/ No) Over the past month, have you often been bothered by feeling down, depressed, or hopeless? (Yes/ No). A two-stage screening with the PHQ-2 and then the PHQ-9 has been investigated and is probably more efficient than either test alone. However, when given by pen and paper, the time taken to check if there is a positive PHQ-2 may limit the efficiency saving.
Major Depression Inventory (MDI)90 This self-rated questionnaire aims to help make a diagnosis of major depression, according to either the DSM-IV criteria or the ICD-10 criteria.91 It covers the previous 2 weeks and requires 5 to 10 minutes. An answer of ‘‘more than half of the time’’ to at least 5 of the 10 questions is indicative of major depression. It has 10 questions, although items 8 and 10 each have two subitems, a and b—therefore, it can be considered 12 items. Ratings are consistent from 0 (at no time) to 5 (all of the time), giving a total score from 0 to 50. A score of 4 or more on an item (ie, most of the time) qualifies for the algorithm of ICD-10 or DSM-IV. The ICD-10 algorithm requires a score of 4
44
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
or 5 on two of the three top items and on at least four of the remaining items. The DSM-IV algorithm requires a score of 4 or 5 on five of the nine items (item 4 being excluded), but at least one of these five items must be either depressed mood or loss of interest. Few validation studies or translations of the MDI exist.92 A comparative study of the SDS and MDI in 89 patients with Parkinson’s disease suggested that the MDI is superior to the SDS.93 The largest study compared the MDI in 1,093 persons also interviewed by psychiatrists using SCAN. The specificity of the MDI was 0.22, the sensitivity 0.67, and kappa 0.25 when major depression according to SCAN was considered as the index of validity, and with all depressive disorders the specificity was 0.44, the sensitivity 0.51, and kappa 0.33. More highly educated persons and those with reported disability were less likely to be false negatives.94
4.
The Future of Screening Scales
The ideal scale is one that is very brief, highly acceptable, and very accurate when tested against an accepted reference standard. It may also be an advantage if it obeys current conventional diagnostic rules from ICD or DSM and is freely available but long enough to gauge severity and measure change. It is unclear whether one scale can fulfill all these purposes, but there is a trend to develop ever-shorter scales that attempt to retain high accuracy. All scales must consider the tension between acceptability and accuracy.
Improving Acceptability Following on from the originals, ever-shorter versions of every major scale have been released, usually comprising 10 items or less (Textbox 2.5). A good example is the 8-item Even Briefer Assessment Scale for Depression (EBAS DEP) derived from the 21-item Brief Assessment Scale.95 Of course, eight items might not be short enough for many settings, and in the extreme case single-item methods (applied by pen and paper, verbally, or in visual analog form) have been evaluated. The first ‘‘ultra-short’’ scales began to appear in the 1970s with early visual analog methods of rating mood.96 Just how good are these short and ultra-short scales?97 Whooley and colleagues (1997)98 compared CES-D (20- and 10-item versions), BDI (20- and 13-item versions), Symptom-Driven Diagnostic System for Primary Care (SDDS-PC), and MOS-8 against the Quick Diagnostic Interview Schedule for major depression. Using summary statistics
Table 2.1. Conventional Cutoff Scores for Different Severities of Depression Scale
Abbreviation No Depression (asymptomatic and subsyndromal)
Mild Moderate Severe Depression
Hamilton Depression Scale Beck Depression Inventory Beck Depression Inventory II Geriatric Depression Scale (original) Zung Self-Rating Depression Scale Hospital Anxiety and Depression Scale Montgomery˚ sberg A Depression Rating Scale Center for Epidemiologic Studies Depression Scale Edinburgh Postnatal Depression Scale Patient Health Questionnaire Patient Health Questionnaire (remapped to DSM-IV) Major Depression Inventory
HAM-D
0 to 7
8 to 13
14 to 18
19 to 63
BDI
0 to 9
10 to 16
17 to 29
30 to 63
BDI-II
0 to 13
14 to 19
20 to 28
29 to 63
GDS-30
0 to 9
10 to 19
20 to 30
20 to 30
SDS
0 to 49
50 to 59
60 to 69
70 to 80
HADS-D
0 to 7
8 to 10
11 to 14
15 to 21
MADRS
0 to 6
7 to 19
20 to 34
35 to 60
CESD
0 to 15
16 to 20
21 to 26
27 to 60
EPDS
0 to 9
9 to 12
13 to 30
13 to 30
PHQ-9
0 to 5
6 to 9
10 to 19
20 to 27
PHQ-9
0 to 9
10 to 16
17 to 22
23 to 27
MDI
0 to 13
14 to 19
20 to 26
27 to 50
45
Table 2.2. Summary of Scale Properties Year Scale
Abbreviation Original Items
1960
HAM-D
21
63
BDI
21
63
1961
Hamilton Depression Scale Beck Depression Inventory
Max Rater Score
Copyright
Clinician Public domain Patient Harcourt Assessment
Duration Time Frame 15 min 10 min
1965
Zung Self-Rating Depression Scale
SDS
20
80
Patient
Public domain
5–8 min
1977
Center for Epidemiologic Studies Depression Scale Montgomery˚ sberg Depression A Rating Scale
CESD
20
60
Patient
Public domain
4–5 min
MADRS
10
60
Observer Copyright
1979
10 min
Past week Past few days (BDI) Last 2 weeks (in BDI II) Past several days Past week
Current
Cites Per Year
Suicidality Included?
Somatic Bias (most to least)
237
Yes
#1
225
Yes
#6
84
Yes
#5
256
No
#7
107
Yes
#4
Table 2.2. (Continued) Year Scale
Abbreviation Original Items
1982
GDS-30
30
30
Patient
HADS
14
42
Patient
GDS-15
15
15
Patient
EPDS
10
30
Patient
MOS-8
8
20
Patient
RAND 2–5 min Corporation
PHQ
9
27
Patient
MDI
10
60
Patient
Public domain Elsevier
1983
1986 1987 1988
2001 2001
Geriatric Depression Scale (original) Hospital Anxiety and Depression Scale Geriatric Depression Scale (modified) Edinburgh Postnatal Depression Scale MOS-8 Burnam Screen Patient Health Questionnaire Major Depression Inventory
Max Rater Score
Copyright
Duration Time Frame
Public domain NFERNelson
10 min
Public domain Copyright
5 min
5 min
Past week Past week
Cites Per Year
Suicidality Included?
Somatic Bias (most to least)
94
No
#10
195
No
#6
31
No
#8
50
No
#11
12
No
#9
2–4 min
Past week Past week 2 weeks and 2 years 2 weeks
53
Yes
#2
3–5 min
2 weeks
7
Yes
#3
1–2 min
48
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
(Table 2.3), the optimal tests appear to be MOS-8 > CES-D20 > CESD10 > BDI-20 > BDI-13 >SDDS-PC, with the least accurate method being the PHQ-2. However, even the PHQ-2 was good at excluding nondepressed cases with a high negative predictive value. However, this finding does not allow for test efficiency—that is, correcting for the length of the scale. Such weighting requires an economic evaluation, and such studies are in progress. This finding has since been extended, showing that even single-item mood scales can be valuable, albeit as a form of rule out (reassurance) for those who answer negatively.
Textbox 2.5. Short Versions of Rating Scales (10 items or less) Ten Items EPDS-10 (original) SDS-10 CES-D 10 DEPS-10 MADRS-10 (original) Nine Items PHQ9 HDI-Short Form Eight Items MOS-8 EPDS-8 PHQ-8 EBAS-Dep Seven Items HADS-Depression HADS-Anxiety HAM-D-7 BDI-7 DADS-7 EPDS-7 (depression items) Six Items EPDS-6 HAM-D-6 CES-D-6
Five Items EPDS-5 WHO-5 GDS-5 Emotion Thermometers Four Items GDS-4 Three Items PHQ2 + help question EPDS-3 (anxiety items) Two Items PHQ2 Whooley / NICE 2 Questions BDI-2 EPDS-2 One Item PHQ Q1 PHQ Q2 GDS-1 Distress Thermometer
Short methods improve acceptability, but there may be other techniques to improve uptake. A tool can be administered in the waiting room or by
49
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
Table 2.3. Accuracy of Various Depression Scales Head to Head Questionnaire PHQ2 SDDS-PC MOS-8 CESD20 CESD10 BDI21 BDI13
Sensitivity Specificity 0.96 0.96 0.93 0.93 0.90 0.89 0.92
0.57 0.51 0.72 0.69 0.72 0.64 0.61
PPV
NPV PSI
0.33 0.30 0.42 0.40 0.41 0.35 0.34
0.98 0.98 0.98 0.98 0.97 0.96 0.97
0.31 0.28 0.40 0.38 0.38 0.31 0.31
Youden 0.53 0.47 0.65 0.62 0.62 0.53 0.53
FC
AUC
63.99 59.14 75.75 73.32 75.19 68.47 66.42
0.82 0.86 0.89 0.89 0.87 0.87 0.86
PSI, predictive summary index; PPV, positive predictive value; NPV, negative predictive value; FC, fraction correct; AUC, area under the curve. Data from Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression. J Gen Intern Med. 1997;12(7):439.
mail. Increasingly, questionnaires are becoming computerized and can be given using a Palm Pilot or Tablet or over the Internet (this is discussed further in Chapter 8). The format of a questionnaire can be influential. For example, a single-item visual analog item takes no more time than a verbal item but can quantify a symptom. The seven-item version of the emotion thermometers tested in cancer and cardiovascular settings is shown in the Appendix Figure 5.
Improving Accuracy Algorithmm Approaches In clinical practice, prevalence is typically low (between 10% and 30%), and therefore a high negative predictive value is relatively easy to achieve but a high positive predictive value is difficult. For example, if one applied a screening test with 80% sensitivity and specificity to a sample of 1,000 individuals with a 20% rate of depression, the positive predictive value would be 0.50 and the negative predictive value 0.94 (overall accuracy ¼ 0.80 by fraction correct) (see Appendix Table Single 3). Given that only 50% of those with a positive result would actually have depression, what would happen if you applied a second test to those who scored positive but relied on the results from the first screen for those who scored negative? This is illustrated in Appendix Figure 3. From Appendix Table MultiStep 3 providing the second instrument’s sensitivity and specificity of 80% held for the filtered population, the positive predictive value rises to 0.67 at a cost of a small fall in the negative predictive value to 0.85 (overall accuracy ¼ 0.83). In short, applying a second step to those who screen positive in step 1 favors specificity at a cost of sensitivity but with a gain in overall accuracy. This example of the application of two tests with 80% sensitivity and specificity might
50
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
be unrealistic in clinical practice. Often different test performances are achievable in each step. A difficult question to answer is: What would be best, to choose instruments with high sensitivity or high specificity applied in step 1 or step 2? The answer from Table AP.4 is that it is best to apply the most accurate instrument first, where clinically possible (although often in screening the reverse occurs). If both instruments have the same combined value but different sensitivity and specificity values, the optimal yield can be calculated. The rule of thumb for a two-step approach for a low-prevalence setting is to avoid putting two instruments that favor sensitivity together, particularly if one has high sensitivity in the second step, and this may produce low overall yields. Practical application of two-step approaches have been recently described.99,100 Weighting Specific Items In the future there will be re-examination of the weighting of specific symptoms of depression in relation to depression in each setting. The current concept of depression is that there are certain essential core symptoms that define the disorder and others that contribute to severity.101–104 This may or may not hold true. A scientific understanding of optimal depression items has appeared only in the past 3 years. Zimmerman and colleagues have re-examined the traditional symptoms of depression to discover if all the conventional symptoms listed in DSM-IV or ICD-10 contribute to a diagnosis of depression. The difficulty with this method that there is no accepted gold standard (see Chapter 1). One way around this problem is to simply examine how many fulfill full DSM-IV (or ICD-10) criteria if only certain symptoms are counted. Zimmerman and colleagues proposed combining two core and three psychological symptoms—namely depressed mood, lack of interest, worthlessness, poor concentration, and thoughts of death. Against full DSM-IV, this abbreviated checklist had a sensitivity was 93.7%, specificity 94.8%, positive predictive value 95.5%, and negative predictive value 91.6%. Andrews and associates (2007)105 replicated this finding from data from the 10,641 respondents to the Australian National Survey of Mental Health and Well-Being using the 12-month version of the Composite International Diagnostic Interview. In this study sensitivity was 92.9%, specificity 99%, positive predictive value 94%, and negative predictive value 99.7%. Another method is to start with short versions and only add in items that prove useful. Brody and colleagues (1998)106 found that adding four follow-up questions on sleep disturbance, appetite, anhedonia, and self-esteem to the two-question PRIME-MD markedly improved the specificity while maintaining the sensitivity. Future developments will also take into account aspects of depression not measured by symptom counts alone—for example, tools that measure duration, impact, function, and desire for professional help.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
51
References 1. Wittkampf KA, van Zwieten M, Smits FT, et al. Patients’ view on screening for depression in general practice. Fam Pract. 2008;25:438–444. 2. Jepson R, Clegg A, Forbes C, et al. The determinants of screening uptake and interventions for increasing uptake: a systematic review. Health Technol Assess. 2000;4:14. 3. Grinker RR Sr, Miller J, Sabshin M, et al. The phenomena of depressions. New York: Hoeber, 1961. 4. Nezu AM, Ronan GF, Meadows EA, et al. Practitioners’ guide to empirically based measures of depression. Kluwer Academic/Plenum Publishers 2000. 5. Williams JW, Pignone M, Ramirez G, et al. Identifying depression in primary care: a literature synthesis of case-finding instruments. Gen Hosp Psychiatry. 2002;24(4): 225–237. 6. Gilbody S, Sheldon T, House A. Screening and case-finding instruments for depression: a meta-analysis. Can Med Assoc J. 2008;178:997–1003. 7. http://www.mentaltests.com/cms/mentaltests_list. 8. Parloff MB, Kelman HC, Frank JD. Comfort, effectiveness, and self-awareness as criteria of improvement in psychotherapy. Am J Psychiatry. 1954;111:343–351. 9. Derogatis LR, Lipman RS, Covi L. SCL-90: An outpatient psychiatric rating scale, preliminary report. Psychopharmacol Bull. 1973;9:13–28. 10. Fink P, Ornbol E, Hansen MS, et al. Detecting mental disorders in general hospitals by the SCL-8 scale. J Psychosom Res. 2004;56(3):371–375. 11. Demyttenaere K, De Fruyt J. Getting what you ask for: On the selectivity of depression rating scales. Psychotherapy Psychosomatics. 2003;72(2):61–70. 12. Ruhe HG, Dekker JJ, Peen J, et al. Clinical use of the Hamilton Depression Rating Scale: is increased efficiency possible? A post hoc comparison of Hamilton Depression Rating Scale, Maier and Bech subscales, Clinical Global Impression, and Symptom Checklist-90 scores. Comprehensive Psychiatry. 2005;46(6):417–427. 13. Leentjens AF, Lousberg R, Verhey FRJ. The psychometric properties of the Hospital Anxiety and Depression Scale in patients with Parkinson’s disease. Acta Neuropsychiatr. 2001;13:83–85. 14. Richter P, Werner J, Heerlein A, et al. On the validity of the Beck Depression Inventory. A review. Psychopathology. 1998;31(3):160–168. 15. Shafer AB. Meta-analysis of the factor structures of four depression questionnaires: Beck, CES-D, Hamilton, and Zung. J Clin Psychol. 2005;62(1):123–146. 16. Uher R, Farmer A, Maier W, et al. Measuring depression: comparison and integration of three scales in the GENDEP study. Psychol Med. 2008;38(2):289–300. 17. Faravelli C, Servi P, Arends JA, et al. Number of symptoms, quantification, and qualification of depression. Comprehensive Psychiatry. 1996;37(5):307–315. 18. Huttunen J, Taiminen T, Ka¨hko¨nen J, et al. Depression Scale (DEPS) in schizophrenia. Acta Psychiatr Scand. 1999;99(3):220–222. 19. Alexopoulos GS, Abrams RC, Young RC, et al. Cornell Scale for Depression in Dementia. Biol Psychiatry. 1988;23(3):271–284. 20. Gainotti G, Azzoni A, Razzano C, et al. The Post-Stroke Depression Rating Scale: a test specifically devised toinvestigate affective disorders of stroke patients. J Clin Exp Neuropsychol. 1997;19(3):340–356.
52
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
21. Leeds L, Meara RJ, Hobson JP. The utility of the Stroke Aphasia Depression Questionnaire (SADQ) in a stroke rehabilitation unit. Clin Rehab. 2004;18(2):228–231. 22. Benaim C, Cailly B, Perennou D, et al. Validation of the Aphasic Depression Rating Scale. Stroke. 2004;35:1692. 23. Clements KM, Murphy JM, Eisen SV, et al. Comparison of self-report and clinicianrated measures of psychiatric symptoms and functioning in predicting 1-year hospital readmission. Administration And Policy In Mental Health And Mental Health Services Research. 2006;33(5):568–577. 24. Moller HJ. Rating depressed patients: observer- vs self-assessment. Eur Psychiatry. 2000;15(3):160–172. 25. Rush AJ, Carmody TJ, Ibrahim HM, et al. Comparison of self-report and clinician ratings on two inventories of depressive symptomatology. Psychiatr Serv. 2006;57(6):829–837. 26. Biggs JT, Wylie LT, Ziegler VE. Validity of the Zung Self-Rating Depression Scale. Br J Psychiatry. 1978;132:381–385. 27. Faravelli C, Albanesi G, Poli E. Assessment of depression: a comparison of rating scales. J Affect Disord. 1986;11:245–253. 28. Hunt M, Auriemma J, Cashaw ACA. Self-report bias and underreporting of depression on the BDI-II. J Personality Assess. 2003;80(1):26–30. 29. Vaughan M, Krawiecka M. Sensitivity to change in symptoms of new scales for rating chronic psychotic patients. Int Pharmacopsychiatry. 1979;14(3):121–126. 30. Maier W, Philipp M, Demuth W, et al. Reliability, validity, transferability and sensitivity to change of 3 rival observer rating-scales for the severity of depression (HAM-D, MADRS, BRMS). Int J Neurosci. 1986;31(1–4):288. 31. Bagby RM, Ryder AG, Schuller DR, et al. The Hamilton Depression Rating Scale; has the gold standard become a lead weight? Am J Psychiatry. 2004;161:2163–2177. 32. Vermeersch DA, Whipple JL, Lambert MJ, et al. Outcome questionnaire: Is it sensitive to changes in counselling center clients? J Counsel Psychol. 2004;51(1):38–49. 33. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry. 1960;23:56–62. 34. http://healthnet.umassmed.edu/mhealth/HamD.pdf. 35. Zitman FG, Mennen MF, Griez E, et al. The different versions of the Hamilton Depression Rating Scale. Psychopharmacology. 1990;9:28–34. 36. Bagby RM, Ryder AG, Schuller DR, et al. The Hamilton Depression Rating Scale: has the gold standard become a lead weight? Am J Psychiatry. 2004;161:2163–2177. 37. Williams JB. A structured interview guide for the Hamilton Depression Rating Scale. Arch Gen Psychiatry. 1988;45:742–747. 38. Khullar A, McIntyre RS. An approach to managing depression. Defining and measuring outcomes. Can Fam Physician. 2004;50:1374–1380. 39. McIntyre RS, Konarski JZ, Mancini DA, et al. Measuring the severity of depression and remission in primary care: validation of the HAMD-7 scale. Can Med Assoc J. 2005;173:1327–1334. 40. Bobes J, Bulbena A, Luque A, et al. The sufficiency of the HAM-D6 as an outcome instrument in the acute therapy of antidepressants in the outpatient setting. Int J Psychiatry Clin Practice. 2007;11(2):146–150. 41. Bech P, Gram LF, Dein E, et al. Quantitative rating of depressive states. Acta Psychiatr Scand. 1975;51:161–170.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
53
42. Bech P, Allerup P, Gram LF, et al. The Hamilton Depression Scale: evaluation of objectivity using logistic models. Acta Psychiatr Scand. 1981;63:290–299. 43. Kalali A, Williams JBW, Kobak KA, et al. The new GRID HAM-D: pilot testing and international field trials. Int J Neuropsychopharmacol. 2002;5:S147–S148. ˚ sberg M. A new depression scale designed to be sensitive to 44. Montgomery SA, A change. Br J Psychiatry. 1979;134:382–389. 45. http://www.neurotransmitter.net/depressionscales.html. 46. Asberg M, Montgomery SA, Perris C, et al. A comprehensive psychopathological rating scale. Acta Psychiatr Scand Suppl. 1978;271:5–27. 47. Carroll BJ, Wilson WH. HAM-D and MADRS as depression change measures. In: New Clinical Drug Evaluation Unit (NCDEU) Program Abstracts, 40th Annual Meeting, 2000. Rockville, MD: National Institute of Mental Health, poster number 9. 48. Beck AT, Ward CH, Mock J, et al. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561–571. 49. http://harcourtassessment.com/haiweb/cultures/en-us/productdetail.htm?pid=015– 8018–370. 50. Storch EA, Roberti JW, Roth DA. Factor structure, concurrent validity, and internal consistency of the Beck Depression Inventory-Second Edition in a sample of college students. Depression Anxiety. 2001;19(3):187–189. 51. Beck AT, Steer RA, Brown GK. BDI-II fast screen for medical patients manual. London: The Psychological Corporation, 2000. 52. Bouman TK, Kok AR. Homogeneity of Beck’s Depression Inventory (BDI): Applying Rasch analysis in conceptual exploration. Acta Psychiatr Scand. 1987;76(5):568–573. 53. Uher R, Farmer A, Maier W, et al. Measuring depression: comparison and integration of three scales in the GENDEP study. Psychol Med. 2008;38:289–300. 54. Zung WW. A self-rating depression scale. Arch Gen Psychiatry. 1965;12:63–70. 55. http://healthnet.umassmed.edu/mhealth/ZungSelfRatedDepressionScale.pdf. 56. Lambert MJ, Hatch DR, Kingston MD, et al. Zung, Beck, and Hamilton Rating Scales as measures of treatment outcome: a meta-analytic comparison. J Consulting Clin Psychol. 1986;54(1):54–59. 57. Passik SD, Lundberg JC, Rosenfeld B, et al. Factor analysis of the Zung Self-Rating Depression Scale in a large ambulatory oncology sample. Psychosomatics. 2000;41:121–127. 58. Hong S, Min SY. Mixed Rasch modeling of the Self-Rating Depression Scale incorporating latent class and Rasch rating scale models. Educational and Psychological Measurement. 2007;67(2):280–299. 59. Hulstijn EM, Deelman BG, de Graaf A, et al. The Zung-12: a questionnaire for depression in the elderly. Tijdschr Gerontol Geriatr (Netherlands). 1992;23:85–93. 60. Dugan W, McDonald MV, Passik SD, et al. Use of the Zung Self-Rating Depression Scale in cancer patients: feasibility as a screening tool. Psychooncology. 1998;7(6):483–493. 61. Tucker MA, Ogle SJ, Davison JG, et al. Validation of a brief screening test for depression in the elderly. Age Ageing. 1987;16(3):139–144. 62. Radloff LS. The CES-D scale: a self-report depression scale for research in the general population. Appl Psychol Meas. 1977;1:385–401. 63. http://www.mdlogix.com/cesdr.htm.
54
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
64. Markush RE, Favero RV. Epidemiologic assessment of stressful life events, depressed mood, and psychophysiological symptoms: A preliminary report. In Dohrenwend BS, Dohrenwend BP, eds. Stressful life events: their nature and effects. New York: Wiley, 1974:171–190. 65. Furukawa T, Anraku K, Hiroe T, et al. Screening for depression among first-visit psychiatric patients: Comparison of different scoring methods for the Center for Epidemiologic Studies Depression Scale using receiver operating characteristic analyses. Psychiatry Clin Neurosci. 1997;51:71–78. 66. Cole JC, Rabin AS, Smith TL, et al. Development and validation of a Rasch-derived CES-D short form. Psychol Assess. 2004;16(4):360–372. 67. Chan KS, Orlando M, Ghosh-Dastidar B, et al. The interview mode effect on the Center for Epidemiological Studies Depression (CES-D) scale: an item response theory analysis. Med Care. 2004;42:281–289. 68. Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr Scand. 1983;67:361–370. 69. Bjellard I, Dahl AA, Tangen Haug T, et al. The validity of the Hospital Anxiety and Depression Scale. An updated literature review. J Psychosom Res. 2002; 52:69–77. 70. Sharpe M, Strong V, Allen K, et al. Major depression in outpatients attending a regional cancer centre: screening and unmet treatment needs. Br J Cancer. 2004;90:314–320. 71. Martin CR, Thompson DR, Barth J. Factor structure of the Hospital Anxiety and Depression Scale in coronary heart disease patients in three countries. J Eval Clin Pract. 2008;14(2):281–287. 72. Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition of depressive symptoms in primary care: The Hampshire Depression Project 3. Br J Psychiatry. 2001;179:317–323. 73. Crawford JR, Henry JD, Crombie C, et al. Normative data for the HADS from a large non-clinical sample. Br J Clin Psychol. 2001;40:429–434. 74. Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a geriatric depression screening scale: a preliminary report. J Psychiatr Res. 1983;17:37–49. 75. www.stanford.edu/~yesavage/GDS.html. 76. Tang WK, Wong E, Chiu HFK. The Geriatric Depression Scale should be shortened: results of Rasch analysis. Int J Geriatr Psychiatry. 2005;20(8):783–789. 77. Marc LG, Raue PJ, Bruce ML. Screening performance of the 15-item Geriatric Depression Scale in a diverse elderly home care population. Am J Geriatr Psychiatry. 2008;16(11):914–921. 78. Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression: development of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987;150:782–786. 79. Wancata J, Alexandrowicz R, Marquart B, et al. The criterion validity of the Geriatric Depression Scale: a systematic review. Acta Psychiatr Scand. 2006;114(6):398–410. 80. www.aap.org/practicingsafety/Toolkit_Resources/Module2/EPDS.pdf. 81. Cox J, Holden J. Perinatal mental health—A guide to the EPDS. RCPsych Publications, 2003. 82. Chabrol H, Teissedre F. Relation between the Edinburgh Postnatal Depression Scale scores at 2–3 days and 4–6 weeks postpartum. J Reprod Infant Psychol. 2004;22:33–39.
2 OVERVIEW OF DEPRESSION SCALES AND TOOLS
55
83. Hesbacher PT, Rickels K, Morris RJ, et al. Psychiatric illness in family practice. J Clin Psychiatry. 1980;41:6–10. 84. Burnam MA, Wells KB, Leake B, et al. Development of a brief screening instrument for detecting depressive disorders. Med Care. 1988;26:775–789. 85. Pallant JF, Miller RL, Tennant A. Evaluation of the Edinburgh Post Natal Depression Scale using Rasch analysis. BMC Psychiatry. 2006;6:28. 86. www.patient.co.uk/showdoc/40025272/. 87. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study. JAMA. 1994;272:1749–1756. 88. Tuunainena A, Langer RD, Klauber MR, Kripke DF. Short version of the CES-D Burnam screen for depression in reference to the structured psychiatric Interview. Psychiatry Research 2001; 103: 261–270. 89. Kroenke K Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–613. 90. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care. 2003;41:1284–1292. 91. http://www.gp-training.net/protocol/psychiatry/who/mdi.doc. 92. Fountoulakis KN, Iacovides A, Kleanthous S, et al. Reliability, validity and psychometric properties of the Greek translation of the Major Depression Inventory. BMC Psychiatry 2003;3:2. 93. Bech P, Wermuth L. Applicability and validity of the MDI in patients with Parkinson’s Disease. Nord J Psychiatry. 1998;52:305–309. 94. Forsell Y. The Major Depression Inventory versus schedules for clinical assessment in neuropsychiatry in a population sample. Soc Psychiatry Psychiatric Epi. 2005;40(3):209–213. 95. Weyerer S, Killmann U, Ames D, et al. The Even Briefer Assessment Scale for Depression (EBAS DEP): its suitability for the elderly in geriatric care in English- and German-speaking countries. Int J Geriatr Psychiatry. 1999;14(6): 473–480. 96. Folstein M. Reliability, validity, and clinical application of visual analog mood scale. Psychol Med. 1973;3:479. 97. Blank K, Gruman C, Robison JT. Case-finding for depression in elderly people: balancing ease of administration. J Gerontol A Biol Sci Med Sci. 2004;59:M378–M384. 98. Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression. J Gen Intern Med. 1997;12(7):439. 99. Thombs BD, Ziegelstein RC, Whooley MA. Optimizing detection of major depression among patients with coronary artery disease using the Patient Health Questionnaire: Data from the Heart and Soul Study. J Gen Intern Med. 23(12): 2014–2017. 100. Bech P, Rasmussen N, Olsen R, et al. The sensitivity and specificity of the MDI using the Present State Examination as the index of diagnostic validity. J Affect Disord. 2001;66:159–164. 101. Mitchell AJ, Baker-Glenn EA, Park B, et al. Can the distress thermometer be improved by additional mood domains? Part II: What is the Optimal Combination of Thermometers? Psychooncology. 2009 [e-pub March 18]. 102. Evans KR, Sills T, DeBrota DJ, et al. An item response analysis of the Hamilton Depression Rating Scale using shared data from two pharmaceutical companies. J Psychiat Res. 2004;38:275–284.
56
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
103. Maier W, Philipp M. Improving the assessment of severity of depressive states: a reduction of the Hamilton Depression Scale. Pharmacopsychiatry. 1985;18: 114–115. 104. Gibbons RD, Clark D, Kupfer DJ. Exactly what does the Hamilton Depression Rating Scale measure? J Psychiat Res. 1993;27:259–273. 105. Andrews G, Slade T, Sunderland M, et al. Issues for DSM-V: Simplifying DSM-IV to enhance utility: the case of major depressive disorder. Am J Psychiatry. 2007;164: 1784–1785. 106. Brody DS, Hahn SR, Spitzer RL et al. Identifying Patients With Depression in the Primary Care Setting:A More Efficient Method. Arch Intern Med. 1998;158:2469– 2475.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION? Alex J. Mitchell
1. 2. 3. 4. 5.
Introduction to the Problem of Over- and Under-Detection Predictors of Detection Patient and Clinician Influences on Detection Illness-Related Influences on Detection Conclusions
Context Hundreds of studies reveal than most cases of depression remain undetected and untreated. Yet there is growing concern that efforts to increase detection of depression entail unacceptable numbers of persons who are not depressed nonetheless being given a diagnosis and receiving medication. What factors underlie false-positive and false-negative errors? How might clinicians and services address these detection errors?
1.
Introduction to the Problem of Over- and Under-Detection
Only about half of primary care practitioners (PCPs) feel confident in diagnosing depression or assessing suicide risk.1–6 Yet the issue of underdetection is by no means confined to PCPs7–13 or to depression.14,15 Convincing data show that clinicians in all medical specialties have difficulty recognizing mental disorders. This includes depression, anxiety, and delirium and dementia.16,17 Less discussed in the literature but increasingly recognized as important is the issue of overdetection. In this chapter I will review the predictors of diagnostic errors (false positives and false negatives) with 57
58
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
reference to depression in primary care. I will focus on two essential barriers to correct identification: communication and illness complexity. To meaningfully discuss errors in recognition, it is important to first establish baseline rates of depression. Prevalence exerts a powerful influence upon detection accuracy, not least because clinicians usually have a higher index of suspicion for high-risk patients. The World Health Organization (WHO) study on Psychological Problems in General Health Care (PPGHC), conducted across 14 countries, found that 26% of individuals visiting their PCP had at least one psychiatric disorder as defined by ICD-10 criteria.18 Fourteen percent had major depression. Almost identical rates were reported from the European Study of the Epidemiology of Mental Disorders (ESEMeD).19,20 If one examines depression in older people, the point prevalence of major depression is lower in rural than urban primary care practices (8.3% versus 14.8%).21 Further, if one combines a 14% rate of major depression with 10% who have minor depression, then the combined rate approaches 25%.22
How Many Cases of Depression Are Detected in Routine Care? Approximately 100 studies concerning the unassisted recognition rate of depression in primary care have been published, but only a third have used a robust semi-structured interview as a gold standard.23 Of these at least 10 have had samples of more than 1,000 and 17 studies examined both the ability of clinicians to rule in and rule out a diagnosis (see table 3.1). From these studies PCPs’ pooled sensitivity is 48% and specificity 70%. At a prevalence of 16%, the positive predictive value (PPV) is 21.4% and the negative predictive value (NPV) is 87.4%. In a low-risk sample where the prevalence is 10%, the PPV becomes 14% and NPV 92%. This is best illustrated in a Bayesian plot of conditional probabilities (Fig. 3.1). Looked at descriptively at a prevalence of 16%, an average PCP would correctly identify 8 out of 16 depressed cases, missing 8 true positives. He or she would correctly reassure 57 out of 84 non-cases but falsely diagnose 27 people as depressed (Fig. 3.2). Thus, the number of correctly identified people per 100 screened would be 64 (the number needed to screen would be 3.5 to correctly identify one true case or non-case). Out of every five cases thought to be depressed, only one would be a true case (PPV = 21.4%). Out of every 10 cases thought to be well, approximately 9 would be correctly reassured (NPV = 87.4%). In a low-risk sample (such as a rural practice) where the prevalence is 10%, an average PCP would correctly identify 5 out of 10 cases, missing 5 true positives, and would correctly reassure 60 out of 90 non-cases, falsely diagnosing 30 people as depressed. In a high-risk sample (such as patients with
59
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
known physical disease), at a prevalence of 25%, Bayesian analysis suggests that an average PCP would correctly identify 12 out of 25 cases, missing 13 true positives, and would correctly reassure 50 out of 75 non-cases, falsely diagnosing 25 people as depressed.
Post-test Probability
1.00 0.90 0.80 0.70
Unassisted Attempt to Rule-In Depression Unassisted Attempt to Rule-Out Depression Baseline Probability
0.60 0.50 0.40 0.30 0.20 0.10 Pre-test Probability
0.00 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3.1. Bayesian plot of conditional pre-test/post-test probabilities.
Prev 25%
12.0
13.0
50.4
Non-Depressed
Depressed
Prev 10%
5.2 4.8
24.6
60.5
29.5
Depressed
Non-Depressed False Negatives (%) Correctly Diagnosed (%) Correct Reassured (%)
Prev 16%
8.1
7.5
Depressed
56.7
False Positives (%)
27.6
Non-Depressed
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 90.0 95.0 100.0
Figure 3.2. Rates of correct and incorrect identification per 100 selected cases in primary care.
60
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Clinicians do less well with minor depression as well as mild depression—a problem that is shared by those using screening tools as well.24 Underrecognition converts into undertreatment, as recognized patients are more likely to be offered mental health interventions.25 Data from ESEMeD shows that only 15.1% of those with an identified mood disorder and 23.2% with an anxiety disorder received either drug or psychological treatment.26 Maginn and colleagues (2004)27 found that PCPs recorded active management of a psychological problem in 37% of patients whom they rated as cases. Of these, 24% were prescribed psychoactive drug treatment, 5% were referred to psychiatric or psychological services, and 3% were offered both drug and psychological treatments. Surprisingly, only 5% were offered a follow-up appointment with their PCP. Wittchen and colleagues28 found somewhat more favorable rates of conversion to treatment in a large study of 20,421 primary care patients in Germany. After correctly identifying depression (according to the ICD-10 definition), doctors prescribed drug treatments in 60.8%, prescribed non-drug treatments in 24.9%, and referred the patient to a mental health specialist in 10%. The take-home message is that the typical proportion of recognized patients offered treatment from the large ESEMeD, PPGHC, and INSERM studies is approximately 20%.
Textbox 3.1. Case History: An Example of a Difficult Case? A previously well 58-year-old man comes to see his GP for the first time soon after discharge from hospital with a dominant hemisphere stroke from which he has difficulty walking and word finding. His main complaints are physical, notably discomfort on walking, fatigue, loss of appetite, and insomnia. His GP is not sure if he is depressed but asks about low mood and low of interest. Mood is indeed low since the stroke and motivation is poor, but interest, weight, and concentration are preserved. There is no hopelessness, guilt, or suicidal thoughts.
Understanding Detection Errors To go beyond raw rates of detection accuracy, detailed studies examining the types of diagnostic error are needed. Tiemens and colleagues (1999)12 found that that only 26% of missed cases (false negatives) were complete omissions, while 25% were underestimates of severity (eg, diagnosing subthreshold instead of mild) and 38% were misidentifications. Conversely, of false-positive diagnoses, 35% were overestimates of severity, 24% were misdiagnoses, and 41% were complete errors. Diagnostic errors are illustrated in Figure 3,
61
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
using data from Wittchen and colleagues (2002).16 It can be seen that when deliberating both true cases and true non-cases, there is about a 25% rate of uncertainty, which is an area for improvement. It also helps explain the considerable variance between recognition studies, as these possible cases are sometimes included in those detected and sometimes in those missed. In the MAGPIE study, Bushnell and associates (2004)29 found that 38% of depression cases were not recognized. Reasons for this were not categorizing the patient’s psychological issues as clinically significant (23.4%), recognizing clinical significance but not ascribing a particular diagnosis (7.1%), or the PCP making an explicit diagnosis of something other than depression (7.7%). What, then, distinguishes one clinician from another? Rogers (2001)30 suggested several types of common clinical error when attempting to make a psychiatric diagnosis: idiosyncratic language in clinical questioning, idiosyncratic coverage in clinical questioning, idiosyncratic sequence of clinical questioning, idiosyncratic recording of responses and idiosyncratic rating of severity. (a) 60.0
50.0
40.0
30.0
20.0
10.0
0.0 tly
ill
se
n rre
ot
N
cu
ne
rli
de
r Bo
se
se
ca
ild
ca
e at
er
M
M
od
se
se
ca
re
ca
re
ca
ve
ve
Se
ry
se
Ve
Figure 3.3a. and 3.3b. Severity estimates by general practitioners of nondepressed and depressed patients. Adapted from Wittchen HU, Kessler RC, Beesdo K, et al. Generalized anxiety and depression in primary care: prevalence, recognition, and management. J Clin Psychiatry. 2002;63(suppl 8):24–34.
62
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE (b) 60.0
50.0
40.0
30.0
20.0
10.0
0.0 ly
t en
ill
se
rr
ot
N
cu
ne
li
er
d or
se
se
ca
ild
ca
a
er
M
od
M
B
Figure 3.3a and 3.3b
2.
te
se
se
ca
re
ca
re
ca
ve
ve
Se
y er
se
V
(Continued)
Predictors of Detection
There have been some impressive studies examining what factors influence correct detection, although few concerning the influences upon willingness to look for symptoms of depression. Borowsky and colleagues (2000)31 conducted an impressive study involving 19,309 patients from 349 PCPs in Boston, Chicago, and Los Angeles. All underwent the MOS eight-item Burnam screen for depression, and 1,610 underwent a Diagnostic Interview Schedule (DIS) for DSM-III. Of the patients, 661 were depressed, although only 70 had current major depression. Physicians were less likely to detect depression in African Americans, men, and those younger than 35 years and more likely to detect depression when comorbid hypertension or diabetes was present. Hickie and colleagues (2001)32 looked at a large sample of 46,515 patients attending 386 PCPs; 56% of cases were not recognized. This is probably the most comprehensive study of predictors of recognition available. Patients were more likely to be assessed psychologically if they were middleaged, female, Australian-born, unemployed, single, or presenting with mainly psychological symptoms or for psychological reasons. Doctor characteristics
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
63
associated with willingness to assess were being over 35 years old, having an interest in mental health, having had previous mental health training, being in part-time practice, seeing fewer than 100 patients per week, and working in regional centers. Thompson and colleagues (2001)33 examined recognition among 156 PCPs in the United Kingdom, involving 18,414 individuals. The prevalence of depression was 20% based on a 7v8 cutoff on the HADS depression subscale. The mean recognition sensitivity was 36% and recognition specificity was 91.5% (Fig. 3.4). Women and unemployed people were more likely to be detected, while the elderly and retired were more likely to be missed. However, these relationships were confounded by severity of depression or anxiety: increased anxiety improved recognition of depression. Wittchen and colleagues (2002)16 conducted a large study of PCP recognition in Germany. This impressive nationwide study recruited a total of 20,421 patients, attending 633 PCPs. Taking the doctors’ decision of definite or probable depression, 75% of all DSM and 59% of all ICD-10 diagnoses were 0.3
0.25
Proportion Missed Proportion Recognized
0.2
0.15
0.1
0.05
el v Th e irt ee n Fo ur te en Fi fte en Si xt Se een ve nt ee n Ei gh te en N in et ee n Tw e nt Tw y en ty -o ne
en
Tw
El ev
Te n
Ei gh t N in e
0
Figure 3.4. Burden and detection of depression by Hampshire (U.K.) general practitioners. 36% of depression (blue) was detected and 64% was missed (red). 72.6% of all omissions occurred at a HADS-D score of between 8 and 10. Adapted from Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition of depressive symptoms in primary care: The Hampshire Depression Project 3. Br J Psychiatry. 2001;179:317–323.
64
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
recognized by the treating physician, albeit with an 11.7% false-positive rate. Multiple logistic regression revealed that recognition was associated with prior treatment episodes, increasing number of depression symptoms, patient’s higher age, practice experience of greater 5 years, and the presence of psychomotor retardation. In the MAGPIE study from New Zealand, 63.7% of patients with a CIDI-diagnosed disorder were recognized as having psychological problems, although only 40% were recognized as having a clinically significant psychological problem and only 33.8% were given an explicit diagnosis.28 In those seen five or more times during the previous year, these recognition figures increased to 80.2% compared with 28.8% among patients not seen in the previous year. Maginn and associates (2004)26 examined PCP recognition of distress in South London. Overall, PCPs identified 65% of cases, but Black African patients were less likely to be detected or treated than Black Caribbean and White English patients. Willingness to talk to the doctor about psychological problems was the main predictor of detection. Ethnicity did not independently predict detection, but Black African individuals were less likely to talk to their PCP about psychological problems. Worryingly, half as many Black African individuals with detected distress were offered treatment compared with English cases (41% versus 22%). Pfaff and Almeida (2004)34 found that 39.9% of patients (87/218) were correctly classified as depressed by their PCP. Older patients were more likely to be incorrectly classified as ‘‘not depressed’’ by their PCP when they were born outside of Australia or New Zealand, did not smoke or use sleeping tablets, acknowledged milder levels of depression, and presented with primarily somatic complaints. Aragones and colleagues (2004)35 screened 209 Zung-positive patients and 97 negative patients with the SCID. Detection was associated with educational level, severity of the depression, level of impairment, and the complaint of explicit psychological symptoms. Antidepressant treatment was associated with marital status, severity of and impairment from the depression, frequency of visits to the family physician, and the patient’s complaint of psychological symptoms. Aragones and colleagues went on to study of predictors of falsepositive diagnoses (2006)36 and found that PCPs had a nearly 50% rate of falsepositive diagnosis. Factors associated independently with overdiagnosis were higher levels of symptoms SDS score, lower Global Assessment of Functioning, a previous history of depression, and the absence of generalized anxiety. Nuyen and colleagues (2005)37 found that among 191 depressed primary-care patients diagnosed using the CIDI, 28.8% were recognized and recorded by PCPs over the same period. Patients without chronic somatic comorbidity, with a lower educational level, with less severe depression, and with fewer PCP contacts were all significantly more likely not to be diagnosed as depressed. Verhaak and coworkers (2006)38 conducted a survey of primary care contacts of patients with a DSM-IV diagnosis of affective disorder,
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
65
anxiety disorder, or alcohol abuse. Forty percent visited their PCP but received only a somatic diagnosis and 50% were given a psychological or social diagnosis at least once during 1 year. The chances of a psychological PCP diagnosis increased with the number of PCP contacts. Patients who were given a psychological or social diagnosis by their PCP had a higher GHQ score, lower mental functioning scores on the SF-36, and far more visits to their PCP than those not diagnosed as psychologically ill. Finally, patients given a diagnosis tended to express slightly more confidence in their PCP. McCall and colleagues (2007)39 looked at predictors of recognition of distress in Austrian primary care practice. Twenty-eight PCPs completed a clinical audit on 868 of their patients who completed the GHQ-28. PCPs correctly identified 43% of GHQ-positive cases as having distress. For individual PCPs the rate of correct recognition varied considerably, from 4% to 100%. Correct recognition was associated with years of experience as a PCP, older age of patient, and greater severity of distress. Clearly, there is a wide variation in the ability of GPs to diagnose mental health problems, due in part to differences in knowledge, skills, and attitudes (Textbox 3.2).40,41 Most clinicians have difficulty recalling the current criteria for major depression.42 Further, only one third claim for make diagnoses based on validated criteria.43 Self-confident, outgoing physicians with high academic ability appear to make more accurate diagnoses44—yet this same formula would apply to psychiatrists’ ability to detect physical illness. One apparently simple solution is to increase the length of the consultation. There is reasonably good evidence that short appointments impair detection in difficult cases.45 However, paradoxically, lengthening the consultation may not improve recognition.46 Verhaak and colleagues (2007)47 found that in general, healthcare system characteristics do affect PCPs’ performance in psychosocial care. PCPs’ workload was not related to their awareness of psychological problems and hardly related to their communication, except for the finding that a PCP with a subjective experience of a lack of time is less patient-centered (Textbox 3.3).48 Textbox 3.2. Possible Barriers to Recognition (Diagnostic Barriers) Patient Related Younger patient Male gender Reluctance to seek help Reluctance to disclose symptoms Disclosure of only somatic symptoms Low awareness of emotional symptoms Fear of stigma/label of mental illness
66
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 3.2. (Continued) Clinician Related Low clinician confidence and skills Low therapeutic alliance Low consultation time Single appointment only Low index of suspicion Rare inquiry about depressive symptoms Caution re: stigma of mental illness
Textbox 3.3. Basic Patient-Centered Interviewing Method Step 1. Welcoming Welcome the patient Introduce self and identify specific role Ensure patient comfort and privacy Step 2. Set agenda Indicate time available and objective Summarize what is already known and others involved Indicate own needs Clarify what patient wants to discuss Step 3. Non-focused interviewing Open-ended beginning question: ‘‘How have things been recently?’’ Attentive (active) listening (with prompts): ‘‘That sounds difficult’’ Observe nonverbal cues Step 4. Focused interviewing Obtain description of main problem and secondary problems Clarify the development and context of the problems Ask about emotional and functional impact of the problems Step 5. Transition to agreed action Give brief summary and check accuracy
3.
Patient and Clinician Influences on Detection
Do Patients Volunteer Symptoms of Depression? It should be no surprise that recognition of distress and depression is linked with the number of symptoms reported during a consultation.49 Recognition is
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
67
facilitated when patients report psychological symptoms of anxiety or depression early in the consultation.50 Patients who normalize or minimize their symptoms are less likely to be identified.51 It has been reported that detection rates may be 100% in those who spontaneously complain of emotional problems.52 However, patients do not usually complain of ‘‘depression,’’ and patients’ views about their depressive symptoms are significantly different from conventional medical views.53,54 Many groups have noted that patients with depression often present with physical symptoms rather than psychological complaints, and the depression is less likely to be recognized as a consequence.56–62 Perhaps 60% to 70% of patients with depression and anxiety have predominantly somatic presentations.63,64 Such patients tend to be older and have less severe depression but not necessarily more comorbid physical illness. Many authors have shown that patients are often reluctant to discuss emotional issues with health professionals.65–67 Patients have their own readiness to disclose.68 Indeed, willingness to discuss emotional issues may be one of the strongest predictors of detection.69 Some ethnic groups (whites and Hispanics) appear more likely to communicate with a clinician about depression than others (African Americans).70 However, most patients will discuss psychological symptoms if asked.71,72 Reassuringly, Davenport and associates (1987)73 found that there is some association between severity of distress and spontaneous verbal cues, but this is by no means a perfect correlation, and those clues are easily overlooked. O’Conner and colleagues (2001)74 examined 1,021 older patients in Melbourne, Australia. Symptom disclosure was associated with higher depressive scores, previous contact with a psychiatrist, and female gender; even so, 48% of persons with ICD-10 moderate or severe depressive episode had not reported any current complaints to their doctor at the time of the interview. In the MAGPIE study 30% of all primary care patients of all patients (and 37% of patients with current psychological symptoms) did not disclose their psychological problems spontaneously; younger patients, those consulting more frequently, and those with greater psychiatric disability were more likely to report non-disclosure.75 However, in this study, reported nondisclosure did not influence detection rates. Verhaak and colleagues76 collected comprehensive data on detection rates from consultations across 10 European countries and found low rates of spontaneous emotional complaints. What, then, are the reasons for not discussing emotional difficulties? The most frequently given reason in the MAGPIE study was the belief that the PCP is not the ‘‘right’’ person to talk to (33.8%) or that mental health problems should not be discussed at all (27.6%). In a survey of primary care attendees who were high scorers on the GHQ, more than 75% had not mentioned any emotional problems during a consultation.77 Thirty-six percent felt they were able to cope without emotional help, but 45% gave reasons including
68
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
psychological embarrassment and hesitation to trouble the doctor, and a further 19% were deterred by the doctors’ interview behaviors (see below). Thirtynine percent felt there was little the doctor could do to help with their emotional problems. In a study by Del Piccolo and associates (1998),78 about two thirds of patients with stressful life events and social problems had mentioned them to their PCP. A positive attitude about confiding and emotional distress were the best predictors of confiding. In women, past confiding and a longstanding relationship with the PCP were also important. Pollock79 summarized the difficulty, stating that medical consultations are difficult encounters for most patients, who often strive to protect their privacy and personal integrity by ‘‘maintaining face,’’ but this in turn may impede the diagnostic process.
Do Clinicians Ask About Depression? Communication behaviors of clinicians have been much discussed. Individual clinicians differ in their communicative style, with some more patient-centered and others less so, but most adjust their style according to the situation, such as illness severity.79–81 In a large study recording responses of PCPs to standardized patients, biomedical inquiry/explanations, nonspecific acknowledgment, and reassurance were common, whereas empathy, expressions of uncertainty, and exploration of psychosocial factors and emotions were uncommon.82 Yet in consultations about psychosocial issues, doctors show more emotional behavior, ask more questions, and give less information than in other consultations.83,84 Feldman and colleagues (2007)85 found that history taking about depression was directly associated with the likelihood of a chart diagnosis of depression and the provision of minimally acceptable initial depression care. When PCP decisions for late-life depression were monitored, a recorded treatment decision occurred in about 5% of visits, a deferred or monitor-only decision occurred in about a third of visits, and no decision was made in about half of visits.86 Saltini and coworkers (2004)87 found that although occupational, financial, and housing problems and life events of loss were the most important predictors of the GHQ12 case definition, PCPs gave significantly more importance to psychiatric treatment, psychopharmacological drug, use and chronic illness. A number of authors have commented on suboptimal communication strategies from clinicians.88 Inadequate interview and diagnostic skills influence detection.89,90 For example, clinicians appear to miss most cues and concerns and adopt behaviors that discourage disclosure.91,92 More sophisticated analysis with video recording of consultations is revealing. In one of the best examples, Deveugele and colleagues (2004)93 analyzed 2,095 consultations from 168 PCPs using the Roter Interactional Analysis System. Clinicians differed markedly in their psychosocial and emotional communication. Some
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
69
studies attempt to go further and uncover an explicit link with detection. In a seminal study from Marks and associates (1979),94 a research psychiatrist made detailed observations on 2,098 interviews carried out by 55 PCPs. The authors found that PCPs who had a better conceptual understanding of mental illness produced a more accurate diagnosis of the patient’s condition. They also noted that PCPs with an interest in psychological medicine, those with higher levels of empathy, and those who asked about social and family problems more accurately diagnosed psychiatric illness. Badger and colleagues (1994)95 found two communication behaviors that predicted successful recognition of depression: the proportion of the interview devoted to emotional issues and the use of broad, open-ended psychosocial questions. Carney and coworkers (1999)96 found that PCPs who recognized depression asked twice as many questions about feelings and affect compared with those who did not. In a series of interviews, Rost and colleagues (2000)97 found that physicians and patients discussed depression in 47.9% of untreated patients. Chronic physical comorbidity decreased the odds that physicians and untreated patients discussed depression as a possible diagnosis. Interestingly, PCPs who have a preference for psychotherapy rather than antidepressant treatment also appear more accurate in diagnosing depression.98 There are a number of important barriers to detection, including clinician attitude (Textbox 3.4). Saltini and associates (2004)99 found that although occupational, financial, and housing problems and life events of loss were the most important predictors of the GHQ-12 case definition, PCPs gave significantly more importance to psychiatric treatment, psychopharmacological drug use, and chronic illness. Travado and colleagues6 found that low
Textbox 3.4. Top 10 GP Perceived Barriers to Dealing with Depression 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Lack of access to mental health specialists (51.4%) Lack of time (50.6%) Poor reimbursement for depression treatment (50.4%) Distracted by other presenting problems (39.4%) Patient reluctant to be referred to a specialist (37.3%) Workload prevents adequate attention to depression (32.3%) Patient/family reluctance to accept diagnosis of depression (21.7%) Patient inability/unwillingness to discuss depressive symptoms (16.2%) Lack of accessible assessment tools for depression (15.9%) Patient reluctant to begin antidepressant medications (8.6%)
Adapted from Richards JC, Ryan P, McCabe MP, et al. Barriers to the effective management of depression in general practice. Aust N Z J Psychiatry. 2004;38:795–803.
70
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
psychosocial orientation and burnout symptoms were associated with lower confidence in communication skills and higher expectations of a negative outcome after physician–patient communication. In a study of 50 PCPs and 473 patients in Portland, Oregon, routine office visits were audiotaped and analyzed for communication behaviors and emotional tone using the Roter Interactional Analysis System.100 Physicians with more positive attitudes to psychosocial aspects of patient care had more psychosocial discussions in visits. A large-scale practice audit in Australia found that PCPs with a declared interest in mental health and those who had obtained mental health training were more likely to see more patients with depression and more likely to provide appropriate mental health assessment and treatments. In some studies insufficient undergraduate and postgraduate training is influential,101 as well as insufficient time devoted to adequate diagnostic assessment, and a lack of acquisition of new knowledge relevant to provision of treatments. Three recent observation studies have examined physician habits in relation to late-life depression. In a study based in nine primary care clinics involving 1,023 individuals, Fischer and colleagues (2003)102 found that physicians were only 6% as likely to ask older depressed patients about suicide risk and about one-fifth as likely to ask if they felt depressed compared with younger depressed patients. Tai-Seale and colleagues (2005)103 observed 389 elderly patients and 33 physicians using video of their clinical interactions. Physicians assessed depression in only 14% of the visits and used validated tools only three times. Depression assessment was more likely in visits that covered multiple topics, contrary to the ‘‘crowding-out’’ hypothesis. Tai-Seale et al (2007)104 observed 35 PCPs interviewing 366 of their elderly patients. Discussion of mental health topics occurred in only 22% of visits despite a high prevalence of depression. A typical mental health discussion lasted approximately 2 minutes.104 Adelman and colleagues (2008)105 audiotaped 482 follow-up visits at three sites. Depression was discussed in 7.3% of medical visits. Physicians raised the topic of depression in 41% of visits, patients raised the topic in 48% of visits, and accompanying persons raised it in 10% of visits. The topic of depression was raised almost exclusively in the first 2.5 years of the patient– physician relationship. Physicians with some geriatric training were more likely to discuss depression. However, it is important to remember that patient and clinician communication are reciprocally related. Patient perceptions of how the PCP related to him or her in the consultation correlates with reduction in symptom severity 3 months later.106 Goldberg and colleagues (1993)107 found that patient cues were influenced by the PCP’s behavior, increasing with patient-centered behaviors such as empathic statements or directive questioning about psychological issues, and decreasing with medical questions and other doctor-led
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
71
behaviors. Similarly, others found that the patient’s willingness to disclose information is related to physician facilitation, and patient emotional expression is associated with a warm and empathetic attitude of the physician.108 Physicians may signal to patients, wittingly or unwittingly, how emotional problems will be addressed, influencing how patients perceive their interactions with physicians regarding emotional problems. Del Piccolo and coworkers (2000)109 also found that the proportion of cues given by patients was related also to the PCP’s verbal behavior, increasing with closed psychosocial questions and decreasing with the use of active interview techniques. In fact, patients with detected distress gave more cues, often with psychological content, whereas patients with undetected distress gave mainly cues related to their lifestyle and life episodes. Recently, an international study by Verhaak and colleagues (2007)76 found that eye contact and empathy and asking questions about psychological or social topics were associated with more awareness of patients’ psychological problems. One other important predictor of diagnostic sensitivity (recognition) includes the amount of contact with the patient.110,111 In the MAGPIE study from New Zealand, 80.2% of cases seen five or more times during the previous year were correctly identified, compared with 28.8% of those patients not seen in the previous year. For example, over time, only 30% remain undetected at 1 year and 14% at the end of 3 years.112,113 Using patient self-report regarding the adequacy of diagnosis/treatment, Jackson and colleagues114 found that the cumulative recognition rate was a modest 56% for major depression and 20% for minor depression, even after 5 years.
4.
Illness-Related Influences on Detection
There is some evidence that clinicians find mental illness difficult to deal with and awkward to diagnose. For example, PCPs in the United States appear reluctant to code patients as depressed.115 Somatic complaints thought to have a psychological basis are also perceived as difficult.116,117 In a study of 500 primary care visits, 15% were perceived as difficult by clinicians, and these were more likely to involve a mental disorder, more than five somatic symptoms, more severe symptoms, poorer functional status, more unmet expectations, less satisfaction with care, and higher use of health services.118 Interestingly, clinicians with poorer psychosocial attitudes perceived three times as many encounters as being difficult. In the same study, the authors showed that a 2-hour physician workshop followed by information provided before each visit improved physician-perceived difficulty of the encounter.119
Table 3.1. Large-Scale International Studies on Mood Disorders Recognition and Treatment Study
Setting
Sample
Instrument
Prevalence of Mood Disorders
Recognition in Primary Care
% Offered Antidepressants
Institut National de la Sante´ et de la Recherche Me´dicale (INSERM) study
Paris, France, 1996–97
2,419 patients (aged 18–70 years) 238 were found to be depressed and were followed up for 6 months.
MINI
Major depression (14.0%), minor depression (3.1%), and dysthymia (2.1%)
Major depression (21%)
European Study of the Epidemiology of Mental Disorders (ESEMeD)
Community study in Belgium, France, Germany, Germany, Italy, the Netherlands, and Spain
21,425 non-institutionalized adults 18 years old (including those 65 years and older)
WMH-CIDI
World Health Organization study on Psychological Problems in General Health Care (PPGHC)
14 countries worldwide
26,422 consecutive patients (aged 15–65 years)
General Health Questionnaire (GHQ-12)
Lifetime prevalence rates of 13.4% for major depression and 4.4% for dysthymia were reported. Mental disorders (24%) Major depression (13.7%) Minor depression (3.6%) Dysthymia (3.6 %)
Major depression (26%) Any mental disorder (58%) Not examined
Major depression (15%) Any mental disorder (54%)
Major depression (15%)
Major depression (21.2%)
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
73
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Te n El ev en Tw el Th ve irt e Fo en ur te en Fi fte e Si n xt Se e e ve n nt e Ei e n gh te e N in n et ee Tw n Tw en en ty ty -o ne
in e N
ei gh t
0
Figure 3.5. Detection sensitivity (%) by severity of depression according to the HADS scale. Adapted from Thompson, C., Ostler, K., Peveler, R. C., et al (2001) Dimensional perspective on the recognition of depressive symptoms in primary care. The Hampshire Depression Project 3. British Journal of Psychiatry, 179, 317–323.
Most depressions in primary care are mild to moderate in severity (90% have a score of 8 to 13 on the HADS), and the detection of mild disorders is a challenge because symptoms do not differ greatly from those of healthy but stressed individuals.120,121 Thompson and colleagues (2001)32 examined the relationship between severity of depression on the HADS-D and proportion of cases detected (Fig. 3.5). Generally, higher severity of depression is associated with greater recognition, but because of the great burden of mild depression, 50% of all correct recognition occurs at a HADS-D score of between 8 and 10. Further, many cases feature physical or mental comorbidities such as anxiety. Comorbidity may decrease recognition.122 In primary care only about 10% of all depressions do not feature comorbidity (5% of those with major depression). About 50% have physical comorbidity and an overlapping 70% to 80% psychiatric comorbidity (of which 40% to 50% is anxiety). Patients with anxiety or chronic mixed anxiety and depression were less likely to be offered active treatment than those considered to have depression.123 One hypothesis is that somatic complaints, particularly in late-life depression, might cause the clinician to focus on physical rather than mental symptoms. Many clinicians have been taught to take an exclusive approach and ignore such complaints, but accumulating evidence suggests this is probably incorrect and that somatic symptoms should be ‘‘counted’’ toward depression even when another physical
74
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
illness like stroke or Parkinson’s disease is present. This is discussed further in Chapters 10 and 11. However, this ‘‘crowding-out’’ hypothesis has been refuted. For example, Ani and coworkers (2008)124 found that comorbidity had no effect of recognition accuracy. Pfaff and Almedia (2005)125 found that predictors of detection included concomitant polypharmacy (imply higher comorbidity) as well as higher CESD scores, presenting with psychological complaints, and higher risk of suicide. O’Conner and associates (2001)126 found that comorbid pain positively influenced detection of late-life depression. Similarly, Borowsky and associates (2000)30 found superior detection of depression if comorbid diabetes or hypertension were present. Other factors were previous psychiatric consultation, number of years as a patient, severity of depression, and disclosure of depression to the physician. Indeed, the co-occurrence of MDD and anxiety might actually facilitate recognition of depression127 or psychiatric caseness.128–130 When faced with ambiguity and diagnostic difficulties, some evidence suggests that only a minority of clinicians choose to explore the issues in more detail.131
5.
Conclusions
Depression is often a complex comorbid presentation associated with frequent primary care attendance.132 Recognition of depression in primary care and hospital settings is poor, yet in part it is worth remembering that depression is a relatively uncommon reason for presentation in primary care, with at least six out of seven unselected cases not having depression. In primary care, time and resources are limited, and hence psychological or even structured self-help programs are often not available. The most plausible factor explaining undertreatment is underrecognition. Antidepressants are typically the treatment of choice for clinicians but not for patients, and hence managing depression can be seen as difficult.133 Against this background, only about a half of true cases are diagnosed and perhaps a quarter treated. Conversely, about 70% of noncases are correctly reassured. Two major factors appear to influence detection: how the person with depression describes his or her symptoms and how the clinician interviews the patient. The nature of the therapeutic relationship is important. Even in the face of a high frequency of contact, a therapeutic relationship that is noted by the clinician (or patient) to be unhelpful is likely to decrease the recognition rate. Discussion of emotional distress in primary care is also linked with high patient satisfaction.134 Additional factors such as the skill of the clinician and the use of tools may also play a role (see Chapter 7). There are certainly many potential barriers to successful diagnosis and treatment.135 Mental health skills training has been
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
75
effective in improving recognition and management of somatizing and depressed patients by PCPs, but it remains uncertain whether this translates into improved clinical outcomes.136–138 Interventions are likely to be most successful where problems are most serious. For example, Shapiro and colleagues (1987)139 conducted a randomized clinical trial involving 1,242 patients attending inner-city PCPs by giving feedback of GHQ scores. Results showed marked increases in detection but only among the elderly, African Americans, and men. Clinicians should have a high index of suspicion in frequent attendees, those with serious or chronic illness, and those who have persistent but unexplained pain. High vigilance is warranted in patients with those somatic symptoms, in men, and in younger patients.140,141 Ultimately, it is useful to reflect on patients’ opinions on the importance of primary care for depression.142 The top four most important needs are the clinician’s interpersonal skills, ability to recognize depression, the effectiveness of treatment, and problems associated with treatment.
References 1. Callahan CM, Nienaber NA, Hendrie HC, et al. . Depression of elderly outpatients: Primary care physicians’ attitudes and practice patterns. J Gen Intern Med. 1992;7(1): 26–31. 2. Kaplan MS, Adamek ME, Martin JL. Confidence of primary care physicians in assessing the suicidality of geriatric patients. Int J Geriatric Psychiatry. 2001;16(7):728–734. 3. Gallo JJ, Ryan SD, Ford DE. Attitudes, knowledge, and behavior of family physicians regarding depression in late life. Arch Fam Med. 1999;8:249–256. 4. Shao W, Williams J, Lee S, et al. Knowledge and attitudes about depression among non-generalists and generalists. J Fam Pract. 1997;44:161–168. 5. Feldman MD, Franks P, Duberstein PR, et al. Let’s not talk about it: Suicide inquiry in primary care. Ann Fam Med. 2007;5(5):412–418. 6. Travado L, Grassi L, Gil F, et al., and the Southern European Psycho-Oncology Study (SEPOS) Group. Physician-patient communication among Southern European cancer physicians: The influence of psychosocial orientation and burnout. Psychooncology. 2005;14(8):661—670. 7. Plummer SE, Gournay K, Goldberg D, et al. Detection of psychological distress by practice nurses in general practice. Psychol Med. 2000;30(5):1233–1237. 8. Cape J, Morris E, Adams N, et al. Identification of psychological morbidity in older people in primary care by practice nurses. Aging Mental Health. 2003;7(6):446–451. 9. Ryan H, Schofield P, Cockburn J, et al. How to recognize and manage psychological distress in cancer patients. Eur J Cancer Care. 2005;14(1):7–15. 10. Liu SI, Mann A, Cheng A, et al. Identification of common mental disorders by general medical doctors in Taiwan. Gen Hosp Psychiatry. 2004;26(4):282–288. 11. Matarazzo JD. The reliability of psychiatric and psychological diagnosis. Clin Psychol Rev. 1983;3:103–145.
76
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
12. Tiemens BG, VonKorff M, Lin EH. Diagnosis of depression by primary care physicians versus a structured diagnostic interview. Understanding discordance. Gen Hosp Psychiatry. 1999;21(2):87–96. 13. Smith MV, Rosenheck RA, Cavaleri MA, et al. Screening for and detection of depression, panic disorder, and PTSD in public-sector obstetric clinics. Psychiatr Serv. 2004;55:407–414. 14. Ormel J, Koeter MWJ, van den Brink W, et al. Recognition, management and course of anxiety and depression in general practice. Arch Gen Psychiatry. 1991;48:700–706. 15. Norton J, De Roquefeuil G, Boulenger JP, et al. Use of the PRIME-MD Patient Health Questionnaire for estimating the prevalence of psychiatric disorders in French primary care: comparison with family practitioner estimates and relationship to psychotropic medication use. Gen Hosp Psychiatry. 2007;29(4):285–293. 16. Wittchen HU, Kessler RC, Beesdo K, et al. Generalized anxiety and depression in primary care: prevalence, recognition, and management. J Clin Psychiatry. 2002;63(suppl 8):24–34. 17. Jackson JL, Passamonti M , Kroenke K. Outcome and impact of mental disorders in primary care at 5 years. Psychosom Med. 2007;69(3):270–276. 18. Ustun TB, Von Korff M. Primary mental health services. In: Ustun TB, Sartorius N, eds. Mental illness in general health care: an international study. Chichester, UK: John Wiley & Sons; 1995:347–360. 19. Alonso J, Angermeyer MC, Bernert S, et al. Prevalence of mental disorders in Europe: results from the European Study of the Epidemiology of Mental Disorders (ESEMeD) project. Acta Psychiatr Scand Suppl. 2004;420:21–27. 20. Alonso J, Le´pine J-P. Overview of key data from the European Study of the Epidemiology of Mental Disorders (ESEMeD). J Clin Psychiatry. 2007;68(suppl 2):3–9. 21. Friedman B, Conwell Y, Delavan RL. Correlates of late-life major depression: A comparison of urban and rural primary care patients. Am J Geriatr Psychiatry. 2007;15(1):28–41. 22. Licht-Strunk E, van der Kooij KG, van Schaik DJF. Prevalence of depression in older patients consulting their general practitioner in The Netherlands. Int J Geriatr Psychiatry. 2005;20(11):1013–1019. 23. Mitchell AJ, Vaze A, Rao S. Meta-Analysis of Unassisted Recognition of Depression in Primary Care: Importance of False Positives and False Negatives. The Lancet 2009 (in press). 24. Lyness JM, Noel TK, Cox C, et al. Screening for depression in elderly primary care patients. A comparison of the Center for Epidemiologic Studies-Depression Scale and the Geriatric Depression Scale. Arch Intern Med. 1997 24;157(4):449–454. 25. Greer J, Halgin R, Harvey E. Global versus specific symptom attributions: predicting the recognition and treatment of psychological distress in primary care. J Psychosom Res. 2004;57:521–527. 26. Alonso J, Angermeyer MC, Bernert S, et al. Prevalence of mental disorders in Europe: results from the European Study of the Epidemiology of Mental Disorders (ESEMeD) project. Acta Psychiatr Scand Suppl 2004;420:21–27. 27. Maginn S, Boardman AP, Craig TKL, et al. The detection of psychological problems by general practitioners. Influence of ethnicity and other demographic variables. Soc Psychiatry Psychiatr Epidemiol. 2004;39:464–471.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
77
28. Wittchen HU, Hofler M, Meister W. Prevalence and recognition of depressive syndromes in German primary care settings: poorly recognized and treated? Int Clin Psychopharmacol. 2001;16(3):121–135. 29. Bushnell J. Frequency of consultations and general practitioner recognition of psychological symptoms. Br J Gen Pract. 2004;54(508):838–842. 30. Rogers R. Handbook of diagnostic and structured interviewing, New York: Guilford Publications, 2001. 31. Borowsky SJ, Rubenstein LV, Meredith LS, et al. Who is at risk of nondetection of mental health problems in primary care? J Gen Intern Med. 2000;15(6):381–388. 32. Hickie IB, Davenport TA, Scott EM, et al. Unmet need for recognition of common mental disorders in Australian general practice. Med J Australia. 2001;175:S18–S24. 33. Thompson C, Ostler K, Peveler RC, et al. Dimensional perspective on the recognition of depressive symptoms in primary care. The Hampshire Depression Project 3. Br J Psychiatry. 2001;179:317–323. 34. Pfaff JJ, Almeida OP. A cross-sectional analysis of factors that influence the detection of depression in older primary care patients. Australian N Z J Psychiatry. 2005;39(4):262–265. 35. Aragones E, Pinol JL, Labad A, et al. Detection and management of depressive disorders in primary care in Spain. Int J Psychiatry Med. 2004;34(4):331–343. 36. Aragones E, Pinol JL, Labad A. The overdiagnosis of depression in non-depressed patients in primary care. Fam Pract. 2006;23(3):363–368. 37. Nuyen J, Volkers AC, Verhaak PFM, et al. Accuracy of diagnosing depression in primary care: the impact of chronic somatic and psychiatric co-morbidity. Psychol Med. 2005;35(8):1185–1195. 38. Verhaak PFM, Schellevis FG, Nuijen J, et al. Patients with a psychiatric disorder in general practice: determinants of general practitioners’ psychological diagnosis. Gen Hosp Psychiatry. 2006;28:125–132. 39. McCall L, Clarke D, Trauer T, et al. Predictors of accuracy of recognition of emotional distress in general practice. Primary Care Community Psychiatry. 2007;12(1):1–5. 40. Millar T, Goldberg DP. Link between the ability to detect and manage emotional disorders: a study of general practitioner trainees. Br J Gen Pract. 1991; 41: 357–359. 41. Davenport TA, Hickie IB, Naismith SL, et al. Variability and predictors of mental disorder rates and medical practitioner responses across Australian general practices. Med J Australia. 2001;175:S37–S41. 42. Rapp S, Davis K. Geriatric depression: physicians’ knowledge, perceptions and diagnostic practices. Gerontologist. 1989;29:252–257. 43. Williams Jr JW, Rost K, Dietrich AJ, et al. Primary care physicians’ approach to depressive disorders: effects of physician specialty and practice structure. Arch Fam Med. 1999;8(1):58–67. 44. Goldberg D, Steele J, Johnson A, et al. Ability of primary care physicians to make accurate ratings of psychiatric symptoms. Arch Gen Psychiatry. 1982;39:829–833. 45. Hutton C, Gunn J. Do longer consultations improve the management of psychological problems in general practice? A systematic literature review. BMC Health Services Research. May 17, 2007;7:Art. No. 71. 46. Howie JG, Porter AM, Heaney DJ, et al. Long to short consultation ratio: a proxy measure of quality of care for general practice. Br J Gen Pract. 1991;41:48–54.
78
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
47. Verhaak PFM, Van Den Brink-Muinen A, Bensing JM, et al. Demand and supply for psychological help in general practice in different European countries—Access to primary mental health care in six European countries. Eur J Public Health. 2004;14(2):134–140. 48. Zantinge EM, Verhaak PFM, de Bakker DH, et al. The workload of general practitioners does not affect their awareness of patients’ psychological problems. Patient Education Counseling. 2007;67(1–2):93–99. 49. Kruse J, Schmitz N, Woller W, et al. Why does the general practitioner overlook psychological disorders in his patient? Determinates of physicians’ identification with psychological disorders. Psychotherapie Psychosomatik Medizinische Psychologie. 2004;54(2):45–51. 50. Tylee A, Freeling P, Kerry S, et al. How does the content of consultations affect the recognition by general practitioners of major depression in women? Br J Gen Pract. 1995;45:575–578. 51. Kessler D, Lloyd K, Lewis G, et al. Cross sectional study of symptom attribution and recognition of depression and anxiety in primary care. BMJ. 1999;318:436–439. 52. Weich S, Lewis G, Mann AH, et al. The somatic presentation of psychiatric morbidity in general practice. Br J Gen Pract. 1995;45:143–147. 53. Yeung A, Chang D, Gresham RL, et al. Illness beliefs of depressed Chinese American patients in primary care. J Nerv Mental Dis. 2004;192(4):324–327. 54. Cornford CS, Hill A, Reilly J. How patients with depressive symptoms view their condition: a qualitative study. Fam Pract. 2007;24(4): 358–364. 55. Bridges KW, Goldberg DP. Somatic presentation of DSM-III psychiatric disorders in primary care. J Psychosom Res. 1985;29:563–569. 56. Susman JL, Crabtree BF, Essink G. Depression in rural family practice: easy to recognize, difficult to diagnose. Arch Fam Med. 1995;4:427–431. 57. Sartorius N, Ustun TB, Lecrubier Y, et al. Depression comorbid with anxiety: results from the WHO study on psychological disorders in primary health care. Br J Psychiatry. 1996;168(Suppl. 30):38–43. 58. Freeling P, Rao BM, Paykel ES, et al. Unrecognised depression in general practice. BMJ. 1985;290:1880–1883. 59. Tylee AT, Freeling P, Kerry S. Why do general practitioners recognize major depression in one woman patient yet miss it in another? Br J Gen Pract. 1993;43:327–330. 60. Tylee A, Freeling P, Kerry S, et al. How does the content of consultations affect the recognition by general practitioners of major depression in women? Br J Gen Pract. 1995;45:575–578. 61. Coulehan JL, Schulberg HC, Block MR, et al. Medical comorbidity of major depressive disorder in a primary medical practice. Arch Intern Med. 1990;150:2363–2367. 62. Freeling P, Rao BM, Paykel ES, et al. Unrecognized depression in general practice. BMJ. 1985;290:1880–1883. 63. Keeley RD, Smith JL, Nutting PA, et al. Does a depression intervention result in improved outcomes for patients presenting with physical symptoms? J Gen Intern Med. 2004;19:615–623. 64. Vuorilehto M, Melartin T, Isometsa E. Depressive disorders in primary care: recurrent, chronic, and co-morbid. Psychol Med. 2005;35(5):673–682. 65. Priest RG, Vize C, Roberts A, et al. Lay people’s attitudes to treatment of depression: Results of opinion poll for defeat depression campaign just before its launch. BMJ. 1996;313:858–859.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
79
66. Prior L, Wood F, Lewis G, et al. Stigma revisited, disclosure of emotional problems in primary care consultations in Wales. Social Sci Med. 2003;56(10):2191–2200. 67. Cape J, McCullough Y. Patients’ reasons for not presenting emotional problems in general practice consultations. Br J Gen Pract. 1999;49(448):875–879. 68. Leaf PJ, Livingston MM, Tischler GL, et al. Contact with health professionals for the treatment of psychiatric and emotional problems. Med Care. 1985;23:1322–1337. 69. Maginn S, Boardman AP, Craig TKJ, et al. The detection of psychological problems by general practitioners—Influence of ethnicity and other demographic variables. Social Psychiatry Psychiatric Epidemiol. 2004;39(6):464–471. 70. Probst JC, Laditka SB, Moore CG, et al. Race and ethnicity differences in reporting of depressive symptoms. Administration And Policy In Mental Health And Mental Health Services Research. 2007;34(6):519–529. 71. Williams JWJ, Mulrow CD, Kroenke K, et al. Case-finding for depression in primary care: a randomized trial. Am J Med. 1999;106:36–43. 72. Simon GE, Von Korff M, Picinelli M, et al. An international study of the relation between somatic symptoms and depression. N Engl J Med. 1999;341:1329–1335. 73. Davenport S, Goldberg D, Millar T. How psychiatric disorders are missed during medical consultations. Lancet, 1987;330(8556):439–441. 74. O’Connor DW, Rosewarne R, Bruce A. Depression in primary care. 1:Elderly patients’ disclosure of depressive symptoms to their doctors. Int Psychogeriatr. 2001;13(3):359–365. 75. Bushnell J, McLeod D, Dowell A, et al. Do patients want to disclose psychological problems to GPs? Fam Pract. 2005;22(6): 631–637. 76. Verhaak PFM, Bensing JM, Van der Brink-Mulinen A. GP mental health care in 10 European countries: patients’ demands and GPs’ responses. Eur J Psychiatry. 2007;21(1):7–16. 77. Cape J, McCulloch Y. Patients’ reasons for not presenting emotional problems in general practice consultations. Br J Gen Pract. 1999;49(448): 875–879. 78. Del Piccolo L, Saltini A, Zimmermann C. Which patients talk about stressful life events and social problems to the general practitioner? Psychol Med. 1998;28(6):1289–1299. 79. Pollock K. Maintaining face in the presentation of depression: constraining the therapeutic potential of the consultation. Health (London). 2007;11(2): 163–180. 80. Zandbelt LC, Smets EMA, Oort FJ, et al. Determinants of physicians’ patientcentred behaviour in the medical specialist encounter. Social Sci Med. 2006;63(4):899–910. 81. Del Piccolo L, Mazzi M, Saltini A, et al. Inter- and intra-individual variations in physicians’ verbal behaviour during primary care consultations. Social Sci Med. 2002;55(10):1871–1885. 82. Epstein RM, Hadee T, Carroll J, et al. ‘‘Could this be something serious?’’— Reassurance, uncertainty, and empathy in response to patients’ expressions of worry. J Gen Intern Med. 2007;22(12): 1731–1739. 83. Deveugele M, Derese A, De Bacquer D, et al. Is the communicative behavior of GPs during the consultation related to the diagnosis? A cross-sectional study in six European countries. Patient Education Counseling. 2004;54(3):283–289. 84. Deveugele M, Derese A, De Maeseneer J. Is GP-patient communication related to their perceptions of illness severity, coping and social support? Social Sci Med. 2002;55(7):1245–1253.
80
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
85. Feldman MD, Franks P, Epstein RM, et al. Do patient requests for antidepressants enhance or hinder physicians’ evaluation of depression? A randomized controlled trial. Med Care. 2006;44(12):1107–1113. 86. Watts SC, Bhutani GE, Stout IH, et al. Mental health in older adult recipients of primary care services: is depression the key issues? Identification, treatment and the general practitioner. Int J Geriatr Psychiatry. 2002;17:427–437. 87. Saltini A, Mazzi MA, Del Piccolo L, et al. Decisional strategies for the attribution of emotional distress in primary care. Psychol Med. 2004;34(4):729–739. 88. Maguire P. Improving the recognition of concerns and affective disorders in cancer patients. Recent Advances in Clinical Psychiatry. 1992;7:15–30. 89. Goldberg DP, Jenkins L, Millar T, et al. The ability of trainee general practitioners to identify psychological distress among their patients. Psychol Med. 1993;23:185–193. 90. Tobin M, Hickie I, Urbanc A. Increasing general practitioner skills with patients with serious mental illness. Aust Health Rev. 1997;20:55–67. 91. Zimmermann C, Del Piccolo L, Finset A. Cues and concerns by patients in medical consultations: A literature review. Psychol Bull. 2007;133(3):438–463. 92. Deveugele M, Derese A, De Maeseneer J. Is GP-patient communication related to their perceptions of illness severity, coping and social support? Social Sci Med. 2002;55(7):1245–1253. 93. Deveugele M, Derese A, De Bacquer D, et al. Is the communicative behavior of GPs during the consultation related to the diagnosis? A cross-sectional study in six European countries. Patient Education and Counseling. 2004;54(3):283–289. 94. Marks JN, Goldberg DP, Hillier VF. Determinants of the ability of general practitioners to detect psychiatric illness. Psychol Med. 1979;9(2):337–353. 95. Badger LLW, deGruy FV, Hartman MA, et al. Psychosocial interest, medical interviews, and the recognition of depression. Arch Fam Med. 1994;3:899–907. 96. Carney PA, Eliassen MS, Wolford GL, et al. How physician communication influences recognition of depression in primary care. J Fam Pract. 1999;48(12):958–964. 97. Rost K, Nutting P, Smith J, et al. The role of competing demands in the treatment provided primary care patients with major depression. Arch Fam Med. 2000;9:150–154. 98. Dowrick C, Gask L, Perry R, et al. Do general practitioners’ attitudes towards depression predict their clinical behaviour? Psychol Med. 2000;30:413–419. 99. Saltini A, Mazzi MA, Del Piccolo L, et al. Decisional strategies for the attribution of emotional distress in primary care. Psychol Med. 2004;34(4):729–739. 100. Levinson W, Roter D. Physicians psychosocial beliefs correlate with their patient communication-skills. J Gen Intern Med. 1995;10(7):375–379. 101. A report of the Joint Consultative Committee. Primary care psychiatry—the last frontier. Canberra: Royal Australian College of General Practitioners and Royal Australian and New Zealand College of Psychiatrists, 1997. 102. Fischer LR, Wei F, Solberg LI, e tal. Treatment of elderly and other adult patients for depression in primary care. J Am Geriatr Soc. 2003;51(11):1554–1562. 103. Tai-Seale M, Bramson R, Drukker D, et al. Understanding primary care physicians’ propensity to assess elderly patients for depression using interaction and survey data. Med Care. 2005;43(12):1217–1224. 104. Tai-Seale M, McGuire T, Colenda C, et al. Two-minute mental health care for elderly patients: Inside primary care visits. J Am Geriatr Soc. 2007;55(12):1903–1911.
3 WHY DO CLINICIANS HAVE DIFFICULTY DETECTING DEPRESSION?
81
105. Adelman RD, Greene MG, Friedmann E, et al. Discussion of depression in follow-up medical visits with older patients. J Am Geriatr Soc. 2008;56(1):16–22. 106. Cape J. Patient-rated therapeutic relationship and outcome in general practitioner treatment of psychological problems. Br J Clin Psychol. 2000;39(4):383–395. 107. Goldberg D, Jenkins L, Millar T, et al. The ability of trainee general practitioners to identify psychological distress among their patients. Psychol Med. 1993;23:185–193. 108. Ishikawa H, Takayama T, Yamazaki Y, et al. The interaction between physician and patient communication behaviors in Japanese cancer consultations and the influence of personal and consultation characteristics. Patient Education Counseling. 2002;46(4):277–285. 109. Del Piccolo L, Saltini A, Zimmermann C, et al. Differences in verbal behaviours of patients with and without emotional distress during primary care consultations. Psychol Med. 2000;30(3):629–643. 110. Nuyen J, Volkers AC, Verhaak PFM, et al. Accuracy of diagnosing depression in primary care: the impact of chronic somatic and psychiatric co-morbidity. Psychol Med. 2005;35:1185–1195. 111. Verhaak PFM, Schellevis FG, Nuijen J, et al. Patients with a psychiatric disorder in general practice: determinants of general practitioners’ psychological diagnosis. Gen Hosp Psychiatry. 2006;28:125–132. 112. Rost K, Zhang M, Fortney J, et al. Persistently poor outcomes of undetected major depression in primary care. Gen Hosp Psychiatry. 1998;20:12–20. 113. Kessler D, Bennewith O, Lewis G, et al. Detection of depression and anxiety in primary care: follow-up study. BMJ. 2002;325:1016–1017. 114. Jackson JL, Passamonti M, Kroenke K. Outcome and impact of mental disorders in primary care at 5 years. Psychosom Med. 2007;69(3):270–276. 115. Rost K, Smith R, Matthews DB, et al. The deliberate misdiagnosis of major depression in primary care. Arch Fam Med. 1994;3(4):333–337. 116. Hahn SR. Physical symptoms and physician-experienced difficulty in the physicianpatient relationship. Ann Intern Med. 2001;134(9):897–904. 117. Carson AJ, Stone J, Warlow C, et al. Patients whom neurologists find difficult to help. J Neurol Neurosurg Psychiatry. 2004;75(12):1776–1778. 118. Jackson JL, Kroenke K. Difficult patient encounters in the ambulatory clinic: clinical predictors and outcomes. Arch Intern Med. 1999;159:1069–1075. 119. Jackson JL, Kroenke K, Chamberlin J. Effects of physician awareness of symptomrelated expectations and mental disorders—A controlled trial. Arch Fam Med. 1999;8(2):135–142. 120. Olfson M, Gilbert T, Weissman M, et al. Recognition of emotional distress in physically healthy primary care patients who perceive poor physical health. Gen Hosp Psychiatry. 1995;17:173–180. 121. Perez Stable E, Miranda J, Munoz RF. Depression in medical outpatients: underrecognition and misdiagnosis. Arch Intern Med. 1990;150:1083–1088. 122. Schwenk TL, Coyne JC, Fechner-Bates S. Differences between detected and undetected patients in primary care and depressed psychiatric patients. Gen Hosp Psychiatry. 1996;18:407–415. 123. Hyde J, Evans J, Sharp D, et al. Deciding who gets treatment for depression and anxiety: a study of consecutive GP attenders. Br J Gen Pract. 2005;55(520):846–853. 124. Ani C, Bazargan M, Hindman D, et al. Depression symptomatology and diagnosis: discordance between patients and physicians in primary care settings. BMC Family Practice 2008;9:1.
82
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
125. Pfaff JJ, Almeida OP. A cross-sectional analysis of factors that influence the detection of depression in older primary care patients. Australian N Z J Psychiatry. 2005;39(4):262–265. 126. O’Conner DW, Rosewarne R, Bruce A. Depression in primary care 2: General practioners’ recognition of major depression in elderly patients. Int Psychogeratrics. 2001;13(3):367–374. 127. Coyne JC, Schwenk TL, Fechner-Bates S. Nondetection of depression by primary care physicians reconsidered. Gen Hosp Psychiatry. 1995;17:3–12. 128. Ormel J, Van den Brink W, Koeter MW, et al. Recognition, management and outcome of psychological disorders in primary care: a naturalistic follow-up study. Psychol Med. 1990;20:909–923. 129. Pini S, Berardi D, Rucci P, et al. Identification of psychiatric distress by primary care physicians. Gen Hosp Psychiatry. 1997;19:411–418. 130. Pini S, Perkonnig A, Tansella M, et al. Prevalence and 12-month outcome of threshold and sub-threshold mental disorders in primary care. J Affective Disorders. 1999;56:37–48. 131. Seaburn DB, Morse D, McDaniel SH, et al. Physician responses to ambiguous patient symptoms. J Gen Intern Med. 2005;20(6):525–530. 132. Menchetti M, Cevenini N, De Ronchi D, et al. Depression and frequent attendance in elderly primary care patients. Gen Hosp Psychiatry. 2006;28(2):119–124. 133. van Schaik DJF, Klijn AFJ, van Hout HPJ, et al. Patients’ preferences in the treatment of depressive disorder in primary care. Gen Hosp Psychiatry. 2004;26(3):184–189. 134. Gross R, Brammli-Greenberg S, Tabenkin H, et al. Primary care physicians’ discussion of emotional distress and patient satisfaction. Int J Psychiatry Med. 2007;37(3):331–345. 135. Simon GE. Evidence review: efficacy and effectiveness of antidepressant treatment in primary care. Gen Hosp Psychiatry. 2002;24:213–224. 136. Gask L, McGrath G, Goldberg D, et al. Improving the psychiatric skills of established general practitioners: evaluation of group teaching. Med Educ. 1987;21:362–368. 137. Gask L, Usherwood T, Thompson H, et al. Evaluation of a training package in the assessment and management of depression in primary care. Med Educ. 1998;32:190–198. 138. Kaaya S, Goldberg D, Gask L. Management of somatic presentations of psychiatric illness in general medical settings: evaluation of a new training course for general practitioners. Med Educ. 1992;26:138–144. 139. Shapiro S, German PS, Skinner EA, et al. An experiment to change detection and management of mental morbidity in primary care. Med Care. 1987;25:327–339. 140. Gallo JJ, Rabins PV. Depression without sadness: alternative presentations of depression in late life. Am Fam Physician. 1999;60:820–826. 141. Gallo JJ, Rabins PV, Anthony JC. Sadness in older persons: 13-year follow-up of a community sample in Baltimore, Maryland. Psychol Med. 1999;29:341–350. 142. Cooper LA, Brown C, Vu HT, et al. Primary care patients’ opinions regarding the importance of various aspects of care for depression. Gen Hosp Psychiatry. 2000;22(3):163–173.
4 HOW CAN EXISTING MOOD SCALES BE IMPROVED? HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES Adam B. Smith
1. Introduction 2. The Rasch Model and Other Item Response Models 3. Conclusion
Context Many scales and tools have been developed by expert opinion. Several methods are available by which tools can be field tested in order to more accurately gauge their diagnostic potential. Promising new methods including item banks and computer-adaptive tests are under development to maximize the efficiency of screening tools for depression.
1.
Introduction
Various methods are available to diagnose psychiatric disorders (see Chapter 2), but in the absence of a formal semi-structured psychiatric assessment, which remains impractical, the most commonly used method for assessing and screening levels of emotional distress remains by self-completed questionnaire.1 There have been many hundreds of validation attempts, comparing the severity questions against clinical judgment, semi-structured interviews, DSM and ICD criteria, and of course each other. Almost universally in primary care, community, and specialist settings, their accuracy is imperfect and further refinement is required. When tested according to their ability to enhance the detection and quality of care for depression, the efficacy of these instruments remains modest.2 A recent review from Gilbody and colleagues3 83
84
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
found that screening and case-finding instruments were associated with a modest increase in the recognition of depression by clinicians (relative risk [RR] 1.27, 95% confidence interval [CI] 1.02 to 1.59) and only a borderline significant effect on the overall management of depression (RR 1.30, 95% CI 0.97 to 1.76). Seven studies provided data on the impact of screening on depression outcomes, but there was no evidence of an effect (standardized mean difference –0.02, 95% CI –0.25 to 0.20). No doubt some of the problem lies with the organizational elements that may (or may not) accompany screening and some lies with clinicians’ willingness to treat a probable case. However, some blame also lies with the instruments themselves, as most were developed by expert opinion rather than by a scientific process.
Tool Development The quantitative methods that enable evaluation of the diagnostic accuracy of severity scales are discussed in Chapter 5. However, the evaluation of scales should be viewed in a wider context of tool development (Table 4.1). In the preclinical phase a tool is developed, often in the case of depression borrowing from existing scales and usually by consensus rather than by scientific testing. In phases I and II preliminary testing occurs, ideally in the clinically representative sample with several competing comparison groups. These diagnostic validity studies do not prove that the tool is useful, rather that it is potentially
Table 4.1. Stages in the Evaluation of the Screening Tool Stage
Purpose
Description
Preclinical
Tool development
Phase I screen
Early diagnostic validity testing in a selected sample and refinement of tool
Phase II screen
Diagnostic validity in a representative sample
Here the aim is to develop a screening method that is likely to help in the detection of the underlying disorder, either in a specific setting or in all settings. Issues of acceptability of the tool to both patients and staff must be considered for implementation to be successful. The aim is to evaluate the early design of the screening method against a known (ideally accurate) standard known as the criterion reference. In early testing the tool may be refined, selecting the most useful aspects and deleting redundant aspects to make the tool as efficient (brief) as possible while retaining its value. The aim is to assess the refined tool against a criterion (gold standard) in a real-world sample where the comparator subjects may represent several competing conditions that may otherwise cause difficulty regarding differential diagnosis.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES
85
Table 4.1. (Continued) Stage
Purpose
Description
Phase III screen
Screening randomized controlled trial; clinicians using vs. not using a screening tool
Phase IV screen
Screening implementation studies using real-world outcomes
This is an important step in which the tool is evaluated clinically in one group with access to the new method compared to a second group (ideally selected in a randomized fashion) who make assessments without the tool. This is akin to randomized controlled trials for drugs, and the outcome of interest is the number of additional cases correctly diagnosed or ruled out compared with assessment as usual. In this last step the screening tool/method is introduced clinically but monitored to discover the effect on important patient outcomes such as new identifications, new cases treated, and new cases entering remission. In short, the question here is how much the tool influences patient outcomes and how well the tool is accepted by clinicians (uptake).
After Mitchell AJ Psycho-Oncol 17: S141, 2008.
accurate. Given a sufficient sample, a tool may be refined by field testing. This is the basis of the remainder of this chapter. Ultimately the value of a tool must be proven in the clinical environment by comparison against either an established tool or clinical skills alone. The acceptability and availability of the tool will ultimately influence its uptake as much as its efficacy. Given that there are a large number of imperfect but widely used instruments, it follows many could be refined by adding or removing items or changing the weighting of scoring or possibly the diagnostic algorithm. There have been recent attempts to improve efficacy of screening instruments using modern psychometrics, most notably using Rasch models. These models are part of a family of measurement models developed for educational psychology and increasingly employed in test development and refinement in medicine. Very frequently it is found that conventional instruments may be shortened in length without significantly decreasing screening efficacy. Occasionally this shortening is dramatic, reducing an instrument by half or by a quarter. Yet it should be acknowledged that the ability of these adapted instruments to identify levels of a key outcome variable, such as ‘‘distress warranting intervention,’’ remains less than perfect. Combining items drawn from a number of emotional distress instruments into an item bank may improve screening efficacy while at the same time minimizing the number of questions patients are required to answer and consequently reducing patient burden. Item banks such as these and computer-adaptive tests, which tailor the
86
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
questions presented to patients’ responses, have already been successfully developed for assessing emotional distress in a psychiatric population.4,5 This chapter describes the Rasch model and its application to mental health research in more detail.
2.
The Rasch Model and Other Item Response Models
In classical test theory, item difficulty (eg, the probability of subjects responding ‘‘yes’’ or ‘‘no’’ to items or selecting a category from a number of response options) is calculated from the number of responses or proportion of responses in the sample.6 The major drawback of this approach is that estimation of item difficulty is sample dependent: the ‘‘endorsability’’ of any given item will be larger if drawn from a more able population (eg, a healthier population) than if drawn from a less able population. A similar approach can also be applied to estimating ‘‘person ability’’ (eg, quality of life, physical health). Any given estimate of an individual’s ability on a latent (ie, not directly observable) trait will be dependent on the range of difficulties of the items presented. Rasch models7 overcome this problem of sample dependency by estimating person ability and item difficulty independently.8 The raw data are the sufficient statistics for estimating these parameters—that is, the models use only the raw scores from individuals for estimating item difficulties and the response sets across items for estimating person ability estimates.8 To achieve the separation of item and person parameter estimations, the Rasch models rely on two assumptions: unidimensionality and local dependence. Rasch models assume that a uniform latent trait or construct underlies the data being investigated (eg, mathematical knowledge, physical health). This assumption is then tested using fit statistics and/or principal components analysis of residuals. Local independence is related to unidimensionality and refers to the assumption that the single latent trait (ie, the unidimensionality) accounts for all the variance in the data—that is, the association between the variables in a dataset should disappear once the Rasch model has been controlled for.9 It is possible to have unidimensionality but not local dependence; however, if local independence is proven, then there must also be unidimensionality in the data set. If the assumptions have been met, then the (log) probability of a person responding to an item can be expressed as the difference between the individual’s ability and the item difficulty. Unlike in classical test design, the person ability and item difficulty parameters are estimated jointly to produce estimates (referred to as ‘‘logits’’ or log-odds), which are independent of both the items and sample employed.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES
87
Assessing the Rasch Model A fundamental criterion underlying these models is unidimensionality—that is, a single latent trait should explain the variance in the data. In the absence of unidimensionality, constituent parts of an instrument cannot be summed to create a summary index. Unidimensionality can be assessed through principal components analysis, where the first factor extracted corresponds to the Rasch ‘‘factor,’’ or latent trait.10 Any additional factors extracted can be investigated to confirm whether these form true factors or random noise. In addition to this, unidimensionality can be assessed using fit statistics. Both item fit and person fit to the Rasch model can be evaluated. Fit statistics have an expected value of 1.0 and can range from 0 to infinity. Deviations in excess of the expected value can be interpreted as noise or lack of fit between the items and the model, whereas values significantly lower than the expected value can be interpreted as item redundancy or overlap. Identifying misfitting items allows those items adding noise to the analysis to be removed from a scale. The suggested limits for fit statistics are between 0.7 and 1.3, with those items with fit statistics greater than 1.3 being identified as misfitting.11,12 A similar analysis may also be applied to the response categories and thresholds (ie, the point at which response to categories is equally probable between categories). Within the Rasch model the average level of the latent trait (‘‘ability’’) should increase monotonically across categories. Disordering of categories, where the average level of the latent trait does not increase in this manner, may interfere with measurement precision. Therefore, disordered response categories may be collapsed or items removed to improve fit to the Rasch model.9 Finally, an additional requirement for Rasch models is item invariance— that is, item parameter estimates should be independent of the sample used. Item invariance or differential item functioning (DIF) may be evaluated using defined subgroups (eg, gender, diagnosis). When items fit the model, an interval scale is produced where differences between adjacent scores on a scale are equally spaced. This has important implications for measurement, since this allows meaningful comparisons to be made of changes in scores of equal intervals along the latent trait.13 Recent work has suggested that changes of around 0.5 logits may suggest a clinically meaningful difference.14
Features of the Rasch Model The Rasch model is more accurately referred to as belonging to a family of models. Rasch’s original dichotomous model7 has been extended to
88
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
incorporate polytomous data—that is, from questionnaires incorporating multiple (more than two) response options. Popular models within health research are the Rating Scale15 and the Partial Credit Model.16 In the Rasch model the estimates of person ability (or person measure) and item location or difficulty are located along the same continuum (eg, Depression). For instance, Figure 4.1 shows a ‘‘person-item map’’ from an item bank developed for assessing emotional distress in cancer patients.17 The left side of the map represents the distribution of person measures along the continuum and the right side describes the location of the items. As discussed above, the Rasch model describes a probabilistic relationship between a person’s measure and the item location. For instance, from Figure 4.1, the Rasch model allows us to state that a patient with a level of distress around –1 logits will be more likely to endorse items at a corresponding level, such as General Health Questionnaire (GHQ)-1 item (‘‘concentration’’) and MHI-1 (‘‘nervousness’’), as well as items below this point, but would be less likely to endorse items further along the latent trait, such as Patient Health Questionnaire (PHQ)-9 (‘‘suicidal ideation’’). This analysis can be extended to the thresholds between each response category (Fig. 4.2). An additional important feature of Rasch models is that the models can equate different questionnaires completed by different subgroups of patients, assuming that a common subset of items exists that all patients have completed. This process then enables a range of items measuring the same latent trait to be collated to form an item bank. The development of an item bank may help improve static questionnaires by including fewer and more relevant questions, which could cover a broader and more representative spectrum of the latent trait (for assessment) or may be more focused on discrete areas of clinical interest, such as clinical thresholds (for screening). It also paves the way for the development of computer-adaptive testing,18 creating programs that tailor questions to individual patients based on their previous responses, allowing an accurate assessment of the patient (eg, level of psychological distress) with fewer questions. Taken together, Rasch models offer a number of advantages, including improving existing measures, reducing the number of items in questionnaires, and allowing the development of item banks and computer-adaptive tests.
Application of the Rasch Model to Mental Health Measures In traditional test theory, questionnaires are often designed and validated using techniques such as factor analysis. In addition to the sample dependence of these approaches as described above, rating scales produce ordinal data that do not meet the assumptions behind factor analyses, potentially leading to
Person Measures
Item Lovation <more> | +
4
| –
| +
3
|
2
1
0
–
|
–
+
–
|
–
|T
–
+
d2
–#
|
ef4
ghq8
phq6
phq8
–#
T|S
a7
bdi6
bdi8
d6
ef3
ghq12
– ##
+
a5
d1
ef1
phq1
phq2
stai13
|M
a1
a2
a3
a4
bdi1
ef2
– ####
–1
– ######
S|
bdi2
bdi9
ewb4
ghq3
mhi2
stail
– #########
+
bdi4
ewb1
ewb5
ghq1
mhi1
phq3
|S
d4
mhi4
phq4
|
ewb6
– ########## – ############ –2
– ############ M + – #########
–3
–4
–5
phq9
|T
– ########
|
– #########
+
– ######
S|
– #####
|
–
+
– ####
|
–
T|
– ###
+
ghq7
bdi11
bdi12
|
Figure 4.1. Item-Person Map for Item Bank.
PATSS MAP OF QUESS – 50% Cumulative probabilities (Rasch–Thurstone thresholds) 4
3
2
1
0
–1
–2
–3
–4
–5
–6
–7
–8
–9
–10
–11
< more > | + | . | + | . | . + | . . |T . + .# | . # T | S bdi6 . ## + . #### | M bdi1 . ###### S | bdi2 . ######### + bdi4 | S d2 . ########## | . ############ phq1 . ############ M + a5 . ######### | T d1 | . ######## a1 . ######### + bdi11 . ###### S | | . ##### bdi12 . + | . #### . T| . ### + | . | . + | | + | ghq8 | + ghq12 | ghq3 | + ghq1 | | + | ghq7 | . ### + |
.2 .2 .2 .2 .2 .2 .2 .2 .2 .2
bdi6 bdi1
.3 .3
bdi2 phq9
.3 .3
d2 ef4 a5 bdi11 a1 bdi12 ewb1 d4 ewb6 ghq7
.3 .3 .3 .3 .3 .3 .3 .3 .3 .3
.2
.2 .2 .2 .2
.2
Figure 4.2. Rasch-Thurstone Thresholds for Item Bank.
bdi6 bdi1 ghq4 bdi2 ghq3 phq9 d2 ef4 d6 a5 a2 a1 bdi12 d4 ewb1 phq4 ewb6
.4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4
ewb4 ewb1 mhi4 ewb6
.5 .5 .5 .5
91
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES
misinterpretation of results.19 Furthermore, these ordinal scales are often summed to produce total scores that are assumed to meet the criteria of interval scales; frequently these assumptions are not tested.13 A number of studies have recently described the application of Rasch models to mental health instruments to overcome the shortcomings of traditional test theory and design.
Unidimensionality, Item Reduction, and Differential Item Functioning The Rasch model has been applied to a number of mental health instruments, including the Beck Depression Inventory (BDI),20 the Zung Self-Rating Depression Scale,21 the Geriatric Depression Scale (GDS),22 and the Symptom Checklist (SCL-90 and SCL-90R) (see table 4.2).23 The application of the model to four of the most commonly used mental health instruments, namely the Center for Epidemiologic Studies Depression Scale (CES-D),24 the Hospital Anxiety and Depression Scale (HADS),25 the Hamilton Depression Scale (HAM-D),26 and the Edinburgh Postnatal Depression Scale (EPDS),27 is discussed in this section. These four instruments have been well validated using traditional test theory involving reliability and validity studies and factor analyses, yet despite this Table 4.2. Examples of Rasch-Refined Mood Scales Stage
Original Length
Rasch-Derived Length
Unidimensionality Shown
Reference
CES-D
20 items
13 items
Yes
HADS
14 items
11 items
Yes
EPDS
10 items
8 items
Yes
Hamilton
17 items
6 items
No
Beck
21 items
Not changed
No
Zung SDS
20 items
Not changed
Yes
GDS
15 items
11 items
Yes
SCL90
92 items
63 items
SCL25
25 items
8 items
Yes (for nonpsychotic items) Yes
Covic et al. (2007)29 Smith et al. (2006)31 Pallant et al. (2006)32 Licht et al. (2005)35 Bouman & Kok (1987)20 Hong & Min (2007)21 Tang et al. (2005)22 Olsen et al. (2004)23 Fink et al. (1995)47
92
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
there has been little previous evidence to support the assumption that these questionnaires are unidimensional. Stansbury and colleagues28 applied the Rasch model to the full CES-D completed by a large community sample of elderly participants. Four of the positively worded items were identified as misfitting and removed. The remaining 16 items formed a unidimensional structure that was verified using confirmatory factor analysis. Additionally, the removal of the misfitting items also reduced the floor effects that had been observed in this sample. Covic and colleagues29 demonstrated, using a sample of patients with rheumatoid arthritis, that three additional items (appetite, restlessness, sadness) misfitted the Rasch model. The resulting 13-item CES-D demonstrated good internal validity. In contrast to these two studies, Pickard and colleagues30 found no misfit for the CES-D in primary care patients, although misfit was reported for three items that were not positively worded in stroke patients. Additionally, four items from this instrument demonstrated differential item functioning when comparing the two patient samples. Rasch studies of the HADS with cancer patients31 and patients attending an outpatient musculoskeletal rehabilitation program32 showed that the full instrument is broadly unidimensional, although the individual subscales contained items that misfitted. Similarly, an analysis of the Edinburgh Postnatal Depression Scale has recommended that the original 10-item form be reduced to eight items to produce a unidimensional instrument.33 In addition to identifying misfit, Rasch models have also been used to develop short forms of these standard instruments. For instance, a 10-item version of the CES-D has been validated using both Rasch and traditional test methods,34 as well as the 6-item version of the HAM-D.35 Licht and colleagues35 compared the unidimensionality of the Bech-Rafaelsen Melancholia Scale (MES) and the 17-item HAM-D in 1,629 patients with a major depressive episode using Mokken and Rasch analysis. Unidimensionality of the HAM-D-17 could not be confirmed; however, the HAM-D-6 and the MES did fulfill criteria for unidimensionality. There have also been recent attempts to apply Rasch models to the standardized psychiatric interview schedule for major depression.36 A modified SCID interview was used on a large sample of twins from the Virginia Twin Registry (n = 2,163). Participants were asked to report whether they had experienced any of the 14 disaggregated DSM-III-R criteria for major depression. The Rasch model was used to derive liability thresholds (the point at which there is a 50% probability of a given diagnostic category being endorsed) for the 10 symptom criteria for major depression. The results demonstrated an uneven spacing between liability thresholds where ‘‘depressed mood’’ was easiest to endorse (–1.8 logits) and ‘‘suicidal ideation’’ at the other end of the latent trait (2.5 logits) was hardest to endorse, suggesting a tentative link between the
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES
93
latent trait as measured by the Rasch model and that derived from a formal psychiatric interview. Other more general distress and psychopathology tools have also been tested using Rasch models. For example, the 90-item SCL and the 25-item SCL-25 have been improved.23 Clinical Testing and Clinical Impact Ultimately any tool (original or adapted) should be field tested, even if the refinement is minor. In a robust test of a newly developed tool (let’s use the hypothetical example of CES-D-Revised), the new scale would be compared alongside the original scale, and unassisted clinical diagnosis against a robust gold standard such as the SCID for DSM-IV major depression. Any additional detection beyond the unassisted clinician would suggest that the scale is clinically useful; any additional detection beyond that achieved by the original scale would suggest that the new scale is an improvement. If the new version is shorter, both accuracy and efficiency may be enhanced, and hence acceptability increased. If the new version is longer, accuracy may be improved at the expense of efficiency, and then a clinical judgment is required to explore which is most useful. Sadly, very few well-designed validation studies exist. A few studies have employed Rasch models to assess the impact of misfit and the subsequent removal of misfitting on the diagnostic accuracy of mental health measures. Smith and colleagues31 applied the Rasch model to both the full 14-item HADS25 as well as the 7-item anxiety and depression subscales. In addition to completing the HADS, a subset of cancer patients had also received a psychiatric assessment in the form of either the Present State Examination (PSE)37 or the Schedules for Assessment in Clinical Neuropsychiatry (SCAN World Health Organisation).38 Three items from the full HADS were identified as misfitting the Rasch model, in addition to one misfitting item from the subscales. Removal of the items had little or no impact on the specificity and sensitivity of the scales (including the area under the curve [AUC]). Similarly, Tang and colleagues22 identified four items from the GDS that did not fit the Rasch model. The GDS data were derived from a community sample of patients with pneumoconiosis who had also received a structured psychiatric interview with the aim of diagnosing depressive disorders. Once again, the results demonstrated that removing the misfitting items did not affect the AUC or sensitivity and specificity. Item Banking and Computer-Adaptive Testing The ability of the Rasch model to derive item locations for different instruments and to allow evaluations of whether these items form a unidimensional construct creates the opportunity to generate item banks. Various methods
94
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
exist for item banking39; however, a frequently employed approach is common item equating,10 where patients complete a core set of questionnaires. Additional items or instruments may be added by anchoring the locations for the core set of items. Typically in this scenario patients will have completed the core set of items along with further items. The benefit of item banking is that patients do not have to complete all the questionnaires, which therefore reduces not only patient burden but also the costs of developing the item bank. After item banks are developed, two further steps can be taken: (1) the development of multiple fixed short forms derived from the item bank (see Ware and associates40 for an example of the development of a short form of the headache impact scale) and (2) the development of computer-adaptive tests. Computer-adaptive tests (eg, Wainer18) tailor the items presented to the patient on the basis of his or her previous responses. They generally present an initial item aimed at the average level of the latent trait in the target population (eg, average level of depression); subsequent questions presented are either easier or harder to endorse. At each step the patient’s level of latent trait (eg, depression) is estimated until a predetermined number of questions has been presented or the standard error of the estimate falls below a given predetermined level. Computer-adaptive test systems provide a greater level of precision in estimating the latent trait and may be designed to allow a broad assessment of, for instance, depression, or specifically designed to present more questions around diagnostic categories. Another benefit of these systems is that fewer questions need to be completed by the patient (for the same or greater level of accuracy). The development of item banks and computer-adaptive tests has been progressing apace in fields such as physical health,41 although in mental health this area is still in its infancy. However, recently an item bank has been developed for assessing psychological distress in cancer patients.17 A large sample of cancer patients completed the HADS25 in addition to a variety of other instruments, including the GHQ-12,42, BDI,43 PHQ-9,44 and Spielberger State-Trait Anxiety Inventory (STAI).45 Common item equating using the HADS as the anchor was used to create the item bank. The initial 83 items were reduced to a unidimensional item bank with good internal reliability (Cronbach’s alpha = 0.84) consisting of 63 items once misfitting items had been removed. An analysis of the item-person map (see Fig. 4.1) demonstrated good face validity: questions concerning suicidal ideation were hardest to endorse, whereas questions concerning fatigue and energy were easiest to endorse. Further analysis of the item-person map revealed that items tended to be targeted at moderate to high levels of distress, indicating a floor effect for low levels of distress, potentially requiring additional items.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES
95
Computer-adaptive tests have already been developed for use with psychiatric populations to identify emotional distress.4,5 Fliege and associates4 have developed a system for measuring depression (‘‘D-CAT’’) in a psychosomatic patient sample. Patients completed 11 mental health questionnaires that were subsequently rated as indicative of depressive symptomatology by expert reviewers. A total of 320 items from the original questionnaires produced an item bank of 64 items. A simulation study using patients’ actual responses to the questions demonstrated that levels of depression could be estimated reliably from six items. Scores generated from the D-CAT system fell within 2 standard deviations of the sample mean and correlated well with the overall item bank and two standard mental health measures (BDI and CES-D). Finally, recently Gibbons and colleagues46 developed a computer-adaptive test derived from the 626-item Mood and Anxiety Spectrum Scales (MASS). This system was designed to identify anxiety and mood disorders in patients attending outpatient clinics. The study demonstrated that the number of items presented to patients could be reduced to 24 to 30 items without a loss of information, representing a significant reduction in both administration time and patient burden.
3.
Conclusion
Despite the intuitive appeal and ease of use of brief self-report instruments to screen for depressive disorders, there remains a great deal of variability in the efficacy of a number of commonly employed instruments. Many instruments have been comprehensively validated by traditional test methods, but issues still remain about unidimensionality, floor and ceiling effects, and instrument performance across different groups of patients. Rasch models7 have the potential to address and overcome these issues, generating instruments that are independent across samples and providing the basis for item banks and computer-adaptive tests. Although item banking is a relatively new area of development in health measures, the U.S. National Institutes of Health has recently provided major funding for the Patient-Reported Outcomes Measurement Information System (PROMIS) initiative, with one of the goals to produce computer-adaptive tests for the clinical research community (http://nihroadmap.nih.gov/clinicalresearch/ promis.asp). The next step in the development of the item bank will be to develop computer-adaptive testing systems. An important corollary to this will be to continue to map the item bank, in particular levels of emotional distress, to both psychiatric diagnoses of clinical anxiety and major depression, as well as clinical guidelines. This will not only provide a potentially more sensitive instrument for
96
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
assessing and screening for distress, but will also assist in tailoring the management of distress and associated interventions to individual patients.
References 1. Wright AF. Should general practitioners be testing for depression? Br J Gen Pract. 1994;44(380):132–135. 2. Gilbody S, House AO, Sheldon TA. Screening and case finding instruments for depression. Cochrane Database of Systematic Reviews. 2005, Issue 4. 3. Gilbody S, Sheldon T, House A. Screening and case-finding instruments for depression: a meta-analysis. CMAJ. 2008;178:997–1003. 4. Fliege H, Becker J, Walter OB, et al. Development of a computer-adaptive test for depression (D-CAT). Qual Life Res. 2005;14:2277–2291. 5. Walter OB, Becker J, Bjorner JB, et al. Development and evaluation of a computer adaptive test for ‘Anxiety’ (Anxiety-CAT). Qual Life Res. 2007;16:S143–S155. 6. Suen HK. Principles of test theories. Hillsdale, NJ: Lawrence Erlbaum Associates, 1990. 7. Rasch G. Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press, 1960/1980. 8. Wright BD, Masters G. Rating scale analysis. Chicago: MESA Press, 1982. 9. Bond TG, Fox CM. Applying the Rasch model: fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum Associates, 2001. 10. Linacre JM. A user’s guide to WINSTEPS/MINISTEPS Rasch-model computer programs. 2007. 11. Lai JS, Cella D, Chang CH, et al. Item banking to improve, shorten and computerize self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale. Qual Life Res. 2003;12(5):485–501. 12. Wright BD, Linacre JM, Gustafson J-E, et al. Reasonable mean-square fit values. Rasch Measurement Transactions. 1994;8:370. 13. Stucki G, Daltroy L, Katz JN, et al. Interpretation of change scores in ordinal clinical scales and health status measures: the whole may not equal the sum of the parts. J Clin Epidemiol. 1996;49:711–717. 14. Lai JS, Eton DT. Clinically meaningful gaps. Rasch Measurement Transactions. 2002;15:850. 15. Andrich D. A rating formulation for ordered response categories. Psychometrika. 1978;43:561–573. 16. Masters GN. A Rasch model for partial credit scoring. Psychometrika. 1982;47:149–174. 17. Smith AB, Rush R, Velikova G, et al. The initial development of an item bank to assess and screen for psychological distress in cancer patients. Psychooncology. 2007;16:724–732. 18. Wainer H. Computerized adaptive testing: a primer. Hillsdale, NJ: Lawrence Erlbaum Associates, 1990. 19. Schumacker RE, Linacre JM. Factor analysis and Rasch. Rasch Measurement Transactions. 1996;9:470. 20. Bouman TK, Kok AR. Homogeneity of Beck’s Depression Inventory (BDI): applying Rasch analysis in conceptual exploration. Acta Psychiatr Scand. 1987;76(5):568–573.
4 HOW TO TEST, REFINE, AND IMPROVE EXISTING SCALES
97
21. Hong S, Min SY. Mixed Rasch modeling of the Self-Rating Depression Scale incorporating latent class and Rasch rating scale models. Educ Psych Measure. 2007;67(2):280–299. 22. Tang WK, Wong E, Chiu HF, et al. The Geriatric Depression Scale should be shortened: results of Rasch analysis. Int J Geriatr Psychiatry. 2005;20:783–789. 23. Olsen LR, Mortensen EL, Bech P. The SCL-90 and SCL-90R versions validated by item response models in a Danish community sample. Acta Psychiatr Scand. 2004;110(3):225–229. 24. Radloff LS. The CES-D scale: A self-report depression scale for research in the general population. Applied Psych Measure. 1977;384–401. 25. Zigmond AS, Snaith RP. The hospital anxiety and depression scale. Acta Psychiatr Scand. 1983;67:361–370. 26. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry. 1960;23:56–62. 27. Cox JL, Holden JM, Sagovsky R. Detection of postnatal depression. Development of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987;150:782–786. 28. Stansbury JP, Ried LD, Velozo CA. Unidimensionality and bandwidth in the Center for Epidemiologic Studies Depression (CES-D) Scale. J Pers Assess. 2006;86:10–22. 29. Covic T, Pallant JF, Conaghan PG, et al. A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health Qual Life Outcomes. 2007;5:41. 30. Pickard AS, Dalal MR, Bushnell DM. A comparison of depressive symptoms in stroke and primary care: applying Rasch models to evaluate the Center for Epidemiologic Studies-Depression scale. Value Health. 2006;9:59–64. 31. Smith AB, Wright EP, Rush R, et al. Rasch analysis of the dimensional structure of the Hospital Anxiety and Depression Scale. Psychooncology. 2006;15:817–827. 32. Pallant JF, Tennant A. An introduction to the Rasch measurement model: an example using the Hospital Anxiety and Depression Scale (HADS). Br J Clin Psychol. 2007;46:1–18. 33. Pallant JF, Miller RL, Tennant A. Evaluation of the Edinburgh Postnatal Depression Scale using Rasch analysis. BMC Psychiatry. 2006;6:28. 34. Cole JC, Rabin AS, Smith TL, et al. Development and validation of a Rasch-derived CES-D short form. Psychol Assess. 2004;16:360–372. 35. Licht RW, Qvitzau S, Allerup P, et al. Validation of the Bech-Rafaelsen Melancholia Scale and the Hamilton Depression Scale in patients with major depression; is the total score a valid measure of illness severity? Acta Psychiatr Scand. 2005;111:144–149. 36. Aggen SH, Neale MC, Kendler KS. DSM criteria for major depression: evaluating symptom patterns using latent-trait item response models. Psychol Med. 2005;35:475–487. 37. Wing J Cooper JE, Sartorius N. The description of psychiatric symptoms: an introduction manual for the PSE and CATEGO System. Cambridge: Cambridge University Press, 1974. 38. World Health Organization. Mental health: new understanding, new hope. Geneva, Switzerland: WHO, 1993. 39. Wolfe EW. Equating and item banking with the Rasch model. J Applied Measure. 2000;1(4):409–434. 40. Ware JE Jr, Kosinski M, Bjorner JB, et al. Applications of computerized adaptive testing (CAT) to the assessment of headache impact. Qual Life Res. 2003;12(8):935–952.
98
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
41. Rose M, Bjorner JB, Becker J, et al. Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). J Clin Epidemiol. 2008;61:17–33. 42. Goldberg DP, Hillier VF. A scaled version of the General Health Questionnaire. Psychol Med. 1979;9:139–145. 43. Beck AT, Ward CH, Mendelson M, et al. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561–571. 44. Kroenke KJ, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. Gen Intern Med. 2001;16:606–613. 45. Spielberger CD. Manual for the State-Trait Anxiety Inventory (STAI). Palo Alto, CA: Consulting Psychologists Press, 1983. 46. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatr Serv. 2008;59(4):361–368.
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL? Alex J. Mitchell
1. 2. 3. 4. 5.
How Do Clinicians Make a Diagnosis? Scientific Aspects of Diagnostic Accuracy Clinical Aspects of Diagnostic Accuracy Testing Screening via Implementation Studies Conclusions
Context There is no shortage of suggested methods to screen for depression, including clinical interviews. Assuming these are applied to a group containing patients with depression and patients without depression, how do we decide which are the optimal methods? In addition, how can tests be compared and how can tests be combined? This chapter discusses the methods used to compared scales and tools.
1.
How Do Clinicians Make a Diagnosis?
The terms diagnosis and screening both refer to the application of an agreed method to confirm those with a condition and to exclude those without the condition (for discussion see Chapter 2). When attempting to separate depressed versus non-depressed individuals there is always an overlap of symptoms (or biological markers) (see Chapter 1, Fig. 1); therefore, a perfect test based on current tests is unobtainable. Testing may be focused on those at high risk of the condition (such as screening for depression after myocardial infarction) or applied to a wider population (screening for depression in all 99
100
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
primary care patients). The former is a high-prevalence setting, which favors the ability to confirm a condition, whereas the latter is a low-prevalence setting, which favors the ability to refute a condition. It is often forgotten that the clinical process of making a diagnosis is a form of screening itself. Here the tool is the clinician’s clinical skill and the sample is all patients seen by the clinician. If a clinician is attuned to the concept of depression, has a high index of suspicion, and asks the right questions, then it is likely he or she will have high personal diagnostic accuracy. If the clinician is unconfident, inexperienced, and untrained, it is less likely that he or she will be able to make a correct diagnosis (see Table 5.1 and Chapter 3). Some literature suggests that the added value of screening tools for depression is apparent only in the latter situation. A diagnostic test for depression is designed to help the clinician elicit and weigh symptoms and signs to make a diagnosis. How, then, is this achieved, and how does a screening test work in scientific terms?
Case Example Consider the case illustrated in Textbox 5.1. A man who suffered a stroke 2 months previously now complains of five troubling symptoms. Assuming these symptoms are elicited correctly, is he clinically depressed? Could the somatic symptoms be features of stroke and not depression (see Chapters 10 and 11)? Five symptoms may immediately sound sufficient for a diagnosis, but not all symptoms qualify under DSM-IV or ICD-10. For example, loss of drive is not a qualifying feature and therefore, under these guidelines, must be ignored. This leaves four qualifying symptoms and only one core symptom, which is insufficient for a DSM-IV-based diagnosis of major depression. However, using ICD-10, he does have two core features and two associated features listed, but only at a level designated as a mild depressive episode. Thus, clinicians who use a strict operational checklist approach may or may not diagnose depression in this case. In fact, research suggests that fewer than one in five psychiatrists would take this strict operational approach, and fewer still use validated questionnaires such as the Patient Health
Table 5.1. Levels of Diagnostic Confidence
Use a checklist or screening tool Do not use a checklist or screening tool
Prior Experience & Training
No Prior Experience & Training
i. Trained, Assisted
ii. Untrained, Assisted
iii. Trained, Unassisted
iv. Untrained, Unassisted
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL?
101
Textbox 5.1. Case History: Post-Stroke Depression? A previously well 58-year-old man who suffered a dominant hemisphere stroke 2 months previously is referred to an outpatient psychiatry clinic. He reports that he has had five symptoms—low mood, loss of drive, low energy, poor appetite, and insomnia—for the past 3 weeks. He has no other symptoms on detailed questioning. Core Symptoms
ICD-10
DSM-IV
Persistent sadness or low mood Loss of interests or pleasure Fatigue or low energy Disturbed sleep Poor concentration or indecisiveness Low self-confidence Poor or increased appetite Suicidal thoughts or acts Agitation or slowing of movements Guilt or self-blame Significant change in weight
Yes (core) Yes (core) Yes (core) Yes Yes Yes Yes Yes Yes Yes No
Yes (core) Yes (core) Yes Yes Yes No No Yes Yes Yes Yes
Questionnaire (PHQ)-9. Most trained psychiatrists rely on their own clinical skills. Similarly, in primary care, in a survey of 2,500 Australian primary care practitioners (PCPs), Krupinski and Tiller (2001)1 found that 28% asked about at least five of the nine standard DSM-IV symptoms. The two symptoms that were most frequently asked about were sleep disturbance (cited by 86.8%) and loss of appetite (cited by 55.6%). Only 0.2% of this sample said they would make a diagnosis using a rating scale.
Toward Evidence-Based Diagnosis Is ICD or DSM right to place more weight on some symptoms than others? If so, there must be evidence that specific symptoms have more diagnostic importance than others. This means that these methods have been subject to comparative diagnostic validity testing. Most clinicians (psychiatrists and non-psychiatrists alike) use their own clinical acumen to make a diagnosis without using any specific tool, but they may have personal experience of the diagnostic importance of specific symptoms. Even those using DSM-IV still have to use clinical judgment because there are no recommended structured
102
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
questions in DSM.2 Conventional clinical method replies on experience and pattern recognition, whereas actuarial judgment uses decision theory informed by empirically established tests.3 In both cases, reaching a diagnosis means narrowing down a long list of possibilities in light of accumulating clinical evidence. However, in the former case it is difficult to check for inaccuracy, whereas in the latter case there is an attempt to diagnose on the basis of calculated probabilities. The standard model for this task is Bayes’ theorem, which calculates post-test probability in relation to the baseline probability (Fig. 5.1). The baseline (pre-test) probability of the condition is the local prevalence of the disease, and the post-test probability is the probability of disease given new information such as a positive test result.4 Before assuming that assisted methods (eg, screening) are helpful, it is worth checking on the evidence base for unassisted detection (see Chapter 3).
Textbox 5.2. Definitions of Measures of Diagnostic Accuracy Sensitivity (Se) A measure of accuracy defined the proportion of patients with disease in whom the test result is positive: a/(a + c) Specificity (Sp) A measure of accuracy defined as the proportion of patients without disease in whom the test result is negative: d/(b + d) Positive Predictive Value A measure of rule-in accuracy defined as the proportion of true positives in those with a positive screening result: a/(a + b) Negative Predictive Value A measure of rule-out accuracy defined as the proportion of true negatives in those with a negative screening result: c/(c + d) Youden’s J A composite of overall accuracy using sensitivity and specificity that is unaffected by prevalence: sensitivity + specificity – 1 Predictive Summary Index A composite of overall accuracy using all positive and negative screens that reflects the prevalence: PPV + NPV – 1 Kappa An index that compares the agreement against that which might be expected by chance. Kappa can be thought of as the chance-corrected proportional agreement: (Observed agreement – Chance agreement)/(1 – Chance agreement)
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL?
103
Decision Trees Test Positive Condition Pre valence
Sensitivity Test Negative 1-sensitivity
Screen
Test Positive No condition
1-specificity
1-Pre valence
Test Negative specificity
Condition Pre valence Don’t Screen No condition 1-Pre valence
Treated condition Pre valence X Sens Untreated condition Pre valence X T-Sens False positive 1-Prev X 1-Spec Healthy child 1-Prev X Spec Untreated condition Pre valence
Healthy child 1-Pre valence
Figure 5.1. Decision Theory.
2.
Scientific Aspects of Diagnostic Accuracy
Attempts to distinguish patients with a condition from those without on the basis of a test or clinical method are most simply represented by a 2 2 table that generates sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) (Textbox 5.2).5 It is critical to understand the difference between looking vertically down cells and looking horizontally across (Figure 5.2). Vertically, the denominator is the number of cases with or without the condition, a number that is unknown to the clinician but is known in a research setting with a gold standard. Horizontally, the dominator is the number of positive or negative screens, a number that is known to clinicians and hence the reason why PPV and NPV reveal proportions of interest in the real world. There is a complex relationship between these variables. In real life the performance of a test varies with the baseline prevalence of the condition. Put simply, it is simple to spot cases when nothing but cases exist (prevalence = 100%); conversely, it is hard when the prevalence is low.6 Rule-in and rule-out accuracy are essentially independent variables, although a test may perform well in both directions. Rule-in accuracy is best measured by the PPV, but a high specificity also implies there are few false positives, and hence any positive results will suggest a true case.7 Rule-out accuracy is best measured by the NPV where the denominator is all who test negative, but again if the
104
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Gold Standard Gold Standard Disorder No Disorder Test +ve
A
B
A/A + B PPV
C
D
D/C + D NPV
A/A + C Se
D/B + D Sp
Test –ve
Total
Figure 5.2. Generic 2 2 Table.
sensitivity is high, there will be few false-negative results, and hence any negative implies a true non-case. Optimal accuracy is often achieved by choosing one test for rule in (casefinding) and another for rule out, but not uncommonly only a single test can be applied and it must perform as well as possible in both directions. In this situation summary accuracy statistics are useful. The simplest are Youden’s J and the predictive summary index, which are essentially averages of sensitivity + specificity and PPV + NPV, respectively.8 The fraction correct (ratio of true cases and non-cases/all cases and non-cases) is also useful, as it can easily be used to compare different methods. All such methods work well when the optimal cutoff is known or in binary (yes/no) tests. However, where performance varies by cutoff threshold, sensitivity versus specificity for each cutoff generates a receiver-operator curve, and the area under the curve gives a measure of the overall performance. Where multiple tests need to be compared, each with different optimal sensitivity and specificity values, results can be combined in a summary receiver operator characteristic curve (sROC).9 Additionally when the relative importance of false positives or false negatives is significant, then a cutoff may be chosen that favors rule-in or rule-out accuracy.
Likelihood Ratios Likelihood ratios can be clinically useful because they do not vary with prevalence and because they can be calculated for several levels of test result. A positive likelihood ratio is the odds that a positive test result came from a patient with the disorder (sensitivity/[1 – specificity]). The negative likelihood ratio represents the odds that a negative result came from a patient with the disorder ([1 – sensitivity]/specificity). A normogram (Fig. 5.3) has been developed for use with likelihood ratios to determine the post-test probability of disease if the pre-test probability and the likelihood ratio for the specific test are known. A likelihood ratio greater than 1 produces a post-test probability that is higher than the pre-test probability.
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL?
0.1
99
0.2
98
0.5 1 2
95 2000 1000 500
90
200 100 50
80
10
20 10 5
60 50 40
20
2 1
30
5
70
30
0.5
20
40 50
0.2 0.1 0.05
10
60
0.02 0.01 0.005
70 80
0.002 0.001 0.0005
90
5 2 1
95
0.5
98
0.2
99
0.1
Pre-Test Probability (%)
105
Likelihood Ratio
Post-Test Probability (%)
Figure 5.3. Likelihood Ratio Normogram.
3.
Clinical Aspects of Diagnostic Accuracy
The best way to understand the clinical applicability of a screening test is to consider the example listed in Textbox 5.1. The patient complains of five symptoms and has data from a single Hospital Anxiety and Depression ScaleDepression (HADS-D) rating. Are these symptoms likely to be symptoms of depression or do they occur in people with stroke who are not depressed? The diagnostic impact of each piece of information can be evaluated scientifically, provided its rate of occurrence is known in both groups (Textbox 5.3 lists these rates). The occurrence rate in the depressed sample is in fact the sensitivity of each specific item. Thus, the symptom with optimal sensitivity
106
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
is ‘‘persistent low mood.’’ Specificity is derived from the non-occurrence in the non-depressed subject, and in this case the optimal specificity is a HADS score of 9 of above, closely followed by poor appetite. However, does this mean these are the best ‘‘tests’’ for this condition?
Textbox 5.3. Post-Stroke Depression: Symptom Counts A previously well 58-year-old man who suffered a dominant hemisphere stroke 2 months previously is referred to an outpatient psychiatry clinic. He reports that he has had five symptoms—low mood, loss of drive, low energy, poor appetite, and insomnia—for the last 3 weeks. His score on the HADS depression scale is 9 out of 21. Out of the last 100 patients seen in this clinic, 54 were depressed. Patient’s Symptoms
% of Depressed Stroke Patients from Previous Studies
Persistent low mood Loss of drive Low energy Disturbed sleep Poor appetite HADS score 9 or above
93% 88% 87% 83% 45% 60%
% of Nondepressed Stroke Patients from Previous Studies 18% 30% 32% 32% 11% 9%
Pre-Test–Post-Test Change As previously noted, raw sensitivity and specificity figures are of only moderate use by themselves. More useful are the PPV and NPV, which can be calculated from the above data. The data from Textbox 5.3 are reproduced in detail in Table 5.2. From this study of 1,000 people following stroke, we see the complexity of deciding upon the optimal test. Persistent low mood is the symptom with highest sensitivity and NPV. Thus, if low mood is not present, there is a 98% chance of identifying a healthy subject on this symptom alone. This alone improves upon the pre-test probability of 0.80 by 0.18 (pre–post gain) (Fig. 5.4). Similarly, if all five symptoms listed are present, there is an 88% chance of major depression, a large pre–post gain. This is different from calculating the value of any one of the five symptoms, which compares ‘‘or’’ rather than ‘‘and’’ combination.
Table 5.2. Summary of Diagnostic Accuracy Results from a Hypothetical Study of Post-Stroke Depression Patient’s Symptoms Single Symptoms Persistent low mood Loss of drive Low energy Disturbed sleep Poor appetite Composite Measures All five symptoms PHQ2 (Q1 or Q2 positive) HADS: score 9 or above Algorithm: PHQ2 then HADS (if positive)
Depressed after Stroke
TP
Sensitivity NonDepressed after Stroke
TN
Specificity
200 200 200 200 200
186 176 174 166 90
0.93 0.88 0.87 0.83 0.45
200 200
56 160
200 200
PPV NPV Youden PSI
FC
UI+ UI
800 800 800 800 800
656 560 544 544 712
0.82 0.70 0.68 0.68 0.89
0.56 0.42 0.40 0.39 0.51
0.98 0.96 0.95 0.94 0.87
0.75 0.58 0.55 0.51 0.34
0.54 0.38 0.36 0.33 0.37
0.84 0.74 0.72 0.71 0.80
0.52 0.37 0.35 0.33 0.23
0.28 0.80
800 800
792 560
0.99 0.70
0.88 0.40
0.85 0.93
0.27 0.50
0.72 0.85 0.25 0.84 0.33 0.72 0.32 0.65
130
0.60
800
728
0.91
0.64
0.91
0.51
0.56 0.86 0.39 0.83
96
0.48
800
778
0.97
0.81
0.88
0.45
0.70 0.87 0.39 0.86
Sample size = 1,000; prevalence = 0.20 TP, true positives; TN, true negatives; PSI, predictive summary index; FC, fraction correction; UI, utility Index.
0.80 0.67 0.65 0.64 0.77
108
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
1
Post-test probability
0.8
0.6
Max gain 0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
Prevalence of prior probability
Figure 5.4. Conditional probabilities graph of pre-test post-test gain from a hypothetical diagnostic test.
Surely, then, the five-symptom method is the best method to identify post-stroke depression? In the real world, the situation is more complex than it first appears because all five symptoms are positive in only 28% of true cases.
Clinical Utility of a Discriminating Test Even when a test has a high PPV or NPV, a correction is needed for occurrence of that test in each respective population. Thus, in this example, if a combination of five symptoms occurs, then it is 88% likely that major depression is present; however, this combination is actually uncommon (28%) in clinical practice. For the clinician, any test with a high PPV will be devalued if it occurs rarely in true cases. Clinically relevant rule-in accuracy (also known as the positive utility index) is a product of the PPV and sensitivity. Thus, the positive utility index for all five symptoms is 0.88 0.28 = 0.32. A similar calculation applies for ruling out a diagnosis. For example, the symptom ‘‘loss of drive’’ has a high NPV but is negative in only 70% of non-depressed stroke patients. Thus, its corrected rule-out value can be calculated by the negative utility index, 0.96 0.70 = 0.67. Utility index scores can be converted into qualitative grades as follows: excellent 0.81, good 0.64, satisfactory 0.49, and poor < 0.49.
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL?
109
In this example, the most useful population-based rule-in test is low mood, although it is only a ‘‘satisfactory’’ test. The most useful rule-out test is the algorithm approach, which can be graded as an ‘‘excellent’’ rule-out test. Algorithm approaches are worth examining in a bit more detail.
Algorithm Approaches In this example, three questionnaire approaches are shown. The PHQ-2 achieves modest sensitivity and specificity and identifies 77% of all true cases. The HADS-D has excellent specificity and NPV and thus could be used as a rule-out test. Indeed, it could be combined with a high cutoff (eg, 15v16) as a good rule-in test, leaving a cohort scoring 9 to 15 as diagnostically uncertain and requiring a second-stage test. The HADS can also be combined with another questionnaire, in this case the PHQ-2 (see Appendix Fig. 2). This is a basic algorithm approach where a second test is applied only in those positive in the first step. This two-step strategy has the effect of reducing the false positives, improving the PPV and specificity but at the expense of sensitivity and NPV. In low-prevalence conditions, the overall gain in accuracy may be worth the effort of the extra step. Thus, the two-step strategy improves on the 0.40 PPV from the PHQ-2 alone to 0.81 but reduces the NPV from 0.93 to 0.88. However, there is an overall gain in accuracy from 65% to 86% correctly identified. Clinicians may use their own clinical method as an algorithm—for example, offering a follow-up interview to those who are suspected of having a disorder on initial examination. The algorithm often offers a potential economic and efficiency advantage over a conventional approach. Here the majority of patients receive a simple, inexpensive screening test and a minority receive a more lengthy case-finding test. However, the algorithm approach is efficient only where the prevalence of a condition is very low (or very high, in which case the second step is applied to those who screen negative to reduce the false negatives). As the prevalence approaches 0.50, the yield of two-steps converges on the yield from onesteps. The gain is also at its greatest when the accuracy of the single-step approach is least (see Appendix Tables 3 and 4 for more details). A practical example of an algorithm approach to the detection of depression can be found here.10
4.
Testing Screening via Implementation Studies
Even a test of high predictive value and high utility index cannot be assumed to be beneficial. Guidelines from the U.K. National Screening Committee are
110
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 5.4. U.K. National Screening Committee Guidelines The condition should: Be an important health issue Have a well-understood history, with a detectable risk factor or disease marker Have cost-effective primary preventions implemented The screening tool should: Be a valid tool with known cutoff Be acceptable to the public Have agreed diagnostic procedures The treatment should: Be effective, with evidence of benefits of early intervention Have adequate resources Have appropriate policies as to who should be treated The screening program should: Show evidence that benefits of screening outweigh risks Be acceptable to public and professionals Be cost-effective (and have ongoing evaluation) Have quality-assurance strategies in place Adapted from UK National Screening Committee Criteria for appraising the viability, effectiveness and appropriateness of a screening programme. Available at: http://www.nsc.nhs.uk/pdfs/criteria.pdf
helpful here (Textbox 5.4). Ultimately, the case for a screening test has to be proven in an implementation study. This has two important parts: the feasibility of the tool in a clinical setting and the added value of the tool beyond what could be achieved without it.
Feasibility of Depression Screening Feasibility asks whether a tool is practical both in application and scoring to gain acceptance by healthcare professionals and patients. This has been rarely studied in relation to depression severity scales. Bermejo and associates (2005)11 looked at attitudes to the PHQ-9 in general practice in Germany. This study enrolled 1,034 patients from 17 PCPs; both patients and healthcare professionals were asked about acceptability. Patients found the instrument highly acceptable, but 62.5% of the PCPs thought it was too long and 37.5% thought it was too time-consuming, even though it typically took 1 to 2 minutes. Half of the PCPs rated the PHQ as an impediment to daily practice and 75% thought it was impractical, compared with
5 HOW DO WE KNOW WHEN A SCREENING TEST IS CLINICALLY USEFUL?
111
only 25% of patients. One proxy for feasibility is willingness of clinicians to use the test: any screening roll-out will be compromised if front-line staff find the tool too difficult to administer or score.
Added Value Demonstrating the possible benefit of a screening tool is akin to demonstrating benefit from a new medicine. Ideally, a randomized controlled trial using representative clinicians and patients takes place. The design should be a randomized trial where one group (arm 1) use their clinical skills uninfluenced by the study taking place (Hawthorn effect) and the other group (arm 2) use their clinical skills plus the screening tool or method. The advantage of this design is that the results reveal the unassisted detection rate (arm 1) as well as added value beyond usual care (the difference between arm 2 and arm 1). Possible stages of tool development are discussed in Chapter 4. Ideally, implementation should not stop with demonstration of superior detection; rather, it should attempt to demonstrate further patient benefits, such as better quality of care and greater resolution of depression. This is discussed further in Chapter 7.
5.
Conclusions
Although depression is one of the world’s most prevalent disorders and antidepressants are the most commonly prescribed class of drug, the science of diagnosing depression has been hampered by the paucity of simple studies documenting the rate of symptoms and signs in depressed and non-depressed subjects. Once these data become available, calculating the diagnostic value of specific symptoms (both individually and in combination) becomes straightforward. Better data exist for depression severity scales and other assisted methods. Beyond this, further implementation studies are required in which the true benefit of all proposed diagnostic methods to patients are compared with conventional unassisted approaches.
References 1. Krupinski J, Tiller J. The identification and treatment of depression by general practitioners. Aust N Z J Psychiatry. 2001;35:827–832. 2. Steiner JL, Tebes JK, Sledge WH, et al. A comparison of the structured clinical interview for DSM-III-R and clinical diagnoses. J Nerv Ment Dis. 1995;183:365–369. 3. Steadman HJ, Silver E, Monahan J, et al. A classification tree approach to the development of actuarial violence risk assessment tools. Law and Human Behavior. 2000;24:83–100.
112
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
4. Elstein AS, Schwarz A. Clinical problem solving and diagnostic decision making: selective review of the cognitive literature. BMJ. 2002;324:729–732. 5. Yerushalmy J. Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. Pub Health Rep. 1947;62:1432–1449. 6. Whiting P, Rutjes AWS, Dinnes J, et al. Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Technology Assessment. 2004;8(25):1–234. 7. Sackett DL, RB Haynes. The architecture of diagnostic research. BMJ. 2002;324:539–541. 8. Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3:32–35. 9. Macaskill P. Empirical Bayes estimates generated in a hierarchical summary ROC analysis agreed closely with those of a full Bayesian analysis. J Clin Epidemiol. 2004;57:925–932. 10. Thombs BD, Ziegelstein RC, Whooley MA. Optimizing detection of major depression among patients with coronary artery disease using the Patient Health Questionnaire: Data from the Heart and Soul Study. J Gen Intern Med. 2008;23(12):2014–2017. 11. Bermejo I, Niebling W, Mathias B, et al. Patients’ and physicians’ evaluation of the PHQ-D for depression screening. Primary Care & Community Psychiatry. 2005;10(4):125–131.
6 CLINICAL JUDGMENT AND THE INFLUENCE OF SCREENING ON DECISION MAKING Howard N. Garb
1. Introduction 2. Research on Clinical Judgment 3. The Limits of Screening
Context How do clinicians arrive at diagnostic decisions? In most cases the decision is not made following formal criteria, but by intuition. In addition, routine interviews are often narrow and the feedback gleaned from patients is inadequate. Yet it is not clear if screening helps or hinders clinical judgment. It might be that only clinicians who have low confidence and interviewing and diagnostic skills are open to the use of and actually helped by diagnostic tools.
1. Introduction To provide a theoretical framework for understanding why it is difficult for physicians to detect depression in primary care settings, a broad array of research in the mental health fields can be described. For example, more than 1,000 studies have been conducted on clinical judgment in the area of mental health practice,1,2 and the results from these studies can be used to illuminate the challenges physicians face in judging whether a patient is clinically
*
The views expressed in this article are those of the author and are not the official policy of the Department of Defense or the United States Air Force.
113
114
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
depressed and can benefit from treatment. In this chapter, results on clinical judgment will be described. A second topic will also be briefly discussed. Results from research on clinical judgment would seem to indicate that screening should be of value. Yet, as noted in Chapter 7, stand-alone screening programs have added little or nothing to outcomes. Reasons for this unexpected result will be explored.
2. Research on Clinical Judgment Three topics will be discussed: (1) narrowness of interviews, (2) nature of patient feedback, and (3) the cognitive processes of clinicians.
Narrowness of Interviews Depression goes undetected because in many cases physicians do not ask patients if they have symptoms of a depressive mood disorder.3 To place this in context, it can be noted that mental health professionals also often do not ask patients about important symptoms and behaviors. Failure to inquire about depression in primary care settings can be viewed in the broader context of failure to inquire about important symptoms and events in mental health settings. Research on clinical judgment has demonstrated that lack of comprehensiveness is often a problem for interviews made in clinical practice. For example, in one study,4 mental health professionals saw patients in routine clinical practice, and afterwards research investigators conducted semi-structured interviews with the patients. Remarkably, the mental health professionals had evaluated only about 50% of the symptoms that were recorded using the semi-structured interviews. Similarly, a number of studies have found that mental health professionals often do not ask about important events when formulating a case history. For example, in a study by Malone and associates (1995),5 clinicians at a psychiatric hospital failed to document a history of suicidal behavior for 12 of 50 patients who had a history of suicidal behavior. This is important because past suicidal behavior is one of the best predictors of suicide. In another study,6 26 of 69 psychiatric inpatients reported on a research questionnaire that they were victims of severe physical abuse by family members or partners during the past year. The abuse had been documented in medical charts for only nine of the patients. To give one more example, in another study a computer interview was used to collect a psychiatric history.7 Important history information was obtained using the computer interview that had not been obtained by mental health professionals in the course of their routine work. This was especially
6 THE INFLUENCE OF SCREENING ON DECISION MAKING
115
true for obtaining information about criminal history (26% of patients), amnesic blackouts after drinking heavily (23%), repeatedly being fired from jobs (17%), recent drug abuse (10%), and debts (10%). Another type of error that occurs when evaluations of psychopathology are not comprehensive is called diagnostic overshadowing. Diagnostic overshadowing is said to occur when clinicians make one or two diagnoses but overlook other disorders.8,9 For example, when diagnoses are made by mental health professionals, mental disorders tend to be missed among clients with mental retardation,10,11 alcohol and drug abuse is often underdiagnosed among clients presenting with psychiatric problems,12 and diagnoses of personality disorder are often missed among clients with an Axis I disorder (eg, among clients with obsessive-compulsive disorder).13 If mental health professionals fail to ask about important emotional and behavioral problems and overlook mental disorders, it is not surprising that physicians who are not trained in psychiatry do the same. Since patients in primary care settings almost always present with physical complaints, we should not be surprised when diagnostic overshadowing occurs and physicians do not explore other possible problems.
Nature of Patient Feedback Another reason why physicians may have difficulty detecting depression in primary care settings is because they are unlikely to receive accurate feedback. If a patient with clinically significant depression presents with a medical problem and the physician misses the diagnosis, it is unlikely that the physician will later learn that the diagnosis of depression was missed. One of the most surprising findings on clinical judgment is that it can be very difficult to learn from clinical experience. Training is often positively related to validity, but experience is not.14,15 Thus, once physicians and mental health professionals complete residency or graduate-school levels of training, the amount of experience they gain is weakly related, or even negatively related, to the accuracy of judgments and treatment outcomes. In a review of the literature on the relationship between clinical experience and quality of healthcare,16 physicians who had been in practice longer were found to be at risk for providing lower-quality care. A decreasing level of performance (or treatment) was associated with increasing years in practice for all outcomes assessed in 32 of 62 studies. In the other studies, decreasing level of performance was associated with increasing experience for some outcomes but not for others (13 of 62 studies), no association was observed for 13 of 62 studies, mixed results were obtained for 3 of 62 studies, and an increasing level of performance with increasing years in practice for all outcomes was obtained in 1 of 62 studies.
116
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Similarly, in routine clinical practice in the mental health fields, professionals with extensive clinical experience are typically no more accurate than other clinicians. For example, in one study,17 different participants (eg, marital therapists, undergraduates) viewed videotaped conversations of 10 married couples and predicted which couples were likely to divorce in the future. Attitudes about marriage, but not amount of clinical experience, were related to the validity of predictions.
The Cognitive Processes of Clinicians It is likely that depression often goes undetected in primary care settings not only because interviews are narrow and feedback is inadequate, but also because the cognitive processes of clinicians are fallible. The primacy effect, confirmatory hypothesis testing, cognitive heuristics, and causal reasoning are described in this section. One can wonder if one reason physicians miss diagnosing depression is because they make judgments too quickly. The tendency to make judgments quickly, sometimes after collecting relatively few data, is called the primacy effect. It is characteristic of social judgments made in everyday situations as well as of clinical judgments made in mental health settings.1,18 For example, Gauron and Dickinson reported that psychiatrists who observed a videotaped interview routinely formed diagnostic impressions in 30 to 60 seconds.19 Similarly, Kendell found that psychiatrists are often ready to make a diagnosis for a patient within a few minutes.20 One can wonder if physicians in primary care settings also tend to reach conclusions surprisingly quickly, and if this is a reason for their missing diagnoses of depression. Another reason depression may go undetected is because physicians may rely on confirmatory hypothesis testing. Confirmatory hypothesis testing refers to a tendency to seek, use, and remember information that is likely to confirm, but not refute, a hypothesis. Research on clinical judgment indicates that mental health professionals tend to seek and remember information that will support a hypothesis and this leads them to not consider alternative hypotheses. For example, in an especially well-designed study,21 psychology graduate students watched a videotape of an initial psychotherapy session. They listed questions they would like to ask the client portrayed in the videotape, and they described their reasons for wanting to ask the client these questions. An independent panel of psychologists coded each question as being likely to elicit information that could confirm or disconfirm their hypothesis. The style of hypothesis testing was confirmatory 64% of the time, neutral 21% of the time, and disconfirmatory 15% of the time. These results, along with results from other studies, provide insight into why clinicians do not routinely consider alternative hypotheses.
6 THE INFLUENCE OF SCREENING ON DECISION MAKING
117
Cognitive heuristics are simple rules that describe how judgments are made. Made famous by Daniel Kahneman and Amos Tversky, cognitive heuristics describe cognitive processes that allow us to efficiently process vast amounts of information.22 However, these same cognitive processes also cause us to sometimes make characteristic types of mistakes. Cognitive heuristics include the affect, representativeness, and availability heuristics. The affect heuristic refers to the fact that people often make judgments and decisions based, in part, on their feelings. ‘‘Snap judgments’’ and judgments based on ‘‘gut instinct’’ or intuition are often described by the affect heuristic. Kahneman believes that the formulation of the affect heuristic is ‘‘probably the most important development in the study of judgment heuristics in the past few decades.’’23, p. 703 But how does the affect heuristic relate to the detection of depressive disorders in primary care settings? For whatever reasons, in many cases, physicians’ reliance on affect and intuition does not allow them to detect depression in these settings. The representativeness heuristic is said to be descriptive of a clinician’s cognitive processes when a judgment is made by deciding if a patient is representative of a category.24 For example, when a screening instrument indicates that a patient may be depressed and physicians must decide if treatment for depression is required, the physicians may compare the patient to (a) patients they have worked with who have been clinically depressed, (b) their concept of the ‘‘typical’’ person with clinically significant depression, or (c) a theoretical standard that serves to define clinically significant depression. The representativeness heuristic is often descriptive of how judgments are made in everyday life,25 and it is even descriptive of how many mental health professionals make diagnoses.26 Since the representativeness heuristic is often descriptive of how people make judgments, it is likely to also be descriptive of physicians in primary care settings. If they are not comparing patients to appropriate exemplars, stereotypes, or prototypes, then this may explain why they are having difficulty with this task. The third heuristic, the availability heuristic, is descriptive of memory when clinicians are influenced by the ease with which events or different patients can be remembered. For example, the ease with which information is remembered can be related to its recency or its vividness. The point to be understood here is that memory is fallible. We are unable to remember all of the patients we have seen. By being selective for memory, cognitive efficiency is enormously enhanced, but learning from experience becomes difficult. One more feature of the cognitive processes of clinicians will be described. A major finding on clinical judgment in recent years is that causal reasoning underlies the manner in which mental health professionals make many different types of judgments, including treatment decisions, predictions, and diagnoses.27,28
118
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
With regard to treatment decisions, Witteman and Koele addressed the following questions: ‘‘What explains which treatment is proposed to a (depressed) patient? Is it the patient characteristics, such as her or his specific symptoms, social context, and seriousness of the disorder, or is it the theoretical background of the proposing psychotherapist?’’29, p. 100 For a group of 56 therapists, treatment plans were highly variable, and Witteman and Koele concluded, ‘‘The best explanations of the treatment proposals seemed to be the therapist’s theory-inspired interpretations of the patient complaints.’’29, p. 100 Causal reasoning also underlies how mental health professionals make predictions. In one study, clinicians predicted whether patients would become violent in the next 6 months.30 Ratings were made by mental health professionals working in a psychiatric emergency room. Mulvey and Lidz observed: Clinicians did not appear to be making simple ‘‘yes’’ or ‘‘no’’ judgments of dangerousness. Rather, they seemed to be making contextualized judgments regarding future violence. Instead of stating whether they thought someone was highly likely or unlikely to be involved in violence, the clinicians instead gave what we called ‘‘conditional judgments’’ regarding future violence. . . . In other words, they saw the violence as dependent upon certain conditions in the person’s life.’’30, p. S108
Thus, clinicians will frequently make predictions by formulating case conceptualizations. Finally, when clinicians make diagnoses, they are influenced not only by diagnostic criteria but also by their implicit causal theories.27,31 Clinicians weigh diagnostic criteria more heavily when the criteria describe symptoms and behaviors that are part of a clinician’s implicit causal model for a disorder.27 When using DSM, clinicians are supposed to weigh each criterion equally. Similarly, mental health professionals’ implicit theories influenced their memories of their clients’ mental status. Causally central symptoms were recalled more often than causally peripheral symptoms and isolated symptoms. In addition, false memories of a patient having symptoms the patient did not really have were most likely to occur for symptoms that were causally central to clinicians’ theories of different disorders. The finding that causal reasoning underlies different types of clinical judgments is important for helping us understand the actions of physicians in primary care settings. To understand the etiology and course of a patient’s physical complaint, physicians should understand the effect of depression. In other words, for some patients, vague physical complaints and complaints of fatigue and aches and pains are highly correlated with depression and anxiety. To the extent that this is recognized by physicians, they will become more
6 THE INFLUENCE OF SCREENING ON DECISION MAKING
119
adept at detecting depression. Thus, to some degree, to bring about change in primary care settings, we must be concerned with the implicit causal theories of physicians.
3. The Limits of Screening The use of screening questionnaires can help physicians overcome some problems but not others. Screening questionnaires can compensate for interviews that are not comprehensive, and they can help physicians overcome some cognitive processes that are counterproductive, such as diagnostic overshadowing and confirmatory hypothesis testing. In particular, screening questionnaires will prompt physicians to consider alternative hypotheses—that is, results from a screening questionnaire can lead a physician to consider whether a patient is depressed. Otherwise, the physician may not even consider the hypothesis that a particular patient has a mood disorder. Given everything we know about clinical judgment, it is somewhat surprising that the use of screening questionnaires has not been related to improved clinical outcomes. A number of reasons can be given for why this is the case. Two reasons will be described here. First, some patients overreport symptoms while other patients underreport them. This can occur if a patient misunderstands an item or if the patient wants to create an impression of being healthy or of being impaired. To the extent that symptoms are overreported or underreported on screening instruments, we should not expect better clinical outcomes. Second, even with the use of screening questionnaires, physicians must still rely on clinical judgment. Thus, if a patient tests positive for depression on a screening instrument, physicians must rely on their clinical judgment to determine whether the patient’s responses should be viewed as indicating a need for treatment or as a false positive. If someone is clinically depressed, physicians will need to determine if he or she may have a bipolar disorder (and should not be treated with an antidepressant). They must also determine if the patient is at serious risk for suicide. If physicians are not making the right judgments when a patient tests positive (eg, making a referral to a mental health professional, providing treatment for depression, making a differential diagnosis of bipolar disorder), then the use of screening questionnaires will not lead to improved clinical outcomes. This is a challenging task for physicians, in part because they will not receive feedback on the validity of their judgments or the utility of their decision making and in part because they are unlikely to have specialized training in mental health diagnosis and treatment. It is also a challenging task because when patients complete questionnaires inquiring about mental health
120
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
symptoms, false positives are common, usually because patients (and everyone else) will sometimes interpret items in an idiosyncratic manner.32 In conclusion, we are faced with a dilemma. Clinical judgment is fallible, and the use of screening questionnaires has not been related to improved clinical outcomes. However, the use of screening tools should help to improve clinical judgment, and, much of the time, an optimal strategy will be to conduct screening and then rely on clinical judgment. Although a large body of research describes errors and mistakes in clinical judgment, it can still be of considerable value, if only to review responses on a screening questionnaire with a patient so as to better understand how the patient interpreted the items. In addition, it may be that use of screening assists in the diagnosis of underconfident clinicians but could be unhelpful in those skilled in making the diagnosis in question.
References 1. Garb HN. Studying the clinician: judgment research and psychological assessment. Washington, DC: American Psychological Association, 1998. 2. Garb HN. Clinical judgment and decision making. Ann Rev Clin Psychol. 2005;1:67–89. 3. Nichols GA, Brown JB. Following depression in primary care—Do family practice physicians ask about depression at different rates than internal medicine physicians? Arch Fam Med. 2000;9:478–482. 4. Miller PR, Dasher R, Collins R, et al. Inpatient diagnostic assessments: 1. Accuracy of structured vs. unstructured interviews. Psychiatry Res. 2001;105:255–264. 5. Malone KM, Szanto K, Corbitt EM, et al. Clinical assessment versus research methods in the assessment of suicidal behavior. Am J Psychiatry. 1995;152:1601–1607. 6. Cascardi M, Mueser KT, DeGiralomo J, et al. Physical aggression against psychiatric inpatients by family members and partners. Psychiatr Serv. 1996;47:531–533. 7. Carr AC, Ghosh A, Ancill RJ. Can a computer take a psychiatric history? Psychol Med. 1983;13:151–158. 8. Jopp DA, Keys CB. Diagnostic overshadowing reviewed and reconsidered. Am J Ment Retard. 2001;106:416–433. 9. Reiss S, Szyszko J. Diagnostic overshadowing and professional experience with mentally retarded persons. Am J Mental Defic. 1983;87:396–402. 10. Mason J, Scior K. Diagnostic overshadowing amongst clinicians working with people with intellectual disabilities in the UK. J Appl Res Int Dis. 2004;17:85–90. 11. Spengler PM, Strohmer DC, Prout HT. Testing the robustness of the overshadowing bias. Am J Mental Retard. 1990;95:204–214. 12. Drake RE, Osher FC, Noordsy DL, et al. Diagnosis of alcohol use disorders in schizophrenia. Schizophr Bull. 1990;16:57–67. 13. Tenney NH, Schotte CKW, Denys DAJP, et al. Assessment of DSM-IV personality disorders in obsessive-compulsive disorder: Comparison of clinical diagnosis, self-report questionnaire, and semi-structured interview. J Personal Disord. 2003;17:550–561.
6 THE INFLUENCE OF SCREENING ON DECISION MAKING
121
14. Garb HN. Clinical judgment, clinical training, and professional experience. Psychol Bull. 1989;105:387–396. 15. Garb HN, Schramke CJ. Judgment research and neuropsychological assessment: a narrative review and meta-analyses. Psychol Bull. 1996;120:140–153. 16. Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: The relationship between clinical experience and quality of health care. Ann Intern Med. 2005;142:260–273. 17. Ebling R, Levenson RW. Who are the marital experts? J Marriage Fam. 2003;65:130–142. 18. Ambady N, Rosenthal R. Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychol Bull. 1992;111:256–274. 19. Gauron EF, Dickinson JK. Diagnostic decision making in psychiatry. Arch Gen Psychiatry. 1966;14:225–232. 20. Kendell RE. Psychiatric diagnoses: A study of how they are made. Br J Psychiatry. 1973;122:437–445. 21. Haverkamp BE. Confirmatory bias in hypothesis testing for client-identified and counselor self-generated hypotheses. J Couns Psychol. 1993;40:303–315. 22. Tversky A, Kahneman D. Judgments under uncertainty: heuristics and biases. Science. 1974;185:1124–1131. 23. Kahneman D. A perspective on judgment and choice: Mapping bounded rationality. Am Psychol. 2003;58:697–720. 24. Kahneman D, Slovic P, Tversky A, eds. Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press, 1982. 25. Gilovich T, Griffin D, Kahneman, D, eds. Heuristics and biases. New York: Cambridge University Press, 2002. 26. Garb HN. The representativeness and past-behavior heuristics in clinical judgment. Prof Psychol Res Pr. 1996;27:272–277. 27. Kim NS, Ahn W. Clinical psychologists’ theory-based representations of mental disorders predict their diagnostic reasoning and memory. J Exp Psychol Gen. 2002;131:451–476. 28. Wakefield JC, Kirk SA, Pottick KJ, et al. Disorder attribution and clinical judgment in the assessment of adolescent antisocial behavior. Soc Work Res. 199;23:227–238. 29. Witteman C, Koele P. Explaining treatment decisions. Psychother Res. 1999;9:100–114. 30. Mulvey, EP, Lidz CW. Clinical prediction of violence as a conditional judgment. Soc Psychiatry Psychiatr Epidemiol. 1998;33:S107–S113. 31. Pottick KJ, Kirk SA, Hsieh DK, et al. Judging mental disorder in youths: Effects of client, clinician, and contextual differences. J Consult Clin Psychol. 2007;75:1–8. 32. Nease DE, Klinkman MS, Aikens JE. Depression case findings in primary care: A method for the mandates. Int J Psychiatry Med. 2006;36:141–151.
This page intentionally left blank
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE: SCREENING ALONE IS NOT ENOUGH Simon Gilbody and Dan Beck
1. 2. 3. 4.
The Case for Screening Screening and Enhanced Care for Depression New and Additional Evidence Relating to Enhanced Care Is Screening a Necessary Intervention to Improve the Quality and Outcome of Care? 5. To Screen or Not to Screen?
Context There are conflicting conclusions and policy recommendations relating to the effects of screening on the outcome of depression, but what does the latest evidence suggest? Based on the best available information to date, it emerges that screening alone is not a sufficient intervention to improve the quality and outcomes of care for depression. What is less clear is whether screening is a necessary condition for enhanced and improved quality of care and, given additional components, to what extent screening programs can potentially improve quality of routine care.
1.
The Case for Screening
Depression is the most common mental health problem and is associated with decrements in functioning and quality of life comparable to other chronic physical diseases.1 The prevalence, chronicity, and burden of suffering are such that the World Bank has predicted that depression will become the second leading cause of global disability by 2020.2 The economic consequences of depression are also profound, with the healthcare costs, welfare costs, and 123
124
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
losses to productivity amounting to £9 billion ($20 billion) in the United Kingdom3 and $53 billion in the United States.4 Depression is most commonly encountered in primary care and in hospital settings, yet it often goes unrecognized by healthcare professionals.5–7 This has led to calls to implement screening programs to aid in the detection and management of this problem.8,9 The rationale and evidence base to support screening for depression is the focus of the present book and is discussed extensively in other chapters (see Chapters 2, 4, and 9). In the United States, screening has shifted from being an intervention that was not initially supported in national policy recommendations10 to being one that is regarded as being of proven effectiveness.11 An evolution in thinking has occurred that places screening at the center of mental health policy and practice, and is based upon the general assumption that screening will logically lead to improvements in the quality and outcome of care. Some have termed this the screening– detection–treatment–improvement paradigm.12,13 Recently screening for common mental health problems in the United States has become the cornerstone of the president’s agenda to improve the mental health of the U.S. population.14
Arguments For and Against Screening Screening has a long and honorable tradition in helping to improve the health and well-being of populations and individuals.15 However, screening is a ‘‘special case’’ in the armory of healthcare interventions, since testing and treatment may be offered to those who do not necessarily know they have a condition or do not specifically ask for help for that problem.16 Screening programs have also been implemented in the past without due consideration of their effectiveness, their ethical and clinical implications, and their impact on finite healthcare resources.17 Consequently, clear criteria have evolved that must be satisfied before screening programs are adopted (see Chapter 5).18 In the case of depression, screening is just one of a range of possible interventions that might be offered to improve care for depression at a population level,19 and the implementation of screening programs should be supported by sound clinical and economic evidence.20 The relative merits of screening for depression more generally have been reviewed by Gilbody and colleagues 20 and by Palmer and Coyne.13 Gilbody and colleagues used a set of analytic principles laid down by the World Health Organization18 and adopted by the U.K. National Screening Committee.21 In their analysis, they agued that the relative merits of screening programs are sometimes overstated, and that convincing evidence that screening substantially influences the outcomes of depression is difficult to find. The principal concerns that have been highlighted are that
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE
125
screening for depression uncovers a substantial body of undetected psychological need that is not currently well met within existing healthcare systems. Much of this represents short-term and self-limiting distress, the natural history of which is not readily influenced by active intervention.22 In addition, the common belief that unrecognized depression is as responsive to the evidencesupported interventions (antidepressants and brief psychotherapy) currently used for already recognized depression is not necessarily true: unrecognized depression may be more difficult to treat because it tends to be mild or atypical. Most importantly of all, they highlighted the relative lack of evidence in the form of randomized controlled trials to show that the introduction of screening programs for depression makes any substantial difference to the outcomes of depression itself.23 There is also a dearth of economic data to inform this population-level policy intervention. It is this area of supportive epidemiologic and economic evidence that has produced the greatest amount of debate and controversy, which we will review in more detail within this chapter. Two strategies have been scrutinized and variously rejected10,24,25 or advocated.11,26,27 The first is the use of screening as a ‘‘stand-alone’’ quality improvement strategy. The second is the use of screening within a more general enhancement of the care for depression in non-specialist settings. Let us examine each of these strategies in turn to establish whether screening is a sufficient or necessary condition in improving the quality and outcome of care for depression.
Is Screening a Sufficient Intervention to Improve the Quality and Outcome of Care? The effectiveness of screening for depression was first addressed with reference to the research literature the 1990s. The first evidence synthesis was conducted by the U.S. Agency for Health Care Policy and Research, which looked at the evidence to support various aspects of the management of depression in primary care settings, including screening.28 This review examined the totality of research and came down firmly against screening. On the basis of a review of the literature published in May 1993, the U.S. Preventive Services Task Force (USPSTF) concluded that there was ‘‘sufficient evidence to exclude screening for depression in the primary care setting’’ (a ‘‘grade D’’ recommendation). This research highlighted that screening instruments did not generally improve the detection rate or management of depression. The evidence they reviewed was primarily related to the use of screening programs as a ‘‘stand-alone’’ measure. A similar conclusion was found in a 2001 evidence review24 also published under the auspices of the Cochrane Collaboration (first in 200523 and updated
126
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
again in 200829,30). The most recent version of this review of ‘‘stand-alone’’ depression screening, which now includes 16 primary trials of the effectiveness of screening strategies (5,000 + patients), concluded, ‘‘There is substantial evidence that routinely administered case finding/screening questionnaires for depression have minimal impact on the detection, management or outcome of depression by clinicians.’’ The most important finding from the Cochrane reviews29,30 has been the consistent demonstration that screening had minimal impact on the actual outcomes of depression when screened populations were followed up over time. This review concurs with the first USPSTF review,11 and an overall summary diagram of the lack of effect of simple screening strategies based on the Cochrane review is shown in Figure 7.1. A review conducted at around the same time as the first Cochrane review, to provide updated guidance to the USPSTF,11 examined a similar body of research and found a similar lack of effect in relation to the impact of standalone screening strategies. However, this review was altogether more positive about screening (Textbox 7.1). The reasons for this shift in recommendation by the USPSTF deserve examination in some detail, and relate to the additional consideration of screening alongside ‘‘additional enhancements of care.’’
Study
Depression outcomes (SMD) (95% CI)
Bergus 2005
-0.29 (-1.40, 0.82)
Callaghan 1994
-0.05 (-0.97, 0.86)
Johnstone 1976
-0.77 (-1.54, 0.00)
Lewis GHQ 1996
0.10 (-0.09, 0.29)
Lewis PRQ 1996
-0.06 (-0.25, 0.13)
Whooley 2000
-0.16 (-0.72, 0.39)
Williams 1999
-0.22 (-0.81, 0.37)
Overall
-0.03 (-0.16, 0.10)
–1.5
–1
–.5
0
.5
1
1.5
Depression outcomes (SMD) Favors screening
Favors control
Figure 7.1. Summary of random effects meta-analysis of the effect of simple screening/ case-finding instruments on the outcome of depression at follow-up (adapted from references 23, 29, and 30).
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE
127
Textbox 7.1. Current Policy Recommendations on Screening for Depression U.K. National Institute of Clinical Excellence31 ‘‘Screening should be undertaken in primary care and general hospital settings for depression in high-risk groups—for example, those with a past history of depression, significant physical illnesses causing disability, or other mental health problems such as dementia.’’ Review of reviews to inform practice and policy in Australia and New Zealand32 ‘‘Brief self-report instruments have acceptable psychometric properties and are practical for use in general practice settings. Screening increases the recognition and diagnosis of depression and, when integrated with a commitment to provide coordinated and prompt follow up of diagnosis and treatment, clinical outcomes are improved. Although controversial, the evidence is now in favour of the appropriate use of screening tools in primary care.’’ U.S. Preventive Services Task Force11 ‘‘The USPSTF found good evidence that screening improves the accurate identification of depressed patients in primary care settings and that treatment of depressed adults identified in primary care settings decreases clinical morbidity. Trials that have directly evaluated the effect of screening on clinical outcomes have shown mixed results. Small benefits have been observed in studies that simply feed back screening results to clinicians. Larger benefits have been observed in studies in which the communication of screening results is coordinated with effective follow-up and treatment. The USPSTF concluded the benefits of screening are likely to outweigh any potential harms.’’ Strength of recommendation: B (‘‘there is at least fair evidence that the intervention improves important health outcomes and that the benefits outweigh the harms’’) Canadian Task Force on Preventive Health Care27 ‘‘The CTFPHC concludes that there is fair evidence to recommend screening adults for depression in primary care settings since screening improves health outcomes when linked to effective follow-up and treatment.’’ Strength of recommendation: B (‘‘there is fair evidence to recommend the clinical preventive action’’) ‘‘The CTFPHC concludes that there is insufficient evidence to recommend for or against screening adults for depression in primary care settings where effective follow-up and treatment are not available.’’ Strength of recommendation: I (‘‘insufficient evidence [in quantity and/or quality] to make a recommendation, however other factors may influence decision-making’’)
128
2.
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Screening and Enhanced Care for Depression
The major shift between recommendations produced in 1996 and 2003 turns upon the change in the scope of the evidence review and the inclusion criteria that were set.25 In contrast to earlier reviews, the USPSTF in their updated report reviewed both stand-alone screening programs and those embedded within enhancements of care. An example of such an enhanced care study was that conducted by Wells and colleagues (the Partners in Care study),33 which provided practice-level enhancements in the quality of care for depression, including structured psychotherapy or medication management, clinician education and consultation/liaison, treatment guidelines, and structured followup. Recruitment to this trial was by screening and, as such, was considered by the USPSTF as evidence to support the effectiveness of screening in practice. This study showed strongly positive results on the outcomes of depression and was included a summary meta-analysis (accounting for 33% of the overall weight of evidence). On the basis of this evidence, the USPSTF concluded, ‘‘benefits have been observed in studies in which the communication of screening results is coordinated with effective follow-up and treatment.’’ A subsequent 2005 review published by the Canadian Task Force on Preventive Health Care (CTFPHC)27 made a nearly identical recommendation, highlighting the ineffectiveness of stand-alone screening and the effectiveness of screening plus enhanced care. A similar recommendation was made in the United Kingdom in guidance offered by the U.K. National Institute of Clinical Excellence (NICE) (see Textbox 7.1).31
3.
New and Additional Evidence Relating to Enhanced Care
The specific recommendations made by the USPSTF, CTFPHC and NICE relating to screening plus enhanced care fit into a much wider body of research relating to organizational enhancements to the process of care for depression.34 The enhancement of primary care for depression is an active area of research, and a substantial body of research evidence now exists to show that this is an effective intervention.35 The most recent review of this topic has included pooled data from over 30 randomized trials, based on over 12,000 patients with depression, and has shown that enhanced care is effective in the short and medium term.35 The finding that enhanced or collaborative care is effective is now a consistent one that has been supported in several independently conducted meta-analyses (see Bower and Gilbody36 for an overview of reviews in this area). In the aforementioned Partners in Care study, the benefits of an enhanced care intervention have persisted at up to 5 years.37 However, while the effectiveness of enhanced care is now beyond reasonable doubt, the USPSTF review included only 438–41 of the 36 trials of enhanced care that
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE
129
were summarized in the largest or most comprehensive review to date. From these four studies, the U.S. and Canadian reports drew quite specific conclusions about the effectiveness of screening (the topic of their review) rather than about the effectiveness of enhanced and collaborative care in general.25 Many studies of enhanced care do not use screening as an entry criterion or component of quality improvement, but these were not reviewed by the USPSTF. This is not just of academic interest, since it is clear that many healthcare systems have taken the positive endorsement of screening within enhancements of care as an endorsement of screening per se. In the United Kingdom, for example, financial inducements have been introduced to encourage primary care physicians to screen for depression, without any requirement that further enhancements in the quality of care are introduced.20 Clearly, the specific question about the relative contribution of screening to the effectiveness of quality improvement strategies is important from a policy and practice perspective. To what extent is screening the critical component in determining the quality of depression care?
4.
Is Screening a Necessary Intervention to Improve the Quality and Outcome of Care?
What remains unclear from the preceding discussion and the work of the USPSTF is whether screening is a necessary component or condition for effective enhanced care, and whether enhancements of care without screening are in themselves ineffective. Recent research has emerged to answer this question, which was not effectively addressed by the USPSTF11 and a subsequent review by the CTFPHC.27 The overall effectiveness of enhanced care for depression has most recently been reviewed by Gilbody and colleagues,35 who found that collaborative care strategies were effective far beyond conventional levels of significance in improving depression outcomes in the short and medium term. This dataset provides a more comprehensive body of research within which to begin to examine whether screening is a necessary ingredient of effective enhanced care for depression. Among enhanced care studies as a whole, the authors found a moderate pooled standardized effect size of 0.25 for enhanced care compared to usual care (95% confidence interval 0.18 to 0.32). They also found that there was significant between-study variation in the magnitude of effect size (that is, heterogeneity). When conducting a meta-analysis, the most rigorous approach to heterogeneity is to seek to explain or explore the causes of this heterogeneity.42 This technique can provide useful insights into mechanisms of effect and variations in treatment response according to the population under study or the intervention under evaluation. This information is often of interest to clinicians and policymakers charged with implementing or interpreting
130
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
research evidence. One technique that can be used is regression modeling, whereby the relationship between study-level design variables and a dependent variable (study effect size) is examined (this is termed meta-regression42,43). This technique was applied to the dataset of enhanced care for depression by Bower and colleagues to identify some of the ‘‘active ingredients’’ in enhanced care for depression.44 Among 34 studies, there was substantial variation in the content and intensity of collaborative care. Some studies, such as the previously discussed Partners in Care study,39 provided relatively intensive packages of enhanced care, including face-to-face clinician education, computerized decision support, individualized treatment algorithms, the active support of a nurse case manager. and regular consultation/liaison with a specialist mental health clinician (psychologist or psychiatrist). This study39 accounted for 30% to 47% of the weighted information in the meta-analyses produced on behalf of the USPSTF.11 In contrast, less intensive packages of care were also included in the collaborative review by Gilbody and colleagues and involved simple telephone follow-up by practice nurses.45 Bower and colleagues44 used metaregression to examine the relative contributions of various aspects of the content of enhanced care interventions in improving depression outcome within the dataset of collaborative care studies. They specified and were able to find sufficient study-level information on eight aspects of care and study design, including the method of recruitment—whether by screening or by clinician referral of already recognized depression. Stratification according to this variable showed that the majority of studies used screening, but that 12 collaborative care studies did not.45–55A stratified meta-analysis according to this variable is shown in Figure 7.2, and the methods of patient recruitment (by screening or by other means) are detailed in Table 7.1. From this stratified analysis, it is evident that the majority of studies were positive, and that screening studies showed the most strongly positive effect size (Standardized Mean Differencescreening = 0.30, 95% confidence interval 0.21 to 0.38), while non-screening studies were still significantly positive, but the magnitude of effect was less pronounced (Standardized Mean Differenceno-screening = 0.15, 95% confidence interval 0.03 to 0.26). When the difference between these two effect sizes was tested using logistic metaregression,56 this trend was positive but nonsignificant (difference in standardized mean differences = 0.15, 95% confidence interval –0.03 to 0.29, p = 0.09). Of particular interest from the point of view of the present chapter was the fact that several additional study-level variables were also related to the magnitude of effect size in collaborative care, and that three of these predictive covariates were either strongly significant (p < 0.05) or more significant than screening (p < 0.1). These were better antidepressant concordance, having a trained case manager, and regular and planned supervision of case managers.
Standardized Depression Outcomes (95% CI)
Study referred by clinician Wilkinson 1993 Mann 1998 Peveler 1999
–0.29 (–0.79, 0.22) –0.08 (–0.29, 0.13) 0.21 (–0.11, 0.54)
Akerblad 2003 Brook 2003
0.26 ( 0.07, 0.45) 0.00 (–0.34, 0.34) 0.19 (–0.12, 0.49) 0.49 ( 0.13, 0.86)
Katon 1995 Katon 1996 Finley 2000 Hunkeler 2000 Datto 2003
–0.30 (–0.83, 0.24) 0.28 ( 0.03, 0.53) 0.42 (–0.14, 0.98)
Dietrich 2004 Cappocia 2004
0.16 (–0.08, 0.39) 0.17 (–0.38, 0.72) 0.15 ( 0.03, 0.26)
Subtotal identified by screening Blanchard 1995 Araya 2003
0.43 (–0.01, 0.87) 1.13 ( 0.79, 1.47)
Bosmans 2006 Callahan 1994 Katon 1999
0.07 (–0.28, 0.42) 0.05 (–0.48, 0.58) 0.31 ( 0.01, 0.61) –0.14 (–0.53, 0.25) 0.22 (–0.02, 0.46)
Coleman 1999 Wells-medication 2000 Simon 2000 Katzelnick 2000 Wells-therapy 2000
0.30 ( 0.07, 0.52) 0.43 ( 0.22, 0.63) 0.22 (–0.01, 0.45)
Unutzer 2001 Katon 2001
0.40 ( 0.31, 0.50) 0.11 (–0.09, 0.32)
Rost 2001b Rost 2001a Oslin 2003
0.29 (–0.05, 0.62) 0.20 (–0.10, 0.50) 0.61 ( 0.08, 1.13)
Swindle 2003 Rickles 2004
0.18 (–0.30, 0.66) 0.25 (–0.37, 0.87)
Adler 2004 Bruce 2004
0.19 (–0.01, 0.39) 0.30 ( 0.07, 0.52)
Simon 2004b Katon 2004 Jarjoura 2004
0.33 ( 0.05, 0.62) 0.24 (–0.03, 0.51) 0.41 ( 0.00, 0.82)
Simon 2004a Wang 2007
0.18 (–0.11, 0.46) 0.82 (–0.06, 1.70)
Subtotal
0.30 ( 0.21, 0.38)
Overall
0.25 ( 0.18, 0.32)
–1.5
–1
–.5
0
.5
1
1.5
Standardized Depression Outcomes
Figure 7.2. Enhanced care for depression: a random effects meta-analysis of 36 studies, comparing depression outcomes at 6 months in studies that use screening to recruit patients, versus those where clinicians recruit patients with recognized depression. (Re-analysis of data from Bower P, Gilbody SM, Richards D, et al. Collaborative care for depression: making sense of complex interventions through systematic review and meta-regression. Br J Psychiatry. 2006;189:484–493.)
Table 7.1. Study Details and Method of Patient Recruitment from Studies of Collaborative or Enhanced Care for Depression Study Name
References
Setting
Adler 2004
62
US
Akerblad 2003 Araya 2003
46
Sweden
63
Chile
Blanchard 1995
64
UK
Brook 2003
47
Bruce 2004
Callahan 1994 Capoccia 2004 Coleman 1999
Sample Size 533
1,031 240
Patient Population
Recruitment Method
Adults with major depression or dysthymia (DSM-IV)
Screening of primary care attenders using the Primary Care Screener for Affective Disorders (PC-SAD) Physician referral, no screening
Adults with major depression and an indication for antidepressants Women with major depression
96
Elderly with depression warranting clinical intervention
Netherlands
147
65
US
598
Adults with depressive complaints, prescribed new antidepressant Elderly with major depression, dysthymia, and minor depression
66
US
175
48
US
74
67
US
169
Elderly with newly diagnosed depression Adults with depression, prescribed a new antidepressant Depressed frail elderly
Screening of primary care attenders using GHQ-12 (score 5 or more on two occasions) Elderly nursing home residents screening positive with diagnostic depression scale (DPDS) Physician referral, no screening Elderly patients screening positive using the CES-D (score > 20) or responding positively to previous history of depression Elderly patients screening positive using the CES-D (score > 20) Physician referral of new episode of depression, no screening Frail older adults who screened positive for a predictive index of hospitalization. Use of CES-D as a screening instrument integrated into chronic care clinics.
Table 7.1. (Continued) Study Name
References
Setting
Sample Size
Patient Population
Recruitment Method Physician referral of patients with depression, no screening Physician referral of patients with depression, no screening as method of recruitment, but had to score SCL-20 > 0.5 at enrollment Physician referral of patients already prescribed antidepressants
Datto 2003
49
US
61
Adults with depressive symptoms
Dietrich 2004
68
US
405
Adults with major depression and dysthymia (DSM-IV), starting/ changing treatment
Finley 1999
51
US
125
Hunkeler 2000
52
US
302
Jarjoura 2004 Katon 1995
69
US
121
53
US
217
Katon 1996
53
US
153
Katon 1999
70
US
228
Katon 2001
71
US
386
Katon 2004
72
US
329
Adults with current major depression, prescribed a new antidepressant Adults with major depression or dysthymia, prescribed a new antidepressant Adults with major depression not currently in treatment Adults with depression, prescribed a new antidepressant Adults with depression, prescribed a new antidepressant Adults at high risk of persistent depression, recurrent depression, or dysthymia Adults, prescribed a new antidepressant, at high risk of relapse Adults with diabetes with depressive symptoms
Physician referral of patients with a new diagnosis of depression, and prescribed antidepressant Screening for inclusion using the PRIME-MD Physician referral of patients with definite or probable depression Physician referral of patients with definite or probable depression Telephone screening using the SCID
Telephone screening using the SCID
Telephone screening using the PHQ-9 (score >=10) (Continued )
Table 7.1. (Continued) Study Name
References
Setting
Sample Size
Patient Population
Recruitment Method Two-stage telephone screening procedure with the SCID and Hamilton Depression Rating Scale Primary care physician referral; patients currently with a diagnosis and in receipt of care for depression Primary care screening with CES-D (score > 15) Physician referral; patients with a new diagnosis of depression commencing antidepressant medication Patients with a newly initiated prescription of antidepressant medication Two-stage screening procedure using WHO-CIDI administered by practice nurses Two-stage screening procedure using WHO-CIDI administered by practice nurses Patients identified from computerized records with a new diagnosis of depression and commencing antidepressant medication Patients identified from computerized records with a new diagnosis of depression and commencing antidepressant medication. No screening.
Katzelnick 2000
38
US
407
Adults, high utilizers of services, with depressive symptoms
Mann 1998
54
UK
419
Adults with depression
Oslin 2003
73
US
97
Peveler 1999
45
UK
160
Rickles 2005
74
US
63
Rost 2001
41
US
243
Rost 2002b
41
US
189
Simon 2000
75
US
392
Simon 2004a
76
US
402
Adults with depression or dysthymia, at-risk drinking Diagnosis of depression, prescribed a new antidepressant Prescribed a new antidepressant
Adults with major depression, prescribed a new antidepressant, recently treated Adults with major depression, prescribed a new antidepressant, beginning new episode Adults with depression, prescribed a new antidepressant
Adults with depression, prescribed a new antidepressant
Table 7.1. (Continued) Study Name
References
Setting
Sample Size
Patient Population
Recruitment Method Patients identified from computerized records with a new diagnosis of depression and commencing antidepressant medication. No screening. Primary care patients screening positive with the PRIME-MD
Simon 2004b
76
US
393
Adults with depression, prescribed a new antidepressant
Swindle 2003
77
US
268
Unutzer 2001
78
US
1801
Adults with major depression, Dysthymia, or partially remitted major depression Elderly with major depression, dysthymia, or both
Wells 2000a
39
US
867
Wells 2000b
39
US
932
Whooley 2000
40
US
331
Wilkinson 1993
55
UK
61
Adults with major depression or dysthymia Adults with major depression or dysthymia Elderly with depressive symptoms
Adults with depression, prescribed a new antidepressant
Patients screened face to face or by phone from primary care lists or attendance using CIDI Consecutive primary care attenders screened using the CIDI Consecutive primary care attenders screened using the CIDI Consecutive elderly primary care attenders screened using the GDS (score >=6) Physician referral of patients with already diagnosed depression
Adapted from Gilbody SM, House AO, Sheldon TA. Screening and case-finding instruments for depression: a Cochrane systematic review and exploration of heterogeneity. CMAJ. 2008;178:1023–1024; and Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a cumulative meta-analysis and review of longer-term outcomes. Arch Intern Med. 2006;166:2314–2321.
136
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
The review by Bower and colleagues44 provides a richer and more complete dataset than the USPSTF review within which to examine the relative contribution of screening to the effectiveness of enhanced care. However, there are several limitations to their approach. The most important limitation is the fact that, despite using randomized studies, the exploratory comparison within a meta-regression is an observational one and is therefore susceptible to confounding (alternative explanations for observed effects and relationships).56 In this case, the use of screening could be confounded by other design-level variables (such as increased intensity of care). Bower and colleagues44 sought to address this limitation by conducting a multivariate analysis of these data to adjust for other potentially confounding covariates. They found in their multivariate analysis that several of the positive associations found in univariate meta-regression (such as this highlighted above) ceased to be significant in multivariate analysis. The only study-level variable that remained after adjusting for other potentially confounding variables was the mental health background of the case manager (p = 0.03). Screening, in contrast, became less significant (p = 0.19) when other variables were accounted for. The most likely conclusion that can be drawn from this analysis is that the effect of screening is weak and is potentially confounded by other study-level variables. Screening as a recruitment strategy is not therefore likely to be an independently significant predictor of the effectiveness of enhanced care strategies. One might go further and suggest that good-quality collaborative care is likely to be effective, whether or not screening is used.
5.
To Screen or Not to Screen?
Despite the apparently differing conclusions and policy recommendations relating to screening for depression, an evidence-based consensus seems to emerge that screening when given alone is an ineffective strategy. This conclusion should not be surprising, since the quality of care for depression is often poor57,58 and the addition of screening is likely only to identify an unmet need without offering anything positive to improve the management and outcome of this condition. It has been discussed elsewhere that screening identifies a qualitatively different population of people with depression from those who are already identified and managed in primary care (what Goldberg calls ‘‘conspicuous psychiatric morbidity’’59). The people identified by screening programs tend to have less severe psychopathology, a better outcome, and a general reluctance to take antidepressants or to benefit from medical or psychosocial interventions (see Palmer and Coyne13 for review). Low expectations and poor outcome of screening strategies have led to a more fundamental rethinking of the organization of delivery of care for
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE
137
depression.58 A direct result of the failure of the screening–detection–treatment–improvement paradigm12 has been the emergence of organizational enhancements of care, such as collaborative care.60,61 The conclusion that should be drawn from the re-analysis of existing studies of collaborative care in the present chapter is that this strategy is generally effective, but the assumption that screening is a key element of effective enhancement might not be true. This is not a small and insignificant epidemiologic issue of causal inference and confounding, but one that is of importance to practitioners and policymakers. The concerns relating to the relative importance of screening in quality enhancement are important for two main reasons. Firstly, policymakers have readily picked up on the positive endorsement of screening from bodies such as the USPSTF and NICE without reading the small print. Quality enhancement strategies have sometimes begun and ended with screening, without the implementation of wider enhancements of care. Screening is a quick and easy policy to implement, measure, and reward. The experience in the United Kingdom is that screening and case-finding is financially rewarded without any explicit requirement that the process of care be improved any further.20 Secondly, for those who do choose to follow the evidence and implement collaborative care, there are many decisions that need to be made in the design of effective care systems. The use of screening as a point of entry to enhanced care raises a number of ethical and logistical issues.13 Screening usually identifies an unmet need and creates an increased demand for care. If this demand is not met, screening itself might do more harm than good. Services will have to be planned accordingly to meet this need (and expectation of care) from within finite healthcare resources. Ultimately, the most thorough way in which the effectiveness of screening as a necessary or active component of enhanced care could be established would be through the conduct of a randomized controlled trial of enhanced care with screening, versus identical enhanced care without screening. To date (and to our knowledge) there are no such trials, and it is debatable whether any such trial will ever be conducted. In the interim, it is clear that screening is not a sufficient intervention to improve the quality and outcomes of care for depression. What is less clear is whether screening is a necessary condition for enhanced and improved quality of care for this important condition.
References 1. Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed patients. Results from the Medical Outcomes Study. JAMA. 1989;262(7):914–919. 2. Murray CJ, Lopez AD. The global burden of disease: a comprehensive assessment of mortality and disability from disease, injuries and risk factors in 1990. Boston: Harvard School of Public Health on behalf of the World Bank, 1996.
138
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
3. Thomas C, Morris S. Cost of depression among adults in England in 2000. Br J Psychiatry. 2003;183:514–519. 4. Greenberg PE, Kessler RC, Birnbaum HG, et al. The economic burden of depression in the United States: how did it change between 1990 and 2000? J Clin Psychiatry. 2003;64:1465–1475. 5. Cepoiu M, McCusker J, Cole MG, et al. Recognition of depression by non-psychiatric physicians—a systematic literature review and meta-analysis. J Gen Intern Med. 2008;23:25–36. 6. Simon G, Von Korff M. Recognition and management of depression in primary care. Arch Fam Med. 1995;4:99–105. 7. Katon W, Ciechanowski P. Impact of major depression on chronic medical illness. J Psychosom Res. 2002;53:859–863. 8. Wright A. Should general practitioners be testing for depression? Br J Gen Pract. 1994;44:132–135. 9. Sharp LK, Lipsky MS. Screening for depression across the lifespan: a review of measures for use in primary care settings. Am Fam Physician. 2002;66:1001–1008. 10. U/S/ Preventive Services Task Force. Guide to clinical preventive services, 2nd ed. Alexandria, VA: International Medical Publishing, 1996. 11. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2002;136:765–776. 12. Klinkman MS, Coyne JC, Gallo S, et al. False positives, false negatives and the validity of the diagnosis of major depression in primary care. Arch Family Med. 1998;7:451–461. 13. Palmer SC, Coyne JC. Screening for depression in medical care: pitfalls, alternatives, and revised priorities. J Psychosom Res. 2003:54(4):279–287. 14. New Freedom Commission on Mental Health. Achieving the promise: transforming mental health care in America—final report. Rockville, MD: DHHS Pub. No. SMA03–3832, 2003. 15. Cochrane AL, Holland WW. Validation of screening procedures. Br Med Bull. 1971;27:3–8. 16. Mant D, Fowler G. Mass screening: theory and ethics. BMJ. 1990;300:916–918. 17. Stewart-Brown S, Farmer A. Screening could seriously damage your health. BMJ. 1997;314:533–534. 18. Wilson JM, Junger CT. Principles and practice of screening for disease: World Health Organization Public Health Paper 34. Geneva: World Health Organization, 1968. 19. Gilbody S, Whitty P, Grimshaw JG, et al. Improving the recognition and management of depression in primary care. Effective Health Care Bulletin, University of York. 2002;7(Number 5). 20. Gilbody S, Sheldon T, Wessely S. Should we screen for depression? BMJ. 2006;332(7548):1027–1030. 21. National Screening Committee. The UK National Screening Committee’s Criteria for appraising the viability, effectiveness and appropriateness of a screening programme (available at http://www.nsc.nhs.uk/pdfs/criteria.pdf). London: HMSO, 2003. 22. Oxman TE, Sengupta A. Treatment of minor depression. Am J Geriatr Psychiatry. 2002;10:256–264. 23. Gilbody SM, House AO, Sheldon TA. Screening and case finding for depression. The Cochrane Library (Issue 4). Chichester: Wiley Publishing, 2005.
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE
139
24. Gilbody SM, House AO, Sheldon TA. Routinely administered questionnaires for depression and anxiety: a systematic review. BMJ. 2001;322:406–409. 25. Coyne JC, Palmer SC, Sullivan PA. Screening for depression in adults. Ann Intern Med. 2003;138(9):767–768. 26. AHCPR Depression Guideline Panel. Depression in primary care: detection, diagnosis, and treatment. Technical report. Number 5. Rockville, MD: US Department of Health and Human Services, Public Health Service, 2000. 27. MacMillan HL, Patterson CJS, Wathen CN, and The Canadian Task Force on Preventive Health Care. Screening for depression in primary care: recommendation statement from the Canadian Task Force on Preventive Health Care. CMAJ. 2005;172(1):33–35. 28. Agency for Health Care Policy Research. Depression in primary care. Washington DC: US Department of Health and Human Services, 1993. 29. Gilbody SM, House AO, Sheldon TA. Screening and case-finding instruments for depression: a Cochrane systematic review and exploration of heterogeneity. CMAJ. 2008;178:1023–1024. 30. Beck D, Gilbody SM. Screening and case finding for depression. The Cochrane Library (Issue 4). Chichester: Wiley Publishing, 2008. 31. National Institute for Clinical Excellence. Depression: core interventions in the management of depression in primary and secondary care. London: HMSO, 2004. 32. Hickie IB, Davenport TA, Ricci CS. Screening for depression in general practice and related medical settings. Med J Austr. 2002;177(7 Suppl):S111–S116. 33. Wells KB. The design of Partners in Care: evaluating the cost effectiveness of improving care for depression in primary care. Social Psychiatry Psychiatr Epidemiol. 1999;34:20–29. 34. Gilbody S, Whitty P, Grimshaw J, et al. Educational and organizational interventions to improve the management of depression in primary care: a systematic review. JAMA. 2003;289:3145–3151. 35. Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression: a cumulative meta-analysis and review of longer-term outcomes. Arch Intern Med. 2006;166:2314– 2321. 36. Bower P, Gilbody S. Managing common mental health disorders in primary care: conceptual models and evidence base. BMJ. 2005;330:839–842. 37. Wells K, Sherbourne C, Schoenbaum M, et al. Five-year impact of quality improvement for depression: results of a group-level randomized controlled trial. Arch Gen Psychiatry. 2004;61:378–386. 38. Katzelnick DJ, Simon GE, Pearson SD, et al. Randomized trial of a depression management program in high utilizers of medical care. Arch Fam Med. 2000;9:345– 351. 39. Wells KA, Sherbourne C, Schoenbaum M, et al. Impact of disseminating quality improvement programs for depression in managed primary care: a randomized controlled trial. JAMA. 2000;283:212–220. 40. Whooley MA, Stone B, Soghikian K. Randomized trial of case-finding for depression in elderly primary care patients. J Gen Intern Med. 2000;15:293–300. 41. Rost K, Nutting PA, Smith J, et al. Improving depression outcomes in community primary care practice: a randomised trial of the QuEST intervention. J Gen Intern Med. 2001;16:143–149. 42. Thompson S. Why sources of heterogeneity in meta-analysis should be investigated. In: Chalmers I, Altman DG, eds. Systematic reviews. London: BMJ, 1995.
140
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
43. Thompson SG, Higgins JP. How should meta-regression analyses be undertaken and interpreted? Stat Med. 2002;21:1559–1573. 44. Bower P, Gilbody SM, Richards D, et al. Collaborative care for depression: making sense of complex interventions through systematic review and meta-regression. British Journal of Psychiatry 2006;189:484–493. 45. Peveler R, George C, Kinmonth AL, et al. Effect of antidepressant drug counselling and information leaflets on adherence to drug treatment in primary care: randomised controlled trial. BMJ. 1999;319:612–615. 46. Akerblad AC, Bengtsson F, Ekselius L, et al. Effects of an educational compliance enhancement programme and therapeutic drug monitoring on treatment adherence in depressed patients managed by general practitioners. Int Clin Psychopharmacol. 2003;18:347–354. 47. Brook O, van Hout H, Nieuwenhuyse H, et al. Impact of coaching by community pharmacists on drug attitude of depressive primary care patients and acceptability to patients; a randomized controlled trial. Eur Neuropsychopharmacol. 2003;13:1–9. 48. Capoccia K, Boudreau D, Blough D, et al. Randomized trial of pharmacist interventions to improve depression care and outcomes in primary care. Am J Health System Pharmacy. 2004;61:364–372. 49. Datto CJ, Thompson R, Horowitz D, et al. The pilot study of a telephone disease management program for depression. Gen Hosp Psychiatry. 2003;25:169–177. 50. Dietrich AJ, Oxman TE, Williams JW Jr, et al. Going to scale: re-engineering systems for primary care treatment of depression. Ann Fam Med. 2004;2(4):301–304. 51. Finley P, Rens H, Gess S, et al. Case management of depression by clinical pharmacists in a primary care setting. Formulary. 1999;34:864–870. 52. Hunkeler EM, Meresman JF, Hargreaves WA, et al. Efficacy of nurse telehealth care and peer support in augmenting treatment of depression in primary care. Arch Fam Med. 2000;9:700–708. 53. Katon W, Robinson P, Von Korff M, et al. A multifaceted intervention to improve treatment of depression in primary care. Arch Gen Psychiatry. 1996;53(10):924–932. 54. Mann A, Blizard R, Murray J. An evaluation of practice nurses working with general practitioners to treat people with depression. Br J Gen Pract. 1998;48:875–879. 55. Wilkinson G, Allen P, Marshall E. The role of the practice nurse in the management of depression in general practice: treatment adherence to antidepressant medication. Psychol Med. 1993;23:229–237. 56. Higgins JPT, Thompson SG. Controlling the risk of spurious findings from metaregression. Statistics in Medicine. 2004;23:1663–1682. 57. Katon W, von Korff M, Lin E, et al. Adequacy and duration of antidepressant treatment in primary care. Med Care. 1992;30:67–76. 58. Katon W, Von Korff M, Lin E, et al. Population-based care of depression: effective disease management strategies to decrease prevalence. Gen Hosp Psychiatry. 1997;19:169–178. 59. Goldberg D. The detection of psychiatric illness by questionnaire. Oxford: Oxford University Press, 1972. 60. Simon G. Collaborative care for depression. BMJ. 2006;332:249–250. 61. Unutzer J, Schoenbaum M, Druss BG, et al. Transforming mental health care at the interface with general medicine: report for the President’s Commission. Psychiatr Serv. 2006;57:37–47.
7 IMPLEMENTING SCREENING AS PART OF ENHANCED CARE
141
62. Adler DA, Bungay KM, Wilson IB, et al. The impact of a pharmacist intervention on 6-month outcomes in depressed primary care patients. Gen Hosp Psychiatry. 2004;26(3):199–209. 63. Araya R, Rojas G, Fritsch R, et al. Treating depression in primary care in low-income women in Santiago, Chile: a randomised controlled trial. Lancet. 2003;361:995–1000. 64. Blanchard MR, Waterreus A, Mann AH. The effect of primary care nurse intervention upon older people screened as depressed. Int J Geriatr Psychiatry. 1995;10:289–298. 65. Bruce M, Ten Have T, Reynolds C, et al. Reducing suicidal ideation and depressive symptoms in depressed older primary care patients. JAMA. 2004;291(9):1081–1091. 66. Callahan C, Hendrie H, Dittus R, et al. Improving treatment of late life depression in primary care: a randomized clinical trial. J Am Geriatr Soc. 1994;42:839–846. 67. Coleman EA, Grothaus LC, Sandhu N, et al. Chronic care clinics: a randomized controlled trial of a new model of primary care for frail older adults. J Am Geriatr Soc. 1999;47:775–783. 68. Dietrich AJ, Oxman TE, Williams JW, et al. Re-engineering systems for the treatment of depression in primary care: cluster randomised controlled trial. BMJ. 2004;329:602–609. 69. Jarjoura D, Polen A, Baum E, et al. Effectiveness of screening and treatment for depression in ambulatory indigent patients. J Gen Intern Med. 2004;19(1):78–84. 70. Katon W, Von Korff M, Lin E, et al. Stepped collaborative care for primary care patients with persistent symptoms of depression: a randomized trial. Arch Gen Psychiatry. 1999;56:1109–1115. 71. Katon W, Rutter C, Ludman EJ, et al. A randomized trial of relapse prevention of depression in primary care. Arch Gen Psychiatry. 2001;58:241–247. 72. Katon WJ, Von Korff M, Lin EHB, et al. The Pathways Study: a randomized trial of collaborative care in patients with diabetes and depression. Arch Gen Psychiatry. 2004;61:1042–1049. 73. Oslin D, Sayers S, Ross J, et al. Disease management for depression and at risk drinking via telephone in an older population of veterans. Psychosom Med. 2003;65:931–937. 74. Rickles N, Svarstad BL, Statz-Paynter JL, et al. Pharmacist telemonitoring of antidepressant use: effects on pharmacist–patient collaboration. J Am Pharm Assoc. 2005;45:344–353. 75. Simon G, Von Korff M, Rutter C, et al. Randomised trial of monitoring, feedback and management of care by telephone to improve treatment of depression in primary care. BMJ. 2000;320:550–554. 76. Simon GE, Ludman EJ, Tutty S, et al. Telephone psychotherapy and telephone care management for primary care patients starting antidepressant treatment: a randomized controlled trial. JAMA. 2004;292(8):935–942. 77. Swindle R, Rao J, Helmy A, et al. Integrating clinical nurse specialists into the treatment of primary care patients with depression. Int J Psychiatry Med. 2003;33(1):17–37. 78. Unutzer J, Katon W, Williams J, et al. Improving primary care for depression in late life: the design of a multicenter randomized trial. Med Care. 2001;39(8):785–799.
This page intentionally left blank
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING FOR DEPRESSION William H. Rogers, Debra Lerner, and David A. Adler
1. 2. 3. 4. 5.
Technological Methods of Screening for Depression Ten Issues When Developing Computerized Screening for Depression Examples of Implementation of Computerized Screening for Depression Discussion Conclusion
Context What are the strengths and weaknesses of computer-based and other automated methods of detecting depression? Two promising technologies make use of the Internet and speech recognition. Whatever technology is used, each method needs to be assessed rigorously using the same high standards that have been applied to pencil-and-paper tests. We are in the midst of a technological revolution that inevitably will transform psychiatric clinical practice. A consensus for routine depression screening is building,1,2 and at the same time methods by which it could be accomplished are emerging. The hope is that the right technology can provide an easy, inexpensive, valid, and reliable public health approach to depression screening. Computerized assessment is well accepted in diverse fields, and the use of Internet-based survey technology has grown exponentially.3–7 Issues regarding the strengths and limitations of computerized assessments are addressed regularly in the literature.3–11 For example, such assessments have been shown to improve data quality while at the same time reducing cost as well as the time to score, analyze, and report results. Increasingly, as depressive disorders have 143
144
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
been recognized as highly prevalent with significant morbidity, multiple screeners using an array of technological advances have been developed2,12–33 (Table 8.1 lists selected studies).34–49 This chapter will review the technologies that are currently available for automated depression screening and will discuss them in terms of criteria that should dictate their adoption.
1.
Technological Methods of Screening for Depression
The growing list of technologies can be classified on several dimensions. Perhaps the most important of these is adaptive vs. non-adaptive. In an adaptive technology pioneered by the Educational Testing Service,50 a computer, using a preprogrammed algorithm, decides which question to ask next given the responses so far.3,9,48,49,51–55 Paper-and-pencil is the classical non-adaptive technology— everyone gets the same paper with the same questions in the same order. Technological modality is a second dimension. Currently available technologies include the phone, the Internet, and hand-held electronic devices.5 The phone can be split into several groups, including agent: computer-assisted telephone interview (CATI), speech recognition, and touch-tone. Phone can also be classified as inbound (the patient initiates the call to a toll-free number) or outbound (the system initiates the call). Hand-held devices could include tablets such as personal digital assistants, game consoles, modern cell phones, or ‘‘electronic paper.’’ Internet-based screeners (eg, Patient Health Questionnaire-9 [PHQ-9], Zung Self-Rating Depression Scale),13,20 can be implemented through standard web browsers, at public kiosks, or through connected hand-held devices. In this chapter, all of these methods are classified together under the term ‘‘Internet’’ because they follow a common approach of visually presenting the screener or monitoring instrument and taking responses by interaction with that visual image. There is always a computer involved in presenting the data and recording the responses. One can even envision the day when more futuristic technologies such as eye-tracking equipment, brain scans, blood tests, or electrical system monitors for depression will be available. Two basic premises underlie our discussion: 1. There is no fail-proof methodology. There is no single technology that guarantees success, but some technologies have inherent failures. 2. Implementation and circumstances matter. A technology that performs well in one setting (eg, Internet screening at home) may be unacceptable in another (automated screening on a desktop computer in a physician’s waiting room). In the current marketplace, there are no full-service automated systems that are embedded in an electronic medical record.
Table 8.1. Technological Methods of Depression Screening: Summary of Studies Technological Method
Author/ Publication
Mental Health-Based Studies Computer voice Gonzales (2007), recognition: Hisp J Behav Sci VIDAS
Sample/Setting
Accuracy of Computerized Method
Comment
English- and Spanish-speaking patients, n = 217, visual
CES-D 20, alpha = 0.87–0.91 computer/written; CES-D vs. BDI-2: r = 0.74–0.86; ROC CES-D (cut point of 16) vs. CIDI-SF: Se: 0.88–1.0; Sp: 0.42–0.20; PPV: 0.61–0.28; NPV: 0.77–1.0 PRIME-MD IVR/Desktop and SCIDIV for MDD Kappa 0.49/0.27; Se: 0.77/0.77; Sp: 0.75/0.50; PPV: 0.87/ 0.77; NPV: 0.77/0.69; similar prevalence rates CES-D 20 and MDD (18-item DIS Mood questions) Screener K=0.82/ 0.89 for current and lifetime MDD for computer vs. interview K = 0.81/0.75 computer vs. interview of MDD vs. PRIME-MD Se: 0. 89/0.91 current/ lifetime MDD; Sp: 0.93/0.91 current/ lifetime MDD
Computerized CES-D speech recognition vs. written acceptable in both English and Spanish speakers; visual somewhat better than aural
Zung (SDS)-20 found acceptable by subjects
No direct comparison with other forms of screening
ISP-D for MDD vs. MINI (N=55): Kappa 0.80; Se: 0.82; Sp: 0.73; PPV: 0.67; NPV: 0.86
Internet-based Self-assessment Program for Depression (ISP-D) is reliable and valid online tool for assessing depression with excellent retest reliability
Computer vs. IVR telephone
Kobak (1997), Psychiatr Serv
CMHC, n = 51
Computer voice recognition
Munoz (1999), J Consult Clin Psychol
Women’s health clinic, n = 104 English- and Spanish-speaking women
Population-Based Studies IVR using Baer (1995), telephone JAMA keypad Computer Lin (2007), BMC touchscreen Psychiatry
Midwest Univ. and NE high-tech firm; n = 1,812; 1,597/1,812 Zung completers Taiwanese volunteers, n = 579
IVR vs. Desktop of PRIME-MD and compared to phone SCID-IV, Ham D-17 and chart Dx, both acceptable phone SCID and chart Dx, both acceptable Voice recognition of CES-D and MDD screener to clinician interview of both plus PRIME-MD yielded comparable results
(Continued )
Table 8.1. (Continued) Technological Method
Author/ Publication
Sample/Setting
Accuracy of Computerized Method
Comment
Computer
Patton (1999), Soc Psychiatry Psychiatr Epidemiol
Australian HS students; n = 2,032 65 of 1,729 completers with MDD
Computerized CIS-R to live CIDI 2–9 weeks late CISR/CIDI Se: 0.97; Sp: 0.18; PPV: 0.49; NPV: 0..91
Students favorable to computer
Australian amb. oncology center; n = 450, median age 61
BDI-2, Cancer Needs Questionnaire; EORTC QLQ-C30
No direct comparison with other forms of screening. Acceptable to patients
Medical-Based Studies Computer Allenby (2002), touchscreen Eur J Cancer Care Computer touchscreen
Bliven (2001), Quality of Life Research
Cardiac OPD, n = 55
SF-36, 8 subscales/Seattle Angina Quest. SF-36 computer/written r = 0.54–0.76; SF-MH mean scale computer/written: 66.19/65.77; r = 0.54
Compared computer to written, 82% preferred computer
Computer touchscreen
Cull (2001), Br J Cancer
Outpatient chemotherapy patients, n = 172
Two (HADS and MHI-5) screeners 2– 4 weeks apart compared to an inperson interview using Present State Exam (PSE) within a week
Computer touchscreen
Kurt (2004), Computer Methods in Biomedicine Sharpe (2004), Br J Cancer
Pts. >65, PCP office; n = 240; 68/ 240 participated
MHI-5>10, Hospital Anxiety and Depression Scale (HADS) >8, computer vs. PSE diagnosis of MDD: Se: 0.85; Sp: 0.71; PPV: 0.47; NPV: 0.26 CESD-20 (or 35) and GDS (Geriatric Depression Scale) computer/written: BL reliability: 0.74/0.72 computer/ written: F/Up reliability: 0.61/0.83 Comparison of Hospital Anxiety and Depression Scale (HADS) with DSMIV SCID clinician telephone interview
Computer touchscreen
Cancer center, n = 5,613; 891/ 3,938 HADS completers, score >14; 196/570 interviewed had MDD
Patients favorable to computer
No direct comparison with other forms of screening
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING
2.
147
Ten Issues When Developing Computerized Screening for Depression
With this in mind, we now consider the issues that arise regarding the use of automated screeners in general and depression-monitoring instruments specifically.
Quality Control and Accuracy The first question posited in any discussion of automation is its accuracy. Technology-based methods are more consistently applied, which implies more comparable and interpretable data.3,6,17,20,47,56–66 No human bias is introduced. Clinician interviews and agent-administered phone CATI depend on a human being. A clinician or an agent speaks and listens differently every time. Paper-and-pencil screeners, as well as automated electronic surveys, eliminate this source of variation. If this advantage is pursued, agreement with known standards can be improved beyond what is possible with a clinician or agent. While the technology already exists, ensuring accuracy rests on the craftsmanship of the instrument (eg, inaccurate or poorly designed programming will result in poor-quality data).
Error Control Evidence to date is that different data collection methods do not change the probability that the answer is recorded as intended.7 In paper-and-pencil screeners, respondents can make stray marks that scanners cannot easily interpret. These can be reduced to acceptable levels by providing clear instructions with examples on how to make marks. In speech recognition systems, respondents can speak responses outside of the answer set, but asking questions in a way that prompts a response in range and challenging responses that do not seem to be within range can reduce this.36 For both of these systems, human post-response review of questionable responses is desirable. For example, scanners can detect stray marks and voice recognizers can identify problematic voice input. With these measures, very low error rates (eg, over 99.5% correct) are possible. Without these measures, the error rates are low but errors do occur (eg, over 98% correct). Numerous data companies report error control checks within these ranges and better. Nominal error rates for touch-tone and for the Internet and related technologies such as kiosks and hand-held devices are low because these systems enforce a single answer. However, this does not mean that such devices are free of error. The error rates on the Internet are low if the respondent can see all the responses and no default choices are premarked. Several studies have found
148
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
that Internet surveys and mail are equivalent.67 If the respondent has to click a mouse to see all the responses, then the results will be biased. For touch-tone interactive voice recognition (IVR), elderly respondents and those whose touch-tone buttons are in the receiver are likely to have high error rates, but no further identification of errors is possible without a very laborious review of each response—a practice suitable for banking but not for screening questionnaires. Touch-tone also invites cognitive errors because the verbal responses must be converted to numerical form before they can be entered. Most studies have concluded that touch-tone is not equivalent to mail.67
Honesty Research has shown repeatedly that respondents even with depression are more honest with computers or mail than they are with live interviewers, translating into better acccuracy.59,60,64,68–70
Physical Clues Conversely, human interpreters, and especially clinician interviewers, are best at dealing with clues such as crying, gaps in speech, or slurred, sped-up, or retarded speech that might have important implications in the screening process.4 Voice recognition systems could also be trained to find these, but this has not happened yet to our knowledge, and it would never be as good as trained clinicians meeting with depressed individuals.
Performance Case-specific performance data are key to successful use of an automated system, given the potential time savings.7,20 Physicians can use the results most efficiently if patient-specific reports of positive predictive value (PPV) and negative predictive value (NPV) are included. In one of the few studies addressing depression, Kobak and colleagues,20 using the PHQ-9, reported a PPV of 0.87 and a sensitivity for touch-tone and IVR of 0.84 to 0.88. The cost of untreated depression is high, particularly among employed patients,71–74 so automated screening will normally be cost-effective compared with the haphazard approach characteristic of population screening. If the screener cannot find cases (poor sensitivity or low NPV), then other case-finding tools may need to be used anyway.
Workload Considerations A highly effective automated system that is used to screen all individuals routinely has the potential to generate many possible or probable cases very
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING
149
quickly. For example, as found in studies by Sharpe30 and Cull40 and their colleagues, if every attendee at a regional cancer center is assessed, it is possible that 20% might be flagged as high scorers on a depression scale. Even with a second filter such as request for help, a large number of people may need to be seen. The potential benefit of a high yield of true cases might come at the expense of a large number (in absolute terms) of false positives, each of whom has higher expectations on the basis of the first-stage alert and needs to be have follow-up. Alternatively, fear of workload may defeat the screening process itself. When the PPV is too much below 70%, physicians may choose to ignore screening results on the grounds that following up 30% or more who are false positives is too much work.7,75 Although PPV and sensitivity are affected by response errors, they are more influenced by the screening instrument itself. The balance between them is implementation-specific. In general, demanding criteria for diagnosing depression will result in good PPV but poor sensitivity.27
Acceptability A system is useful only if subjects are willing to use it; acceptability is a necessity for implementation of any automated screening system. Most of the evidence to date suggests that patients accept automated screening as a general idea compared with visits to mental health specialists.3,6,20,24,30,40 A number of national studies have had excellent response rates with no particular item nonresponse on depression screening questions.38,47,76–78 With respect to the technologies, the survey response literature has some lessons to teach. The technological challenge to the respondent of touch-tone IVR is higher than speech recognition; touch-tone response rates are lower. The Internet (and associated device-related technologies) is generally regarded as usable, but not every home has a computer, and in many businesses personal computer use is restricted or frowned upon.4 In addition, many people have privacy worries about the Internet, and in some businesses these are justified.79 Some degree of computer skill and literacy is necessary.38 The impact of age cohort, gender, and cultural issues requires further study. This suggests that alternatives to the Internet will remain useful. Combination approaches involving Internet, phone, and either outbound calling or mail achieve the best coverage.67
Prices As a general rule, prices are highly implementation-dependent, and a bid is necessary to know what the price will be. However, some general principles apply. Paper-and-pencil surveys depend on a combination of mailing costs and
150
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
processing costs.80,81 Very efficient high-end scanners are available, but they must still be fed. Even a ‘‘free’’ screener that is entered by fax machine in a doctor’s office costs more than $3 when the cost of handing out the survey, collecting the response, and feeding it into a fax machine is counted. If mail is involved, back-end duties can be handled by clerks, but this cost reduction is more than offset by the price of mailing.6,67,76 The traditional methods of screening such as paper surveys and scanning are only suited to large-scale data-collection systems with central mail processing facilities and are difficult to manage in smaller settings. For Internet screeners and voice recognition or touch-tone, the marginal cost of the screener ranges from nothing to a dollar, but there are fixed costs associated with developing and fielding the system purposes.82–84 Such costs are typically between $10,000 and $25,000.12
Availability All of the methods except for scanned paper-and-pencil surveys can be processed immediately, with real-time feedback to respondents about what to do. Patients often have time to consider the possibilities at times of the day when physicians are not available (eg, the middle of the night). Results are immediately available without transcription error.47
Embedding in a System To be useful, a screening system needs to be embedded in a healthcare system that can deal with the information.3,7,20,85,86 Unless the results are available and retrievable, they are useless. This very important issue is mostly beyond the scope of this paper. Technology has some impact. A mailed and scanned questionnaire cannot be acted on in a timely way. All of the electronic methods can be followed up with questions about context (Did someone important to you die recently? Are you thinking of taking your life soon?). In principle, the results can be transmitted to electronic medical records (EMR) or physician e-mail, if the setting allows for one. Contextual data such as medications could also be drawn from an EMR. In the current environment, embedding screeners is still a custom operation—EMR is not at this point sold with a depression screener or monitor website included.
3.
Examples of Implementation of Computerized Screening for Depression
Whether a system is actually acceptable in practice depends on both the technology and the context. All of the technologies have been shown to be
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING
151
acceptable in some context (see Table 8.1 for selected studies discussed below). For example, in our prior work, most patients in primary care offices were willing to fill out a two-page depression screener that was immediately scanned.26 We are now using web-based touchscreen methodology to screen
Figure 8.1.a Work and Health Initiative depression pre-screener.
Figure 8.1.b Work and Health Initiative depression pre-screener.
152
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Figure 8.1.c Sample electronic WHI Patient Depression Report.
employed individuals for depression in workplace settings (Fig. 8.1). The study by Baer and associates13 using IVR with telephone keypad response was one of the first to demonstrate the use and acceptability of fully automated technology for confidential mass depression screening. Two recent studies—Gonzalez and associates,36 using computer voice recognition, and Lin and coworkers,35 using computer touchscreen—found good psychometric properties for wellaccepted depression screeners compared to standardized diagnostic in-person interviews. Kobak and associates,19,20,47,61 in a series of studies, demonstrated the acceptability and equivalence of all forms of depression screening (clinician interview by telephone, phone IVR, and computer touchscreen). Kurt and colleagues22 found similar results for a computer-assisted assessment of depression in geriatric primary care patients. Even in a minority population, Munoz and associates24 met no resistance to depression screening with computerized voice-recognition technology. In non-mental health outpatient settings Allenby and colleagues12 in oncology and Bliven and associates80 in cardiology found high degrees of acceptability for computer-assisted technology in screening for psychosocial distress. Sharpe and colleagues30 applied touchscreen technology and found no resistance to screening for depression and anxiety in a regional ambulatory cancer center. Cull and colleagues40 used touchscreen technology to administer the Mental Health Index and Hospital Anxiety and Depression Scale to develop a depression screening algorithm with adequate psychometric properties among outpatient cancer patients.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING
4.
153
Discussion
Automated methods for both general health and depression-specific screening are here to stay. They produce more accurate answers, are more suited to evidence-based medicine, and are less expensive than paper-and-pencil person-dependent methods or mail. Electronic methods are also superior to paper and pencil because they produce timely answers and can also explore some of the follow-up issues, such as more detail about suicidal ideation or how the patient fits into the care process. While mental health clinicians’ faceto-face observations of patients can identify verbal and nonverbal depressive cues and lead to more immediate response, most individuals with depression are not seen in the mental health specialty sector. However, gaps in both evidence and barriers remain to effective widespread use. Once a screening context is established, then some methods that are acceptable in principle become unacceptable in practice. For example, most patients would feel uncomfortable conducting a phone interview while sitting in a crowded waiting room, or taking an Internet-based screener on a home computer known to be infected with a virus. On the other hand the same patients might feel comfortable taking a phone interview at home or completing an Internet-based screener on a computer in a private room off the waiting area at the doctor’s office. A number of groups have studied the issues of implementation in a number of settings focusing on acceptability and accuracy (see Table 8.1). In general, these pilot projects find that depressed patients are able to accurately complete both computer (desktop and web) and telephone screener methodologies and find them acceptable alternatives to both paper-and-pencil and clinician interviews. Just as with conventional methods, there is no one-size-fits-all answer: multiple modalities are needed to meet varied patient and provider needs. Solution modality by itself (eg, Internet, phone, or tablet) is not the answer—much of the value lies in the craft with which it is executed. Good-quality solutions are available in all three modalities, but so are poor solutions. Choice is dependent upon purpose. If technology such as computer-adaptive testing is to be applied to population screening, a multi-tiered approach can improve the accuracy. For example, a general mental health prescreening can efficiently reduce the number of individuals who might then be followed with a diagnosis-specific pre-screener, reserving full screening for at-risk populations and for following patients known to have a depressive disorder. With respect to acceptability, the evidence to date suggests that automated depression screening via web, computer, telephone, or soon tablet does not incur reluctance by those screened. With respect to follow-up, however, the story may differ. In most health risk-appraisal systems, patients and providers can ignore a positive depression screener. On the other hand, a positive
154
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
screener can lead to overreaction. Work needs to be done on the back end of a positive screener to identify cases that are appropriate for follow-up. Careful thought needs to be given to how results will be handled with providers, what follow-up would be cost-effective, and who will need to deliver follow-up services. Nonetheless, without an electronic system, there is no mechanism to help the system address these issues. The marketplace will continue to define and redefine solutions that are available and affordable. We have raised a set of questions that should be asked of such systems and put them into two categories: concerns that are frequently raised but usually do not turn out to be important issues (eg, accuracy and acceptability) and concerns that have often led to existing systems working less well than they could and that need to be addressed in every implementation (eg, privacy, follow-up, and the interface of automated results to the physician–patient relationship).
5.
Conclusion
Thirty years of research has led to the conclusion that the benefits of automated methods outweigh their limitations in general,3,6,7 for mental and specifically for depression health issues,3,20,58,61,62,64,68,87 13,15,16,20,24,35,36,47,88,89 screening. In the absence of information about a particular implementation and the setting it is in, one cannot say that it is automatically worthwhile or unacceptable. However, one can say that pencil-andpaper screeners will be effective only under a limited set of conditions that avoid the costs and delays commonly associated with mail. The two most promising technologies seem to be the Internet (using web browsers and/or hand-held devices) and speech recognition. Whatever technology is used, there needs to be a good fit between the technology and the system within which it is deployed.86 Acceptability depends on context; accuracy depends on craft. The system needs to connect the patient to a physician and support that physician with the correct information.
References 1. Agency for Health Care Policy and Research. Depression in primary care: detection and diagnosis. Rockville, MD, 1993. 2. U.S. Preventive Services Task Force. Guide to clinical preventive services, 2nd ed. Baltimore: Williams & Wilkins, 1996. 3. Berger M. Computer-assisted clinical assessment. Child Adolesc Mental Health. 2006;11(2):64–75. 4. Butcher JN, Perry J, Hahn J. Computers in clinical assessment: historical developments, present status, and future challenges. J Clin Psychol. 2004;60(3):331–345.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING
155
5. Dillman DA. Mail and Internet surveys: the tailored design method, 2nd ed. Hoboken, NJ: John Wiley & Sons, 2007:352–412. 6. Epstein J, Klinkenberg WD. From Eliza to Internet: a brief history of computerized assessment. Computers in Human Behavior. 2001;17:295–314. 7. Garb HN. Computer-administered interviews and rating scales. Psychol Assess. 2007;19(1):4–13. 8. Buchanan T, Smith JL. Using the Internet for psychological research: personality testing on the World Wide Web. Br J Psychol. 1999;90(Pt 1):125–144. 9. Revicki DA, Cella DF. Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing. Qual Life Res. 1997;6(6):595–600. 10. Truell AD, Bartlett JE, Alexander MW. Response rate, speed, and completeness: a comparison of Internet-based and mail surveys. Behav Res Methods Instrum Comput. 2002;34(1):46–49. 11. Schleyer TK, Forrest JL. Methods for the design and administration of web-based surveys. J Am Med Inform Assoc. 2000;7(4):416–425. 12. Allenby A, Matthews J, Beresford J, et al. The application of computer touch-screen technology in screening for psychosocial distress in an ambulatory oncology setting. Eur J Cancer Care (Engl). 2002;11(4):245–253. 13. Baer L, Jacobs DG, Cukor P, et al. Automated telephone screening survey for depression. JAMA. 1995;273(24):1943–1944. 14. Beck AT, Steer RA, Garbin MG. Psychometric properties of the Beck Depression Inventory: twenty-five years of evaluation. Clin Psychol Rev. 1988;8:77–100. 15. Gonzalez GM, Spiteri CB, Knowlton JP. An exploratory study using computerized speech recognition for screening depressive symptoms. Computers in Human Behavior. 1995;11(1):85–93. 16. Carr AC, Ancill RJ, Ghosh A, et al. Direct assessment of depression by microcomputer. A feasibility study. Acta Psychiatr Scand. 1981;64(5):415–422. 17. Carr AC, Ghosh A, Ancill RJ. Can a computer take a psychiatric history? Psychol Med. 1983;13(1):151–158. 18. Klinkman MS, Coyne JC, Gallo S, et al. Case finding instruments to be used to improve physician detection of depression in primary care. Arch Fam Med. 1997;6:567–573. 19. Kobak KA, Reynolds WM, Rosenfeld R, et al. Development and validation of a computer-administered version of the Hamilton Depression Rating Scale. Psychol Assess. 1990;2:56–63. 20. Kobak KA, Taylor LVH, Dottl SL, et al. Computerized screening for psychiatric disorders in an outpatient community mental health clinic. Psychiatr Serv. 1997;48(8):1048–1057. 21. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–613. 22. Kurt R, Bogner HR, Straton JB, et al. Computer-assisted assessment of depression and function in older primary care patients. Comput Methods Programs Biomed. 2004;73(2):165–171. 23. Mulrow CD, Williams JW Jr, Gerety MB, et al. Case-finding instruments for depression in primary care settings. Ann Intern Med. 1995;122(12):913–921. 24. Munoz RF, McQuaid JR, Gonzalez GM, et al. Depression screening in a women’s clinic: using automated Spanish- and English-language voice recognition. J Consult Clin Psychol. 1999;67(4):502–510.
156
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
25. Patton GC, Coffey C, Posterino M, et al. A computerised screening instrument for adolescent depression: population-based validation and application to a two-phase case-control study. Soc Psychiatry Psychiatr Epidemiol. 1999;34(3):166–172. 26. Rogers WH, Wilson IB, Bungay KM, et al. Assessing the performance of a new depression screener for primary care (PC-SAD(c)). J Clin Epidemiol. 2002;55(2):164–175. 27. Rogers WH, Adler DA, Bungay KM, et al. Depression screening instruments make good severity measures in a cross-sectional analysis. J Clin Epidemiol. 2005;58:370–377. 28. Schade CP, Jones ER Jr, Wittlin BJ. A ten-year review of the validity and clinical utility of depression screening. Psych Serv. 1998;49(1):55–61. 29. Schwenk TL. Screening for depression in primary care: a disease in search of a test. J Gen Intern Med. 1996;11:437–439. 30. Sharpe M, Strong V, Allen K, et al. Major depression in outpatients attending a regional cancer centre: screening and unmet treatment needs. Br J Cancer. 2004;90(2):314–320. 31. Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. JAMA. 1999;282(18):1737–1744. 32. Valenstein M, Vijan S, Zeber JE, et al. The cost-utility of screening for depression in primary care. Ann Intern Med. 2001;134(5):345–360. 33. Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression. Two questions are as good as many. J Gen Intern Med. 1997;12(7):439–445. 34. Kim H, Bracha Y, Tipnis A. Automated depression screening in disadvantaged pregnant women in an urban obstetric clinic. Arch Womens Ment Health. 2007;10(4):163–169. 35. Lin CC, Bai YM, Liu CY, et al. Web-based tools can be used reliably to detect patients with major depressive disorder and subsyndromal depressive symptoms. BMC Psychiatry. 2007;7:12. 36. Gonzalez GM, Carter C, Blanes E. Bilingual computerized speech recognition screening for depression symptoms: comparing aural and visual methods. Hispanic Journal of Behavioral Sciences. 2007;29(2):156–180. 37. Fann J, Berry DL, Wolpin SE, et al. Feasibility of depression screening using the PHQ-9 administered on a touchscreen computer. Psychooncology. 2006;15(1):S18–S18. 38. Ekman A, Dickman PW, Klint A, et al. Feasibility of using web-based questionnaires in large population-based epidemiological studies. Eur J Epidemiol. 2006;21(2):103–111. 39. Hyler SE, Gangure DP, Batchelder ST. Can telepsychiatry replace in-person psychiatric assessments? A review and meta-analysis of comparison studies. CNS Spectr. 2005;10(5):403–413. 40. Cull A, Gould A, House A, et al. Validating automated screening for psychological distress by means of computer touchscreens for use in routine oncology practice. Br J Cancer. 2001;85(12):1842–1849. 41. Houston TK, Cooper LA, Vu HT, et al. Screening the public for depression through the internet. Psychiatr Serv. 2001;52(3):362–367. 42. Leon AC, Kelsey JE, Pleil A, et al. An evaluation of a computer-assisted telephone interview for screening for mental disorders among primary care patients. J Nerv Ment Dis. 1999;187(5):308–311. 43. Brodey BB, Rosen CS, Brodey IS, et al. Reliability and acceptability of automated telephone surveys among Spanish- and English-speaking mental health services recipients. Ment Health Serv Res. 2005;7(3):181–184.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING
157
44. Mitchell AM, Mittelstaedt ME, Schott-Baer D. Postpartum depression: the reliability of telephone screening. MCN Am J Matern Child Nurs. 2006;31(6):382–387. 45. Ogles BM, France CR, Lunnen KM, et al. Computerized depression screening and awareness. Community Ment Health J. 1998;34(1):27–38. 46. Fliege H, Becker J, Walter OB, et al. Development of a computer-adaptive test for depression (D-CAT). Qual Life Res. 2005;14(10):2277–2291. 47. Kobak KA, Mundt JC, Greist JH, et al. Computer assessment of depression: automating the Hamilton Depression Rating Scale. Drug Inf J. 2000;34:145–156. 48. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatr Serv. 2008;59(4):361–368. 49. Gardner W, Shear K, Kelleher KJ, et al. Computerized adaptive measurement of depression: a simulation study. BMC Psychiatry. 2004;4:13. 50. Educational Testing Services. Educational testing services. [Web document], 2000. Accessed 7-30-2007. 51. Green B, Bock R, Humphreys L, et al. Technical guidelines for assessing computerized adaptive tests. J Educ Measure. 1984;21:347–360. 52. Sands WA, Waters BK, McBride JR. Computerized adaptive testing: from inquiry to operation. Washington, DC: APA Books, 1997. 53. Wainer H, Dorans NL. Computerized adaptive testing: a primer. Hillsdale, NJ: Erlbaum Associates, 2000. 54. Ware JE Jr, Bjorner JB, Kosinski M. Practical implications of item response theory and computerized adaptive testing: a brief summary of ongoing studies of widely used headache impact scales. Med Care. 2000;38(9 Suppl):II73–II82. 55. Weiss DJ. Adaptive testing by computer. J Consult Clin Psychol. 1985;53(6):774–789. 56. Baer L, Brown-Beasley MW, Sorce J, et al. Computer-assisted telephone administration of a structured interview for obsessive-compulsive disorder. Am J Psychiatry. 1993;150(11):1737–1738. 57. Buchanan T. Online assessment: desirable or dangerous. Professional Psychology: Research and Practice. 2002;33:148–154. 58. Carr AC, Ghosh A. Accuracy of behavioural assessment by computer. Br J Psychiatry. 1983;142:66–70. 59. Erdman HP, Klein MH, Greist JH. Direct patient computer interviewing. J Consult Clin Psychol. 1985;53(6):760–773. 60. Erdman HP, Greist JH, Gustafson DH, et al. Suicide risk prediction by computer interview: a prospective study. J Clin Psychiatry. 1987;48(12):464–467. 61. Kobak KA, Greist JH, Jefferson JW, et al. Computer-administered clinical rating scales. A review. Psychopharmacology (Berl). 1996;127:291–301. 62. Peters L, Andrews G. Procedural validity of the computerized version of the Composite International Diagnostic Interview (CIDI-Auto) in the anxiety disorders. Psychol Med. 1995;25(6):1269–1280. 63. Robins L, Helzer J, Cottler L, et al. NIMH Diagnostic Interview Schedule, Version III Revised (DIS-III-R). St. Louis, MO: Washington University, 1989. 64. Rosenfeld R, Dar R, Anderson D, et al. A computer-administered version of the YaleBrown Obsessive-Compulsive Scale. Psychol Assess. 1992;4:329–332. 65. Shaffer D, Fisher P, Lucas CP, et al. NIMH Diagnostic Interview Schedule for Children Version IV (NIMH DISC-IV): description, differences from previous versions, and reliability of some common diagnoses. J Am Acad Child Adolesc Psychiatry. 2000;39(1):28–38.
158
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
66. Wilson FR, Genco KT, Yager GG. Assessing the equivalence of paper-and-pencil versus computerized tests: Demonstration of a promising technology. Computers in Human Behavior. 1985;1:265–275. 67. Rodriguez HP, von GT, Rogers WH, et al. Evaluating patients’ experiences with individual physicians: a randomized trial of mail, internet, and interactive voice response telephone administration of surveys. Med Care. 2006;44(2):167–174. 68. Davis LJ Jr, Hoffmann NG, Morse RM, et al. Substance Use Disorder Diagnostic Schedule (SUDDS): the equivalence and validity of a computer-administered and an interviewer-administered format. Alcohol Clin Exp Res. 1992;16(2):250–254. 69. Millstein S. Acceptability and reliability of sensitive information collected via computer interview. Educational and Psychological Measurement. 1987;47:523–533. 70. Rosenman SJ, Levings CT, Korten AE. Clinical utility and patient acceptance of the computerized composite international diagnostic interview. Psychiatr Serv. 1997;48(6):815–820. 71. Adler DA, McLaughlin TJ, Rogers WH, et al. Job performance deficits due to depression. Am J Psychiatry. 2006;163(9):1569–1576. 72. Greenberg PE, Kessler RC, Birnbaum HG, et al. The economic burden of depression in the United States: how did it change between 1990 and 2000? J Clin Psychiatry. 2003;64(12):1465–1475. 73. Kessler RC, Berglund P, Demler O, et al. The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA. 2003;289(23):3095–3105. 74. Wang PS, Patrick A, Avorn J, et al. The costs and benefits of enhanced depression care to employers. Arch Gen Psychiatry. 2006;63(12):1345–1353. 75. Grove WM, Zald DH, Lebow BS, et al. Clinical versus mechanical prediction: a metaanalysis. Psychol Assess. 2000;12(1):19–30. 76. Selim AJ, Berlowitz DR, Fincke G, et al. The health status of elderly veteran enrollees in the Veterans Health Administration. J Am Geriatr Soc. 2004;52(8):1271–1276. 77. Tarlov AR, Ware JE Jr, Greenfield S, et al. The Medical Outcomes Study. An application of methods for monitoring the results of medical care. JAMA. 1989;262(7):925–930. 78. Wells KB, Burnam MA, Camp P. Severity of depression in prepaid and fee-forservice general medical and mental health specialty practices. Med Care. 1995;33(4):350–364. 79. Kilbourne AM, McGinnis GF, Belnap BH, et al. The role of clinical information technology in depression care management. Adm Policy Ment Health.2006;33(1):59–69. 80. Bliven BD, Kaufman SE, Spertus JA. Electronic collection of health-related quality of life data: validity, time benefits, and patient preference. Qual Life Res. 2001;10(1):15–22. 81. Radosevich DM, Werni TL. A practical guide for implementing, analyzing, and reporting outcomes measurements. Health Outcomes Institute, 1998. 82. Rind DM, Kohane IS, Szolovits P, et al. Maintaining the confidentiality of medical records shared over the Internet and the World Wide Web. Ann Intern Med. 1997;127(2):138–141. 83. Soetikno R, Young HS, Keefe EB. Role of emerging technology in the era of cost containment. Am J Gastroenterol. 1997;92:1038–1040. 84. Subramanian AK, McAfee AT, Getzinger JP. Use of the World Wide Web for multisite data collection. Acad Emerg Med. 1997;4(8):811–817.
8 TECHNOLOGICAL APPROACHES TO SCREENING AND CASE FINDING
159
85. Barak A. Psychological applications on the Internet: a discipline on the threshold of a new millennium. Applied and Preventive Psychology. 1999;8(4):231–245. 86. Blumenthal D, Glaser JP. Information technology comes to medicine. N Engl J Med. 2007;356(24):2527–2534. 87. Skinner HA, Allen BA. Does the computer make a difference? Computerized versus face-to-face versus self-report assessment of alcohol, drug, and tobacco use. J Consult Clin Psychol. 1983;51(2):267–275. 88. Greist JH, Gustafson DH, Stauss FF, et al. A computer interview for suicide-risk prediction. Am J Psychiatry. 1973;130(12):1327–1332. 89. Kobak KA, Reynolds WM, Griest JH. Computerized and clinician assessment of depression and anxiety: respondent evaluation and satisfaction. J Pers Assess. 1994;63(1):173–180.
This page intentionally left blank
9 SCREENING FOR DEPRESSION IN PRIMARY CARE: CAN IT BECOME MORE EFFICIENT? Kathryn M. Magruder and Derik E. Yeager
1. 2. 3. 4. 5. 6. 7.
Introduction Epidemiology of Depression in Primary Care Is Screening for Depression in Primary Care Worthwhile? Which Screening Tool Should Be Used? Implementing Screening in Primary Care What Developments Are on the Horizon? Conclusions
Context Screening for depression has been so widely advocated that the burden of proof has shifted to skeptics who argue against it. Yet only recently has sufficient evidence accrued to judge dispassionately the advantages and disadvantages of screening. Here we discuss the evidence for specific tools and specific strategies in improving the outcome of depression screening in primary care.
1. Introduction In 1978, the Institute of Medicine defined primary care as ‘‘care that is accessible, comprehensive, coordinated, continuous, and accountable.’’1 While the definition has evolved over time,2 these fundamental characteristics are still valid today. Included in the primary care mission is to serve as the first line for detection and either treatment or referral of common mental disorders, including depression. The inclusion of first-line mental health services as a component of primary care distinguishes primary care (including outpatient 161
162
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
clinics in managed care organizations, community hospitals, Veterans Administration hospitals, teaching institutions, and other medical centers) from care in more specialized clinical settings. The comprehensiveness of primary care and the obligation of its providers for first-line care make it a logical and appropriate venue for mental health screening.3 Complicating the issue, however, are the time constraints on primary care providers. Although the amount of time spent per patient visit is about 20 minutes in the United States,4 the recommended services that should be provided in that short period of time are daunting. It is therefore imperative that these recommended services—in particular preventive health services— be provided in the most efficient manner possible. Services that cannot be provided efficiently and fit within the busy, fast-paced world of primary care are at risk of being omitted. This is especially true for preventive mental health services. Screening for depression is such a service; therefore, it is critical that primary care providers make use of the best and most efficient depression screening approaches possible. In this chapter, we will address issues related to screening for depression in the primary care context. We will start by briefly reviewing the epidemiology of depression as related to primary care. Next, we will provide a critical examination of the applicability to depression screening of the World Health Organization’s criteria. Then we will review published screening tools and their attributes for use in primary care settings. Last, we will provide a discussion of future directions, including additional ways that screening for depression in primary care can be made more efficient and thus more effective and more widely implemented.
2. Epidemiology of Depression in Primary Care Population Prevalence of Depression The National Comorbidity Survey Replication (NCS-R), conducted on adults over 18 years old, found a 12-month prevalence of 9.5% for any DSM-IV mood disorder, with 6.7% for major depression and 1.5% for dysthymia.5 From this survey, 19.5% of major depression cases in the community are classified as mild, with 50.1% and 30.4% classified as moderate and serious, respectively.5 Thus, about 80% of those with major depressive disorder have symptoms that are moderate to serious, and it is likely that those who seek health services are in the higher spectra of disorder. In a European epidemiologic study of mental disorders involving six countries, major depression was the single most common disorder assessed, with a 12-month prevalence of 3.9%.6 Wittchen and Jacobi7 conducted a meta-analysis of 27 studies with data on the prevalence of
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
163
mental disorders in European countries. The 12-month prevalence of major depression ranged between 3.1% and 10.1%, with a median prevalence of 6.9%. Clearly, depression may be the most prevalent of mental disorders and constitutes a worldwide problem affecting approximately 5% to 10% of adults in a given year.
Primary Care Prevalence of Depression An early compendium of studies showed that pre-DSM-III-R depression prevalence in primary care ranged from 4.8% to 8.6%.8 More recently, one of the most comprehensive assessments of mental disorders in primary care was conducted by the World Health Organization and involved 15 cities in 14 countries.9 Using the Composite International Diagnostic Interview (CIDI) as the diagnostic assessment tool for DSM-III-R and ICD-10 conditions, this study showed that the prevalence of current psychiatric disorders is 24% but varies substantially by country.9 In particular, prevalence estimates for major depression ranged from 2.6% in Nagasaki, Japan, to an exceptionally high 29.5% in Santiago de Chile (over 12% greater than the next highest—16.9% in Manchester, England). The total prevalence of ICD-10 major depression was 10.4%. Although it is acknowledged that there is considerable variability within a city or country based on the characteristics of a primary care clinic (eg, inner-city clinics that serve disadvantaged patients may have higher depression prevalence), and thus the findings of this study do not generalize as national primary care prevalences, this important international study has helped to solidify the importance of depression in primary care settings throughout the world. A number of studies have found significant prevalence and morbidity of subthreshold disorders. For example, in a study of 619 primary care patients, Backenstrass and associates10 found a prevalence of 4.6% for major depression, 6.2% for minor depression, and 9.1% for nonspecific depression symptoms. Levels of disability followed a similar pattern, with highest levels for major depression and lowest levels for nonspecific depression symptoms.10 Thus, these ‘‘sub-major’’ forms of depression are not without associated morbidity.
Primary Care is the ‘‘De Facto’’ Mental Health Treatment System Primary care has been termed the de facto mental health treatment system since as many people with mental disorders receive treatment in general medical settings as in mental health specialty settings.11,12 From Epidemiologic
164
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Catchment Area (ECA) data, it has been estimated that only 45% of those with unipolar major depression used any health service in the 12 months prior; 27.8% sought care in the specialty mental health sector, while 25.3% sought care in the general medical sector.11 Paralleling ECA findings, NCS-R data have shown that 51.6% of those who met the criteria for major depression received some health services for depression in the past 12 months, with 27.2% in the general medical sector.13 This paper also examined symptom severity with respect to treatment and found that only 12.8% of those in treatment in the general medical sector were classified as mild cases—all others were moderate and above. It has been estimated that 50% to 80% of depression management occurs in primary care. Harman and colleagues14 found that for older adults 64% of depression visits occurred in primary care, representing only 3% of all elder primary care visits, contrasted with 26% of depression visits occurring in psychiatric care, representing 58% of all psychiatric elder visits. Thus, the index of suspicion is likely to be low in primary care settings where the prevalence is also low. An analysis of National Ambulatory Medical Care Survey data showed that for the average primary care doctor, 10.33 visits per week were considered antidepressant medication visits, compared with 11.04 such visits for the average psychiatrist.15 While antidepressant medication visits are slightly higher for psychiatrists than for primary care physicians, it is likely that primary care physicians initiate more antidepressant prescriptions but fewer monitoring visits, while psychiatrists have fewer antidepressant-initiating visits but more monitoring visits.
Unassisted Recognition of Depression in Primary Care Ironically, while general medical settings are a primary venue for treating mental disorders, a very large percentage of such disorders go unrecognized by primary care providers and therefore untreated. Some reports suggest that fewer than 50% of those with depression are so diagnosed in primary care settings.16–18 The WHO primary care study found that overall, 54.2% of those who met criteria for depression (ICD F32/33) were recognized as having a psychological illness by their treating physician. This ranged from a low of 19.3% in Nagasaki to a high of 74.0% in Santiago de Chile.19 Thus, studies show that depression is relatively common in primary care settings, but many with depression go unrecognized. It is no wonder that a number of screening tools have been developed to assist providers in recognizing and diagnosing depression. Yet there are other issues to consider before initiating screening programs.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
165
3. Is Screening for Depression in Primary Care Worthwhile? Screening is an important aspect of prevention and early intervention for many diseases and conditions, and this includes depression. WHO describes 10 criteria for initiating a screening program. Below, we discuss each criterion along with issues that should be considered for clinically effective depression screening. Because our focus is on primary care, we consider these criteria in that context.
The Condition Should Be an Important Health Problem With a depression prevalence of approximately 5% to 10% worldwide and 5% to 20% in primary care settings, depression is considered an important health problem. In addition to personal suffering, those with depression have significantly worse functioning. Based on the landmark publication on worldwide disability,20 Ustun and associates21 have updated earlier data and estimate that depression was the fourth leading cause of global disease burden in the year 2000. The burden of depression on the healthcare system is equally significant. The average medical costs (6-month period) for primary care patients in the United States diagnosed with depression or anxiety were approximately twice the average costs for patients with subthreshold depression or anxiety or no disorder ($2,390 vs. $1,248),22 resulting in national annual medical costs of approximately $26 billion (1990 dollars).23 For the most part, this burden is on primary care in terms of recognition and treatment,24 including antidepressant prescribing.25,26 On another level, the societal burden of depression is great, and patients need not receive a clinical diagnosis of depression to experience impaired functioning,27 missed workdays (at an annual national cost of $17 billion),23 and disability days,28 with impairment equal to or greater than that found with other chronic conditions such as diabetes, arthritis, gastrointestinal disturbances, lung disturbances, bronchitis, emphysema, and back problems.29 Thus, there is no doubt that at all levels depression is an important health and public health problem.
There Should Be a Treatment for the Condition A number of effective treatments exist for depression, including cognitivebehavioral therapy and medications. In fact, the robust research basis for these treatments has prompted a proliferation of treatment guidelines that provide practical approaches for implementing these evidence-based practices for primary care providers (see, for example, the Agency for Healthcare Research and Quality website with depression guidelines).30
166
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Facilities for Diagnosis and Treatment Should Be Available Although this tends to be setting-specific, more and more primary care practitioners are recognizing their roles as first-line responders for depression diagnosis and treatment. Additionally, many primary care practices incorporate mental health care specialists in their practice (eg, psychiatric nurse specialist), are aligned with mental health specialists (ie, have a ready referral source), or are part of larger healthcare organizations that incorporate mental health services (eg, HMOs, U.S. Veterans Health Administration). Thus, when there is a positive screen and a diagnosis of depression is made, treatment is typically available within the practice or within a referral network.
There Should Be a Latent Stage of the Disease Although the diagnosis of depression depends on the presence of symptoms, the disorder can be considered to have a latent stage in the following sense. Depression is often not detected clinically, patients do not spontaneously report symptoms to providers, and patients themselves may not be aware that their symptoms constitute depression. From NCS-R data, it has been estimated that there is a delay of approximately 8 years between the onset of depression and first receipt of professional help.31 Additionally, longstanding depression is associated with disability as well as psychiatric and medical comorbidities, which early detection and intervention may prevent.
There Should Be a Test or Examination for the Condition As is detailed in the next section, a number of adequate depression screening tools exist, including standard screeners (eg, the Zung Self-Rating Depression Scale [SDS]),32 short screens (eg, Medical Outcomes Study Depression Screen [MOSD]),33 and some ultra-brief screens (eg, Patient Health Questionnaire [PHQ]-2).34 In addition, there are diagnostic interviews suitable for use in primary care, such as the depression module of the Mini International Neuropsychiatric Interview (M.I.N.I.),35 the Primary Care Evaluation of Mental Disorders (PRIME-MD),36 and the Symptom-Driven Diagnostic System for Primary Care (SDDS-PC).37
The Test Should Be Acceptable to the Population Screens for depression are generally acceptable to both participants and the staff who administer them.38,39 Diagnostic tools are lengthier and may be more difficult for some patients; however, they are considered acceptable in terms of risk and time. Certainly, relative to other recommended primary care screenings (eg, colonoscopy), screening for depression is noninvasive, brief, and well
167
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
Burden
Screening Tasks
Patient, PC Staff
Screen
PC Staff
Score Review results – | +
PCP Patient, PC Staff, PCP
2nd Stage Screen – +
Diagnostic Work-up – + Psych Education
Patient, PC Staff, PCP Watchful Wait
Referral
Treatment
Figure 9.1. Screening burden by task.
tolerated by patients, and results are relatively easy to interpret. In contrast to some screenings such as colonoscopy and mammography, which require only a referral from the primary care provider, depression screening typically requires more clinician (nurse or physician) time to administer, interpret, and assess, and (if positive) to treat or refer. Thus, the screening burden to clinicians is significantly greater than to patients, and may well influence acceptability in clinical practice (Fig. 9.1).
The Natural History of the Disease Should Be Adequately Understood Depression is known as a disorder with exacerbations and remissions. Persistent depression is a risk factor for disability,40 both medical and psychiatric comorbidities,5 and suicide.41 There is evidence that early recognition and effective treatment of depression can alter the trajectory by reducing disability and premature mortality,42 promoting remission, and preventing relapse.43 There is also evidence suggesting that early recognition and effective treatment of depression can improve patient outcomes such as social functioning, productivity,44 and absenteeism.45 ‘‘Sub-major’’ depression is often considered to be an integral part of the natural course of major depression and is sometimes referred to as the prodromal phase.46 Research has demonstrated that both subthreshold and subsyndromal depression are associated with increased functional disability47 and have a negative impact on quality of life.48 Data from a randomized trial of older adults (PROSPECT) show that patients initially presenting with
168
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
sub-major depression were five times more likely to have major depression after 1 year.47 Thus, identification of these patients may help broaden the focus of depression treatment to include a more preventive approach,49 allowing patients to benefit from improved functional and quality-of-life outcomes and receive more aggressive assessment and symptom monitoring to hasten recognition of major depressive disorder. Patients presenting with sub-major depression may, in fact, benefit from treatment. Seligman and colleagues50 followed ‘‘at-risk’’ university students and found that those randomized to receive weekly cognitive-behavioral therapy workshop meetings had significantly fewer depressive symptoms after 8 weeks.
There Should Be an Agreed Policy on Whom to Treat This may vary from site to site, with some advocating treatment for minor depression and adjustment disorders with depressed mood. All clinical practice guidelines advocate treating patients who meet the criteria for a diagnosis of major depression. Several groups have shown that patients whose depression is not recognized have milder forms of the disorder with less disability.51 To some extent, treating those with ‘‘sub-major’’ depression may be a resource issue. Some have advocated low-cost, low-intensity, nontraditional treatments (eg, bibliotherapy, web-based self-help) where therapeutic intensity and cost are aligned with symptom severity.52 While there may be benefit to treating these sub-major conditions, those policy decisions should not compromise system capacity to provide treatment for other important conditions.
The Total Cost of Finding a Case Should Be Economically Balanced in Relation to Medical Expenditure as a Whole Given the relatively short and inexpensive screening instruments, the availability of structured diagnostic assessments for depression that can be administered inhouse for diagnostic follow-up, and the relatively moderate cost of treatment, contrasted with the medical and psychiatric comorbid problems that are apt to develop from lack of treatment, economics favor screening for depression. In a cost-utility study, Valenstein and coworkers53 concluded that one-time screening for depression is cost-effective, and more frequent screening is likely to become more cost-effective with improvements in treatments.
Case-Finding Should Be a Continuous Process Several studies have shown that depression can occur throughout the lifespan.5,54 Furthermore, it may have been present but not detected until many years later. Thus, it makes sense to have in place a system that will screen periodically throughout the lifespan.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
169
4. Which Screening Tool Should Be Used? Primary care providers have a great deal to consider when selecting a screening instrument, and there are many tools from which to choose, each with its own set of attributes. Time is of obvious importance in primary care, and typically the provider time to administer a screening tool and score it (rather than patient time) is a key consideration. In the quest for brevity, screening tools have evolved from standard screeners to short screeners to ultra-brief screeners. Below, we consider a number of published screening tools organized by administration time. In addition to time, we also consider scope of use, administration/scoring, and performance.
Standard Screeners In a recent article, Mitchell and Coyne55 defined a ‘‘standard’’ screening tool as one that contains 15 or more items and takes, on average, more than 5 minutes to complete. In addition to the term standard, many of these screeners can also be defined as traditional, as many, including the Zung SDS,32 Beck Depression Inventory (BDI),56 and Center for Epidemiologic Studies Depression Scale (CES-D),57 have been in use since the early 1960s. Also, they have been translated into dozens of languages and have been used in virtually every health setting, including primary care and specialty clinics, and for research. Table 9.4 provides details about the administration, scoring, and psychometric performance of five ‘‘standard’’ depression screeners: the BDI,56 CES-D,57 Geriatric Depression Scale (GDS),58 Inventory for Depression (ID),59 and the Zung SDS.32 The BDI,56 CES-D,57 and GDS58 are available in multiple, typically shorter, versions. Some of these screeners offer situational advantages over the others; for example, scoring results for the BDI and the Zung SDS provide an estimate of symptom severity. The GDS was designed specifically for use with geriatric patients. One must take these characteristics (and others, such as self-administration and time frame of symptoms) into account when selecting a screening tool. In general, all five of these screeners are well suited for use in primary care settings; they are easy to administer, they are easy to score, and they offer decent accuracy. Despite this, standard-length screeners may seem cumbersome to some busy primary care providers who prefer shorter alternatives.
Short Screeners Short screeners, defined as consisting of 5 to 14 items and taking between 2 and 5 minutes to complete,55 include the Hospital Anxiety and Depression Scale
Table 9.4. Standard Depression Screening Instruments Commonly Used in Primary Care Scope of Use BDI
60–64
Depression only* Severity of symptoms today
Administration 7, 13, or 21 items* 2–5 min to complete Literacy: Easy Scoring: Simple Can be self-administered
CES-D60–64
Depression only Frequency of symptoms in the past week
10 or 20 items 2–5 min to complete Literacy: Easy Scoring: Simple
Scoring Score range: 0–63 Usual cut point:10–19 (mild), 20–29 (moderate), 30 (severe)
Score range: 0–60 Usual cut point: 16
Can be self-administered GDS60,62
Depression only Endorsement of symptoms (y/n) in the past week
15 or 30 items 2–5 min to complete Literacy: Easy Scoring: Simple
Score range: 0–30 Usual cut point: 11
Performance
Reference 63
64
Sensitivity: 97% ; 89% (81–95) Specificity: 99%63; 64% (59–68)64 Efficiency: 0.9963 False positive: 0.0163 False negative: 0.0063 LRþ: 4.2 (1.2;13.6)61; 2.564 LR: 0.17 (0.1;0.3)61; 0.1764 PPV: 84.0%63; 29.6% (10.7;57.6)62 AUC (95% CI): 0.87 (0.82–0.91)64 Sensitivity: 81%63; 93% (85–97)64 Specificity: 72%63; 69% (65–74)64 Efficiency: 0.7263 False positive: 0.2763 False negative: 0.0163 LRþ: 3.3 (2.5; 4.4)61; 3.064 LR: 0.24 (0.2; 0.3)61; 0.1064 PPV: 13.0%63; 24.8% (20; 30.6)62 AUC (95% CI): 0.89 (0.85–0.92)64 LRþ: 3.3 (2.4; 4.7)62 LR: 0.16 (0.1; 0.3)62 PPV: 24.8% (19.4; 32)62
Original citation: Beck AT, Ward CH, Mock J, et al. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561–571 www.psychcorpcenter.com/ content/bdi-ll.htm Original citation: Radloff L. The CES-D scale: A self-report depression scale for research in the general population. Appl Psychol Meas. 1977;1:385–401. www.mhhe.com/hper/health/ personalhealth/labs/stress/ activ2-2.html Original citation: Yesavage JA, Brink TL, Rose TL, et al. Development and validation of a geriatric depression screening scale: a preliminary report. J Psychiatr Res. 1982–83;17(1):37–49. www.stanford.edu/ ~yesavage/GDS.html
Table 9.4. (Continued) Scope of Use
Administration
Scoring
ID60,61
Depression only Recently
15 items 2–5 min to complete Literacy: Easy
Score range: 0–15 Usual cut point: 10
SDS60–63
Depression only Frequency of symptoms recently
20 items 2–5 min to complete Literacy: Easy Scoring: Simple
Score range: 25–100 Usual cut point: 50–59 (mild), 60–69 (moderate), 70 (severe)
Can be self-administered
Performance
Sensitivity: 100%63 Specificity: 71%63 Efficiency: 72%63 False positive: 0.2863 False negative: 0.0063 LRþ: 3.3 (1.3; 8.1)62 LR: 0.35 (0.2; 0.8)62 PPV: 15.0%63; 24.8% (11.5; 44.8)62
Reference Original citation: Popoff, L. M. A simple method for diagnosis of depression by the family physician. Clinical Medicine. 1969 March: 24–29. Original citation: Zung, WW (1965) A self-rating depression scale. Arch Gen Psychiatry 12, 63–70.
fpinfo.medicine.uiowa.edu/ calculat.htm AUC, area under the curve; CI, confidence interval; LR, likelihood ratio; PPV, positive predictive value. Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
172
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
(HADS),65 MOS-D,33 and PHQ34 (Table 9.5). Many authors consider the diagnostic performance of these intermediate-length screeners to range from modest to good.55,64,66 Despite the advantage of both diagnostic performance and brevity, a national U.K. survey demonstrated that they continue to be underused in primary and secondary care settings.67 This lack of use may have led to the development of even shorter screeners.
Ultra-short/Ultra-brief Screeners What is the minimum number of items required to effectively screen for depression? With the quest to reduce screening time, several new screening instruments with four or fewer questions have been published. Mitchell and Coyne have defined ultra-short/ultra-brief screeners as consisting of four or fewer items and taking less than 2 minutes to complete (Table 9.6).55 Whooley and colleagues64 reported data supporting a two-item screener, and the U.S. Veterans Administration has adopted a four-item screener to satisfy a 1998 universal depression screening mandate. A meta-analysis on 22 studies that assessed the accuracy of ultra-short screeners for depression in primary care found that diagnostic rule-in accuracy increases with the number of items, with two- and three-item screeners offering the greatest accuracy (80%) and oneitem screeners providing very poor accuracy (30%).55 No four-item screeners met inclusion criteria for this analysis. The authors concluded that while twoand three-item screeners can help providers identify 8 out of 10 depression cases, it is most often at the expense of a high false-positive rate. They therefore argue for a two-stage screening approach when an ultra-brief screener is employed.
Two-Stage Approaches Another approach that may offer advantages in some situations or practices is the use of a two-stage process. Screening followed by a standardized diagnostic assessment has often been used in research projects for efficient identification of potential subjects who meet criteria for major depression. The approach enables investigators to avoid conducting diagnostic assessments on all subjects, yet has the advantage of having screening information available on all subjects, with diagnostic data on those above a certain screening threshold. While in theory any screener could be combined with any acceptable diagnostic assessment, two instruments that ‘‘package’’ both screening and diagnosis, the SDDS-PC and PRIME-MD, were developed in the late 1990s specifically for use in primary care settings.36, 37, 68, 69 These instruments were intended for both clinical and research purposes. They were both designed to
Table 9.5. Short Depression Screening Instruments Commonly Used in Primary Care Scope of Use
Administration
Scoring
Performance
Reference
HADS
Anxiety and depression Severity of symptoms in the past week
14 items £2 min to complete Literacy: Difficult Scoring: Simple
Score range: 0–21 Usual cut point: 11
LRþ: 7.0 (2.9; 11.2)62 LR: 0.3 (0.3; 0.4)62 PPV: 41.3% (22.6; 52.8)62
Original citation: Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr Scand 1983;67:361–370. www.clinical-supervision.com/ hads.htm
MOS-D60,61,64
Depression only Frequency of symptoms in the past week
8 items <2 min to complete Literacy: Average
Score range: 0–1 (logistic regression) Usual cut point: 0.06
Sensitivity: 93% (86–97)64 Specificity: 72% (68–76)64 LRþ: 3.364 LR: 0.1064 AUC (95% CI): 0.89 (0.85–0.91)64
Original citation: Burnam MA, Wells KB, Leake B, et al. Development of a brief screening instrument for detecting depressive disorders. Medical Care. 1988;26,775–789.
Depression only Frequency of symptoms in the past 2 weeks
9 items <2 min to complete Literacy: Average Scoring: Simple
Diagnosis: Score range: 0–9 Usual cut point: 5 symptoms Severity: Score range: 0–27 Usual cut point: 0–4 (none), 5–9 (mild), 10–14 (moderate), 15–19 (major), 20 (severe)
LRþ: 12.2 (8.4; 18)62 LR: 0.28 (0.2; 0.5)62 PPV: 55% (45.7; 64.3)62
Original citation: Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–613. www.depression primarycare.org/ap1.html
60,62
PHQ60,62
Can be self-administered
Can be self-administered
LR, likelihood ratio; PPV, positive predictive value. Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
Table 9.6. Ultra-Short Depression Screening Instruments Commonly Used in Primary Care Scope of Use PRIME-MD (PHQ-2)60–63
SDDSPC60–62,64
Administration
Multiple components with depression category Presence of symptoms in the past month
2 items 1–2 min to complete Literacy: Average Scoring: Complex
Multiple components with depression category Presence of symptoms in the past month
5 items 1–2 min to complete Literacy: Easy Scoring: Complex
Scoring
Performance 60
Reference 63
Score range: 0–2 Usual cut point: 160
Sensitivity: 96% Specificity: 57% Efficiency: 0.5963 FP: 0.4163 FN: 0.0063 LRþ: 2.7 (2.0; 3.7)62 LR: 0.14 (0.1; 0.3) 62 PPV: 21.3% (16.7–27)62
Score range: 0–560 Usual cut point: 260
Sensitivity: 96%64 Specificity: 51%64 LRþ: 3.5 (2.4; 5.1)62 LR: 0.2 (0.1; 0.4)62 PPV: 25.9% (19.4; 33.8)62 AUC (95% CI): 0.86 (0.82–0.89)64
Can be self-administered
Original citation: Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study. JAMA. 1994;272:1749–1756. Kroenke K, Spitzer RL, Williams JBW. The Patient Health Questionnaire-2: Validity of a two-item depression screener. Medical Care. 2003;41:1284–1292. Original citation: Broadhead WE, Leon AC, et al. Development and validation of the SDDS-PC screen for multiple mental disorders in primary care. Arch Fam Med. 1995;4:211–219.
AUC, area under the curve; CI, confidence interval; FN, false negative; FP, false positive; LR, likelihood ratio; PPV, positive predictive value. Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
175
take minimum clinician time and still provide multiple psychiatric diagnoses in primary care. Both instruments have a quick screen (sometimes referred to as stem questions) for multiple psychiatric disorders, followed by specific disorder modules when so indicated by the quick screen. Both instruments include major depression. Time burden is placed mainly on patients for the quick screen and clinicians for the disorder modules (but only for the subset of patients with a high likelihood of disorder). For practices interested in only a single disorder, the screen questions and module for that disorder can be selected for use. Notably, the developers of the PRIME-MD developed the PHQ (with slightly improved sensitivity and specificity for major depression) because the PRIME-MD was still considered too long to be clinically useful.36
Screening for General Emotional Distress One fundamental issue is whether screening should be aimed at identifying distress rather than depression alone. There are several popular tools that screen for nonspecific psychiatric distress, including the General Health Questionnaire (GHQ),70 the Hopkins Symptom Checklist (HSCL),71 the World Health Organization Well-Being Scale (WHO-5),72 and the Emotional State Questionnaire-2 (EST-Q2)73 (Table 9.7). A prospective cohort study found that the WHO-5, a well-being screener, performed better in a primary care setting than the GHQ-12, PHQ-9, or an unaided physician diagnosis when compared to the CIDI as the gold standard for detection of depression.74 Despite the broadness of this approach, brevity can be achieved by taking advantage of shared symptomatology and diagnostic comorbidity. Thus, the specificity of the screener for a disorder may not matter so much, and it will be up to the provider to sort out, for example, major depression from posttraumatic stress or other anxiety disorders. Because first-line primary care treatments for many disorders are similar (eg, pharmacotherapy with selective serotonin reuptake inhibitors), this approach could work reasonably well in primary care.
Screening for Multiple Disorders For many providers, it may be worthwhile to implement a screener that covers many disorders, including only one or two items for each disorder. MeansChristensen and coworkers75 tested such an approach with the Anxiety and Depression Detector (ADD) and found that screening for panic disorder, posttraumatic stress disorder, social phobia, generalized anxiety disorder, and major depression simultaneously offered advantages in time efficiency while maintaining screener performance. The SDDS-PC and PRIME-MD
Table 9.7. General Psychiatric Screening Instruments Commonly Used in Primary Care Scope of Use
Administration
WHO-5
Measures degree of well-being
5 items
GHQ60,61,63
General psychiatric distress Frequency of symptoms in the past week
12, 28, or 30 items 2–10 min to complete Literacy: Easy Can be self-administered
Scoring
Score range: 0–28 Usual cut point: 4
Performance
Reference
Sensitivity: 94% Specificity: 65% False negative: 0.06 PPV: 0.37 NPV : 0.98 LRþ: 2.69 LR: 0.09
Original citations: Bech P, Gudex C, Johansen KS. The WHO(Ten) Well-Being Index: validation in diabetes. Psychother Psychosom. 1996;65:183–190. Bech P, Olsen LR, Kjoller M, et al. Measuring well-being rather than the absence of distress symptoms: a comparison of the SF-36 Mental Health subscale and the WHO-Five Well-Being Scale. Int J Methods Psychiatr Res. 2003;12:85–91.
Sensitivity: 76%63 Specificity: 74%63 Efficiency: 0.7463 False positive: 0.2563 False negative: 0.0163 PPV: 13.0%63
Original citation: Goldberg DP. The detection of psychiatric illness by questionnaire. London, Oxford University Press, 1972.
Table 9.7. (Continued) Scope of Use
Administration
Scoring
60,61
HSCL
General distress Frequency of symptoms in the past week
13 or 25 items 2–5 min to complete Literacy: Average
Score range: 25–100 Usual cut point: 43
EST-Q276
Detection of symptoms characteristic of depressive and anxiety disorders during the past four weeks
28 items Depression subscale: 8 items Time to complete: unknown Literacy: unknown
Score Range: 0–112 Depression subscale: Score Range: 0–32 Usual Cutpoint: >11
Performance
Reference Original citation: Derogatis LR, Lipman RS, Rickels K, Uhlenhuth EH, Covi L. The Hopkins Symptom Checklist (HSCL): a selfreport symptom inventory Behav Sci. 1974 Jan; 19(1):1–15.
Sensitivity: 81%76 Specificity: 81%76 False Positive: 0.1976 False Negative: 0.1976 PPV: 0.4476 NPV : 0.9676 LRþ: 4.376 LR-: 0.2376
Aluoja A, Shlik J, Vasar V, Luuk K, Leinsalu M. Development and psychometric properties of the Emotional State Questionnaire, a selfreport questionnaire for depression and anxiety. Nord J Psychiatry 1999; 53: 443–449.
LR , likelihood ratio; NPV, negative predictive value; PPV; positive predictive value. Adapted from General Hospital Psychiatry, 24/4, Williams JW, Pignone M, Ramirez G, Perez Stellato C, Identifying depression in primary care: a literature synthesis of case-finding instruments, 225–237, Copyright (2002), with permission from Elsevier.
178
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
(mentioned above) cover multiple disorders (including major depression) that are prevalent and often undetected in primary care settings. They also cover suicidality, an important consideration regardless of diagnosis.
Severity Rating Depression screening instruments are important beyond case-finding. Additional uses for certain instruments include monitoring symptom levels (eg, frequency, severity) for ‘‘at-risk’’ patients or evaluating treatment response/effectiveness. The types of screening instruments that would be most valuable in these situations are those that provide severity levels (eg, Zung SDS). The practice of ‘‘watchful waiting’’ (see Fig. 9.1) involves following patients who present with symptomatology that may be subthreshold or otherwise not sufficient for a clinical diagnosis of depression, yet suggestive of an increased risk of developing depression in the future. In this scenario, depression screeners can be administered repeatedly over time to monitor symptom levels and determine symptom changes and patterns (in much the same way that prostate-specific antigen levels are monitored over time). Patients who have been clinically diagnosed with depression and are receiving treatment can be routinely administered screeners both to assess treatment effectiveness and to determine if additional interventions are required (the U.S. Preventive Services Task Force recommends the PHQ-9 for this purpose).
5. Implementing Screening in Primary Care Implementation of a screening strategy must be undertaken with both the screening instrument performance characteristics and clinical context in mind. In addition to considering overall staffing patterns and underlying nonpsychiatric case mix, a key contextual issue is the estimated underlying prevalence of depression in the clinic population. This, along with screening instrument performance characteristics of sensitivity and specificity, allow one to estimate resource use for various implementation strategies. Such exercises can aid in determining the most parsimonious approach under various scenarios. Table 9.1a/b illustrates a one-stage screening approach using an instrument with sensitivity and specificity both 80%. We present the results of using this instrument under different prevalence scenarios: 5% and 10% (see Appendix Tables 3 and 4 for additional scenarios). Assuming 5% prevalence, if 1,000 patients were screened for major depression, 230 would screen as positive, but only 40 would actually have major depression (positive predictive value
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
179
Table 9.1a/b. Sample Performance Yields for Single-Stage Screening in Primary Care Setting 9.1a Prevalence: Low (5% or 50 MDD cases) Gold Standard MDD +
MDD
Total
Screen +
40 True Positive
190 False Positive
230 Screen Positive
Screen
10 False Negative
760 True Negative
770 Screen Negative
50 MDD Positive
950 MDD Negative
1000 Total Sample
PPV: 40/230 = 17.4%. For every 100 subjects who screen positive, only approximately 17 would be depressed. Excess diagnostic burden: 190/1000 = 19%. Diagnostic assessment would be performed on 190 patients who were not depressed.
9.1b Prevalence: Average (10% or 100 MDD cases) Gold Standard MDD +
MDD
Total
Screen +
80 True Positive
180 False Positive
260 Screen Positive
Screen
20 False Negative
720 True Negative
740 Screen Negative
100 MDD Positive
900 MDD Negative
1000 Total Sample
PPV: 80/260 = 30.8%. For every 100 subjects who screen positive, only approximately 31 would be depressed. Excess diagnostic burden: 180/1000 = 18%. Diagnostic assessment would be performed on 180 patients who were not depressed.
n = 1,000; Screener sensitivity 80%, specificity 80%
[PPV] 17.4%). That means that 190 false-positive patients would undergo diagnostic assessment for major depression—an excess diagnostic burden of 19% (190/1,000). From this chart, it can be seen that as prevalence increases, PPV also increases and excess diagnostic burden declines. Table 9.2a/b illustrates a two-stage approach using an initial screener with sensitivity of 95% and specificity of 60% and a follow-up screener of
180
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
sensitivity 80% and specificity 80%. Assuming prevalence of 5%, of the 1,000 patients screened in the first stage, 428 will be positive, of whom 48 are true positive (PPV 11.2%). In stage two, these 428 are screened again, yielding 38 true positives of 114 screen positives for a more favorable PPV of 33.3%. The cumulative yield from steps 1 and 2 combined would be 38 true positives (76% sensitivity) and 874 true negatives (92%), with a PPV of 33% and a negative predictive value (NPV) of 99%. Table 9.2a/b. Sample Performance Yields for Two-Stage Screening in Primary Care Setting 9.2a Depression Prevalence: Low (5% or 50 MDD cases) Stage I Gold Standard MDD +
MDD
Total
Screen +
48 True Positive
380 False Positive
428 Screen Positive
Screen
2 False Negative
570 True Negative
572 Screen Negative
50 MDD Positive
950 MDD Negative
1000 Total Sample
PPV: 48/428 = 11.2%. For every 100 subjects who screen positive, approximately 11 would be depressed.
Stage II Gold Standard MDD +
MDD
Total
Screen +
38 True Positive
76 False Positive
114 Screen Positive
Screen
10 False Negative
304 True Negative
314 Screen Negative
48 MDD Positive
380 MDD Negative
428 Total Sample
9.2b Depression Prevalence: Average (10% or 100 MDD cases) Stage I
PPV: 38/114 = 33.3%. For every 100 subjects who screen positive, approximately 33 would be depressed. Overall excess diagnostic burden: 76/1,000 = 7.6%. Diagnostic assessment would be performed on 76 patients who were not depressed.
Table 9.2a/b. (Continued) Gold Standard MDD +
MDD
Total
Screen +
95 True Positive
360 False Positive
455 Screen Positive
Screen
5 False Negative
540 True Negative
545 Screen Negative
100 MDD Positive
900 MDD Negative
1000 Total Sample
PPV: 95/455 = 20.9%. For every 100 screen positives, approximately 21 would be depressed
Stage II Gold Standard MDD +
MDD
Total
Screen +
76 True Positive
72 False Positive
148 Screen Positive
Screen
19 False Negative
288 True Negative
307 Screen Negative
95 MDD Positive
360 MDD Negative
455 Total Sample
PPV: 76/148 = 51.4%. For every 100 screen positives, approximately 51 would be depressed Overall excess diagnostic burden: 72/ 1,000 = 7.2%. Diagnostic assessment would be performed on 72 patients who were not depressed.
n = 1,000; stage I screener sensitivity 95%, specificity 60%; stage II screener sensitivity 80%, specificity: 80%
Table 9.3 assigns time costs to the various screening tasks, as well as diagnostic assessment for screen-positive patients. In this table, we estimate patient, staff, and clinician time under the various screening scenarios (prevalence of 5%, 10%, 20%; one- and two-stage screening approaches). We assume the same sensitivity and specificity of the screening instruments as in Table 9.2. In a sample of 1,000 patients where the prevalence of depression is 5%, we estimate the burden in patient time for a single-stage screener to be 6,600 minutes. This is based on an estimate of 2 minutes per patient for the initial screen, with 20 additional minutes for each screenpositive patient. We estimate 2,000 minutes of non-physician staff time (based on 2 minutes per patient) and 4,600 minutes of clinician time (based on 20 minutes per screen-positive patient). In the single-screener 181
182
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Table 9.3. Screening and Diagnosis Time Burden for Patients, Staff, and Providers Time Burden (min)
MDD Prevalence 5%
10%
20%
A. Single-Stage Screening Approach Sensitivity 80%, specificity 80% Screening (patient) Scoring (staff) Screening yield
2,000 (1000*2) 2,000 (1000*2) 23.0% (230/1000)
2,000 (1000*2) 2,000 (1000*2) 26.0% (260/1000)
2,000 (1000*2) 2,000 (1000*2) 32.0% (320/1000)
Diagnostic interview Patient Provider
4,600 (230*20) 4,600 (230*20)
5,200 (260*20) 5,200 (260*20)
6,400 (320*20) 6,400 (320*20)
Positive predictive value
17.4% (40/230)
30.8% (80/260)
50% (160/320)
Total time Patient Staff Provider
6,600 min 2,000 min 4,600 min
7,200 min 2,000 min 5,200 min
8,400 min 2,000 min 6,400 min
B1. Two-Stage Screening Approach: Stage I Sensitivity 95%, specificity 60% Screening (patient) 1000 (1000*1) Scoring (staff) 1000 (1000*1) Screening yield 42.8% (428/1000)
1000 (1000*1) 1000 (1000*1) 45.5% (455/1000)
1000 (1000*1) 1000 (1000*1) 51.0% (510/1000)
B2. Two-Stage Screening Approach: Stage II Sensitivity 80%, specificity 80% Screening (patient) 856 (428*2) Scoring (staff) 856 (428*2) Screening yield 26.6% (114/428)
910 (455*2) 910 (455*2) 32.5% (148/455)
1,020 (510*2) 1,020 (510*2) 42.4% (216 /510)
Diagnostic interview Patient Provider
2,280 (114*20) 2,280 (114*20)
2,960 (148*20) 2,960 (148*20)
4,320 (216*20) 4,320 (216*20)
Positive predictive value
33.3% (38/114)
51.4% (76/148)
70.4% (152/216)
Total time Patient Staff Provider
4,136 min 1,856 min 2,280 min
4,870 min 1,910 min 2,960 min
6,340 min 2,020 min 4,320 min
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
183
example, patient burden and provider burden increase with increasing prevalence. We provide similar time estimates for the two-stage screening approach. Here, patient burden decreases (because the initial screener is half the time of the more comprehensive second-stage screener), staff burden decreases slightly for prevalences of 5% and 10% but increases slightly for 20% prevalence, and provider time decreases significantly (because there are fewer false positives to evaluate). In the above examples, we have emphasized tangible costs and have not estimated costs of non-detection (false negative) or costs of treatment.
6. What Developments Are on the Horizon? Opinions concerning the appropriateness of screening for depression in primary care have shifted over the past two decades. As more effective treatments have become available to primary care providers, and as providers have become more knowledgeable about the importance of recognizing and treating depression, there has been a shift towards advocating routine screening in primary care settings. Many patients, however, are still unwilling to accept a diagnosis of depression or treatment for depression. Clinicians need to explain screening and diagnostic results in a way that is non-stigmatizing. Providers must offer educational information and motivate patients to accept treatment. Building depression treatment capabilities may increase patient acceptance of both the diagnosis and treatment, as treatment in primary care is seen as less stigmatizing, more timely, and more integrated into overall healthcare. Over the past two decades, remarkable progress has been made in screening for depression in primary care. This can be seen in the change in U.S. Preventive Services Task Force guidance,30 which recommends screening adults ‘‘in clinical practices with systems in place to assure accurate diagnosis, effective treatment, and follow-up.’’ It can also be seen in the myriad of guidelines for detecting and treating depression in clinical practice, and the tools that have been developed to assist in this. Clear advances have come in reducing the burden of screening tools, so that some instruments with excellent performance characteristics are as short as two questions. With increasing acceptance of depression as a treatable illness to both patients and providers, parallel gains need to be made in terms of implementation of screening and early detection practices. Further efficiencies in the screening benefit–cost ratio will need to be made by improving treatment outcomes or by reducing screening time. Psychometricians will be hard-pressed to develop briefer screening tools
184
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
than the current group of ultra-brief instruments described in Table 9.1, but new methods might be able to focus on those who most need help. For example, New Zealand researchers have found that following the two PRIME-MD screening items with the question, ‘‘Is this something with which you would like help?’’ improves specificity from 78% to 89% (sensitivity remained the same at 96%) (CIDI was the gold standard).77 This would theoretically improve efficiencies by selecting patients who are likely to accept treatment for depression. Another possibility is to reduce clinician and staff time by modifying the screening modality. For example, waiting rooms could contain carrels with computers where patients could update their histories and answer screening questions. Notable results could be flagged and printed for clinicians to address with the patient in the exam room. Similarly, patients could undertake similar updates and self-screens on their home computers, again with results going automatically and confidentially to providers. Automated computer reminders for clinicians to perform depression (and other) screens could also improve efficiency, as would making use of trained nurses to administer the screens and flag positive results for the provider (see Chapter 8 for further discussion). Depending on the practice, a two-stage screening process is another possibility, using extremely brief first-level screens followed by more intense second-level diagnostic assessments when indicated. Such second-level assessments could even include self-administered instruments that are considered diagnostic in nature, such as the PRIME-MD or the SDDS. Considering depression screening in the context of other psychiatric illness may broaden our notion of screening effectiveness. For example, depressive symptoms frequently co-occur with generalized anxiety, post-traumatic stress, and substance use disorders. False-positive screening results for depression may be less worrisome in that such patients may be positive for any of these three. Thus, a positive screen—though false positive for depression—may in essence correctly identify patients in need of mental health treatment, even if not for depression. Timing is yet another way to improve efficiency. Screening less often (eg, every 2 to 5 years instead of every year) would minimize the cost of the screening itself but at the expense of a lower detection rate. Approaches could be developed that take into account patient profiles to target screening to those at highest risk. Similarly, prior screening results (eg, subthreshold scores, positive screens for other mental health conditions, or answers to highly predictive questions) could be used to generate a screening frequency algorithm. In an age with electronic medical records and computer-generated clinical reminders, the ability to develop and implement such frequency algorithms based on individual profiles may not be as far away as it once seemed.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
185
7. Conclusions Screening for depression in primary care has changed radically in the past 20 years. With improvements in depression treatment, reduced stigmatization, better acceptance of depression as a treatable illness, and more efficient screening tools, primary care providers have embraced the notion that they are responsible for recognizing and treating this condition. Fortunately, providers have many excellent screening tools from which to choose. For additional efficiencies to be realized, advances in technology (eg, computerized screening and scoring), along with improved treatment outcomes, will need to take place to change the benefit–cost ratio for depression screening even more favorably.
References 1. Institute of Medicine (IOM). A manpower policy for primary health care. Washington, DC: National Academy of Sciences, 1978. 2. Starfield B. Primary care: concept, evaluation, and policy. New York: Oxford University Press, 1992. 3. Culpepper L. The active management of depression. J Fam Pract. 2002;51:769–776. 4. Mechanic D, McAlpine DD, Rosenthal M. Are patients’ office visits with physicians getting shorter? N Engl J Med. 2001;344(3):198–204. 5. Kessler RC, Chiu WT, Demler O, et al. Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry. 2005;62(6):617–627. 6. Alonso J, Angermeyer MC, Bernert S, et al. 12-Month comorbidity patterns and associated factors in Europe: results from the European Study of the Epidemiology of Mental Disorders (ESEMeD) project. Acta Psychiatr Scand. 2004;109(s420):28–37. 7. Wittchen H-U, Jacobi F. Size and burden of mental disorders in Europe—a critical review and appraisal of 27 studies. Eur Neuropsychopharmacol. 2005;15(4):357–376. 8. Depression Guideline Panel. Depression in primary care, vol 1. Detection and diagnosis. Clinical Practice Guideline, No. 5. Rockville, MD: DHHS Pub Hlth Serv. AHCPR Publication No. 93–0550, 1993. 9. Goldberg D, Lecrubier Y. Chapter 4.1. Form and Frequency of Mental Disorders across Centres. In Mental illness in general health care: an international study. Chichester: John Wiley and Sons, 1995. 10. Backenstrass M, Frank A, Joest K, et al. A comparative study of nonspecific depressive symptoms and minor depression regarding functional impairment and associated characteristics in primary care. Compr Psychiatry. 2006;47(1):35–41. 11. Regier D, Goldberg I, Taube C. The de facto US mental and addictive disorders service system. Epidemiologic Catchment Area prospective 1-year prevalence rates of disorders and services. Arch Gen Psychiatry. 1993;50:85–94. 12. Regier D, Narrow W, Rae D, et al. The de facto US mental health services system: a public health perspective. Arch Gen Psychiatry. 1978;35(6):685–693. 13. Kessler RC, Berglund P, Demler O, et al. The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). JAMA. 2003;289(23):3095–3105.
186
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
14. Harman JS, Veazie PJ, Lyness JM. Primary care physician office visits for depression by older Americans. J Gen Intern Med. 2006;21(9):926–930. 15. Pincus HA, Tanielian TL, Marcus SC, et al. Prescribing trends in psychotropic medications: primary care, psychiatry, and other medical specialties. JAMA. 1998;279(7):526–531. 16. Bridges K, Goldberg D. Somatic presentation of DSM III psychiatric disorders in primary care. J Psychosom Res. 1985;29:563–569. 17. Magruder-Habib K, Zung W, Feussner J. Improving physicians’ recognition and treatment of depression in general medical care: results of randomized clinical trial. Med Care. 1990;28(3):239–250. 18. Wilson D, Widmer R, Cadoret R, et al. Somatic symptoms: a major feature of depression in a family practice. J Affective Disorders. 1983;5:299–307. 19. Ustun T, Von Korff M. Chapter 4.3. Primary mental health services: access and provision of care. In Mental illness in general health care: an international study. Chichester: John Wiley and Sons, 1995. 20. Murray C, Lopez A, eds. The global burden of disease: a comprehensive assessment of mortality and disability from diseases, injuries and risk factors in 1990 and projected to 2020. Cambridge, MA: Harvard University Press on behalf of the World Health Organization and the World Bank, 1996. 21. Ustun TB, Ayuso-Mateos JL, Chatterji S, et al. Global burden of depressive disorders in the year 2000. Br J Psychiatry. 2004;184:386–392. 22. Simon G, Ormel J, VonKorff M, et al. Health care costs associated with depressive and anxiety disorders in primary care. Am J Psychiatry. 1995;152(3):352–357. 23. Greenberg P, Stiglin L, Finkelstein S, et al. Depression: a neglected major illness. J Clin Psychiatry. 1993;54(11):419–424. 24. Manderscheid RW, Rae DS, Narrow WE, et al. Congruence of service utilization estimates from the Epidemiologic Catchment Area Project and other sources. Arch Gen Psychiatry. 1993;50(2):108–114. 25. Beardsley RS, Gardocki GJ, Larson DB, et al. Prescribing of psychotropic medication by primary care physicians and psychiatrists. Arch Gen Psychiatry. 1988;45(12):1117–1119. 26. Simon GE, VonKorff M, Wagner EH, et al. Patterns of antidepressant use in community practice. Gen Hosp Psychiatry. 1993;15(6):399–408. 27. Wells KB, Stewart A, Hays RD, et al. The functioning and well-being of depressed patients. Results from the Medical Outcomes Study. JAMA. 1989;262(7):914–919. 28. Broadhead WE, Blazer DG, George LK, et al. Depression, disability days, and days lost from work in a prospective epidemiologic survey. JAMA. 1990;264(19):2524–2528. 29. Wells KB, Golding J, Burnam MA. Psychiatric disorder and limitations in physical functioning in a general population. Am J Psychiatry. 1988;145:712–717. 30. Guide to clinical preventive services. AHRQ Publication No. 06–0588, Agency for Healthcare Research and Quality, Rockville, MD. Available at: http://www.ahrq.gov/ clinic/pocketgd.htm. 2006. 31. Wang PS, Berglund P, Olfson M, et al. Failure and delay in initial treatment contact after first onset of mental disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry. 2005;62(6):603–613. 32. Zung, WW (1965) A self-rating depression scale. Arch Gen Psychiatry 12, 63–70. 33. Burnam MA, Wells KB, Leake B, & Landsverk J (1988). Development of a brief screening instrument for detecting depressive disorders. Medical Care, 26, 775–789. 34. Kroenke K, Spitzer RL, Williams JBW. The Patient Health Questionnaire-2: Validity of a two-item depression screener. Medical Care. 2003;41:1284–1292.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
187
35. Lecrubier Y, Sheehan DV, Weiller E, Amorim P, Bonora I, Sheehan K Harnett, Janavs J and Dunbar GC (1997) The Mini International Neuropsychiatric Interview (MINI). A short diagnostic structrued interview: reliability and validity according to the CIDI. Eur Psychiat 12, 224–231. 36. Spitzer RL, Williams JB, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study. JAMA. 1994;272:1749–1756. 37. Broadhead WE, Leon AC, et al. Development and validation of the SDDS-PC screen for multiple mental disorders in primary care. Arch Fam Med. 1995;4:211–219. 38. Bermejo I, Niebling W, Berger M, et al. Patients’ and physicians’ evaluation of the PHQ-D for depression screening. Primary Care and Community Psychiatry. 2005;10(4):125–131. 39. Loerch B, Szegedi A, Kohnen R, et al. The primary care evaluation of mental disorders (PRIME-MD), German version: a comparison with the CIDI. J Psychiatr Res. 2000;34(3):211–220. 40. Ormel J, Von Korff M, Oldehinkel A, et al. Onset of disability in depressed and nondepressed primary care patients. Psychol Med. 1999;29:847–853. 41. American Psychiatric Association. Diagnostic and statistical manual of mental disorders. Vol. 4th ed., text rev. Washington, DC, 2000. 42. Sherman L. Depression and medical illness. Audio Digest Psychiatry. 2004;33(16):1–6. 43. Halfin A. Depression: the benefits of early and appropriate treatment. Am J Manag Care. 2007;13:S92–S97. 44. Coulehan J, Schulberg H, Block M, et al. Treating depressed primary care patients improves their physical, mental, and social functioning. Arch Intern Med. 1997;157:1113–1120. 45. Rost K, Smith J, Dickinson M. The effect of improving primary care depression management on employee absenteeism and productivity. A randomized trial. Med Care. 2004;42:1202–1210. 46. Eaton WW, Badawi M, Melton B. Prodromes and precursors: epidemiologic data for primary prevention of disorders with slow onset. Am J Psychiatry. 1995;152(7):967–972. 47. Lyness JM, Heo M, Datto CJ, et al. Outcomes of minor and subsyndromal depression among elderly patients in primary care settings. Ann Intern Med. 2006;144(7):496–504. 48. Wells KB, Burnam MA, Rogers W, et al. The course of depression in adult outpatients. Arch Gen Psychiatry. 1992;49:788–794. 49. Cuijpers P, Smit F. Subthreshold depression as a risk indicator for major depressive disorder: a systematic review of prospective studies. Acta Psychiatr Scand. 2004;109(5):325–331. 50. Seligman MEP, Schulman P, DeRubeis RJ, et al. The prevention of depression and anxiety. Prevention & Treatment. 1999;2(1). 51. Simon G, Goldberg D, Tiemens B, et al. Outcomes of recognized and unrecognized depression in an international primary care study. Gen Hosp Psychiatry. 1999;21(2):97–105. 52. Magruder KM, Calderone GE. Public health consequences of different thresholds for the diagnosis of mental disorders. Compr Psychiatry. 2000;41(2, Supplement 1):14–18. 53. Valenstein M, Vijan S, Zeber JE, et al. The cost-utility of screening for depression in primary care. Ann Intern Med. 2001;134(5):345–360.
188
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
¨ stling S, Skoog I. The incidence of first-onset depression in a population 54. Pa´lsson S, O followed from the age of 70 to 85. Psychol Med. 2001;31:1159–1168. 55. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J Gen Practice. 2007;57:144–151. 56. Beck AT, Ward CH, Mock J, Erbaugh J. An inventory for measuring depression. Archives of General Psychiatry. 196;4:561–571. 57. Radloff, L. The CES-D scale: A self-report depression scale for research in the general population. Appl Psychol Meas 1:385–401, 1977. 58. Yesavage JA, Brink TL, Rose TL, Lum O, Huang V, Adey M, Leirer VO. Development and validation of a geriatric depression screening scale: a preliminary report. J Psychiatr Res. 1982–83;17(1):37–49. 59. Popoff LM. A simple method for diagnosis of depression by the family physician. Clinical Medicine. 1969 March:24–29. 60. Williams JW, Pignone M, Ramirez G, Perez Stellato C. Identifying depression in primary care: a literature synthesis of case-finding instruments. General Hospital Psychiatry 2002;24(4):225–237. 61. Mulrow CD, Williams JW, Gerety MB, Ramirez G, Montiel OM, Kerber C. CaseFinding Instruments for Depression in Primary Care Settings. Ann Intern Med 1995;122(12):913–921. 62. Nease DE, Jr., Malouin JM. Depression screening: a practical strategy. (Applied evidence: research findings that are changing clinical practice). Journal of Family Practice 2003;52(2):118(8). 63. McAlpine DD, Wilson AR. Screening for depression in primary care: what do we still need to know? Depression & Anxiety (1091–4269) 20041;19(3):137–145. 64. Whooley M, Avins A, Miranda J, et al. Case-finding instruments for depression: two questions are as good as many. J Gen Intern Med. 1997;12(7):439–445. 65. Zigmond AS, Snaith RP. The hospital anxiety and depression scale, Acta Psychiatr Scand 1983;67:361–70. 66. Schade CP, Jones ER Jr, Wittlin BJ. A ten-year review of the validity and clinical utility of depression screening. Psychiatr Serv. 1998;49(1):55–61. 67. Gilbody S, Whitty P, Grimshaw J, et al. Improving the recognition and management of depression in primary care. Effective Health Care Bull. 2002;7(5). 68. Weissman MM, Broadhead WE, Olfson M, et al. A diagnostic aid for detecting (DSM-IV) mental disorders in primary care. Gen Hosp Psychiatry. 1998;20(1):1–11. 69. Weissman M, Olfson M, Leon AC, et al. Brief diagnostic interviews (SDDS-PC) for multiple mental disorders in primary care: a pilot study. Arch Fam Med. 1995;4(3):220–227. 70. Goldberg DP. The detection of psychiatric illness by questionnaire. London, Oxford University Press, 1972. 71. Derogatis LR, Lipman RS, Rickels K, Uhlenhuth EH, Covi L. The Hopkins Symptom Checklist (HSCL): a self-report symptom inventory Behav Sci. 1974 Jan;19(1):1–15. 72. Bech P, Olsen LR, Kjoller M, Rasmussen NK: Measuring well-being rather than the absence of distress symptoms: a comparison of the SF-36 Mental Health subscale and the WHO-Five Well-Being Scale. Int J Methods Psychiatr Res 12:85–91, 2003. 73. Aluoja A, Shlik J, Vasar V, Luuk K, Leinsalu M. Development and psychometric properties of the Emotional State Questionnaire, a self-report questionnaire for depression and anxiety. Nord J Psychiatry 1999;53:443–449.
9 SCREENING FOR DEPRESSION IN PRIMARY CARE
189
74. Henkel V, Mergl R, Kohnen R, et al. Identifying depression in primary care: a comparison of different methods in a prospective cohort study. BMJ. 2003;326(7382):200–201. 75. Means-Christensen AJ, Sherbourne CD, Roy-Byrne PP, Craske MG, and Stein MB. Using five questions to screen for five common mental disorders in primary care: diagnostic accuracy of the Anxiety and Depression Detector. General Hospital Psychiatry 2006; 28(2): 108–111. ¨ o¨pik P, Aluoja A, Kalda R, et al. Screening for depression in primary care. Fam Pract. 76. O 2006;23(6):693–698. 77. Arroll B, Goodyear-Smith F, Kerse N, et al. Effect of the addition of a ‘‘help’’ question to two screening questions on specificity for diagnosis of depression in general practice: diagnostic validity study. BMJ. 2005;331:884–887.
This page intentionally left blank
10 SCREENING FOR DEPRESSION IN MEDICAL SETTINGS: ARE SPECIFIC SCALES USEFUL? Gordon Parker and Matthew Hyett
1. An Introductory Logic 2. Depression in the Medically Ill 3. ‘‘False-Positive’’ Depression Reflecting Confounding by Physical Symptoms Associated with Medical Illness 4. Screening Measures Used to Assess Depression in the Medically Ill 5. Discussion
Context There are two broad strategies for screening and quantifying depression in medical settings. The first approach is replying upon measures developed in psychiatric samples, and the second approach is to concede that symptoms are substantially different and to develop customized scales. Here we discuss the merits of several specific scales for measuring depression in physical settings and make the case for scales tailored to specific populations. A subsequent chapter (Babaei and Mitchell) will present a contrasting position.
1. An Introductory Logic There are two broad strategies for screening and quantifying depression in medical settings. The first approach involves using measures developed in psychiatric samples and assuming that their relevance holds. The second approach is to concede that there are intrinsic limitations to extrapolating those ‘‘general’’ measures to medically ill populations. In the former case the hypothesis is that symptoms of depression are essentially the same when 191
192
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
depression occurs with and without physical illness. In the latter case the hypothesis is the symptoms are substantially different. Pursuing the latter, there are two key concerns. Firstly, such an approach assumes some constancy to the nature of depression across differing psychiatric and medical settings. Depression, however, is difficult enough to define in psychiatric patient samples. Even ignoring the debate as to whether depression is viewed as comprising a set of subtypes or is best modeled along a continuum, quantifying clinical depression remains problematic, as detailed elsewhere in this book. Over the past few decades, clinical depression has most commonly been viewed as synonymous with major depression, but, as numerous studies have shown, comparable symptomatic distress and disability associated with major depression and minor depression—and even with subsyndromal depression1,2—begs an obvious question: Can imposing a cutoff score on a dimensional measure of depression accurately distinguish true cases and true non-cases in a psychiatric sample? Further, assuming that a cutoff is derived with an acceptable classification rate, can we extrapolate decision rules derived from psychiatric samples to screen and quantify depression caseness in the medically ill? As measures that have been widely used for decades (such as the Zung and the Beck Depression Inventory) generate widely differing cutoff scores across psychiatric, general practice, and medical settings, there would appear to be quantitatively and possibly qualitative differences to the nature of depression in medical contexts, making general measure extrapolation problematic. The second issue of concern is a methodologic one. Many measures used to assess depression in psychiatric samples weight features such as fatigue, anergia, anhedonia, and loss of interest, as well as appetite and sleep changes. However, it is quite possible for nondepressed patients with a medical illness to rate positively on such items purely as a consequence of their physical problem or of the drugs being used to treat the medical condition, or even of being hospitalized. Such confounding clearly risks false-positive scores, which then will inflate case identification and severity estimates. This issue also requires some consideration.
2. Depression in the Medically Ill The 12-month prevalence and odds of major depression are high in individuals with chronic medical conditions, and major depression is associated with significant increases in utilization, lost productivity, and functional disability.3 Those with a medical illness may have a co-occurring depressive illness (melancholic or nonmelancholic) that is similar in all regards to those depressive conditions observed in a psychiatric context. However, many with a medical illness will more have a grief-like reaction to the
10 ARE SPECIFIC SCALES USEFUL?
193
medical illness per se. Here, instead of experiencing the primary defining feature of depression—a loss in self-esteem or self-worth—as might be expected for an individual with clinical depression, they may more be grieving the loss of their previous healthy role and have no impairment of self-esteem. In addition, medical illness itself can cause psychological features approaching the phenomenology of depression. Cassell4 has emphasized (i) disconnection from the usual world, (ii) a loss of the sense of indestructibility or omnipotence, (iii) a loss of competence and completeness of reason, and (iv) a loss of control of the sufferer’s world. He notes that, as illness deepens, medically ill people become more and more withdrawn from their usual world, their previous interests, friends, and families, reflecting that, ‘‘We exist to the extent that we are connected.’’ When medically ill patients experience such feelings, they will frequently develop irritability, anxiety, fear, and even depression. The disconnection can occur rapidly after events such as a myocardial infarction or severe trauma, or be gradual following the development of a chronic disease or long-term illness.4 The loss of the sense of omnipotence is commonly handled by denial and/or disavowal as the individual seeks to preserve his or her intactness. The loss of control—where the patient perceives himself or herself as helpless—can be one of the most distressing of human experiences. According to Cassell,4 such features are illness. While they sometimes approximate to depressive phenomenology, they can be distinguished by careful clinical inquiry—but not always by simple screening measures. In essence, there is a distinction between the experiential components of illness and depression. Thus, in screening for depression in the medically ill, there is a need to ensure that items are not confounded by questions that risk falsepositive responses emerging from those with a nondepressive illness.
3. ‘‘False-Positive’’ Depression Reflecting Confounding by Physical Symptoms Associated with Medical Illness As noted earlier, individuals with many medical conditions might be expected to report features such as loss of interest, anergia, and sleep and appetite disturbances, which, if secondary to the medical illness and not a reflection of depression, will tend to inflate depression estimates. A number of options have been proposed to redress such confounding influences. Several authors5 have argued for an inclusive approach. Here, every relevant depressive symptom is counted even if secondary to the illness or its management, with or without subsequent adjustment to threshold scores to calibrate caseness estimates. A contrasting exclusive approach6 ignores features common to those with medical illness. A third substitutive approach7 involves substituting psychological symptoms (eg, tearfulness and social withdrawal) or vegetative symptoms (eg, weight loss,
194
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
appetite and sleep disturbance, fatigue, and concentration difficulties). Fourthly, both DSM-III-R and DSM-IV decision rules allow an etiologic approach, whereby symptoms are counted only if they are judged as not being caused by a general medical condition. The last approach requires the rater to make interpretative (and thus subjective) judgments. Common sense would suggest that there would be advantages to having measures of depression in medical settings that assess items defining depression per se and that are unlikely to be confounded by aspects of the medical illness or of its treatment. Such an approach therefore rejects the use of general depression measures, and argues for consideration only of measures that have been designed to preempt confounding influences. We now review measures that have been widely advanced and/or specifically designed for measuring depression in medical settings.
4. Screening Measures Used to Assess Depression in the Medically Ill The Hospital Anxiety Depression Scale This seven-item subscale (HADS8) is one of the most commonly used research measures of depression in the medically ill. As the authors judged anhedonia to be a central feature of depression and a predictor of antidepressant drug response, five of its seven items assess anhedonia (eg, ‘‘I still enjoy the things I used to enjoy’’; ‘‘I feel cheerful’’; ‘‘I have lost interest in my appearance’’; ‘‘I look forward with enjoyment to things’’; and ‘‘I can enjoy a good book or program’’), suggesting some redundancy. For this dimensional measure, the authors suggested that a cutoff score of 11 or more indicates a definite case of depression, while noncases score less than 8 and doubtful cases score in the 8 to 10 range. While this scale is widely used, Hermann9 noted that there is ‘‘still no comprehensive documentation of its psychometric properties,’’ while its actual validity has been challenged in both medically unwell and psychiatric patients.10 The positive predictive value (PPV) of the HADS in the latter study showed poor discrimination, with only 17% of medically ill patients accurately diagnosed at a cutoff of 8, rising to just 25% at a cutoff of 11. Moreover, a recent review of the validity of the HADS11 identified differing optimal cutoffs across differing primary care populations, suggesting that its case-finding ability is dependent on sample characteristics. For instance, its use in general practice settings revealed areas under the curve in the range of 0.84 to 0.96, though its translation to more specific medical settings (eg, stroke clinics) reveals uncharacteristically low case-finding cutoffs (ie, 4). Thus, the validity of the HADS as a measure of depression in divergent medical settings lacks support.
10 ARE SPECIFIC SCALES USEFUL?
195
The Beck Depression Inventory for Primary Care (BDI-PC) This seven-item measure12 was developed for primary care (and therefore medical settings) by removing somatic items from the well-established Beck Depression Inventory.13 Sadness and loss of pleasure or anhedonia were included on an a priori basis, as at least one of these symptoms is necessary for a DSM-IV diagnosis of major depression. Suicidal ideation was also chosen on an a priori basis, being judged as an important clinical indicator of suicidal risk. The remaining four items—pessimism, past failure, self-dislike, and selfcriticalness—were derived empirically from data obtained from a study of 500 psychiatric patients. A cutoff score of 4 or more is used to define depression caseness, with sensitivity and specificity being quantified at 82% to 99% across medical inpatient and outpatient samples.14–16 In a head-to-head comparison of the BDI-PC and HADS depression measures, the former was shown to be superior in distinguishing depressed nondepressed patients referred to a consultation-liaison service.12
The Depression in the Medically Ill (DMI) Scales These scales were developed by our research team17 with the objective of developing a valid measure of depression in the medically ill by focusing on cognitive symptoms. In comparison to Beck’s strategy of stripping somatic items from an accepted measure of depression, we adopted a ‘‘bottom-up’’ approach of specifically studying those with medical illness to generate possible salient constructs. In essence, we selected 81 items assessing the impact of a medical illness on the individual4 as well as ones capturing cognitive aspects of depression (eg, anhedonia, self-reproach, nonreactive mood). Items were scored by subjects on a three-point scale (‘‘not true at all,’’ ‘‘true to some degree,’’ ‘‘very true’’) for the previous 2 to 3 days. The initial study population comprised inpatients and outpatients of a large Sydney teaching hospital being treated for a primary medical condition. A research psychiatrist subsequently (i) made a dimensional estimate of any depression and (ii) judged whether there was any current depression of clinical significance (ie, major depression or an adjustment disorder with depressed mood). A number of the subjects also completed the HADS and BDI-PC measures so that the comparative properties of the measures could be examined. We refined the initial 81-item measure by removing items affirmed by both depressed and nondepressed subjects. We also deleted items that, while weighted to depression, had a low prevalence (eg, suicidal ideation), resulting in a final set of 16 items. Of interest, the measure did not appear limited to
196
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
assessing depressed mood, including a brooding item and two items having anxiety connotations (fearfulness and insecurity). The internal consistency of the derived DMI-16 was high (alpha = 0.95), and total score measures correlated highly with the BDI-PC (0.80) and HADS (0.72) measures. When correlated against depression severity as estimated by the psychiatrist, the DMI-16 returned a high coefficient (0.74), slightly in excess of the BDI-PC (0.68) and superior to the HADS (0.54). A receiver operating characteristic curve (ROC) analysis derived a cutoff score of 18 or more with both high sensitivity (100%) and specificity (96%). Of the 29 subjects who received a psychiatrist-rated judgment of a clinically significant depression, the DMI-16 cutoff discriminated highly (kappa = 0.91) and was superior to the BDI-PC (0.68) and the HADS (0.57). In a second study18 involving a larger sample of hospitalized medically ill patients, we derived a briefer version (the DMI-10) and further examined its properties (along with the DMI-16) in comparison to the BDI-PC and HADS measures. While anhedonia is included in the DMI-16, as it was affirmed by a significant number of nondepressed medically ill subjects, it was excluded from the DMI-10 measure. Analysis against clinically judged caseness established similar overall classification rates for the DMI-10 and DMI-18 measures, comparable to that derived for the BDI-PC but superior to the HADS measure. In this study, the formally recommended HADS cutoff of 8 or more for a probable case was also the optimal cutoff suggested by our ROC analysis using clinical judgment as a criterion. The recommended HADS cutoff of 11 or more for a definite case, however, showed low sensitivity. Our ROC analysis of the BDI-PC established a cutoff score of 5 or more, close to its recommended cutoff score of 4 or more. In a third development study report,19 the capacity of the DMI-10 to screen for a depression in a general practice setting—where it might be assumed that the majority of the subjects would have a primary medical illness—was again supported. The DMI-10 measure is shown in Table 10.1.
Parsimonious Screening Chochinov and colleagues20 compared four screening measures in a sample of inpatients with advanced cancer who were receiving palliative care. A single item (‘‘Are you depressed?’’) was reported to have perfect sensitivity and specificity, with the authors concluding that this question provides a ‘‘reliable and remarkably accurate screen.’’ However, as responses to the outcome measure (Research Diagnostic Criteria status) and the single predictor question could both have been derived by subjective response bias (ie, affirming or denying depression), this study risks a tautologic bias. Subsequent metaanalysis showed more modest results.21 The need for economical accurate
197
10 ARE SPECIFIC SCALES USEFUL?
Table 10.1. Ten-Item Depression in the Medically Ill Screening Measure DMI-10 Depression Self-Report Questionnaire Please consider the following questions and rate how true each one is in relation to how you have been feeling lately (ie. in the last 2 to 3 days) compared to how you usually or normally feel. Please tick ([) the most relevant option
Not True
Slightly
Moderately
Very True
1. Are you stewing over things? 2. Do you feel more vulnerable than usual? 3. Are you being self-critical and hard on yourself? 4. Are you feeling guilty about things in your life? 5. Do you find that nothing seems to be able to cheer you up? 6. Do you feel as if you have lost your core and essence? 7. Are you feeling depressed? 8. Do you feel less worthwhile? 9. Do you feel hopeless or helpless? 10. Do you feel more distant from other people? Adapted from www.blackdoginstitute.org.au/docs/DMI-10.pdf
measures encouraged development of a four-item screener (the Brief Case Finder for Depression [BCD]22), assessing whether depressed mood or ‘‘restless and disturbed nights’’ were present, together with items assessing inability to overcome difficulties and/or dissatisfaction with life. This measure also tends to be overly inclusive because its broad questioning generates many false positives; however, the sensitivity of the measure and negative predictive power appear adequate for ruling out those who are not depressed.23
198
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
The Primary Care Evaluation of Mental Disorders (PRIME-MD) The PRIME-MD24 assesses four domains of mental disorders commonly observed in general population settings: mood, anxiety, somatoform, and alcohol disorders. The two-tier assessment structure of the PRIME-MD allows patients who score positively on the patient questionnaire (PQ) to then receive a physician-administered structured interview (Clinician Evaluation Guide [CEG]) involving modularized DSM-III-R criteria. A patient who scores positively on one of two depressive symptoms (demoralization and/or anhedonia) on the PQ is subsequently assessed for more specific criteria. Due to the length of administration time, consequent selfreport measures (PRIME-MD Patient Health Questionnaire [PHQ]) have been designed,25 including the PHQ-9 measure of depressive status. Standard DSM-IV major depressive disorder criteria apply, and recognition of symptomatology is comparable—if not slightly more sensitive—than the original PRIME-MD.25 In the initial PHQ primary care study,25 measured sensitivity and specificity were 73% and 98% respectively for the self-report version, compared with 57% and 94% for the original clinician-administered version. More specifically, at a cutoff score of 10 or more, the PHQ-9 derived sensitivity estimates of 88%, likewise for specificity, to meet diagnostic criteria for major depression.26 However, diagnostic concordance of the PHQ-9, while higher than both the HADS and the WHO Wellbeing Index (WBI-5),27 is still relatively low in comparison with DSM-IV criteria (kappa ¼ 0.56). The comparative validity of the PHQ-9 with physicians’ diagnoses (sensitivities of 98% and 40% respectively) is, however, superior.28 Thus, the PHQ-9 is suggested to be somewhat more accurate29 than HADS and physicians’ diagnoses, though comparable to more general measures of well-being in primary care populations.
5. Discussion The capacity for medical illness and/or hospitalization to distort the assessment of depression in the medically ill argues against use of any general depression measure, and we have therefore not reviewed studies using such measures other than the PRIME-MD. The last does risk confounding by medical illness nuances but has the advantage of delivering DSM case status decisions, although the risk is that the intrinsic limitations to such diagnoses in medically ill groups may fail to be recognized. We therefore take as a given that any valid depression measure excludes items that can be confounded by illness or
10 ARE SPECIFIC SCALES USEFUL?
199
hospitalization, and have focused on relevant measures. Two—the HADS and the BDI-PC—have adopted the exclusive approach by effectively removing potentially confounding items from established depression measures. In developing the DMI measures, we adopted a differing ‘‘bottom-up’’ approach of examining the properties of items capturing the world of medically ill patients (both depressed and nondepressed). While the HADS measure has long been in use, it has been criticized for the lack of studies examining its psychometric properties and even for its validity. Its focus on anhedonia respects that construct’s utility in psychiatric subjects but, as we established its high rate of affirmation of anhedonia in nondepressed medically ill subjects,18 that construct may not be as central to depression as imagined. Our quantifying18 low sensitivity for the HADS in diagnosing definite depression is of concern if the aim of the screening measure is to prioritize detection of those with probable or definite depression. In our initial DMI study,17 we established that the DMI-16 had high internal consistency and was distinctly superior to the HADS and somewhat superior to the BDI-PC when compared against a psychiatrist’s independent clinical judgment of depression severity and case status. These findings were essentially confirmed in our second study,18 where we again compared the three relevant measures. Any measure of depression in the medically ill needs to be acceptable, brief, and minimally intrusive. The last issue is worthy of consideration. We deleted a provisional item assessing suicidal ideation as it proved intrusive to a number of our medically ill subjects. However, we demonstrated that its omission (in the final DMI measures) was not of concern, as all those admitting to suicidal ideation scored above the cutoff on the DMI-16. Our studies of the three principal candidate screening measures (HADS, BDI-PC and the DMI) suggest that the BDI-PC and DMI measures are roughly comparable—but superior to the HADS—in terms of their capacity to separate depressed and nondepressed individuals in medical settings. We would recommend both the use of the BDI-PC and DMI-10.
References 1. Kessler R. Prevalence, correlates, and course of minor depression and major depression in the National Comorbidity Survey. J Affect Disord. 1997;45:14–30. 2. Cuijpers P, Smit F. Subthreshold depression as a risk indicator for major depressive disorder: a systematic review of prospective studies. Acta Psychiatr Scand. 2004;109(5):325–331. 3. Egede LE. Major depression in individuals with chronic medical disorders: prevalence, correlates and association with health resource utilization, lost productivity and functional disability. Gen Hosp Psychiatry. 2007;29(5):409–416.
200
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
4. Cassell EJ. Reactions to physical illness and hospitalisation. In Usdin G, Lewis JM, eds. Psychiatry in general nedical practice. New York: McGraw Hill, 1979. 5. Cohen-Cole SA, Brown FN, McDaniel JS. Diagnostic assessment of depression in the medically ill. In Stoudermire A, Fogel B, eds. Psychiatric care of the medical patient. New York: Oxford University Press, 1993:53–70. 6. Plumb MM, Holland J. Comparative studies of psychological function in patients with advanced cancer-I. Self-reported depressive symptoms. Psychosom Med. 1977;39(4):264–276. 7. Endicott J. Measurement of depression in patients with cancer. Cancer. 1984;53(10 Suppl):2243–2249. 8. Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale. Acta Psychiatr Scand. 1983;67(6):361–370. 9. Hermann C. International experiences with the Hospital Anxiety and Depression Scale: a review of validation data and clinical results. J Psychosom Res. 1997;42(1):17–41. 10. Silverstone PH. Poor efficacy of the Hospital Anxiety and Depression Scale in the diagnosis of major depressive disorder in both medical and psychiatric patients. J Psychosom Res. 1994;38(5):441–450. 11. Bjelland I, Dahl AA, Haug TT, et al. The validity of the Hospital Anxiety and Depression Scale: An updated literature review. J Psychosom Res. 2002;52(2):69–77. 12. Beck AT, Guth D, Steer RA, et al. Screening for major depression disorders in medical inpatients with the Beck Depression Inventory for Primary Care. Behav Res Ther. 1997;35(8):785–791. 13. Beck AT, Beck RW. Screening depressed patients in family practice—rapid technique. Postgrad Med. 1972;52(6):81–85. 14. Beck AT, Steer RA, Ball R, et al. Use of Beck anxiety and depression inventories for primary care with medical outpatients. Assessment. 1997;4(Suppl 3):211–219. 15. Steer RA, Cavalieri DO, Leonard DM, et al. Use of the Beck Depression Inventory for Primary Care to screen for major depressive disorders. Gen Hosp Psychiatry. 1999;21(2):106–111. 16. Winter LB, Steer RA, Jones-Hicks L, et al. Screening for major depression disorders in adolescent medical outpatients with the Beck Depression Inventory for Primary Care. J Adolesc Health. 1999;24(6):389–394. 17. Parker G, Hilton T, Hadzi-Pavlovic D, et al. Screening for depression in the medically ill: the suggested utility of a cognitive-based approach. Aust N Z J Psychiatry. 2001;35(4):474–480. 18. Parker G, Hilton T, Bains J, et al. Cognitive-based measures screening for depression in the medically ill: the DMI-10 and the DMI-18. Acta Psychiatr Scand. 2002;105(6):419–426. 19. Parker G, Hilton T, Hadzi-Pavlovic D, et al. Clinical and personality correlates of a new measure of depression: a general practice study. Aust N Z J Psychiatry. 2003;37(1): 104–109. 20. Chochinov HM, Wilson KG, Enns M, et al. ‘‘Are you depressed?’’ Screening for depression in the terminally ill. Am J Psychiatry. 1997;154(5):674–676. 21. Mitchell AJ. Are one or two simple questions sufficient to detect depression in cancer and palliative care? A Bayesian meta-analysis. Br J Cancer. 2008;98(12): 1934–1943. 22. Clarke DM, McKenzie DP, Marshall, RJ, et al. The construction of a brief case-finding instrument for depression in the physically ill. Integr Psychiatry. 1994;10:117–123.
10 ARE SPECIFIC SCALES USEFUL?
201
23. Jefford M, Mileshkin L, Richards K, et al. Rapid screening for depression—validation of the Brief Case-Finder for Depression (BCD) in medical oncology and palliative care patients. Br J Cancer. 2004;91(5):900–906. 24. Spitzer RL, Williams JBW, Kroenke K, et al. Utility of a new procedure for diagnosing mental disorders in primary care: The PRIME-MD 1000 study. JAMA. 1994;272(22):1749–1756. 25. Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report version of PRIME-MD: The PHQ primary care study. JAMA. 1999;282(18): 1737–1744. 26. Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–613. 27. World Health Organization (WHO). Wellbeing measures in primary health care: The DepCare Project. WHO, Regional Office for Europe, Copenhagen: 1998. 28. Lo¨we B, Spitzer RL, Gra¨fe K et al. Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians’ diagnoses. J Affect Disord. 2004;78(2):131–140. 29. Wittkampf KA, Naeije L, Schene AH, et al. Diagnostic accuracy of the mood module of the Patient Health Questionnaire: a systematic review. Gen Hosp Psychiatry. 2007;29(5):388–395.
This page intentionally left blank
11 SCREENING FOR DEPRESSION IN MEDICAL SETTINGS: THE CASE AGAINST SPECIFIC SCALES Fariba Babaei and Alex J. Mitchell
1. 2. 3. 4.
Overview of Depression in Physical Disease Defining Somatic Symptoms Diagnostic Accuracy of Somatic Symptoms in Depression Evidence For and Against Somatic Symptoms when Diagnosing Comorbid Depression 5. Implications for Screening
Context The prevailing view for detecting mood disorders in the presence of physical disease is to exclude somatic symptoms that might contaminate a diagnosis (See Parker and Hyatt, Chapter 10 for a presentation of this point of view). This chapter will examine whether this approach is beneficial, with a view to deciding whether new depression scales for each physical disorder (each excluding somatic symptoms) are required.
1. Overview of Depression in Physical Disease There is a bidirectional relationship between depression and physical illness. New evidence suggests that among depressed individuals presenting in primary care, most have at least one comorbid psychiatric condition and at least one physical condition.1,2 At least 75% of elderly depressed patients in primary care also have a known physical illness, and in 30–50% this is of high severity.3–6 In one study only 10% of elderly depressed patients in primary care had pure depression with no comorbidity.7 Thus, comorbid depression should be considered ‘‘normal’’ in primary care. Some evidence suggests that 203
204
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
those with comorbidity are less likely to have depression treatment initiated by their primary care practitioner.8 They are also less likely to recover from depression.9 Specific conditions such as speech disorders, arthritis, and dermatologic problems have been linked with worse outcomes of depression.10,11 The exact relationship of depression and comorbidities is complex. In one of the largest studies, Egede (2007)12 examined data from 30,801 adults captured in the 1999 Household National Health Interview Survey. The community prevalence of major depression was 4.7% in those without chronic medical illness but 7.7%, 9.8%, and 12% in those with one, two, or three or more chronic disorders, respectively (Fig. 11.1). Major depression was associated with significant increases in utilization, lost productivity, and functional disability. Patients with chronic medical illness and comorbid depression (and anxiety) also have significantly higher numbers of medical symptoms, even controlling for severity of disease.13 Around one in four people in the general population have functional disability, but in those with depression and medical comorbidity, at least three out of four have functional limitations.14 18 16 14 12 10 8 6 4 2
) 31 =4
en R
y
e
ar
ag St d-
C or
(n
al
C
O
Fa
PD
ilu re
(n =1 68 1)
) =7 VA
Ar
C
D is ry te
(n
=3 (n
=1 (n
te s be ia D
10
1) 49
4) 79
(n =7 37 1)
io n ns te er
yp H
on
En
C
on
ge
st
iv
e
H ea rt
Fa
ilu
N
o
re
di s
or
de r
(n =3 91 )
0
Figure 11.1. 12-month prevalence of major depression in community population. Data from Egede LE. Major depression in individuals with chronic medical disorders: prevalence, correlates and association with health resource utilization, lost productivity and functional disability. Gen Hosp Psychiatry. 2007;29(5):409–416.
11 THE CASE AGAINST SPECIFIC SCALES
205
A population survey in New Zealand found that a quarter of people with chronic physical conditions suffered from a comorbid mental disorder, compared with 15% of the population without chronic conditions.15 Further, those with a mental disorder had higher rates of chronic pain, cardiovascular disease, high blood pressure, and respiratory conditions as well as the risk factors smoking, overweight/obesity, and hazardous alcohol use. In a primary care survey of 6,641 patients with multiple physical disorders, Nuyen and colleagues (2006)16 used morbidity data recorded by Dutch general practitioners to examine both psychiatric and physical comorbidities. The top three conditions linked with lifetime depression were schizophrenia, anxiety disorders, and substance abuse. The top three medical disorders were Parkinson’s disease, male genital problems, and stroke. Physical disease is also strongly linked with suicide. Juurlink and associates (2004)17 examined 1,354 provincial coroners’ records of Ontario residents 66 years or older who committed suicide between 1992 and 2000. Their prescription records during the preceding 6 months were compared with those of living matched controls (1:4) to determine the presence or absence of 17 illnesses potentially associated with suicide. Conditions associated with suicide are shown in Figure 11.2. Compared with patients with no identified illness, for example, patients with three illnesses had about a threefold increase in the estimated relative risk of suicide, and patients with five illnesses had about a fivefold increase in risk.
2. Defining Somatic Symptoms Defining and Eliciting ‘‘Somatic Symptoms’’ What exactly is meant by ‘‘somatic symptoms’’? At face value the answer seems obvious: somatic symptoms are physical complaints relating to bodily sensations. These would include aches, low energy, fatigability, muscle weakness, leaden paralysis, and gastrointestinal symptoms (low appetite and weight loss). Pain, sexual dysfunction, and sleep disturbance are certainly core somatic symptoms, but are these strictly bodily sensations? For example, pain may be defined as ‘‘an unpleasant sensory and emotional experience associated with actual or potential tissue damage or is described in terms of such damage.’’18 Thus pain (and sleep) may represent physical and psychological aspects. Even more difficult to classify but still conventionally regarded as somatic are concentration problems, agitation/retardation, and changes in arousal. This short list is not exhaustive. Less common somatic symptoms of depression might include shortness of breath, dry mouth, constipation or diarrhea, urinary frequency or hesitancy, menstrual disturbances, dizziness, changes in libido, palpitations, increased sweating, flushing, blurred vision, tremor, pins and needles, restless legs, and rash. Indeed, any bodily sensation
and the risk of suicide in the elderly. Arch Intern Med. 2004;164:1179–1184. ar
ol
in er
rd
n n
io pa so
di
re
ve
Se
ss
re
s
er tio
ita
ag
rd
er
rd
ce
en
in
pa
so so
ep
D
e
e re
lu
ai
in di di d
an
ip
B
s
se
ho
p
ee
sl
e
ur
iz
Se d
an yc
Ps
ty
ie
nx
A
nt
co
in
at
er
od
M
tf
e as
se
di
r
ce as
se
di
ar
he
ng
lu e
y
ar
rin
U
ic iv
st
ge
on
C
s
es
m an
tc
as
n’
so
in
on
hr
C
rk
Pa
ro
er
nc
ca
nd
sy
re
B
ity
is
us
lit
el
m
rit
th
se
ea
is
ar
te
ta
os
Pr id
ac
er
yp
H
s
te
be
ia
D
d
oi
at
td
ar
he
um
he
R
ic
em
ch
Is
10 9
8
7
6
5
4
3
2
1
0
Figure 11.2. Suicide risk in medical and psychiatric disorders. Reprinted with permission from Juurlink DN, Herrmann N, Szalai JP. Medical illness
207
11 THE CASE AGAINST SPECIFIC SCALES
might be included, although some symptoms may be due to medication rather than the underlying depression. One study examined how reliably clinicians elicit somatic compared to nonsomatic symptoms. In the Rhode Island MIDAS project, Zimmerman and colleagues (2006)19 conducted an in-depth analysis of symptoms for major depressive disorder by trained raters administering a semi-structured interview to 1,523 psychiatric outpatients. They analyzed a 17-item bank of possible symptoms of depression, including the standard 9 DSM items but separating the compound criteria that encompass more than one symptom (eg, increased sleep OR insomnia), along with non-DSM diagnostic items such as hopelessness, helplessness, and unreactive mood. The authors found that some items were rated more reliably than others—for example, suicidal ideas/plan/ attempt (suicidality) achieved almost perfect agreement, whereas raters often disagreed about what constituted psychomotor retardation (Textbox 11.1). There was no overall pattern indicating that somatic symptoms were rated more or less reliably than nonsomatic symptoms.
Textbox 11.1. Inter-Rater Reliability Eliciting Individual Symptoms of Depression Symptoms
Kappa
Suicidality Depressed mood Insomnia Anhedonia Decreased appetite Loss of energy Indecisiveness Thoughts of death Psychomotor agitation Feelings of worthlessness Increased weight Decreased concentration Excessive guilt Decreased weight Increased appetite Psychomotor retardation Hypersomnia
0.94 0.92 0.91 0.90 0.89 0.88 0.88 0.86 0.83 0.80 0.79 0.78 0.76 0.69 0.63 0.63 0.54
Bold text indicates somatic symptoms.
208
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Somatic Symptoms in Current Diagnostic Systems and Scales Somatic items are included in both ICD-10 and DSM-IV. In fact, ICD-10 includes fatigue as a core feature. Neither ICD-10 nor DSM-IV gives clear guidance about how to judge these specific symptoms in the case of depression and physical disease (Table 11.1). As many as 70% of patients with depression (and to a lesser extent anxiety) present with somatic symptoms as their first complaint. Emotional symptoms are less likely to be mentioned if they are not specifically asked about by the interviewer.20 That said, physical complaints are seldom attributed to psychological causes, and the focus for clinical examination is usually physical disorders with somatic symptoms.21 Thus, somatic symptoms may indicate major depression or an underlying physical disorder. Particular difficulty arises in the case of major depression occurring in the context of a comorbid physical disorder. In this situation it is unclear how to judge the significance of somatic symptoms.22,23 In an attempt to improve upon the discriminatory value of the Beck Depression Inventory (BDI) and the Zung Self-Rating Depression Scale, questionnaires such as the Hospital Anxiety and Depression Scale (HADS) and the General Health Questionnaire (GHQ-12) omit most somatic symptoms of depression in favor of cognitive aspects.24 Most commonly fatigue and appetite and weight changes are omitted.25 In this approach somatic symptoms are assumed to contaminate a diagnosis of comorbid depression. The concern is that somatic symptoms may lead to an overdiagnosis of depression because of the lack of Table 11.1. Somatic Symptoms of Depression in ICD and DSM Somatic or NonSomatic
Core Symptom
ICD-10
DSM-IV
Nonsomatic
Persistent sadness or low mood
Nonsomatic
Loss of interests or pleasure
Somatic
Fatigue or low energy
Yes (core) Yes (core) Yes
Somatic Somatic
Disturbed sleep Poor concentration or indecisiveness Low self-confidence Poor or increased appetite Suicidal thoughts or acts Agitation or slowing of movements Guilt or self-blame Significant change in weight
Yes (core) Yes (core) Yes (core) Yes Yes
Yes Yes
Yes Yes Yes Yes Yes No
No No Yes Yes Yes Yes
Nonsomatic Somatic Nonsomatic Somatic Nonsomatic Somatic
11 THE CASE AGAINST SPECIFIC SCALES
209
discrimination regarding the cause of the symptoms.26 One way to investigate this is to compare the ability of somatic items to distinguish between healthy controls and those with major depression. A second method is to compare the ability of somatic items to distinguish between those with uncomplicated major depression and those with comorbid major depression and physical illness. A third method is to compare those with comorbid depression and those with physical illness alone. We consider each of these in turn below.
3. Diagnostic Accuracy of Somatic Symptoms in Depression Given the almost endless list of possible somatic symptoms, it is important to first establish which, if any, are of diagnostic significance in primary depression and then in the diagnosis of comorbid depression and physical illness. For example, Chochinov and associates (1994)27 compared results from semistructured diagnostic interviews in 130 patients receiving palliative care. Diagnoses according to the Research Diagnostic Criteria (RDC) were compared with diagnoses made according to Endicott’s revised criteria (which replace the somatic symptoms change in weight or in appetite, sleep disturbance, loss of energy, and reduced concentration with the nonsomatic alternatives depressed appearance, social withdrawal, brooding, self-pity or pessimism, and lack of reactivity). The authors found that including somatic symptoms in the diagnostic criteria increased the rates of diagnosis, but only when these symptoms are used in conjunction with a low-threshold approach. Similarly, Dugan and coworkers (1998)28 analyzed the Zung Self-Rating Depression Scale both with and without somatic items and reported 5% more false positives when measuring depression in cancer with somatic items. However, to confirm or refute this effect, a diagnostic validity study is needed in which somatic symptoms are added or removed from the model to examine the effect on accuracy of ruling in or ruling out the condition according to the gold standard. Once this information is gathered, then a decision can be made whether to include or exclude the somatic symptoms. A slightly more sophisticated approach uses somatic symptoms only if they are caused by depression (Textbox 11.2). In reality this etiologic approach is challenging, because causation of specific symptoms is usually impossible to establish except in the crudest terms. One reason for uncertainty is that the rate of somatic complaints is not clear in each subgroup. For example, although somatic symptoms are certainly common in depressed patients, they also appear to be common in the general population: more than 75% of respondents in one community study reported at least one somatic complaint during the previous 30 days.29 The most common symptoms were tiredness (50%), headache (42%), and lower back pain (35%).
210
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Textbox 11.2. Approaches to Somatic Symptoms of Depression Inclusive The inclusive approach uses all of the symptoms of depression, regardless of whether they may or may not be secondary to a physical illness. This approach is used in the Schedule for Affective Disorders and Schizophrenia (SADS) and the Research Diagnostic Criteria. Etiologic The etiologic approach attempts to assess the origin of each symptom and counts a symptom of depression only if it is clearly not the result of the physical illness. This is proposed by the Structured Clinical Interview for DSM and Diagnostic Interview Schedule (DIS), as well as the DSM-III-R/IV. Substitutive The substitutive approach assumes somatic symptoms are a contaminant and replaces these with additional cognitive symptoms. However, it is not clear what specific symptoms should be substituted. Exclusive The exclusive approach eliminates somatic symptoms but without substitution. There is concern that this might lower sensitivity, with an increased likelihood of missed cases (false negatives). Adapted from Trask PC. Assessment of depression in cancer patients. J Natl Cancer Inst Monogr. 2004;32:80–92.
However, only about one third of patients with somatic symptoms seek medical help. From the reverse perspective, mood disorders are a common finding in those with somatic symptoms, accounting for approximately 30% of patients presenting with physical complaints.30 In the Epidemiological Catchment Area Study (ECA), the presence of physical symptoms was associated with at least a twofold increase in anxiety or depressive disorders.31,32 In the HUNT-II study, which surveyed all inhabitants from the Nord-Trøndelag County of Norway, women had a mean of 3.8 somatic symptoms and men 2.9 symptoms.33 There was a linear association between the number of somatic symptoms and the total HADS score. Gerber and associates (1992)34 showed that sleep disturbance, fatigue, more than three complaints, nonspecific musculoskeletal complaints, back pain, shortness of breath, amplified complaints, and vaguely stated complaints distinguished between depressed and nondepressed patients in a general medical primary care practice. Better evidence was recently reported in the Rhode Island MIDAS project. Zimmerman and colleagues (2006)35 found that the ranked order of diagnostic weight (by individual item) for DSM-IV
11 THE CASE AGAINST SPECIFIC SCALES
211
membership on logistic regression was depressed mood > anhedonia > sleep disturbance > concentration/indecision > worthlessness/excessive guilt > loss of energy > appetite/weight disturbance > psychomotor change > death/suicidal thoughts. In the 8.9% who fulfilled the minimum DSM-IV criteria for major depressive disorder (five features only), increased weight, decreased weight, and indecisiveness rarely influenced diagnostic classification and in fact were influential in diagnosis in the whole sample in about 1% of cases. More detailed analysis of the MIDAS project was recently reported by Mitchell and colleagues (2008).36 We found that somatic symptoms had value in ruling in and ruling out primary depression (Fig. 11.3). When ruling in depression (case-finding), the most successful single symptoms were psychomotor retardation, diminished interest/pleasure, indecisiveness, depressed mood, and worthlessness. When ruling out depression (reassurance), the most successful symptoms were depressed mood, diminished drive, loss of energy, diminished interest/pleasure, and diminished concentration. Therefore, it may be concluded that psychomotor retardation, loss of energy, and diminished concentration do indeed help clinicians diagnose uncomplicated depression. What is the evidence that somatic symptoms assist in a diagnosis of comorbid depression?
4. Evidence For and Against Somatic Symptoms when Diagnosing Comorbid Depression Evidence from Comparative Studies of Primary Depression versus Secondary Depression Lipsey and colleagues (1986)37 studied 43 post-stroke depressed patients against 43 patients with functional major depression to compare their depressive symptoms. They concluded that the depressive syndrome profiles in the two patient groups were similar, and only two symptoms were significantly different: slowness was more common and lack of interest/concentration was less common in post-stroke patients. Simon and associates (2005)38 examined the validity of the DSM-IV depression criteria in 235 individuals with medical comorbidities, including diabetes, ischemic heart disease, or chronic obstructive lung disease, versus 204 depressed subjects without those conditions. At the midpoint of the depression severity scale, patients with medical comorbidity had a 54% probability of reporting fatigue compared to 45% in those without comorbidity. All four somatic symptoms showed robust improvement with treatment, and this improvement did not differ significantly between patients with and without medical comorbidity. They could find only limited evidence that fatigue, changes in weight or appetite, psychomotor agitation/ retardation, and sleep disturbance are less valid indicators of depression in
gu il t
e ss H yp e rsom
le ssn
le ssn e ss
Hope
Help
drive
asu re
sive
est/p le
Exce s
in te r
h ed
n trati on
mo od
o od
o tor
ghts
ssne
ss
o f de ath
ati c a nxiety
Wo rt h le
Th ou
Som
ati on
ge
ba nc e
re ta rd
ch an
Sle e p dis tur
P s yc h om
a gita tio n o tor
o tor P s yc h om
P s yc h om
P s yc h ic a n xie ty
Lo ss of e n ergy
re acti ve m
n ia
en es s In som
In de cisiv
n ia In cre ase d appe ti te In cre ase d we ig ht
h ed
co nc e
ssed
Dimin is
h ed
De pre
Lack of
Dimin is
Dimin is
ppeti te De cr ease d we igh t
De cr ease da
–0.10 An xie ty
An ge r
0.50 Rule-In Added Value (PPV-Prev)
Rule-Out Added Value (NPV-Prev)
0.40
0.30
0.20
0.10
0.00
Figure 11.3. Added value in diagnosing primary depression. Adapted from Mitchell AJ, McGlinchey JB, Young D, et al. Accuracy of specific symptoms in the diagnosis of major depressive disorder in psychiatric out-patients: data from the MIDAS project. Psychol Med. Nov. 12, 2008:1–10.
11 THE CASE AGAINST SPECIFIC SCALES
213
patients with chronic medical illness. Pickard and associates (2006)39 used Rasch methods to compare symptoms of depression in 32 subjects with poststroke depression versus 366 depressed primary-care patients. They found that four items demonstrated statistically significant differential item functioning: ‘‘my sleep was restless,’’ ‘‘I felt that people disliked me,’’ ‘‘I did not feel like eating,’’ and ‘‘I had crying spells.’’ Each of these items identified with statistically significant Differential Item Functioning (DIF) demonstrated a logit difference of approximately 0.5 or more across the two groups. Overall, however, the authors found few differences between groups. Van Wilgen and associates (2006)40 analyzed the influence of somatic symptoms on the Center for Epidemiologic Studies Depression Scale (CES-D) in 509 patients with oropharyngeal, gynecologic, colorectal, and breast cancer after treatment versus a control group of 223 depressed patients without cancer. They concluded that the incidences of somatic morbidity within cancer types differ, but somatic items do not interfere with the outcome of depression as measured with the CES-D. Interestingly, some cancer groups showed both less somatic morbidity (colorectal cancer) while others showed more (oral/oropharyngeal, breast) than the comparison group. In the analyses of the CES-D with and without the somatic domain, the prevalence of depression symptoms with the somatic domain is lower for the cancer groups. Ehrt and colleagues (2007)41 compared the individual depressive symptoms of 145 depressed patients with Parkinson’s disease and 100 depressed patients without Parkinson’s disease by comparing item scores on the Montgomery˚ sberg Depression Rating Scale. Depressed patients with Parkinson’s disease A showed significant less reported sadness, less anhedonia, fewer feelings of guilt, and slightly less loss of energy but more concentration problems than depressed control subjects. Thus, some but not all somatic symptoms were increased in comorbid groups. The results of this study support the hypothesis that depression profile in Parkinson’s disease differs to a certain extent from that in non-Parkinson’s disease patients with major depression. Yates and colleagues for STAR*D (2007)42 analyzed the effect of specific somatic symptoms in separating primary depression from depression with comorbid physical disease. Clearly, if somatic symptoms were overrepresented in the comorbid group, then the classic view that somatic symptoms may contaminate a diagnosis of depression in physical disease would be supported. Two somatic symptoms occurred in 80% or more of those with noncomplicated depression and four occurred in 80% or more of those with comorbid depression. The two most common were impaired concentration (91%) and fatigue (87%). Although somatic symptoms were common in patients with both depression and physical ill health, somatic symptoms were also common in patients without comorbidity. In
214
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
particular, impaired concentration and fatigue occurred in approximately 90% of both groups. Other studies have examined this issue in relation to comorbid depression versus healthy controls.
Evidence from Comparative Studies of Comorbid Depression versus Healthy Controls Aikens and associates (1999)43 evaluated the depressive symptoms in 105 multiple sclerosis patients and compared the results with 80 healthy controls as well as three other comparison groups: diabetes (n = 71), chronic pain (n = 80), and psychiatric patients with depressive disorder (n = 37). They evaluated the appropriateness of omitting somatic items from the original BDI when assessing depressive symptoms in multiple sclerosis patients. They suggested that somatic items appear to function quite normally for this group, with psychometric indices comparable to those observed in psychiatric and nonpsychiatric samples, and recommended against dropping items from the original BDI for routine depression assessment in multiple sclerosis samples. Guo and colleagues (2006)44 looked at a small sample of 33 cancer patients, 13 patients with major depression without cancer, and 12 normal comparison subjects. The authors examined which HAM-D items would optimize the diagnosis of depression among cancer patients. Their final model contained six HAM-D items, combining somatic and nonsomatic items (late insomnia, agitation, psychic anxiety, diurnal mood variation, depressed mood, and genital symptoms). At a cutoff of 6 the sensitivity was 81.3% and specificity 87.5%. However, in this study, certain somatic items, including middle insomnia, retardation, somatic symptoms (gastrointestinal and general), and loss of weight, were not discriminatory. Holzapfel and associates (2008)45 examined depressed patients with (n = 113) and without (n = 137) chronic heart failure in relation to individual DSM-IV depressive symptoms, as measured with the Patient Health Questionnaire (PHQ)-9. Among the patients meeting the criteria for major depressive disorder, patients with heart failure reported significantly lower levels of depressed mood (p = 0.006) and worthlessness/guilt (p = 0.019) than patients without. No significant differences were found for sleep disturbance, loss of energy, change in appetite, poor concentration, psychomotor agitation/ retardation, and suicidal thoughts (Fig. 11.4).
Evidence from Comparative Studies of Comorbid Depression versus Physical Illness Alone Symptom profiles of depressed and nondepressed patients with cancer were examined by Chen and Chang (2004),46 who recruited 121 hospitalized
215
11 THE CASE AGAINST SPECIFIC SCALES Symptom severity: CHF > Non-CHF
Symptom severity: CHF > Non-CHF
Loss of interest
Depressed mood
Sleep disturbance
Loss of energy
Change in appetite
Worthlessness/feelings of guilt
Weak concentration
Psychomotor agitation/retardation
Suicidal thoughts
–1.0
–0.5
0
+0.5
+1.0
Figure 11.4. Differences in severity of individual depression symptoms in patients with major depressive disorder with and without chronic heart failure. Data from Holzapfel N, Mu¨ller-Tasch T, Wild B. et al. Depression profile in patients with and without chronic heart failure. J Affect Disord. 2008;1:53–62.
patients with breast, esophageal, and head and neck cancer. Using a HADS-D cutoff score of 11, 30 patients were classified as depressed and 91 as nondepressed. Depressed patients showed a significantly higher occurrence rate than nondepressed patients on insomnia (83% versus 62%), pain (83% versus 55%), anorexia (63% versus 42%), fatigue (67% versus 32%), and wound/ pressure sore (30% versus 13%). A significant chi-squared statistic with Yates correction (w2 = 10.74, p = 0.001) indicated an association between multiple symptoms and depression in this sample. Patients simultaneously experiencing multiple symptoms (insomnia, pain, anorexia, and fatigue) had a significantly higher risk of being depressed. Both groups showed similar rankings of symptom occurrence rates.
216
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Evidence from Noncomparative Studies (eg, Rasch Analysis) Stein and coworkers (1996)47 found that somatic items of depression were less sensitive than nonsomatic items in the diagnosis of post-stroke depression. In this study 189 persons with unilateral ischemic or embolic cerebrovascular accident were interviewed by a psychologist, 4 weeks or more after stroke, using the BDI and the HAM-D. Findings suggested that the most discriminating individual symptoms of post-stroke depression were nonsomatic. Somatic items from both scales were significantly less specific when diagnosing post-stroke depression than were the nonsomatic items. Somatic symptoms were neither specific to post-stroke depression nor added incremental validity over nonsomatic symptoms for diagnosing post-stroke depression. Kathol and colleagues (1990)48 investigated the relation of scores on the HAM-D and BDI to the presence or absence of criteria-based diagnoses of depression in cancer. The diagnoses of major depression in 152 cancer patients differed as much as 13% depending on the diagnostic system used. The BDI and the HAM-D were useful tools for screening patients with depressive symptoms but frequently misclassified those who had no major depression according to one or more of the criteria-based diagnostic systems. Kalichman and colleagues (2000)49 worked on overlapping somatic symptoms of depression and HIV disease in 357 people living with HIV/AIDS. They directly compared the diagnostic use of the BDI and the CES-D in this single sample. Results of a factor analysis entering the six depression factor scores from the BDI and CES-D showed that HIV symptoms were most strongly associated with the somatic depression symptom factors of the BDI and CES-D. In other words, the findings suggested that depression scales that include somatic symptoms will inflate depression scores in people living with HIV infection, and available methods for distinguishing overlapping symptoms should be employed when assessing people living with HIV infection. Leentjens and coworkers (2001)50 assessed the sensitivity of individual depressive symptoms and their relative contribution to the diagnosis of depressive disorder using the Structured Clinical Interview for DSM Disorders (SCID) in 149 patients with Parkinson’s disease. Applying the HAM-D and ˚ sberg Depression Rating Scale, they showed that only two the Montgomery-A somatic symptoms, early morning awakening and reduced appetite, had good discriminative properties. Therefore, they concluded that the core symptoms were most important in distinguishing depressed and nondepressed Parkinson’s disease patients. Akechi and associates (2003)51 used data from 220 cancer patients with major depression to examine the intercorrelations among the DSM-IV somatic and nonsomatic symptom criteria as well as whether the presence of an individual somatic symptom could discriminate the severity of major
11 THE CASE AGAINST SPECIFIC SCALES
217
depression. Appetite changes and a diminished ability to think but not sleep disturbance and fatigue were significantly associated with nonsomatic symptoms. These associations were consistent after adjusting for physical functioning and pain. Only patients with appetite changes showed a higher severity of depression. De Coster and colleagues (2005)52 studied 206 patients with first-ever stroke with the SCID for DSM-IV and the HAM-D. In a discriminant analysis HAM-D item scores correctly classified 88.3% of patients as depressed or nondepressed. Depressed mood discriminated best between depressed and nondepressed stroke patients, but many psychological symptoms, such as hypochondriasis, lack of insight, and feelings of guilt, were not very sensitive. In contrast, somatic symptoms, such as reduced appetite, psychomotor retardation, and fatigue, had high discriminative properties.
5. Implications for Screening Somatic symptoms have a role in the diagnosis of uncomplicated depression, but their role in comorbid depression has been subject to considerable confusion. Two early studies suggested that including somatic symptoms in scales might result in an overdiagnosis of comorbid depression and cancer (low specificity and low positive predictive value). Since that time, our search revealed six studies comparing primary depression and secondary depression, three studies comparing comorbid depression and healthy controls, but only one study comparing comorbid depression versus physical illness alone. From the first group, somatic symptoms were certainly common in patients with comorbid depression, but they were also common in those with uncomplicated depression and less common in patients in physical illness alone and least uncommon in healthy controls. Taking the example of cancer, individuals with cancer undergoing active treatment clearly have numerous somatic symptoms. Indeed, compared with healthy controls, individuals with cancer have a higher level of all somatic symptoms rated by items 14 to 21 on the BDI, with the exception of loss of libido.53 However, such differences are easy to overestimate. Individuals with comorbid and uncomplicated depressions have an even higher rate of somatic symptoms. Overall, somatic symptoms did not emerge as insignificant in primary or secondary depressions. Indeed, of the possible list of symptoms potentially discriminating depressed patients with and without comorbid physical illness, several nonsomatic items such as guilt appear to be better discriminators than somatic symptoms (see Fig. 11.4). Thus, the formulation of custom secondary depression scales by indiscriminately omitting somatic items does not appear to be justified. That said, it is possible that certain medical disorders might be atypical and feature somatic symptoms that have special significance. For example, van Wilgen and colleagues
Table 11.2. Systematic Review of Comparative Studies Examining Value of Somatic Symptoms in Comorbid Depression Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
1997
Suh T, Gallo JJ. Symptom profiles of depression among general medical service users compared with speciality mental health service users. Psychol Med. 1997;27(5):1051–1063.
ECA (Epidemiologic Catchment Area) program: series of epidemiologic surveys conducted by collaborators (1980–1984) at 5 sites in US. ECA data include both community and institutional populations interviewed in person. Measurement strategy: used standardized and generally pre-coded questions as part of highly structured interview administered by an agency lay interviewer with DIS (Diagnostic Interview Schedule) training. Logistic regression models were used to implement item response theory in the framework of the symptom criteria of major depression in DSM-III
4,931 and 363 household respondents from 3 ECA sites (Baltimore, Durham, and Los Angeles) who used general medical sector or speciality mental health respectively within 6 months of interview
(1) Except for gender, there were significant differences between the two groups according to the sociodemographic factors (p < 0.001). (2) Speciality mental health service users were more likely to report all the depression symptoms. (3) General medical users were less likely to report dysphasia (OR = 0.49; 95% CI = 0.33– 0.72) and worthless/sinful/ guilty (OR = 0.55; 95% CI = 0.35–0.86) after holding constant the level of depression but were more likely to report fatigue (OR = 1.82; 95% CI = 1.17–1.83).
Uncertain
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
2001
Leentjens AFG, Marinus J, Van Hilten JJ, et al. The contribution of somatic symptoms to the diagnosis of depressive disorder in Parkinson’s disease. J Neuropsychiatry Clin Neurosci. 2003;15:74– 77.
DSM-IV diagnosis of depressive disorder was considered the gold standard. All patients completed the Hamilton Rating Scale for Depression (HAM-D) and 111 patients completed the MADRS, which were highly significant and used as symptom checklists. The contribution of the individual items of these scales to the diagnosis of ‘‘depressive disorder’’ was calculated by discriminant analysis. Then, a correlation coefficient with this discriminant function was obtained for each of the individual items on these scales to reflect the relative strength of association of each symptom with the discriminant function. Wilks’ lambda was
169 patients with primary PD, as defined by the United Kingdom Parkinson’s Disease Society Brain Bank (UKPDS-BB), were referred from the neurologic outpatient department for a protocolized mental status examination. 20 (11.8%) were excluded because of dementia.
Using the HAM-D, suicidality was the best discriminator between depressed and nondepressed patients, followed, in descending order, by feelings of guilt, psychic anxiety, reduced appetite, depressed mood, and reduction of work and interest. Most somatic items had low discriminative properties, but reduced appetite and earlymorning wakening (or late insomnia) had relatively high discriminative properties. On the MADRS, the two ‘‘core’’ symptoms of depression, depressed mood and anhedonia, had the highest correlation coefficients. Somatic items as well as the item ‘‘concentration difficulties’’had low correlation coefficients. However, reduced appetite was a relatively important indicator of depression. Following a post hoc analysis,
No
(Continued )
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
calculated as a test of the discriminant function. Physical disability and cognitive status were rated according to the Hoehn and Yahr staging system (I–V) and Mini-Mental State Examination, respectively. 1990
Kathol RG, Mutgi A, Williams J, et al. Diagnosis of major depression in cancer patients according to four sets of criteria. Am J Psychiatry. 1990;147:1021–1024.
DSM-III, RDC (Research Diagnostic Criteria), all symptoms were recorded regardless of etiology. DSM-III-R, only symptoms that had no definite relationship with physical condition. Endicott criteria: to identify depression retrospectively, t-test and w2 square test were used to assess differences in parametric and nonparametric scores, respectively.
In an investigation of the treatment of depression in patients with terminal solid tumors, 152 of 808 patients (age 16–88, 59% female) reported symptoms of depression during clinical evaluation or screening with the Hamilton scale and/or Beck inventory. All of them had potentially fatal solid tumors at different stages.
it was discovered that after excluding the somatic items of the HAMD (items 4, 5, 6, 8, 11– 14, and 16), 86.6% of patients were correctly classified as depressed or nondepressed. After excluding the somatic items of the MADRS (items 4– 7), 88.3% of the patients were classified correctly. One third of patients had major depression according to one or more of diagnostic systems. BDI total score of <11 predicted 7% major depression according to DSM-III, RDC, and the Endicott criteria but not DSM-III-R. At BDI and Hamilton scale scores of 11–25 and 15–19 respectively, the percentage of major depression dropped substantially. The correlation between psychological items (1–14) and somatic symptoms
No
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
(15–21) was 0.55. Somatic symptoms were less discriminating. Hamilton scale was also comparable to BDI, and in fact it allowed a greater number to be assessed with the positive predictive value. 2006
Pickard AS, Dalal MR, Bushnell DB. Comparison of depressive symptoms in stroke and primary care: Applying Rasch models to evaluate the Center for Epidemiologic StudiesDepression Scale. Value in Health. 2006;9(1):59–64.
Center for Epidemiologic Studies-Depression scale (CES-D) (a 20-item scale) as a measure of depression Depression = CES-D score 16 or higher. After informed consent, participants completed a screening questionnaire that included the CES-D.
Two data sources were analyzed: (1) 32 depressed patients who were 3 months poststroke, from two large teaching hospitals in Edmonton, Canada, age 18 or more, and (2) 366 depressed primary-care patients for which data on USA-based primary-care patients with depression were obtained from the Longitudinal Investigation of Depression Outcomes (LIDO) study, age 18–75.
Misfitting items—that is, MNSQ higher than 1.40—in poststroke depression included ‘‘my sleep was restless,’’ ‘‘I had crying spells,’’ ‘‘people were unfriendly,’’ and ‘‘I felt just as good as other people.’’ No items misfit the scale in the primary care-based depression group. Four items demonstrated statistically significant DIF: ‘‘my sleep was restless,’’ ‘‘I felt that people disliked me,’’ ‘‘I did not feel like eating,’’
No
(Continued )
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
2004
Chen ML, Chang H-K. Physical symptom profiles of depressed and nondepressed patients with cancer. Palliative Med. 2004;18(8):712– 718.
Depression was measured using the Hospital Anxiety and Depression Scale. Occurrence of symptoms was evaluated with the Patient Disease Symptom/ Sign Assessment Scale.
121 hospitalized patients with breast, esophageal, and head and neck cancer
and ‘‘I had crying spells.’’ Each of these items identified with statistically significant DIF demonstrated a logit difference of approximately 0.5 or more across the two groups. Using the HADS-D cutoff score of 11, 30 patients (25%) were classified as depressed and 91 (75%) as nondepressed. The Mann/Whitney test indicated that depressed patients had a significantly higher number of symptoms (p = 0.001). Depressed patients showed a significantly higher occurrence rate (p < 0.05) than that of nondepressed patients on the following symptoms: insomnia (83% vs. 62%), pain (83% vs. 55%), anorexia (63% vs. 42%), fatigue (67% vs. 32%), and wound/ pressure sore (30% vs. 13%).
No
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
2000
Kalichman SC, Rompa D, Cage M. Distinguishing between overlapping somatic symptoms of depression and HIV disease in people living with HIV-AIDS. J Nerv Ment Dis. 2000;188(10):662–670.
Beck Depression Inventory (BDI). The BDI consists of 21 items that reflect cognitive, affective, behavioral, and somatic symptoms of depression over the previous 7 days. Center for Epidemiological Studies Depression Scale (CESD) is a 20-item scale that assesses symptoms of depression over the previous 7 days. Anxiety: To assess anxiety in the current study, we used the 20-item Trait-Anxiety scale from the State-Trait Anxiety Inventory (Spielberger et al., 1983). Future Pessimism: We developed a six-item scale
Participants were 242 (68%) men, 110 (31%) women, and 5 (1%) transgender persons living with HIV-AIDS. The majority of the sample was African-American (76%), with 19% white participants, 2% Hispanic, and the remaining 3% of other ethnic backgrounds. They were recruited from AIDS service organizations, healthcare providers, social service agencies, community residences for people living with HIV-AIDS, and infectious disease clinics.
Factor scores were computed for BDI Self-Defeating Thoughts, BDI Affective Symptoms, and BDI Somatic Symptoms. We found that the strongest degree of association with HIV symptoms occurred for BDI items involving ability to work (r = 0.42), sleep (r = 0.37), fatigue (r = 0.41), appetite (r = 0.34), and worry about health (r = 0.31). The BDI items with the strongest associations with HIV symptoms were therefore those items reflecting somatic complaints. For the CES-D, HIV symptoms were significantly correlated with depression items indicating fatigue (r = 0.43), sleep
Uncertain
(Continued )
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
to assess future pessimism. To assess symptoms of obsessive-compulsiveness, we used six items from the Obsessive-Compulsive scale of the Schedule for Nonadaptive Personality (SNAP; Clark, 1993).
1986
Lipsey JR, Spencer WC, Rabins PV, et al. Phenomenological comparison of poststroke depression and functional depression. Am J Psychiatry, 1986;143:527–529.
Structured clinical interviews. Only patients fulfilling DSM-III criteria for major depression were included in this study. The major instrument used during examination of all patients was modified PSE
(r = 0.40), appetite (r = 0.30), not being able to shake the blues (r = 0.34), feeling bothered (r = 0.33), feeling depressed (r = 0.31), and lack of concentration (r = 0.32). CESD items that did not reflect somatic complaints were also closely associated with HIV symptoms. Results of a factor analysis entering the six depression factor scores from the BDI and CES-D showed that HIV symptoms were most strongly associated with the somatic depression symptom factors of the BDI and CESD. The 43 poststroke patients were 23 acutely ill inpatients,14 patients admitted for rehabilitation following acute stroke, and 6 referred to outpatient clinic for poststroke depression. The 43
Poststroke patients had similar Hamilton depression scores to those of functional depression. PSE syndrome profiles were remarkably similar between groups. Of the 17 depression symptoms, only 2 were significantly
No
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
2008
Holzapfel N, Mu¨llerTasch T, Wild B, et al. Depression profile in patients with and without chronic heart failure. J Affect Disorders. 2008;1:53–62.
(Present State Examination) with 59 items specifically related to anxiety and depression. Depressed patients with chronic heart failure (CHF; n = 113) and without CHF (n = 137) were compared with respect to severity of individual DSM-IV depressive symptoms, as measured with the PHQ-9. Of all patients, only those who met DSM-IV diagnostic criteria for major depressive disorder or other depressive disorders and were able to complete the study questionnaire were included in the study. Statistical method: ANCOVAs with
functionally depressed patients were from inpatient admissions for major depression to the hospital. Of a total of 921 patients from a CHF and a psychosomatic outpatient clinic of the Medical Hospital at the University of Heidelberg, 137 met DSM-IV diagnostic criteria for major depressive disorder and 113 for other depressive disorders. Depressed patients with CHF (n = 113) and without CHF (n = 137).
different: slowness was more frequent and lack of interest and concentration was less frequent in poststroke patients. The 677 patients from the CHF outpatient clinic ranged in age from 16 to 90 years. 42 patients (6.2%) met the diagnostic criteria for major depressive disorder, and 71 patients (10.5%) met the diagnostic criteria for other depressive disorders according to the PHQ-9 diagnostic algorithm. 244 patients had evidence of CHF in their medical history and record. The age range was 16 to 79 years. 248 patients from the psychosomatic outpatient clinic participated in this study. 95 patients (38.9%) met the diagnostic criteria for
No
(Continued )
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
major depressive disorder, and 42 patients (17.2%) met the diagnostic criteria for other depressive disorders according to the PHQ-9 diagnostic algorithm.
sociodemographic characteristics as covariates were performed separately for patients with major depressive disorder and other depressive disorders. 2006
Ehrt U, Brønnick K, Leentjens AFG, et al. Depressive symptom profile in Parkinson’s disease: a comparison with depression in elderly patients without Parkinson’s disease. Int J Geriatr Psychiatry. 2006;21(3):252–258.
We compared the individual depressive symptoms of 145 nondemented depressed patients with Parkinson’s disease (PD) and 100 depressed patients without PD by comparing item scores on the ˚ sberg Montgomery-A Depression Rating Scale by way of MANCOVA. Dementia was diagnosed according to DSM-IIIR
PD patients included in this study came from two different cross-sectional studies: a community study in Norway and an outpatient study in the Netherlands. PD patients from both populations were included in the study if they did not suffer from dementia and had at least mild depressive symptoms, which was operationalized as a
Patients with PD had less reported sadness, slightly less loss of energy, more concentration problems, fewer feelings of guilt, and lower anhedonia.
Uncertain
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
1994
Chochinov HM, Wilson KG, Enns M, et al. Prevalence of depression in the terminally ill: effects of diagnostic criteria and symptom threshold judgments. Am J Psychiatry. 1994;151:537–540.
(Stavanger) or DSM-IV (Maastricht) after a clinical examination that included an interview with a caregiver in addition to cognitive testing.
MADRS sum score 7 (Snaith et al., 1986). The control group consisted of 100 consecutive patients referred to the old age psychiatry outpatient clinic at Stavanger University Hospital, Norway, suffering from at least mild depressive symptoms with a MADRS score 7. In both populations, cognition was assessed with the MiniMental State Examination (MMSE).
Semi-structured diagnostic interviews were conducted with 130 patients receiving palliative care. Diagnoses according to the RDC (Research Diagnostic Criteria) were compared with diagnoses according to Endicott’s revised criteria (which involve replacing somatic
130 inpatients from two hospital-based palliative care services with solid tumors were interviewed. Mean age 71.5(SD = 11.0).
A low-threshold (less stringent) diagnostic approach greatly increased the overall prevalence of major and minor depressive episodes with both the RDC and the Endicott criteria. With high thresholds, the RDC and the Endicott criteria were
Uncertain
(Continued )
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
equivalent, whereas with low thresholds the Endicott substitutions identified fewer cases of major (but not minor) depression.
symptoms [change in weight or in appetite, sleep disturbance, loss of energy, and reduced concentration] with nonsomatic alternatives [depressed appearance, social withdrawal, brooding, self-pity or pessimism, and lack of reactivity]) when either a low-severity or a highseverity threshold for classifying RDC criterion A symptoms was used. 2006
Vilalta-Franch J, GarreOlmo J, Lo´pez-Pousa S, et al. Comparison of different clinical diagnostic criteria for depression in Alzheimer’s disease. Am J Geriatr Psychiatry. 2006;14(7):589–597.
This was a cross-sectional, observational study of 491 patients with probable Alzheimer’s disease. Depression was diagnosed using five classification systems (ICD-10, DSM– IV, Cambridge Examination for Mental Disorder of the Elderly
The patients who completed the baseline visit of the EDAC study from 1998–2003 where CAMDEX and NPI were used
The prevalence of depression was 4.9% (95% CI: 3.2–7.1) according to ICD-10 criteria; 9.8% (95% CI: 7.3–12.6) according to CAMDEX; 13.4% (95% CI: 10.6–16.6) according to DSM–IV; 27.4% (95% CI: 23.6–31.5) according to PDC-dAD criteria; and 43.7% (95% CI: 39.4–48.2)
Uncertain
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
when using the screening questions from the NPI depression subscale. The level of agreement between the classification systems was low to moderate (<0.52). The characteristics associated with the most diagnostic disagreement were loss of confidence or self-esteem and irritability.
[CAMDEX], Provisional Diagnostic Criteria for depression in AD [PDCdAD], Neuropsychiatric Inventory [NPI]). Cognitive function was assessed by MMSE and CAMCOG.
1996
Stein PN, Sliwinski MJ, Gordon WA. Discriminative properties of somatic and nonsomatic symptoms for post-stroke depression. Clin Neuropsychologist. 1996;10:141–148.
Mood evaluation comprised the Beck Depression Inventory (BDI) and the Hamilton Rating Scale for Depression (HRSD).
Average age = 67. Patients were from three hospitals in New York City. At least 4 weeks after a unilateral cerebrovascular accident (CVA) of ischemic and/or embolic origin without past history of psychiatric illness, neurologic disease, and substance abuse.
Somatic items from both scales were significantly less specific when diagnosing poststroke depression than were the nonsomatic (intrapsychic) items. Somatic symptoms were neither specific to poststroke depression nor added incremental validity over nonsomatic symptoms for diagnosing poststroke depression.
Yes
(Continued )
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
2005
Simon GE, Von Korff M. Medical comorbidity and validity of DSM-IV depression criteria. Psychol Med. 2006;36:27–36.
Telephone assessments at baseline, 2 months, and 6 months included the Structured Clinical Interview for DSM-IV and other measures of depression severity and functional status. Item Response Theory analyses compared patterns of depressive symptoms across groups and specifically evaluated somatic symptoms (fatigue, change in weight or appetite, psychomotor agitation/ retardation, and sleep disturbance) as indicators of depression.
At staff-model clinics of Group Health Cooperative (GHC), a prepaid health plan serving 450,000 members in western Washington state. Computerized records were used to identify all adult health-plan members filling new (no more than last 150 days) prescriptions for antidepressants from primary care physicians, those with visit diagnoses of depression, to exclude patients with diagnoses of bipolar disorder or psychotic disorder and identify patients with specific comorbid medical conditions.
Overall item response analysis indicated differential item functioning between groups (w2 = 33.7, df = 18, p = 0.017). Two of eight item-level comparisons were statistically significant; one in the predicted direction (patients with comorbidity reported more fatigue at low levels of depression: w2 =17.9, df = 1, p < 0.001) and one in the opposite direction from predicted (patients with comorbidity reported less psychomotor agitation/ retardation at low levels of depression : w2 = 8.0, df = 1, p = 0.005). Observed differences were modest: at the midpoint of the depression severity scale, patients with medical comorbidity had a 54% probability of reporting fatigue compared to 45% in those without comorbidity.
No
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
2005
de Coster E, Leentjens AFG, Lodder J, et al. The sensitivity of somatic symptoms in post-stroke depression: a discriminant analytic approach. Int J Geriatr Psychiatry. 2005;20:358–362.
Structured Clinical Interview for DSM-IV within 1 month after the stroke to confirm or reject the diagnosis of major depressive disorder. Severity of depression was measured with the Hamilton Depression Rating Scale (HAM-D). At the follow-up interviews at 3, 6, 9, and 12 months, depression was diagnosed using a two-step procedure. First, three psychiatric rating scales for depression (BDI, HADS, SCL-90). Then patients who exceeded the previously defined cutoff value on at least one of these scales were called in and reinterviewed using the SCID and HAM-D.
From 1/9/1997 to 1/9/ 1999, all eligible patients with an acute first-ever clinical presentation of cerebral infarction who were seen in the Accident and Emergency Department and the Department of Neurology of Maastricht University Hospital were entered in a prospective stroke registry. The only inclusion criterion was an ischemic stroke.
Wilks’ lambda, as a test of discriminant function, was highly significant (p < 0.001). In total 88.3% of the patients were correctly classified as depressed or nondepressed. In this discriminant model, as expected, depressed mood was the best discriminator between depressed and nondepressed patients, followed by reduced appetite, thoughts of suicide, psychomotor retardation, psychic anxiety, and fatigue.
No
(Continued )
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
2003
Akechi T, Nakano T, Akizuki N, et al. Somatic symptoms for diagnosing major depression in cancer patients. Psychosomatics. 2003;44:244–248.
A computerized database was used to identify patients with major depression. The database included demographic factors, medical factors such as performance status and pain, and psychiatric diagnoses based on a structured clinical interview based on the DSM-IV criteria.
220 of a total of 1,721 cancer patients referred to the Psychiatry Divisions at National Cancer Center Hospital and National Cancer Center Hospital East in Japan between 1996 and 1999 were reviewed.
The results of the logistic regression analyses demonstrated that weight loss or appetite change and a diminished ability to think or concentrate were positively associated with a diminished interest or pleasure after adjusting for possible physical confounders. Patients with weight loss or appetite change showed a significantly higher severity of major depression than those without this symptom (p < 0.003), while patients with the other three somatic symptoms did not (sleep disturbance, fatigue, diminished ability to think).
Uncertain
2006
Guo Y, Musselman D, Manatunga A. The diagnosis of major depression in patients with cancer:
SCID-32 and the dimensional 21-item HAM-D were administered to all in study
Study subjects with cancer were recruited from outpatients and inpatients at Emory University.
The HAM-D items on weight loss was not different in two groups. In the cancer patients, 6 of the 21 HAM-D items
No
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
a comparative approach. Psychosomatics. 2006;47:376–384.
2006
van Wilgen CP, Dijkstra PU, Stewart RE, et al. Measuring somatic symptoms with the CES-D to assess depression in cancer patients after treatment: comparison among patients with oral/ oropharyngeal, gynecological, colorectal, and breast cancer.
participants by a single rater who was either a master’slevel nurse or a fourth-year psychiatry resident. Final psychiatric diagnoses were provided by consensus of the research team, comprising the aforementioned individuals and two board-certified psychiatrists. One-way analysis of variance (ANOVA) was used to compare the continuous variables. The CES-D, which contains 20 items divided in four domains, was administered to patients at least 1 year after the first cancer treatment (and to a control group). Patients with tumor recurrences were excluded. The CES-D in cancer patients has a good internal
Healthy comparison subjects were recruited from Emory and the surrounding community by advertisements or word of mouth.
were significantly associated with an increased probability of major depression: depressed mood (p < 0.004), late insomnia (p < 0.022), agitation (p < 0.008), psychic anxiety (p < 0.001), genital symptoms (p < 0.017), and diurnal variation (p < 0.046).
The data of comparison subjects and patients with cancer were obtained from the hospital or health center database. The comparison group was matched for gender and age with the cancer group and lived in the same area as the patients with cancer.
With ANOVA, the cancer group scored significantly higher than the control group on the domain of Somatic Retarded Activity and Depressed Affect. The four cancer groups and the comparison group differed significantly on Total score, Somatic Retarded Activity, and Depressed Affect scores.
No
(Continued )
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
1999
Psychosomatics. 2006;47:465–470.
consistency (a = 0.89) and the test–retest reliability was 0.51 (p < 0.001).
Aikens JE, Reinecke MA, Pliskin NH, et al. Assessing depressive symptoms in multiple sclerosis: is it necessary to omit items from the original Beck Depression Inventory? J Behav Med. 1999;22(2):127–142.
Poser criteria (Poser et al., 1983) for definite or probable multiple sclerosis (MS). Mean duration since MS diagnosis was 11.0 years with moderate disability due to MS. Beck Depression Inventory was used to diagnose depression and also
The correlations between the domains of Somatic Retarded Activity and Depressed Affect were significant for the control group (0.54; p < 0.01) and the cancer group (0.66; p < 0.01). The cancer groups, except the oral/oropharyngeal patients, and comparison group showed significantly higher incidences of depression symptoms without the Somatic items as compared with the CES-D with Somatic items. MS sample was recruited from the Neurology Department at the University of Chicago. Healthy control (HC) subjects comprised 49 students from the University of Chicago and 39 subjects from the community.
Relative scores for the eight somatic BDI items were analyzed by multivariate analysis of variance with demographic variables and BDI total as covariates. The only significant difference was MS > HC (item 15). On raw scores, MS patients exceeded HCs on items 15
No
Table 11.2. (Continued) Year
Reference
Method
Setting
Results (Description)
Supports Unique Scales? (Yes, no, uncertain)
computed with Mohr et al.’s proposed 18-item BDI modification (BDI18), as well as the cognitive/affective (items 1 13) and somatic (items 14 – 21) BDI subscales suggested by Cavanaugh et al. (1983). MS severity (MS sample only) was assessed with the widely used Expanded Disability Status Scale (EDSS; Kurtzke, 1983), administered by a neurologist.
and 21 (sexual disinterest), but this was attributable to the low HC item endorsement. There were no other differences on somatic items or item – total correlations.
236
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
(2006)40 found that some specific cancers showed less somatic morbidity, while others featured more. Similarly, Ehrt and associates (2007)41 found that depressed patients with Parkinson’s disease had less loss of energy but more concentration problems than depressed control subjects. Ultimately, any new scales should be tested head to head with existing tools (see Chapter 4 for further discussion). To date, attempts to produce custom scales based on exclusion of somatic items have not proven their superiority in well-designed implementation studies showing superiority of new over old. We therefore conclude that based on the evidence to date, including somatic symptoms does not lead to many false-positive diagnoses when attempting to diagnose depression in the context of physical disease (Table 11.2). Indeed, the systematic exclusion of somatic symptoms might cause an under-recognition of major depression. Given the limited evidence in specific areas, we suggest that further studies are required in relation to minor depression and subsyndromal forms.
References 1. Niles BL, Mori DL, Lambert JF, et al. Depression in primary care: Comorbid disorders and related problems. J Clin Psychol Med Settings. 2005;12(1):71–77. 2. Dwight-Johnson M, Sherbourne CD, Liao D, et al. Treatment preferences among depressed primary care patients. J Gen Intern Med. 2000;15(8):527–534. 3. Berardi D, Menchetti M, De Ronchi D, et al. Late-life depression in primary care: A nationwide Italian epidemiological survey. J Am Geriatr Soc. 2002;50(1):77–83. 4. Wells KB, Rogers W, Burnam A, et al. How the medical comorbidity of depressed patients differs across health care settings: results from the Medical Outcomes Study. Am J Psychiatry. 1991;148:1688–1696. 5. Yates WR, Mitchell J, Rush AJ, et al. Clinical features of depressed outpatients with and without co-occurring general medical conditions in STAR*D. Gen Hosp Psychiatry. 2004;26(6):421–429. 6. Aragones E, Pinol JL, Labad A. Depression and physical comorbidity in primary care. J Psychosom Res. 2007;63(2):107–111. 7. Vuorilehto M, Melartin T, Isometsa E. Depressive disorders in primary care: recurrent, chronic, and co-morbid. Psychol Med. 2005;35(5):673–682. 8. Nuyen J, Spreeuwenberg PM, Van Dijk L, et al. The influence of specific chronic somatic conditions on the care for co-morbid depression in general practice. Psychol Med. 2008;38(2):265–277. 9. Cole MG, Bellavance F. Depression in elderly medical inpatients: a meta-analysis of outcomes. Can Med Assoc J. 1997;157:1055–1060. 10. Oslin DW, Datton CJ, Kallan MJ, et al. Association between medical comorbidity and treatment outcomes in late-life depression. J Am Geriatr Soc. 2002;50:823–828. 11. Bogner HR, Cary MS, Bruce ML, et al. The role of medical comorbidity in outcome of major depression in primary care—The PROSPECT study. Am J Geriatr Psychiatry. 2005;13(10):861–868.
11 THE CASE AGAINST SPECIFIC SCALES
237
12. Egede LE. Major depression in individuals with chronic medical disorders: prevalence, correlates and association with health resource utilization, lost productivity and functional disability. Gen Hosp Psychiatry. 2007;29(5):409–416. 13. Katon W, Lin EHB, Kroenke K. The association of depression and anxiety with medical symptom burden in patients with chronic medical illness. Gen Hosp Psychiatry. 2007;29(2):147–155. 14. Egede LE. Diabetes, major depression, and functional disability among US adults. Diabetes Care. 2004;27(2):421–428. 15. Scott KM, Browne MAO, McGee MA, et al. Mental-physical comorbidity in Te Rau Hinengaro: The New Zealand Mental Health Survey. Aust N Z J Psychiatry. 2006;40(10):882–888. 16. Nuyen J, Schellevis FG, Satariano WA, et al. Comorbidity was associated with neurologic and psychiatric diseases: A general practice-based controlled study. J Clin Epidemiol. 2006;59(12):1274–1284. 17. Juurlink DN, Herrmann N, Szalai JP. Medical illness and the risk of suicide in the elderly. Arch Intern Med. 2004;164:1179–1184. 18. International Association for the Study of Pain, Subcommittee of Taxonomy. Pain terms: a current list with definitions and notes on usage. Part II. Pain. 1979;6:249–252. 19. Zimmerman M. McGlinchey JB, Young D, et al. Diagnosing major depressive disorder, I. A psychometric evaluation of the DSM-IV symptom criteria. J Nerv Ment Dis. 2006;194:158–163. 20. Simon GE, VonKorff M, Piccinelli M, et al. An international study of the relation between somatic symptoms and depression. N Engl J Med. 1999;341:1329–1335. 21. Marple RL, Kroenke K, Lucey CR, et al. Concerns and expectations on patients presenting with physical complaints: frequency, physician perceptions and actions, and two-week outcome. Arch Intern Med. 1997;157:1482–1488. 22. Cavanaugh SV. Depression in the medically ill: critical issues in diagnostic assessment. Psychosomatics. 1995;36:48–59. 23. Koenig HG, George LK, Peterson BL, et al. Depression in medically ill hospitalized older adults: prevalence, characteristics, and course of symptoms according to six diagnostic schemes. Am J Psychiatry. 1997;154:1376–1383. 24. Cavenaugh S, Clark D, Gibbons R. Diagnosing depression in the hospitalized medically ill. Psychosomatics. 1983;24:809–815. 25. Plumb M, Holland J. Comparative studies of psychological function in patients with advanced cancer: 2. Interviewer-rated current and past psychological symptoms. Psychosom Med. 1981;43:243–254. 26. Kathol RG, Mutgi A, Williams J, et al. Diagnosis of major depression in cancer patients according to four sets of criteria. Am J Psychiatry. 1990;147:1021–1024. 27. Chochinov HM, Wilson KG, Enns M, et al. Prevalence of depression in the terminally ill: effects of diagnostic criteria and symptom threshold judgments. Am J Psychiatry. 1994;151:537–540. 28. Dugan W, McDonald MV, Passik SD, et al. Use of the Zung Self-Rating Depression Scale in cancer patients: feasibility as a screening tool. Psychooncology. 1998;7:483–493. 29. Eriksen HR, Svendsrød R, Ursin G, et al. Prevalence of subjective health complaints in the Nordic European countries in 1993. Eur J Public Health. 1998;8:294–298.
238
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
30. Kroenke K, Jackson J, Chamberlin J. Depressive and anxiety disorders in patients presenting with physical complaints: clinical predictors and outcome. Am J Med. 1997;103:339–347. 31. Kroenke K, Price R. Symptoms in the community. Prevalence, classification and psychiatric comorbidity. Arch Intern Med. 1993;153:2474–2480. 32. Simon GE, Vonkorff M. Somatization and psychiatric disorder in the NIMH Epidemiologic Catchment area study. Am J Psychiatry. 1991;148(11):1494–1500. 33. Haug TT, Mykletun A, Dahl AA. The association between anxiety, depression, and somatic symptoms in a large population: The HUNT-II Study. Psychosom Med. 2004; 66:845–851. 34. Gerber PD, Barrett JE, Barrett JA, et al. The relationship of presenting physical complaints to depressive symptoms in primary care patients. J Gen Intern Med. 1992;7:170–173. 35. Zimmerman M. McGlinchey JB, Young D, et al. Diagnosing major depressive disorder, III. Can some symptoms be eliminated from the diagnostic criteria? J Nerv Ment Dis. 2006;194:313–317. 36. Mitchell AJ, McGlinchey JB, Young D, et al. Accuracy of specific symptoms in the diagnosis of major depressive disorder in psychiatric out-patients: data from the MIDAS project. Psychol Med. Nov. 12, 2008:1–10. 37. Lipsey JR, Spencer WC, Rabins PV, et al. Phenomenological comparison of poststroke depression and functional depression. Am J Psychiatry. 1986;143:527–529. 38. Simon G, Von Korff M. Medical co-morbidity and validity of DSM-IV depression criteria. Psychol Med. 2006;36:27–36. 39. Pickard AS, Dalal MR, Bushnell DB. Comparison of depressive symptoms in stroke and primary care: applying Rasch models to evaluate the Center for Epidemiologic Studies-Depression Scale. Value in Health. 2006;9(1):59–64. 40. van Wilgen CP, Dijkstra PU, Stewart RE, et al. Measuring somatic symptoms with the CES-D to assess depression in cancer patients after treatment: comparison among patients with oral/oropharyngeal, gynecological, colorectal, and breast cancer. Psychosomatics. 2006;47:465–470. 41. Ehrt U, Brønnick K, Leentjens AFG, et al. Depressive symptom profile in Parkinson’s disease: a comparison with depression in elderly patients without Parkinson’s disease. Int J Geriatr Psychiatry. 2006;21(3):252–258. 42. Yates WR, Mitchell J, Rush AJ, et al. Clinical features of depression in outpatients with and without co-occurring general medical conditions in STAR*D. J Clin Psychiatry. 2007;9:7–15. 43. Aikens JE, Reinecke MA, Pliskin NH, et al. Assessing depressive symptoms in multiple sclerosis: is it necessary to omit items from the original Beck Depression Inventory? J Behav Med. 1999;22(2):127–142. 44. Guo Y, Musselman D, Manatunga A. The diagnosis of major depression in patients with cancer: a comparative approach. Psychosomatics. 2006;47:376–384. 45. Holzapfel N, Mu¨ller-Tasch T, Wild B, et al. Depression profile in patients with and without chronic heart failure. J Affect Disorders. 2008;1:53–62. 46. Chen ML, Chang H-K. Physical symptom profiles of depressed and nondepressed patients with cancer. Palliative Med. 2004;18(8):712–718. 47. Stein PN, Sliwinski MJ, Gordon WA. Discriminative properties of somatic and nonsomatic symptoms for post-stroke depression. Clin Neuropsychologist. 1996;10:141–148.
11 THE CASE AGAINST SPECIFIC SCALES
239
48. Kathol RG, Mutgi A, Williams J, et al. Diagnosis of major depression in cancer patients according to four sets of criteria. Am J Psychiatry. 1990;147:1021–1024. 49. Kalichman SC, Rompa D, Cage M. Distinguishing between overlapping somatic symptoms of depression and HIV disease in people living with HIV-AIDS. J Nerv Ment Dis. 2000;188(10):662–670. 50. Leentjens AFG, Marinus J, Van Hilten JJ, et al. The contribution of somatic symptoms to the diagnosis of depressive disorder in Parkinson’s disease. A discriminant analytic approach. J Neuropsychiatry Clin Neurosci. 2003;15:74–77. 51. Akechi T, Nakano T, Akizuki N, et al. Somatic symptoms for diagnosing major depression in cancer patients. Psychosomatics. 2003;44:244–248. 52. de Coster E, Leentjens AFG, Lodder J, et al. The sensitivity of somatic symptoms in post-stroke depression: a discriminant analytic approach. Int J Geriatr Psychiatry. 2005;20:358–362. 53. Wedding U, Koch A, Rohrig B, et al. Requestioning depression in patients with cancer: Contribution of somatic and affective symptoms to Beck’s Depression Inventory. Ann Oncol. 2007;18(11):1875–1881.
This page intentionally left blank
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS Andres M. Kanner
1. 2. 3. 4. 5.
Depression in Stroke Depression in Multiple Sclerosis Depression in Epilepsy Depression in Parkinson’s Disease Conclusions
Context Depression appears to be particularly common in several neurologic disorders, including epilepsy, stroke, dementias, Parkinson’s disease, Huntington’s disease, and multiple sclerosis. There is some evidence that the ‘‘depression’’ associated with each neurologic disorder is distinct in symptoms and course. This suggests it may be useful to have depression scales validated for each neurologic disorder, yet most instruments appear to yield comparable acceptable sensitivities and specificities. However, head-to-head comparisons of scales and implementation studies are needed to resolve this issue. Depressive disorders are a common psychiatric comorbidity of neurologic disorders, including epilepsy, stroke, dementias, Parkinson’s disease (PD), essential tremor, Huntington’s disease, migraines and multiple sclerosis (MS), to name the principal ones.1 It is typically assumed that depressive disorders are a complication of these neurologic disorders. However, data published in the past 15 years have suggested a bidirectional relation between depression and stroke,2–4 epilepsy,5–7 dementia,8–10 and PD.11,12 In other words, not only are patients with these neurologic conditions at greater risk of developing depression, but patients with depression are at greater risk of developing one of these disorders. 241
242
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Early identification of comorbid depressive disorders is of the essence given their negative impact on quality of life and the course and response to treatment of most of these neurologic disorders. Unfortunately, depression often goes unrecognized and hence untreated. Clearly the use of screening instruments by neurologists may help remedy this problem. Several caveats need to be considered, however. First, the clinical presentation of comorbid depressive disorders may differ in several ways from that of primary depression, such as in cases of depression in epilepsy.13 Second, several somatic and cognitive symptoms are common in primary depression and most neurologic disorders (ie, fatigue, poor concentration, and slow thinking). Thus, a higher score may be a reflection of such symptoms and not of a depressive episode per se. Third, most of the available screening instruments for depression were developed for primary mood disorders and hence may yield false-positive or -negative findings. Fourth, the presence of cognitive deficits related to the underlying neurologic disorder may limit the patient’s ability to complete on his or her own self-report screening instruments in a reliable manner. In such cases, the use of examiner-administered instruments may yield more accurate data. The aim of this chapter is to provide a practical review of the prevalence and clinical manifestations of depression in four major neurologic disorders— stroke, epilepsy, PD, and MS—as well as of its impact on the course of these diseases and quality of life. In addition, this chapter provides a review of the literature of the screening instruments frequently used to identify depression in these neurologic disorders.
1. Depression in Stroke Epidemiologic Considerations The prevalence rates of post-stroke depression (PSD) have ranged from 30% to 50% in several cross-sectional studies.14–17 In a review of the literature, Robinson14 calculated the pooled prevalence of all types of PSD in various populations to be 31.8% (range 30% to 44%) from four community-based studies. Prevalence rates ranged from 25% to 47% from studies carried out in acute hospitals and from 35% to 72% in studies done in rehabilitation hospitals. As stated above, a bidirectional relation has been identified between depressive disorders and stroke. Five studies have investigated the impact of depression on the risk of stroke in large cohorts ranging from 1,703 to 6,675 subjects.2–4,18,19 Four found that depression increased the risk of developing stroke, after controlling for other risk factors associated with this neurologic condition.2,3,18,19 For example, Larson and colleagues3 followed 1,703 subjects for 13 years; patients with a depressive disorder or depressive symptoms had a 2.67 (confidence interval [CI] 1.08–6.63) relative risk of developing a
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
243
stroke, after controlling for vascular risk factors (ie, hypertension, diabetes, hyperlipidemia, heart disease, and use of tobacco). May and colleagues2 followed 2,201 men aged 45 to 59 years for 14 years. Patients with significant symptoms of depression had a 3.36 (CI 1.29–8.71) relative risk of developing a fatal stroke. The increased risk of stroke in patients with depression may be mediated through a direct impact on coagulation and central nervous system vascular parameters, and indirectly by increasing risks of cardiovascular disease, hypertension, cardiac arrhythmias, and diabetes.14
Clinical Manifestations PSD can present as major depressive episodes and minor depression. Various investigators have proposed the existence of another type of PSD, referred to as vascular depression. This is a late-onset (after the age of 65) depressive disorder identified in patients who may have had overt or silent strokes or subcortical bilateral white matter ischemic disease. The occurrence of PSD peaks between the third and sixth month after the stroke. For example, in a study of 100 stroke patients followed for 18 months, symptoms of PSD were identified in 46% of patients during the first 2 months, while only 12% of patients experienced their first symptoms 12 months after their stroke.20 Among the patients with early-onset PSD, symptoms of depression persisted 12 and 18 months later. In addition, the course of PSD can be rather lengthy. For example, symptoms of major depression identified in 27% of patients with a stroke persisted for approximately 1 year, while symptoms of minor depression in 20% of stroke patients lasted for more than 2 years.21,22 Duration of PSD symptoms appears to depend on the vascular territory of the stroke, with longer durations being identified in patients with a stroke in the middle cerebral artery than in the posterior circulation. In one study, 82% of patients with middle cerebral artery stroke continued to be symptomatic at the 6-month follow-up visit, versus 20% of those with posterior circulation strokes.23 At 12 and 24 months none of the patients with posterior circulation strokes exhibited symptoms, but 62% of those with middle cerebral artery stroke did. In general, the clinical manifestations of PSD are similar to those of primary late-onset depression, with the caveat that psychomotor retardation may be more frequently identified among patients with PSD. In fact, Lipsey and associates24 found that the presence of slowness/psychomotor retardation in PSD patients was one of the differentiating symptoms from idiopathic depressed patients who, in turn, reported more anhedonia and more difficulty with concentration. Likewise, neurovegetative symptoms (eg, changes in sleep, appetite, and sexual drive) and fatigue are common symptoms of
244
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
depressive and neurologic disorders, and PSD can worsen their severity. In fact, in a study by Federoff and associates,16 disturbances of sleep, libido, and level of energy were significantly more frequent among depressed than nondepressed stroke patients at initial evaluation and at 3, 6, 12, and 24 months. The severity of PSD has been found to correlate with the degree of impairment of activities of daily living (ADLs), during both its acute and chronic phases. Furthermore, the presence of cognitive disturbances such as aphasia and even dementing processes associated with the underlying stroke may delay the recognition of a depressive disorder. Gainotti and colleagues25 have also suggested that patients with PSD are more likely to present with catastrophic reactions, emotionalism, and diurnal mood variation than patients with idiopathic depression, though these findings have not been confirmed by other investigators.
Screening Instruments Most of the screening instruments used in stroke patients have not been validated for PSD.26 Therefore, there is a concern of a potential for falsepositive diagnosis of depression based on the presence of neurovegetative symptoms. However, this does not appear to be the case. For example, in a study of 142 consecutive patients with stroke who were followed for 2 years, Paradiso and coworkers27 identified 26 who met DSM-IV criteria for major depression during their hospitalization. Excluding the vegetative symptoms did not modify the sensitivity of the diagnosis, while the specificity decreased only to 98%.27 Thus, the diagnosis of depression can be based solely on the ‘‘psychological symptoms.’’ On the other hand, neurovegetative symptoms can be helpful as well to distinguish depressed from nondepressed stroke patients. For example, in a study of 206 patients with a first stroke, de Coster and colleagues28 administered the Structured Clinical Interview for DSM-IV (SCID) and the Hamilton Depression Scale (HAM-D, which includes seven items that identify neurovegetative symptoms); 32% of patients met the criteria for PSD. The discriminant model based on HAM-D item scores was highly significant and classified 88.3% of patients correctly as depressed or nondepressed.28 As expected, ‘‘depressed mood’’ discriminated best between depressed and nondepressed stroke patients. However, some somatic symptoms, such as reduced appetite, psychomotor retardation, and fatigue, had also good discriminative properties. Among the screening instruments developed for the identification of idiopathic depression, the self-rating scales most frequently used in studies of PSD have included the Beck Depression Inventory (BDI), the Hospital Anxiety and Depression Scale (HADS), the Zung Scale (ZS), the Geriatric Depression
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
245
Scale (GDS), and the Center for Epidemiologic Studies-Depression scale (CES-D). HAM-D is one of the examiner-rating scales most frequently used. In a study of 202 consecutive patients, Aben and colleagues29 found that the sensitivity of the BDI and the HADS ranged between 80% and 90%, while their specificity was 60%; the HAM-D yielded a sensitivity of 78.1% and specificity of 74.6%. In a study of 40 elderly stroke patients, 17 of whom were found to be depressed, the GDS and the ZS had the highest sensitivity and the ZS had the highest positive predictive value (93%).30 The value of the GDS was also found to be useful in the Perth Community Stroke Study, but that was not the case with the HADS.30 By the same token, O’Rourke and associates31 did not find the HADS to be adequate in identifying anxiety or PSD 6 months after a stroke in a study of 105 consecutive patients. Among other scales, Healey and colleagues32 tested the sensitivity and specificity of the Brief Assessment Schedule Depression Cards (BASDEC) and the Beck Depression Inventory-Fast Screen (BDI-FS) to identify PSD in 49 elderly patients with stroke with a mean age of 78.8 – 6.8 years. Using cutoff scores of 7 or more, the BASDEC yielded a sensitivity of 100% and specificity of 95% for detecting major depression, whereas the BDI-FS (cutoff scores of 4 or more) had a sensitivity of 71% and specificity of 74%. When patients with minor depression were included in analyses, the sensitivity for the BASDEC decreased to 69% while the specificity remained high (97%), and the sensitivity of the BDI-FS decreased to 62% while its specificity remained almost unchanged at 78%. The Post-Stroke Depression Scale (PSDS) is one of the rare scales developed specifically for the identification of PSD.33 It has 10 items: depressed mood, guilt feelings, thoughts of death or suicide, neurovegetative symptoms, apathy and loss of interest, anxiety, catastrophic reactions, hyperemotionalism, anhedonia, and diurnal mood variations. These investigators administered the PSDS and the HAM-D to 124 stroke patients, 45 of who had met DSM-III-R criteria for major depression and 47 for minor depression. Scores were compared to those obtained on the same scales by 17 psychiatric patients also diagnosed with major depression on the basis of DSM-III-R diagnostic criteria. These investigators suggested that the PSDS demonstrated a continuum between major and minor forms of PSD. In contrast to other authors, they concluded that in stroke patients, a DSM-III-based diagnosis of major PSD could be in part inflated by symptoms (such as apathy and neurovegetative symptoms) that are typical of major depression in a patient free from brain damage but that could be due to the brain lesion in a stroke patient. Very few studies have published data with respect to the use of the PSDS. The presence of aphasia may make it very difficult for clinicians to identify PSD and more so when using screening instruments. Sutcliffe and Lincoln34
246
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
developed the Stroke Aphasic Depression Questionnaire (SADQ) to detect depressed mood in aphasic patients in the community. They studied 70 stroke patients who had been discharged from the hospital with the SADQ, the HADS, and the Wakefield Depression Inventory. The SADQ was also administered to 17 aphasic patients on two occasions at a 4-week interval. The scores on the SADQ were significantly related to other measures of depression (r = 0.22–0.52, p < 0.05). A shortened 10-item version showed higher validity (r = 0.32–0.67, p < 0.01). Test–retest analysis also indicated that the SADQ is reliable over a 4-week interval (r = 0.72, p < 0.001). Sackley and colleagues35 found that the SADQ yielded a sensitivity of 77% and a specificity of 78% in a study of 88 stroke patients, while others have replicated the utility of this scale.36
Impact of Post-stroke Depression on the Course of the Stroke PSD has been found to have a negative impact on recovery of cognitive function, recovery of ability to perform ADLs, and mortality risks. With respect to the impact of PSD on cognitive functions, Starkstein and coworkers37 demonstrated that patients with major PSD had significantly more cognitive deficits than nondepressed patients who experienced a similar location and size of left-hemisphere stroke. However, this was not the case for strokes in the right hemisphere. In a follow-up study of 140 patients, Robinson and colleagues38 also found that the presence of major PSD was associated with greater cognitive impairment 2 years after a stroke. Regarding the impact of PSD on the recovery of ADLs, Parikh and colleagues39 found that in-hospital PSD was the most important variable predicting poor recovery in ADLs over a 2-year period. In fact, the score of in-hospital ADLs was not associated with the 2-year recovery. Likewise, the negative impact of PSD on the course of strokes is reflected in the associated higher mortality risk. Indeed, in a study of 976 stroke patients followed for 1 year, those with PSD had 50% higher mortality than those without.40 An example of a screening implementation study and yield for PSD is discussed in Chapter 5 and illustrated in the Appendix Fig Ap2.
2. Depression in Multiple Sclerosis Epidemiologic Aspects Review of various studies has indicated the presence of depressive symptoms in approximately 80% of all patients with MS; in 20% of these patients, psychopharmacotherapy is indicated. Several studies have estimated lifetime prevalence rates of major depressive disorders to range from 10% to 60%.41–43
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
247
In a population-based study, Patten and associates44 found a lifetime prevalence of major depression of 22.8% in MS patients compared to 16% in the general population.45 Being a woman younger than 35 years old, having a family history of major depression, and carrying high levels of stress were found to be risk factors. In a subsequent population-based survey of 115,071 individuals, MS was identified in 322 people and major depression in 9,010 people.46 The annual prevalence of major depression in those with MS was 15.7%, compared to 7.4% in the non-MS population. As in the previous study, being younger and female were each associated with a higher rate of depression, with a greater prevalence found in MS patients aged 18 to 45 (25.7%) compared to those who were over 45 (8.4%). This study also demonstrated a higher prevalence of depression among MS patients than those with other chronic medical illnesses. Furthermore, significantly higher point prevalence rates have been identified in MS patients than in the general population, ranging between 27% and 54%. Likewise, the prevalence of bipolar disease is significantly greater among MS patients than in the general population,47–49 with one study finding a 13% prevalence rate among 100 patients.49
Clinical Manifestations Depression in MS may be indistinguishable from primary mood disorders. However, as in the case of PSD, symptoms relating to the underlying neurologic process (eg, cognitive, neurovegetative, and somatic symptoms) can also be confused with symptoms of a depressive disorder. Such is the case of symptoms like fatigue, which has been identified in up to 90% of MS patients and is not necessarily associated with a mood disorder.50 On the other hand, the impact of depressive disorders on fatigue was illustrated in a study in which global fatigue severity was significantly reduced with an improvement of a comorbid depression following treatment with cognitive–behavioral therapy, sertraline, or supportive group therapy.51 As with other neurologic disorders, suicidal ideation is a serious problem in MS and has been identified in up to 22% of patients.52 The rate of completed suicide in patients with MS has been reported to be 7.5 times higher than would be expected in the general population, although reviews of the literature have found conflicting results.52–54 The risk appears to be greatest in the first 5 years after diagnosis and in patients ages 40 to 49 years.54 Just as in idiopathic depression, comorbid anxiety in association with depression increases the incidence of suicidality.55 For example, in a study of 152 MS patients, anxiety symptoms were seen in 25%, and depression was seen in 14% of the population. In the group with combined anxiety and depression, 64% had suicidal thoughts, compared with 33% of the group that had anxiety alone and 43% of the group with depression alone.56
248
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Screening Instruments Investigators and clinicians have used screening instruments developed for primary depression. An ongoing debate, however, has centered on the potential confounding effect that several somatic symptoms common to MS and depression may have in yielding a false-positive diagnosis of depression. Some authors have questioned the need to exclude certain somatic symptoms when these screening instruments are used in MS. For example, in the study by Patten and associates cited above,46 excluding cognitive symptoms and confounding symptoms of fatigue resulted in a drop of the overall prevalence rates of depression in both the MS (from 15.7% to 6.8%) and non-MS populations (from 7.4% to 3.2%). On the other hand, in a study of 42 patients with MS and depression, Moran and Mohr57 found that the successful treatment of depression resulted in an improvement in the score of all 21 items of the BDI-II, but only of 12 of the 17 items of the HAM-D, including the items dealing with the somatic symptoms. These authors endorsed the use of the BDI-II in its original form. In fact, population-based studies have found the BDI to be a good screening instrument for depression in MS.58,59 Furthermore, in a study of 46 newly diagnosed MS patients, Sullivan and associates60 found that a BDI cutoff score of 13 yielded a sensitivity of 71% and specificity of 79% for major depression. The CES-D scale has been also found to be a valuable screening instrument in population-based studies.61,62 For example, Chwastiak and coworkers61 found clinically significant depressive symptoms (CES-D score 16 or above) in 41.8% of 739 patients with MS; 29.1% of the subjects had moderate to severe depression (score 21 or above). Of note, patients with advanced MS were much more likely to experience clinically significant depressive symptoms than subjects with minimal disease. The impact of cognitive impairments on the recognition of symptoms of depression has been also a source of concern. To investigate this potential problem, Gold and associates63 administered the HADS to 80 MS patients with cognitive dysfunction, established with the Symbol Digit Modalities Test (SDMT), and 107 unimpaired patients. The HADS exhibited good internal consistency and retest reliability. The pattern and magnitude of correlations with other health status measures supported its validity. Of note, cognitively impaired patients had significantly higher scores in the depression and anxiety subscales. Abbreviated screening instruments have also been found to be effective. Benedict and colleagues64 investigated the validity of the BDI-FS in 54 consecutive MS patients; 48 caregiver/informants were interviewed using the Neuropsychiatric Inventory (NPI). The BDI-FS correlated significantly with other self-report measures of depression and with the informant-reported
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
249
dysphoria. Furthermore, the BDI-FS scores discriminated MS patients undergoing treatment for depressive disorder from untreated MS patients. In a study of 260 MS patients, 26% of whom met the criteria for major depression with the major depression module of the SCID, Mohr and associates65 investigated the sensitivity of two questions, one referring to a depressed mood and the second to an inability to experience pleasure. The two questions identified 99% (95% CI 91–100%) of cases.
Impact of Depression on Quality of Life in Multiple Sclerosis Depression has been found to significantly and independently affect quality of life. In fact, depression has consistently been a stronger predictor of poor quality of life than the severity of MS, as measured by the Extended Disability Status Scale (EDSS), which has been found to correlate modestly with quality of life.66–70 Depression is associated not only with lower overall quality of life, but also with sexual dysfunction and health distress beyond that accounted for by disability status.66,67 For example, in a study of 136 MS patients, 22.8% had a history of major depression and had significantly lower quality-of-life scores in the areas of energy, mental health, sexual and cognitive functioning, and general quality of life than did MS patients who had never had major depressive disorder.67
3. Depression in Epilepsy Epidemiologic Aspects The prevalence of depression in epilepsy is higher than in a matched population of healthy controls and ranges from 3% to 9% in patients with controlled epilepsy to 20% to 55% in patients with recurrent seizures.71–75 For example, in a study of 155 patients with epilepsy identified from two large primary care practices in the United Kingdom, 33% of patients with recurrent seizures and 6% of those in remission had depression.72 A recent large population-based study demonstrated a relatively high lifetime prevalence of mood disorders in patients with epilepsy using the Canadian Community Health Survey (CCHS 1.2) to investigate the prevalence of psychiatric comorbidity in persons with epilepsy in the community compared with those without epilepsy.74 The CCHS included the administration of the World Mental Health Composite International Diagnostic Interview to a sample of 36,984 subjects. A prevalence of 0.6% of epilepsy was identified in this cohort. A 17.4% lifetime prevalence of major depressive disorders was found in patients with epilepsy (95% CI 10.0–24.9) versus 10.7% (95% CI 10.2–11.2) in the general population. Furthermore, patients with epilepsy had a 24.4% (95% CI 16.0–32.8)
250
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
lifetime prevalence for any type of mood disorder versus 13.2% (95% CI 12.7–13.7) among the general population. The lifetime prevalence of suicidal ideation was twice as high in patients with epilepsy (25%; 95% CI 17.4–32.5) compared to that of the general population (13.3%; 95% CI 12.8–13.8). As in the case of stroke, a bidirectional relation has been identified between depressive disorders and epilepsy. Indeed, three population-based studies have demonstrated that depressive disorders can precede the onset of epilepsy.5–7 The first study was a Swedish population-based case-control study in which depression was found to be seven times more frequent among patients with new-onset epilepsy, preceding the seizure disorder, than among age- and sexmatched controls.5 When analyses were restricted to cases with partial epilepsy, depression was found to be 17 times more common among cases than among controls. The second population-based study included all adults aged 55 years and older at the time of the onset of their epilepsy living in Olmstead County, MN.6 In this study, the investigators found that a diagnosis of depression preceding the time of their first seizure was 3.7 times more frequent among cases than among controls after adjusting for medical therapies for depression. As in the Swedish study,5 this increased risk was greater among cases with partial-onset seizures. The third study, carried out in Iceland, investigated the role of specific symptoms of depression in predicting the development of unprovoked seizures or epilepsy in a population-based study of 324 children and adults aged 10 years and older with a first unprovoked seizure or newly diagnosed epilepsy and 647 controls.7 Major depression was associated with a 1.7-fold increased risk for developing epilepsy, while a history of attempted suicide was 5.1-fold more common among cases than among controls.
Clinical Manifestations Depression in epilepsy can mimic any of the mood disorders included in the DSM-IV classification. However, in a significant percentage of patients, the depressive episodes have an atypical clinical presentation that fails to meet any of the DSM (be it III, III-R, or IV) Axis I categories.71 Symptoms or episodes of depression can be classified according to their temporal relation with seizure occurrence. They can be identified prior to the onset of seizures (preictal period), as an expression of the actual seizures (ictal symptoms), following seizures (during the postictal period, which may extend up to 120 hours after a seizure), or, more commonly, independently of seizures (interictal symptoms). Peri-ictal symptoms are often unrecognized by clinicians, which accounts for the scarcity of data regarding their prevalence and response to treatment.
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
251
Preictal Symptoms Preictal symptoms generally manifest as a cluster of dysphoric symptoms lasting hours, or even 1 to 3 days, prior to the onset of a seizure. In one study, Blanchet and Frommer76 examined mood changes over 56 days in 27 patients who were asked to rate their mood on a daily basis. Changes in mood were noted by 22 patients during the 72 hours preceding the seizure, consisting primarily of symptoms of dysphoria, anxiety, and irritability. Ictal Symptoms Ictal symptoms of depression are those expressed during a simple partial seizure.77–79 One study estimated that psychiatric symptoms occur in 25% of auras, with 15% of these involving affect or mood changes—depression symptoms ranked second after anxiety/fear as the most common type of ictal affect in this study.77 Ictal symptoms of depression are usually brief and stereotypical, develop out of context, and are affiliated with other ictal phenomena. Feelings of anhedonia (inability to experience pleasure in anything), guilt, and suicidal ideation represent the most prevalent symptoms. Ictal symptoms of depression are often followed by an alteration of consciousness as the ictus evolves from a simple to a complex partial seizure. Postictal Symptoms Postictal symptoms of depression existing in patients with epilepsy have long been identified yet have been systematically investigated in only one study at the Rush Epilepsy Center80 in 100 consecutive patients with poorly controlled partial seizure disorders. The postictal period was defined as the 72 hours that followed recovery of consciousness from a seizure or cluster of seizures, and symptoms were identified with a 42-item questionnaire. The questions on depressive symptoms were intended to target anhedonia, irritability, poor frustration tolerance, feelings of hopelessness and helplessness, suicidal ideation, feelings of guilt, self-deprecation, and crying bouts. Five neurovegetative symptoms that are common postictally (including postictal fatigue, and changes in patterns of sleep, appetite, and sexual drive) were investigated but not classified as symptoms of depression so as not to falsely increase their prevalence. Only those symptoms consistently identified by patients during the postictal periods of more than 50% of their seizures were made subject to analysis; this ensured that only the postictal symptoms of habitual occurrence were targeted. The typical duration of each symptom was estimated. The symptoms that also occurred during the interictal period were also identified, and their severity during interictal versus postictal periods was compared. Forty-three patients (43%) experienced a median of five postictal symptoms of depression habitually (range two to nine). Thirty-five patients reported at least two postictal symptoms with a minimum duration of 24 hours, and 13 of
252
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
these patients experienced at least seven symptoms clustered to mimic symptoms of major depression spanning 24 hours or longer. Two thirds of symptoms had a median duration of 24 hours or longer. Postictal suicidal ideation was identified in 13 patients—8 experienced both passive and active suicidal thoughts, while 5 reported only passive suicidal ideation. Ten of these 13 patients (77%) had a past history of either major depression or bipolar disorder, and this association was highly significant. Furthermore, the presence of postictal suicidal ideation was also significantly associated with a history of psychiatric hospitalization. Postictal symptoms of depression often occurred with other psychiatric symptoms. In 23 patients concurrent postictal symptoms of anxiety were identified, and in 7 patients a combination of postictal symptoms of depression, psychosis, and anxiety was seen. Episodes of Depression During Interictal Periods Interictal depression is the most commonly recognized manifestation of mood disorders among patients with epilepsy. As previously stated, depressive episodes may be identical to any of the mood disorders described in the DSM-IV classification (eg, major depression, dysthymic disorder, bipolar disorder). However, many cases of interictal depression fail to meet the criteria of any of the DSM mood disorders. For example, a study by Mendez and colleagues81 found that 50% of depressive episodes had to be classified as atypical depression according to DSM-III-R criteria. There is a consensus among various investigators that interictal depression in people with epilepsy most frequently manifests as a pleomorphic cluster of symptoms of depression, irritability, anxiety, as well as neurovegetative symptoms.82–84 It has a chronic course that is interrupted by recurrent symptom-free periods that last hours to several days. This mode of presentation bears the closest resemblance to a dysthymic disorder; therefore, the term ‘‘dysthymic-like disorder of epilepsy’’ (DLDE) has been suggested.82 In a study of 97 consecutive patients with a depressive episode severe enough to warrant pharmacotherapy, DLDE was identified in 69 (70%) patients.82 Overall, the severity of these depressive disorders was milder than that of a major depressive episode; however, they caused sizeable disruptions in patients’ daily activities, social relations, and quality of life. In 1923, Kraepelin84 published a description of interictal depressive episodes in patients with epilepsy suggesting the pleomorphic characteristics of their symptoms. Six decades later, Blumer expanded on Kraepelin’s observations and coined the term ‘‘interictal dysphoric disorder’’ (IDD) to refer to this type of depressive disorder.83 Blumer suggested that IDD consists of the following eight intermittent affective-somatoform symptoms: irritability, depressive moods, anergia, insomnia, pains, anxiety, phobic fears, and
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
253
euphoric moods. In his opinion, the presence of three affective-somatoform symptoms was sufficient to be associated with significant disability. Nearly one third to one half of patients with epilepsy seeking medical care present a clinical picture compatible with an IDD of sufficient degree to warrant pharmacologic treatment. IDD tends to develop 2 years or more after the onset of epilepsy. The suicide rate in depressed patients with epilepsy is five times higher than predicted in the overall population (this figure rises to 25 times higher in patients with partial seizures of temporal lobe origin).85 In a review of the literature, Gilliam and Kanner86 concluded that suicide has one of the highest standardized mortality rates (SMR) of all causes of death in persons with epilepsy. Furthermore, Robertson87 reviewed 17 studies pertaining to mortality in epilepsy and established that suicide occurred 10 times more frequently than in the general population. In a population-based incidence cohort study from Iceland, Rafnsson and associates88 found that suicide had the highest SMR (5.8) of all causes of death.
Screening Instruments for Depressive Disorders in Epilepsy A six-item screening instrument, the Neurological Disorders Depression Inventory for Epilepsy (NDDI-E), was recently validated to identify major depressive episodes specifically in patients with epilepsy.89 This instrument was constructed to minimize the potential for confounding by adverse events related to antiepileptic drugs or cognitive problems associated with epilepsy that plague other instruments. Completion of this instrument takes less than 3 minutes. A score of 14 and higher is suggestive of a major depressive disorder, with a sensitivity of 81% and specificity of 90%. Other self-rating screening instruments developed to identify symptoms of depression in the general population, the BDI-II and the CES-D, have been recently found to be valid in patients with epilepsy.90 To date, they have been the most frequently used instruments in research studies. Jones and colleagues90 found a mean sensitivity of 93% and specificity of 80% for both of these instruments and a very high negative predictive value (0.98) but lower positive predictive value (0.47). Ettinger and associates91 investigated the presence of symptoms of depression with the CES-D among 775 people with epilepsy, 395 people with asthma, and 362 healthy controls identified from a cohort of 85,358 adults aged 18 years and older. Patients with epilepsy experienced symptoms of depression with a significantly greater frequency (36.5%) and severity than people with asthma (27.8%) and healthy controls (11.8%). Of note, 38.5% of people with epilepsy whose score on the CES-D suggested the presence of a depressive disorder and 43.7% of people with asthma and depression had never been previously evaluated for depression. The same
254
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
group of investigators compared the lifetime prevalence rates of bipolar symptoms and past diagnoses of bipolar I and II disorder with the Mood Disorder Questionnaire (MDQ) among subjects who identified themselves as having epilepsy and those with migraine, asthma, diabetes mellitus, or a healthy comparison group.91 Bipolar symptoms, evident in 12.2% of epilepsy patients, were 1.6 to 2.2 times more common in subjects with epilepsy than in those with migraine, asthma, or diabetes mellitus and 6.6 times more likely to occur than in the healthy comparison group. A total of 49.7% of patients with epilepsy who screened positive for bipolar symptoms were diagnosed with bipolar disorder by a physician, nearly twice the rate seen in other disorders. However, 26.3% of MDQ-positive epilepsy subjects carried a diagnosis of unipolar depression, and 25.8% had neither a unipolar or bipolar depression diagnosis. These data question the reliability of this instrument in identifying bipolar disorders in patients with epilepsy. The HADS is an attractive screening instrument because it helps identify symptoms of depression and anxiety at once.92 Self-rating instruments with too many items that identify somatic symptoms, like the PHQ9, may be problematic and yield a false-positive diagnosis of depression.93 Among the examiner-administered instruments, the HAM-D may not be as useful in patients with epilepsy as the high score may result from the sedative adverse events of antiepileptic drugs and not be the expression of an underlying mood disorder.94
Impact of Depressive Disorder on Treatment of the Seizure Disorder and Quality of Life Several studies have demonstrated the negative impact of depressive disorders on the quality of life of patients with epilepsy.95–97 For example, in a study of 56 patients with epilepsy carried out in Germany by Lehrner and associates,95 depression was the single strongest predictor for each domain of health-related quality of life (HRQOL). The significant association of depression with HRQOL persisted after controlling for seizure frequency, seizure severity, and other psychosocial variables. In another study of 257 patients with epilepsy by Perrine and coworkers,96 the ‘‘mood factor’’ had the highest correlations with scales of the QOLIE-89 and was the strongest predictor of poor quality of life in regression analyses. Gilliam and associates97 investigated the variables responsible for poor quality of life measured with the QOLIE-89 in 194 adult patients with refractory partial epilepsy. Patients averaged 9.7 seizures per month (range 0.3 to 51), but there was no correlation between the type or the frequency of seizures and the QOLIE-89 scores. The presence of symptoms of depression and neurotoxicity from antiepileptic drugs were the only
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
255
independent variables significantly associated with poor quality-of-life scores on the QOLIE-89 summary score. A negative impact of depressive disorders on the response to pharmacologic and surgical treatments has also been identified.98–100 In a study of 780 patients with new-onset epilepsy, Hitiris and colleagues98 found that individuals with a history of psychiatric disorders, and particularly depression, were almost three times less likely to be seizure-free with antiepileptic medications (median follow-up period was 79 months) than patients without a history of psychiatric disorders. Similarly, among 121 patients who underwent a temporal lobectomy, Anhoury and associates99 reported a worse postsurgical seizure outcome for patients with a psychiatric history compared with those without a psychiatric history. In a study of 100 patients who had a temporal lobectomy and were followed for a mean of 8.8 – 3.3 years, Kanner and colleagues100 investigated the role of a lifetime history of depression as a predictor of postsurgical seizure outcome. Using a multivariate logistic regression model, the investigators evaluated the covariates of a lifetime history of depression, cause of temporal lobe epilepsy (ie, mesial temporal sclerosis, lesional, or idiopathic), duration of seizure disorder, occurrence of generalized tonic–clonic seizures, and extent of resection of mesial temporal structures. A lifetime history of depression and a smaller resection of mesial temporal structures were the only independent predictors of persistent auras in the absence of disabling seizures in multivariate analyses. A lifetime history of depression was also an independent predictor of failure to reach freedom from disabling seizures in univariate but not multivariate analyses. The data in these three studies raise the question of whether a history of depression may be a marker of a more severe form of epilepsy.
4. Depression in Parkinson’s Disease A review of the literature reveals that depression is relatively common in PD patients: major depression has been identified in 5% to 25% of patients and minor depression in 25% to 50%.101–104 As with other neurologic disorders, the prevalence of depression in PD has varied according to the type of patient population, with data derived from community-based studies being lower than those of hospital-based studies. Thus, in one population-based study of 245 PD patients, 45.5% reported a mild form of depression, while in another community population sample, 19.6% met criteria for moderate to severe depression.101 Other independent reviews found the prevalence of some forms of depression in PD patients to be around 45%.103,104 In contrast, comorbid bipolar disease seems to be rare.105 Recent studies have demonstrated a variety of symptoms other than motor symptoms preceding the typical manifestations of PD, including constipation, loss of smell, sleep disturbances such as rapid-eye-movement sleep behavior
256
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
disorder, and depression. Three population-based studies suggest a bidirectional relation between PD and depression.11,12,106 In the first study all subjects diagnosed with depression between 1975 and 1990 were included and matched with subjects with the same birth year who were never diagnosed with depression.11 Among the 1,358 depressed subjects, 19 developed PD, and among the 67,570 nondepressed subjects, 259 developed PD. Thus, people with depression were three times more likely to develop PD than nondepressed people (hazard ratio of 3.13 [95% CI 1.95–5.01]) in multivariable analysis. In the second study, investigators compared the incidence of depression in patients preceding the onset of PD with that of a matched control population.12 To that end, data from an ongoing general practice-based register study that included a population of 105,416 people were used. Among patients who went on to develop PD, 9.2% had a history of depression, compared with 4.0% of the control population, yielding an odds ratio for a history of depression for these patients of 2.4 (95% CI 2.1–2.7). A third population-based study compared the risk of developing PD between patients with affective disorders and two groups of medically ill patients, one with osteoarthritis and the second with diabetes, using linkage of public hospital registers from 1977 to 1993.106 A total of 164,385 patients entered the study base. The risk of being given a diagnosis of PD was significantly increased for patients with affective disorder when compared to patients with osteoarthritis (odds ratio 2.2 [95% CI 1.7–2.8]) or diabetes (odds ratio 2.2 [95% CI 1.7–2.9]).
Clinical Manifestations Depression in PD often begins late in life, in contrast to primary major depression, which is more likely to appear before the age of 40 and may present as major and minor depressive disorders, but often with certain differences. For example, dysphoric symptoms are more frequent in PD patients and include irritability, sadness, and pessimism. However, feelings of failure, guilt, and self-blame are less frequent in PD. Though suicidal ideation appears to be more frequent in PD patients, they are less likely to actually commit suicide.107 Anxiety symptoms have been identified in two thirds of PD patients with depressive symptoms; conversely, 97% of PD patients with an anxiety disorder have been found to exhibit depressive symptoms as well.
Screening Instruments The screening instruments used for depression in PD patients have included the ˚ sberg Depression Rating Scale BDI, HADS, GDS, and Montgomery-A 108–111 As with other neurologic disorders, the presence of somatic (MADRS).
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
257
symptoms resulting from PD may be a potential problem resulting in a falsepositive diagnosis of depression. Leentjens and colleagues108 found that lower cutoff scores (11/12) of the HAMD-17 and MADRS (14/15) yielded maximal sensitivity, but at the expense of a low specificity, while maximum discrimination between depressed and nondepressed PD patients was reached at cutoff scores of 13/14 and 14/15, respectively. Likewise, the same group of investigators found a high sensitivity and low specificity at cutoff scores of 8/9 with the BDI, while scores of 16/17 or higher yielded a low sensitivity and high specificity.109 Similar findings were reported by Silberman and colleagues.110 Mondolo and associates111 investigated the validity of the HADS and GDS in PD and showed that a maximum discrimination between depressed and nondepressed PD patients was reached at a cutoff score of 10/11 for both the HADS and the GDS. A high specificity and positive predictive value was reached at a cutoff score of 12/13 for the GDS and at a cutoff score of 11/12 for the HADS. Tumas and associates112 evaluated 50 consecutive patients with PD using the Unified Parkinson’s Disease Rating Scale, the 15-item GDS, and the BDI against DSM-IV criteria. GDS-15 (cutoff 8/9) was better than the BDI (cutoff 17/18) and the UPDRS for screening depression in PD, and depression was not related to the degree of parkinsonian symptoms.
Impact of Depression on the Course and Quality of Life in Parkinson’s Disease The presence of depression in PD patients has been associated with a more rapid deterioration of motor and cognitive functions, especially executive function, and a greater likelihood of displaying psychotic symptoms and physical disability.113–119 Such impact is appreciated even when the depressive disorder occurs in the early stage of the disease. For example, in a study by Ravina and associates,119 a total of 114 (27.6%) patients were identified among a group of 413 patients with a depressive disorder during the average 14.6 months of follow-up. Forty percent of these subjects were neither treated with antidepressants nor referred for further psychiatric evaluation. Depression was a significant predictor of more impairment in ADLs and increased need for symptomatic therapy of PD (hazard ratio = 1.86; 95% CI 1.29–2.68). The cognitive disturbances of depressed PD patients include poor insight and judgment, and problems with planning, but memory impairment is seen less frequently. Neuropsychological tests that evaluate executive functioning have demonstrated significant deficits in PD patients, indicating frontal subcortical impairment—an impairment that may also be operant in the development of mood disorders. In a study that compared performance in neuropsychological testing between PD patients with major depression,
258
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
nondepressed PD patients, patients with primary major depression, and healthy controls, depressed PD patients exhibited impairments in set shifting and concept formation; these abnormalities were unique to this group of patients.115 However, cognitive deterioration in depressed PD patients can be mitigated with treatment of the depressive disorder. In a longitudinal study that compared cognitive performance in depressed and nondepressed PD patients who were followed for a 3- to 4-year period, cognitive functions deteriorated more quickly in the depressed PD group.118 However, in the depressed PD patients who were treated, there was an attenuated decrement in cognitive scores (11%), compared with untreated patients. Depression also has been found to have a negative impact on the quality of life in PD patients, just as in stroke and epilepsy. For example, in a multicenter study conducted by the Global Parkinson’s Disease Survey Steering Committee and involving six countries, data were obtained from 2,020 PD patients and 687 caregivers, and depression was found to be the most significant predictor variable in poor health-related quality of life.120 Patients often failed to recognize their depressive disorder: only 1% of the patients reported feeling depressed, while 50% were considered depressed by study criteria of a score of more than 10 on the BDI. Furthermore, in a community-based study of 228 people with PD, depression was the factor most closely related to a poor quality of life, while the stage of PD, duration, and cognitive impairment had a lesser impact.121 Others have confirmed these findings.122
5.
Conclusions
Depressive disorders are a common comorbidity in neurologic disorders, with significant negative impacts on their course and response to treatment. Despite their relatively high prevalence, they go often unrecognized and untreated. This problem can be mitigated with the use of screening instruments. While it would be ideal to have screening instruments of depression validated for each neurologic disorder, the available instruments appear to yield acceptable sensitivities and specificities for the most part and should be used not only in research studies but also in clinical practice. It is important to keep in mind, however, that these are screening instruments, and thus the diagnosis must be confirmed with a formal psychiatric evaluation.
References 1. Kanner AM. Depression and the risk of neurological disorders. Lancet. 2005;366(9492):1147–1148. 2. May M, McCarron P, Stansfeld S, et al. Does psychological distress predict the risk of ischemic stroke and transient ischemic attack? The Caerphilly Study. Stroke. 2002;33:7–12.
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
259
3. Larson SL, Owens PL, Ford D, et al. Depressive disorder, dysthymia, and risk of stroke. Thirteen-year follow-up from the Baltimore Epidemiological Catchment Area Study. Stroke. 2001;32:1979–1983. 4. Jonas BS, Mussolino ME. Symptoms of depression as a prospective risk factor for stroke. Psychosom Med. 2000;62:463–471. 5. Forsgren L, Nystrom L. An incident case referent study of epileptic seizures in adults. Epilepsy Res. 1990;6:66–81. 6. Hesdorffer DC, Hauser WA, Annegers JF, et al. Major depression is a risk factor for seizures in older adults. Ann Neurol. 2000;47:246–249. 7. Hesdorffer DC, Hauser WA, Olafsson E, et al. Depression and suicidal attempt as risk factor for incidental unprovoked seizures. Ann Neurol. 2006;59(1):35–41. 8. Modrego PJ, Ferrandez J. Depression in patients with mild cognitive impairment increases the risk of developing dementia of Alzheimer type: a prospective cohort study. Arch Neurol. 2004;61:1290–1293. 9. Kessing LV, Andersen PK. Does the risk of developing dementia increase with the number of episodes in patients with depressive disorder and in patients with bipolar disorder? J Neurol Neurosurg Psychiatry. 2004;75:1662–1666. 10. Dal Forno G, Palermo MT, Donohue JE, et al. Depressive symptoms, sex, and risk for Alzheimer’s disease. Ann Neurol. 2005;57(3):381–387. 11. Leentgens AFG, Van Der Akker M, Metsemakers JFM, et al. Higher incidence of depression preceding the onset of Parkinson’s disease: a register study. Movement Disorders. 2003;18:414–418. 12. Nilsson FM, Kessing LV, Bowlig TG. Increased risk of developing Parkinson’s disease for patients with major affective disorder: a register study. Acta Psychiatr Scand. 2001;104:380–386. 13. Kanner AM. Depression in epilepsy: prevalence, clinical semiology, pathogenic mechanisms and treatment. Biol Psychiatry. 2003;54:388–398. 14. Robinson RG. Poststroke depression: Prevalence, diagnosis, treatment and disease progression. Biol Psychiatry. 2003;54:376–387. 15. Eastwood MR, Rifat SL, Nobbs H, et al. Mood disorder following cerebrovascular accident. Br J Psychiatry. 1989;154:195–200. 16. Fedoroff JP, Starkstein SE, Parikh RM, et al. Are depressive symptoms non-specific in patients with acute stroke? Am J Psychiatry. 1991;148:1172–1176. 17. Burvill PW, Johnson GA, Jamrozik KD, et al. Prevalence of depression after stroke: The Perth Community Stroke Study. Br J Psychiatry. 1995;166:320–327. 18. Colantonio A, Kasi SV, Ostfeld AM. Depressive symptoms and other psychosocial factors as predictors of stroke in the elderly. Am J Epidemiol. 1992;136:884–894. 19. Everson SA, Roberts RE, Goldberg DE, et al. Depressive symptoms and increased risk of stroke mortality over a 29-year period. Arch Intern Med. 1998;158:1133–1138. 20. Berg A, Palomaki H, Letitihalmes M, et al. Post stroke depression: An 18-month follow-up. Stroke. 2003;34:138–143. 21. Morris PLP, Robinson RG, Raphael B. Prevalence and course of depressive disorders in hospitalized stroke patients. Intl J Psychiatr Med. 1990;20:349–364. 22. Robinson RG, Price TR. Post-stroke depressive disorders: a follow-up study of 103 outpatients. Stroke. 1982;13:635–641. 23. Starkstein SE, Robinson RG, Berther ML, et al. Depressive disorders following posterior circulation as compared with middle cerebral artery infarcts. Brain. 1988;11:375–387.
260
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
24. Lipsey JR, Robinson RG, Pearlson GD, et al. Nortriptyline treatment of post-stroke depression. A double-blind study. Lancet. 1984;1(8372):297–300. 25. Gainotti G, Azzoni A, Marra C. Frequency, phenomenology and anatomical-clinical correlates of major post-stroke depression. Br J Psychiatry. 1999;175:163–167. 26. Salter K, Bhogal SK, Foley N, et al. The assessment of poststroke depression. Top Stroke Rehabil. 2007;14(3):1–24. 27. Paradiso S, Ohkubo T, Robinson RG. Vegetative and psychological symptoms associated with depressed mood over the first two years after stroke. Int J Psychiatry Med. 1997;27(2):137–157. 28. de Coster L, Leentjens AF, Lodder J, et al. The sensitivity of somatic symptoms in post-stroke depression: a discriminant analytic approach. Int J Geriatr Psychiatry. 2005;20(11):1103–1104. 29. Aben I, Verhey F, Lousberg R, et al. Validity of the Beck Depression Inventory, Hospital Anxiety and Depression Scale, SCL-90, and Hamilton Depression Rating Scale as screening instruments for depression in stroke patients. Psychosomatics. 2002;43(5):386–393. 30. Johnson G, Burvill PW, Anderson CS, et al. Screening instruments for depression and anxiety following stroke: experience in the Perth community stroke study. Acta Psychiatr Scand. 1995;91(4):252–257. 31. O’Rourke S, MacHale S, Signorini D, et al. Detecting psychiatric morbidity after stroke: comparison of the GHQ and the HAD Scale. Stroke. 1998;29(5):980–985. 32. Healey AK, Kneebone II, Carroll M, et al. A preliminary investigation of the reliability and validity of the Brief Assessment Schedule Depression Cards and the Beck Depression Inventory-Fast Screen to screen for depression in older stroke survivors. Int J Geriatr Psychiatry. 2008;23(5):531–536. 33. Gainotti G, Azzoni A, Razzano C, et al. The Post-Stroke Depression Rating Scale: a test specifically devised to investigate affective disorders of stroke patients. J Clin Exp Neuropsychol. 1997;19(3):340–356. 34. Sutcliffe LM, Lincoln NB. The assessment of depression in aphasic stroke patients: the development of the Stroke Aphasic Depression Questionnaire. Clin Rehabil. 1998;12(6):506–513. 35. Sackley CM, Hoppitt TJ, Cardoso K. An investigation into the utility of the Stroke Aphasic Depression Questionnaire (SADQ) in care home settings. Clin Rehabil. 2006;20(7):598–602. 36. Bennett HE, Thomas SA, Austen R, et al. Validation of screening measures for assessing mood in stroke patients. Br J Clin Psychol. 2006;45(Pt 3):367–376. 37. Starkstein SE, Robinson RG, Price TR. Comparison of patients with and without poststroke major depression matched for age and location of lesion. Arch Gen Psychiatry. 1988;45:247–252. 38. Robinson RG, Starr LB, Kubos KL, et al. A two year longitudinal study of post-stroke mood disorders: findings during the initial evaluation. Stroke. 1983;14:736–744. 39. Parikh RM, Robinson RG, Lipsey JR, et al. The impact of post-stroke depression on recovery in activities of daily living over two year follow-up. Arch Neurol. 1990;47:785–789. 40. Wade DT, Legh-Smith J, Hewer RA. Depressed mood after stroke, a community study of its frequency. Br J Psychiatry. 1987;151:200–205. 41. Kanner AM. Depression in neurologic disorders. Cambridge: Cambridge Medical Communications, 2005. 42. Feinstein A. Multiple sclerosis and depression. In: The clinical neuropsychiatry of multiple sclerosis. Cambridge: Cambridge University Press, 1999:26–50.
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
261
43. Minden SL, Orav JO, Reich P. Depression in multiple sclerosis. Gen Hosp Psychiatry. 1987;9:426–434. 44. Patten SB, Metz LM, Reimer MA. Biopsychosocial correlates of lifetime major depression in a multiple sclerosis population. Mult Scler. 2000;6(2):115–120. 45. Kessler RC, McGonagle KA, Zhao S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: results from the National Comorbidity Study. Arch Gen Psychiatry. 1994;51:8–19. 46. Patten SB, Beck CA, Williams JV, et al. Major depression in multiple sclerosis: a population-based perspective. Neurology. 2003;61(11):1524–1527. 47. Minden SL, Schiffer RB. Affective disorders in multiple sclerosis, review and recommendations for clinical research. Arch Neurol. 1990;47:98–104. 48. Schiffer RB, Wineman M, Weitkamp LR. Association between bipolar affective disorder and multiple sclerosis. Am J Psychiatry. 1986;143:94–95. 49. Joffe RT, Lippert GP, Gray TA, et al. Mood disorder and multiple sclerosis. Arch Neurol. 1987;44:376–378. 50. Krupp LB, Alvarez LA, LaRocca NG, et al. Fatigue in multiple sclerosis. Arch Neurol. 1988;45:435–437. 51. Mohr DC, Hart SL, Goldberg A. Effects of treatment for depression on fatigue in multiple sclerosis. Psychosom Med. 2003;65(4):542–547. 52. Sadovnick AD, Eisen K, Ebers GC, et al. Cause of death in patients attending multiple sclerosis clinics. Neurology. 1991;41:1193–1196. 53. Stenager EN, Stenager E. Suicide and patients with neurological diseases— methodologic problems. Arch Neurol. 1992;49:1296–1303. 54. Stenager EN, Stenager E, Koch-Henriksen N, et al. Suicide and multiple sclerosis: an epidemiological investigation. J Neurol Neurosurg Psychiatry. 1992;55:542–545. 55. Jacobs DG, Jamison KR, Baldessarini RJ, et al. Suicide: clinical/risk management issues for psychiatrists. CNS Spectrums. 2000;5:32–48. 56. Feinstein A, O’Connor P, Gray T, Feinstein K. The effects of anxiety on psychiatric morbidity in patients with multiple sclerosis. Mult Scler. 1999;5:323–326. 57. Moran PJ, Mohr DC. The validity of Beck Depression Inventory and Hamilton Rating Scale for Depression items in the assessment of depression among patients with multiple sclerosis. J Behav Med. 2005;28(1):35–41. 58. McGuigan C, Hutchinson M. Unrecognised symptoms of depression in a communitybased population with multiple sclerosis. J Neurol. 2006;253(2):219–223. 59. Gottberg K, Einarsson U, Fredrikson S, et al. Population-based study of depressive symptoms in multiple sclerosis in Stockholm County: association with functioning and sense of coherence. J Neurol Neurosurg Psychiatry. 2007;78(1):60–65. 60. Sullivan MJ, Weinshenker B, Mikail S, et al. Screening for major depression in the early stages of multiple sclerosis. Can J Neurol Sci. 1995;22(3):228–231. 61. Chwastiak L, Ehde DM, Gibbons LE, et al. Depressive symptoms and severity of illness in multiple sclerosis: epidemiologic study of a large community sample. Am J Psychiatry. 2004;161(8):1504. 62. Patten SB, Lavorato DH, Metz LM. Clinical correlates of CES-D depressive symptom ratings in an MS population. Gen Hosp Psychiatry. 2005;27(6):439–445. 63. Gold SM, Schulz H, Mo¨nch A, et al. Cognitive impairment in multiple sclerosis does not affect reliability and validity of self-report health measures. Mult Scler. 2003;9(4):404–410. 64. Benedict RH, Fishman I, McClellan MM, et al. Validity of the Beck Depression Inventory-Fast Screen in multiple sclerosis. Mult Scler. 2003;9(4):393–396.
262
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
65. Mohr DC, Hart SL, Julian L, et al. Screening for depression among patients with multiple sclerosis: two questions may be enough. Mult Scler. 2007;13(2):215–219. 66. Amato MP, Ponziani G, Rossi F, et al. Quality of life in multiple sclerosis: the impact of depression, fatigue and disability. Mult Scler. 2001;7:340–344. 67. Wang JL, Reimer MA, Metz LM, et al. Major depression and quality of life in individuals with multiple sclerosis. Int J Psychiatry Med. 2000;30:309–317. 68. Fruehwald S, Loeffler-Stastka H, Eher R, et al. Depression and quality of life in multiple sclerosis. Acta Neurol Scand. 2001;104:257–261. 69. Lobentanz IS, Asenbaum S, Vass K, et al. Factors influencing quality of life in multiple sclerosis patients: disability, depressive mood, fatigue and sleep quality. Acta Neurol Scand. 2004;110(1):6. 70. Benito-Leo´n J, Morales JM, Rivera-Navarro J. Health-related quality of life and its relationship to cognitive and emotional functioning in multiple sclerosis patients. Eur J Neurol 2002; 9(5): 497–502. 71. Kanner AM, Balabanov A. Depression in epilepsy: How closely related are these two disorders? Neurology 2002;58(suppl 5):S27–39. 72. Jacoby A, Baker GA, Steen N, et al. The clinical course of epilepsy and its psychosocial correlates: findings from a UK community study. Epilepsia. 1996;37(2):148–161. 73. O’Donoghue MF, Goodridge DM, Redhead K, et al. Assessing the psychosocial consequences of epilepsy: a community-based study. Br J Gen Pract. 1999;49(440):211–214. 74. Tellez-Zenteno JSF, Patten SB, Wiebe S. Psychiatric comorbidity in epilepsy: A population-based analysis. Epilepsia. 2007;48(12):2336–2344. 75. Ettinger A, Reed M, Cramer J; Epilepsy Impact Project Group. Depression and comorbidity in community-based patients with epilepsy or asthma. Neurology. 2004;63(6):1008–1014. 76. Blanchet P, Frommer GP. Mood change preceding epileptic seizures. J Nerv Ment Dis. 1986;174:471–476. 77. Williams D. The structure of emotions reflected in epileptic experiences. Brain. 1956;79:29–67. 78. Weil A. Depressive reactions associated with temporal lobe uncinate seizures. J Nerv Ment Dis. 1955;121:505–510. 79. Daly D. Ictal affect. Am J Psych. 1958;115:97–108. 80. Kanner AM, Soto A, Gross-Kanner H. Prevalence and clinical characteristics of postictal psychiatric symptoms in partial epilepsy. Neurology. 2004;62:708–713. 81. Mendez MF, Cummings J, Benson D, et al. Depression in epilepsy. Significance and phenomenology. Arch Neurol. 1986;43:766–770. 82. Kanner, AM, Kozak AM, Frey M. The use of sertraline in patients with epilepsy: is it safe? Epilepsy Behav. 2000;1(2):100–105. 83. Blumer D, Altshuler LL. Affective disorders. In: Engel J, Pedley TA, eds. Epilepsy: a comprehensive textbook, vol. II. Philadelphia: Lippincott-Raven, 1998:2083–2099. 84. Kraepelin E. Psychiatrie, vol 3. Leipzig: Johann Ambrosius Barth, 1923. 85. Robertson M. Carbamazepine and depression. Int Clin Psychopharmacol. 1987;2:23–35. 86. Gilliam F, Kanner AM. The treatment of depression in epilepsy. Epilepsy & Behavior. 2002;3:6.
12 SCREENING FOR DEPRESSION IN NEUROLOGIC DISORDERS
263
87. Robertson MM. Suicide, parasuicide, and epilepsy. In: Engel J, Pedley TA, eds. Epilepsy: a comprehensive textbook. Philadelphia: Lippincott-Raven, 1997. 88. Rafnsson V, Olafsson E, Hauser WA, et al. Cause-specific mortality in adults with unprovoked seizures. A population-based incidence cohort study. Neuroepidemiology. 2001;20(4):232–236. 89. Gilliam FG, Barry JJ, Meador KJ, et al. Rapid detection of major depression in epilepsy: a multicenter study. Lancet Neurology. 2006;5(5):399–405. 90. Jones JE, Herman BP, Woodard JL, et al. Screening for major depression in epilepsy with common self-report depression inventories. Epilepsia. 2005;46(5):731–735. 91. Ettinger AB, Reed ML, Goldberg JF, et al. Prevalence of bipolar symptoms in epilepsy vs other chronic health disorders. Neurology. 2005;65(4):535–540. 92. Snaith RP, Zigmond AS. Hospital Anxiety and Depression Scale. Acta Psychiatr Scand. 1983;67:361–370. 93. Gilbody S, Richards D, Brealey S, et al. Screening for depression in medical settings with the Patient Health Questionnaire: a diagnostic meta-analysis. J Gen Intern Med. 2007;11:1596–1602. 94. Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatr. 1960;23:56–62. 95. Lehrner J, Kalchmayr R, Serles W, et al. Health-related quality of life (HRQOL), activity of daily living (ADL) and depressive mood disorder in temporal lobe epilepsy patients. Seizure. 1999;8(2):88–92. 96. Perrine K, Hermann BP, Meador KJ, et al. The relationship of neuropsychological functioning to quality of life in epilepsy. Arch Neurol. 1995;52(10):997–1003. 97. Gilliam F, Kuzniecky R, Faught E, et al. Patient-validated content of epilepsy-specific quality-of-life measurement. Epilepsia. 1997;38(2):233–236. 98. Hitiris N, Mohanraj R, Norrie J, et al. Predictors of pharmacoresistant epilepsy. Epilepsy Res. 2007;75(2–3):192–196. 99. Anhoury S, Brown RJ, Krishnamoorthy ES, et al. Psychiatric outcome after temporal lobectomy: a predictive study. Epilepsia. 2000;41:1608–1615. 100. Kanner AM, Byrne R, Smith MC, et al. Does a lifetime history of depression predict a worse postsurgical seizure outcome following a temporal lobectomy? [abstract] Ann Neurol. 2006;60:(S:10):19. 101. Tandberg E, Larsen JP, Aarsland D, et al. The occurrence of depression in Parkinson’s disease—a community-based study. Arch Neurol. 1996;53:175–179. 102. Schrag A, Jahanshahi M, Quinn NP. What contributes to depression in Parkinson’s disease? Psychol Med. 2001;31:65–73. 103. Gotham AM, Brown RG, Marsden CD. Depression in Parkinson’s disease: a quantitative and qualitative analysis. J Neurol Neurosurg Psychiatry. 1986;49:381–389. 104. Cummings JL. Depression and Parkinson’s disease: a review. Am J Psychiatry. 1992;149:443–454. 105. Cannas A, Spissu A, Floris GL, et al. Bipolar affective disorder and Parkinson’s disease: rare, insidious and often unrecognized association. Neurol Sci. 2002;23:S67–68. 106. Reijnders JS, Ehrt U, Weber WE, et al. A systematic review of prevalence studies of depression in Parkinson’s disease. Mov Disord. 2008;23(2):183–189. 107. Burn DJ. Beyond the iron mask: towards a better recognition and treatment of depression associated with Parkinson’s disease. Mov Disord. 2002;17:445–454.
264
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
108. Leentjens AF, Verhey FR, Lousberg R, et al. The validity of the Hamilton and Montgomery-Asberg depression rating scales as screening and diagnostic tools for depression in Parkinson’s disease. Int J Geriatr Psychiatry. 2000;15:644–649. 109. Leentjens AF, Verhey FR, Luijckx GJ, et al. The validity of the Beck Depression Inventory as a screening and diagnostic instrument for depression in patients with Parkinson’s disease. Mov Disord. 2000;15(6):1221–1224. 110. Silberman CD, Laks J, Capita˜o CF, et al. Recognizing depression in patients with Parkinson’s disease: accuracy and specificity of two depression rating scale. Arq Neuropsiquiatr. 2006;64(2B):407–411. 111. Mondolo F, Jahanshahi M, Grana` A, et al. Evaluation of anxiety in Parkinson’s disease with some commonly used rating scales. Neurol Sci. 2007;28(5):270–275. 112. Tumas V, Rodrigues GGR, Farias TLA, et al. The accuracy of diagnosis of major depression in patients with Parkinson’s disease: a comparative study among the UPDRS, the Geriatric Depression Scale and the Beck Depression Inventory. Arquivos De Neuro-Psiquiatria. 2008;66(2):152–156. 113. Starkstein SE, Petracca G, Chemerinski E, et al. Depression in classic versus akineticrigid Parkinson’s disease. Mov Disord. 1998;13:29–33. 114. Starkstein SE, Bolduc PL, Preziosi TJ, et al. Cognitive impairments in different stages of Parkinson’s disease. J Neuropsychiatry. 1989;1:243–248. 115. Kuzis G, Sabe L, Tiberti C, et al. Cognitive function in major depression and Parkinson’s disease. Arch Neurol. 1997;54:982–986. 116. Starkstein SE, Bolduc PL, Mayberg HS, et al. Cognitive impairments and depression in Parkinson’s disease: a follow-up study. J Neurol Neurosurg Psychiatry. 1990;53:597–602. 117. Weintraub D, Moberg PJ, Duda JE, et al. Effect of psychiatric and other nonmotor symptoms on disability in Parkinson’s disease. J Am Geriatr Soc. 2004;52:784–788. 118. Starkstein SE, Mayberg HS, Leiguarda R, et al. A prospective longitudinal study of depression, cognitive decline, and physical impairments in patients with Parkinson’s disease. J Neurol Neurosurg Psychiatry. 1992;55:377–382. 119. Ravina B, Camicioli R, Como PG, et al. The impact of depressive symptoms in early Parkinson disease. Neurology. 2007;69(4):E2–3. 120. The Global Parkinson’s Disease Survey Steering Committee. Factors impacting on quality of life in Parkinson’s disease: results from an international survey. Mov Disord. 2002;17:60–67. 121. Kuopio AM, Marttila RJ, Helenius H, et al. The quality of life in Parkinson’s disease. Mov Disord. 2000;15:216–213. 122. Schrag A, Jahanshahi M, Quinn N. What contributes to quality of life in patients with Parkinson’s disease? J Neurol Neurosurg Psychiatry. 2000;69:308–312.
13 SCREENING FOR DEPRESSION IN CANCER CARE Linda E. Carlson, Sheena K. Clifford, Shannon L. Groff, Olga Maciejewski, and Barry D. Bultz
1. 2. 3. 4. 5. 6. 7.
Prevalence of Depression in Cancer Care Screening Methods for Depression Screening for Depression in Oncology Implementing Screening Programs in Oncology Settings Special Issues in Screening Cancer Patients Summary, Integration, Future Directions Acknowledgments
Context There is an increasing awareness of the importance of screening for depression and distress in oncology settings. Researchers have devised quick and simple methods for assessing symptoms in a wide range of patients that are acceptable to both patients and providers, and introduced computerized systems that make it possible to quickly screen a large number of patients efficiently. A large body of data concerning implementation of screening in cancer care seems to suggest that screening can serve to stimulate discussions of psychosocial and mental health issues between patients and oncology staff, but whether screening affects patient outcomes is still unclear.
1. Prevalence of Depression in Cancer Care As with many other medical populations, people suffering with cancer are susceptible to clinical depression. More so than many other illnesses, cancer is associated with a poor prognosis and is in many ways synonymous with fear. 265
266
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
References to the ‘‘Big C’’ and hushed tones prevailed in the past when a diagnosis of cancer was discussed,1 and remnants of these attitudes are still prevalent in many communities and countries worldwide. Hence, beyond the burden of the tumor and the associated treatment, the psychological toll of cancer is significant. Two thorough reviews of the prevalence of depression in cancer have been published in the past few years,2,3 and in addition to several other general reviews,4–6 summaries are available with reference to specific types of cancer (prostate,7 pancreatic,8 advanced disease9) and specific patient groups (children10 and the elderly11). Taking methodologic issues into consideration, the point prevalence of major depressive disorder and depression symptoms comorbid with cancer is most commonly cited between 10% and 25%.2 This rate varies considerably depending on how depression is measured (standard clinical interview, questionnaire), how it is conceptualized, the criteria used to define depression, the types of patients assessed (cancer type, demographics, inpatients versus outpatients), and the point during the cancer treatment trajectory when assessment occurs.3 Massie3 summarized 88 papers investigating depression prevalence in cancer patients: the highest rates of depression were found in head and neck, pancreatic, breast, and lung cancer patients (up to 50% of all patients), with lower rates generally reported in colon cancer, gynecologic cancer, and lymphomas (rates from 8% to 25%). The cancers with higher rates of depression generally have less positive prognoses (pancreatic, lung) or involve disfiguring treatments (head and neck, breast), perhaps explaining these discrepancies.
2. Screening Methods for Depression Methods used for assessment and screening of depression in cancer care are varied, with the gold standard for assessment still considered to be clinical interviews based on DSM-IV or ICD-10 criteria for depression, and ideally the Structured Clinical Interview based on the DSM-IV (SCID). However, long structured or semi-structured interviews are not practical in most clinical settings. The usual caveats for measuring depression in somatic illness, as summarized in Chapter 11, also apply to cancer patients, since many of the symptoms of depression are also common symptoms of cancer or results of treatments such as surgery, chemotherapy, radiation therapy, and hormone therapy.12 For example, sleep is often impaired due to chemotherapy drugs or steroids, and fatigue is a common consequence of many cancer treatments. In addition, weight loss is common due to nausea and vomiting, and immunotherapies such as interferon-alpha are known to cause depressive affect and sad mood. Given this extensive array of somatic symptoms, various diagnostic approaches have been developed to deal with these symptoms, commonly referred to as inclusive, etiologic, substitutive, or exclusive.12,13 The inclusive
13 SCREENING FOR DEPRESSION IN CANCER CARE
267
approach counts all the symptoms of depression, whether or not they may be secondary to the cancer, while the etiologic approach includes only those that are thought not to result from a physical illness (this is the approach used in the DSM and SCID). The substitutive approach replaces symptoms that may be related to the disease (eg, fatigue) with additional cognitive symptoms such as hopelessness or pessimism. Finally, the exclusive approach simply eliminates the most common physical symptoms, fatigue and appetite/weight changes, from the diagnostic criteria. There are pros and cons to each approach, although the etiologic approach is preferred by some14; drawbacks, however, are reliance on inference of causality, which will vary in accuracy depending on the assessor. Future studies should assess which approach leads to the most accurate case-finding method in patients living with cancer. In addition to interview methods, self-report tools are often used to screen for depression. Indeed, self-report methods have several advantages in lowresource environments. Some evidence also suggests that self-report methods allow the early detection of symptoms that would not be detected even by trained clinicians.12 The most common such instruments are the Hospital Anxiety and Depression Scale (HADS), the Beck Depression Inventory (BDI), and the Center for Epidemiologic Studies-Depression tool (CES-D), all of which have been discussed in detail in Chapter 4 and elsewhere. Research in general practice has evaluated the utility of these tools against the goldstandard structured clinical interview with some success.15 Recently shorter one- to four-item instruments (such as the Patient Health Questionnaire twoitem version [PHQ2], the Prime MD, and the World Health Organization [WHO] two-item scale) have been evaluated.16 However, in cancer patients, the prevalence rates are typically higher and therefore results cannot be extrapolated but require further study (see below).
3. Screening for Depression in Oncology Conventional Mood Severity Scales Few studies have been conducted in oncology populations that compare the sensitivity and specificity of short questionnaires for depression against a gold-standard clinical interview. A summary of those investigating the HADS (the most commonly used tool in oncology) that reported both sensitivity and specificity are presented in Table 13.1. Some investigators found the HADS to be a useful instrument in this context. For example, Razavi and colleagues17 compared the HADS to a psychiatric interview in 210 inpatients with cancer using receiver operating characteristic (ROC) analyses. They determined appropriate cutoff scores on the HADS that would maximize sensitivity and specificity (see
268
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Chapter 5 for a discussion of ROC curves). The area under the plotted curve (AUC) provides an estimate of the degree to which each cutoff score discriminates cases relative to the criterion measure ranging from 0.5 (no value) to 1 (perfect). Razavi and colleagues17 found that in relation to an interview-based diagnosis of major depressive disorders, a cutoff score of 19 on the HADS total score gave 70% sensitivity and 75% specificity. Where the outcome was adjustment disorders and major depressive disorders together, a cutoff score of 12v13 on the HADS gave 75% sensitivity and 75% specificity. Other studies have found similar results (see Table 13.1). Three more recent cross-cultural studies, one in Southern Europe,18 one in Japan,19 and the third in Australia,20 looked at the HADS compared to either an ICD-10 diagnostic interview or psychiatrist diagnosis based on an interview. All found the HADS to have relatively good predictive value. In the Japanese study, a cutoff on the HADS of 8v9 resulted in sensitivity of 0.92 and specificity of 0.57 against the diagnosis of either adjustment disorders or major depression. The AUC for the HADS in the Italian study was better at 0.89, with the best cutoff identified as 9v10, resulting in sensitivity of 0.86 and specificity of 0.82, against the criterion of diagnosis with any ICD-10 anxiety, adjustment, or major depressive disorder. High values were maintained when the criteria were changed to just look at the HADS in relation to anxiety and adjustment disorders, with an AUC of 0.86, sensitivity of 0.83, and specificity of 0.82 for a HADS cutoff of 10 or more. For mood disorders alone, a HADS total score cutoff of 15 or more was associated with even higher AUC of 0.96, sensitivity of 0.85, and high specificity of 0.96. Hence, higher cutoffs can be used to maximize positive predictive value but at the expense of negative predictive value. Not all studies found both high specificity and sensitivity of the HADS. The Australian study with only breast cancer patients found the recommended cutoff of 10v11 had good specificity at 0.97 but low sensitivity (0.16) to detect both major and minor depression.20 The best cutoff to maximize both indices in that study was 7v8 on the depression subscale, which resulted in sensitivity of 0.46 and specificity of 0.94, still not as good as in the Italian sample. Similarly, Hall and associates21 found that in women with early-stage breast cancer, neither the anxiety or depression subscales of the HADS provided adequate sensitivity, although the specificity was high (see Table 13.1). The BDI was also used against a psychiatric interview in the Australian study with breast cancer patients.20 They found the BDI to be a better instrument than the HADS in that a cutoff of 5 resulted in a sensitivity of 0.73 and specificity of 0.74, with three quarters of the patients correctly classified with major and minor depression. Berard
Table 13.1. Cutoff Points of the HADS and BDI to Maximize Sensitivity and Specificity in Cancer Patients Reference
Population
Measure
Criterion
Cutoff
SE (%)
SP (%)
HADS FullScale Studies Grassi et al., 200718
109 patients with mixed diagnoses
HADS
ICD-10 psychiatric interview – anxiety, adjustment, or major depressive disorders
10 full scale
86
82
Grassi et al, 200718
109 patients with mixed diagnoses
HADS
Adjustment or major depression
16 full scale
85
96
Razavi et al., 199017
210 cancer inpatients with mixed diagnoses
HADS
Endicott criteria – major depression
19 full scale
70
75
Razavi et al., 199017
210 cancer inpatients with mixed diagnoses
HADS
Adjustment and major depression
13 full scale
75
75
Hopwood et al., 199156
81 patients with metastatic breast cancer
HADS
Clinical interview schedule for DSM-III – affective disorders
11 full scale
75
75
Ibbotson et al., 199457
513 outpatients with mixed cancer diagnoses
HADS
Psychiatric Assessment Schedule Interview for DSM-III – GAD plus major depression (no AD)
14 full scale
80
76
100 patients with breast, head and neck cancer, and lymphoma
HADS
Structured psychiatric interview for DSM-IV – major depression only
8 depression
71
95
HADS Subxcale Studies Berard et al., 199822
(Continued )
Table 13.1. (Continued) Reference
Population
Measure
Criterion
Cutoff
SE (%)
SP (%)
Hall et al., 199921
266 women with early-stage breast cancer
HADS
Present State Examination for DSM-III – Major depression and GAD separately
11 depression 11 anxiety 8 depression 8 anxiety
14 24 33 64
98 97 93 84
LloydWilliams et al., 200158
100 patients with metastatic cancer – mixed diagnoses
HADS
Present State Examination for ICD-10 – major depression
19 full scale 11 depression 10 anxiety
68 54 59
67 74 68
Akizuki et al., 200319
275 breast, lung, lymphoma and leukemia patients
HADS
Psychiatric interview based on DSV-IV – adjustment disorders and major depression
9 depression
92
57
Love et al., 200420
227 women with metastatic breast cancer
HADS
Monash Interview for Liaison Psychiatry based on DSM-IV – adjustment disorder and major depression
8 depression
46
94
100 patients with breast, head and neck cancer and lymphoma
BDI
Major depression only
16
86
95
227 women with metastatic breast cancer
BDI
Adjustment disorder and major depression
5
73
74
BDI studies Berard et al., 199822 Love et al., 200420
AD, Affective Disorders; BDI, Beck Depression Inventory; GAD, Generalized Anxiety Disorder; HADS, Hospital Anxiety and Depression Scale; SE, sensitivity; SP, specificity
13 SCREENING FOR DEPRESSION IN CANCER CARE
271
and coworkers22 also used the BDI but found that a much higher cutoff of 14 best identified cases of major depression. In summary, mood severity scales such as the HADS or BDI can be used with some success in classifying patients as depressed against an interview criterion, but the specific cutoff scores used varied considerably across different patient populations, making it difficult to know which values to apply.
The Distress Paradigm In recent years, with increased attention to the ‘‘patient experience,’’ the field of psychosocial oncology has grown significantly. Once considered an add-on, psychosocial oncology is increasingly being seen as a clinical necessity within any cancer care delivery system. An emerging movement within the context of cancer care has arisen to recognize a concept of broadly defined emotional disturbance associated with cancer, which has been given the general term ‘‘distress.’’23 Distress has recently been discussed as the ‘‘sixth vital sign in cancer care,’’24–26 with the intention to raise the level of awareness and normalization of distress such that it mandates routine assessment. National Comprehensive Cancer Network (NCCN) guidelines recommend screening at the first appointment and regularly thereafter as needed throughout the course of treatment. Distress is a somewhat difficult term to understand clinically as training in mental health does not recognize this term as a diagnostic category, and it is difficult to argue for specific symptoms associated with a diagnosis of ‘‘distress.’’ However, it has great face validity, little stigma, and appeals to common sense, as most individuals have a good idea of what feeling distressed entails. The philosophy behind ‘‘distress’’ as a concept was to destigmatize emotional reactions to cancer by providing a commonsense term without the negative linguistic baggage associated with clinical terms such as depression. Emotional distress refers primarily to a composite of anxiety, depression, and adjustment disorders related to the cancer experience. The NCCN Distress Management Panel has defined distress as: . . . a multi-determined unpleasant emotional experience of a psychological (cognitive, behavioral, emotional), social, and/or spiritual nature that may interfere with the ability to cope effectively with cancer, its physical symptoms and its treatment. Distress extends along a continuum, ranging from common normal feelings of vulnerability, sadness and fears to problems that can become disabling, such as depression, anxiety, panic, social isolation, and spiritual crisis.27
272
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Brief Symptom Inventory (BSI)-18 Distress in cancer populations has been assessed using a number of measures, most notably the Brief Symptom Inventory 18-item version (BSI-18)28,29 and the Distress Thermometer (DT).27 The BSI-18 is derived from the family of instruments developed by Derogatis,28 shortened from the Symptom Checklist-9030 and Brief Symptom Inventory-53,31 both of which have undergone extensive psychometric validation. The BSI-18 consists of 18 items that load on three subscales: depression, anxiety, and somatization (with potentially a fourth factor with only one item assessing suicide). To determine the utility of the BSI-18 for identifying cases of distress in cancer patients, Zabora and coworkers29 conducted a study of sensitivity and specificity of the instrument compared to the longer version of the BSI. Cutoff scores were estimated based on the distribution of standardized t-scores, with the 25th percentile used as the cutoff point for positive case identification. For men, the 25th percentile on the BSI-18 fell at a score of 10 on the Global Severity Index (GSI), and for women it fell at 13. The sensitivity of these cutoffs compared to those on the original 53-item BSI was 91.2%, while specificity was 92.6%. Hence, these values were recommended as cutoffs for distress in cancer populations. However, ROC analyses were not performed in this study, which begs the question whether the optimal balance between sensitivity and specificity is actually achieved at those cutoff values. Two large-scale studies of the BSI-18 in cancer patients using the cutoff scores detailed above29 documented clinically significant levels of distress in approximately 35% to 38% of all patients, in studies with over 4,000 and 3,000 patients, respectively.32,33 The highest rates of distress in both studies were found in lung, pancreatic, and head and neck cancer patients, with lower rates in prostate and gynecologic cancer patients.32,33 This conforms to the same patterns seen with respect to rates of depression diagnosed in different types of cancer.3 Distress Thermometer Recently there has been a great deal of interest in the use of the DT in cancer screening, fuelled primarily by the recommendation for its use by the NCCN in its distress screening guidelines. Several studies have now evaluated the performance of the DT using ROC analyses to determine appropriate cutoff scores on the 0-to-10 scale that would maximize sensitivity and specificity (Table 13.2—adapted from Mitchell34). A recent comprehensive review by Mitchell34 assessed the accuracy of the DT as investigated in 19 different studies, as well as other short screening methods (fewer than five questions) in oncology. Overall, accuracy of the DT in diagnosing depression across six studies of 2,816 patients was reported at 81% sensitivity and 60% specificity. For broadly defined distress, the corresponding values for the DT in nine
Table 13.2. Cutoff Values of the Distress Thermometer (DT) to Maximize Sensitivity and Specificity in Cancer Patients Reference
Population
Criterion
Akizuki et al., 200319
275 breast, lung, lymphoma, and leukemia patients
Psychiatric interview based on DSM-IV – adjustment disorders and major depression
Patrick-Miller et al., 200459
1,272 outpatients with mixed cancers
HADS (cutoff not stated)
Hoffman et al., 200460
68 outpatients with mixed cancers
BSI caseness (t-score = 63)
Akizuki et al., 200561
295 mixed cancer and patients preparing for stem cell transplants
Jacobsen et al., 200535
380 patients with mixed diagnoses
Gil et al., 200536
312 patients with mixed diagnoses
Ransom et al., 200662
DT Cutoff
SE (%)
SP (%)
PPV (%)
NPV (%)
84
61
77
71
79
62
26
95
5
59
71
57
73
Clinical interview for DSM-IV – major depression and adjustment disorders
4
81
82
80
83
HADS >14 BSI-18 >¼10 for males, >¼13 for females HADS total >¼14
4
77
68
44
90
4
66
79
56
85
491 patients awaiting bone marrow transplant
CES-D >=16
5
80
70
46
91
Mehnert et al., 200663
475 outpatients with mixed cancers
HADS anxiety >=8
5
78 83
45 37
69 42
56 80
Adams et al., 200664
340 outpatients with mixed cancers
HADS anxiety >=8 HADS depression >=8
4
91 89
63 57
37 19
97 98
4
Not Stated
HADS depression >=8
(Continued )
Table 13.2. (Continued) Reference
Population
Criterion
Andritsch et al., 200665
128 outpatients receiving chemotherapy
Ohno et al., 200666 Kumar et al., 200667 Ozalp et al., 200768 Gessler et al., 200669 Grassi et al., 200718
160 outpatients with mixed cancers 145 palliative care patients
HADS anxiety >=8 HADS depression >=8 HADS total >14
182 outpatients with mixed cancers 152 outpatients with mixed cancers 109 outpatients patients with mixed diagnoses
SE (%)
SP (%)
PPV (%)
NPV (%)
78 80 93
65 64 31
38 35 41
92 93 89
73
52
46
77
4
74
50
47
76
HADS total >15
4
83
76
57
92
ICD-10 psychiatric interview – anxiety, adjustment, or major depressive disorders
4
80
75
69
84
ICD-10 adjustment disorders, affective disorders, and anxiety HADS total >14
DT Cutoff 4 Not specified 5
BSI, Brief Symptom Inventory; CES-D, Center for Epidemiologic Studies–Depression; DSM-IV, Diagnostic and Statistical Manual for Mental Disorders Version IV; HADS, Hospital Anxiety and Depression Scale; ICD-10, International Classification of Diseases, Version 10; NPV, negative Predictive Value; PPV, positive Predictive Value; SE, sensitivity; SP, specificity.
13 SCREENING FOR DEPRESSION IN CANCER CARE
275
studies of 1,447 patients were 77% and 66%. Four studies of the DT used HADS anxiety as the criterion measure in 2,215 patients and found sensitivity of 77% and specificity of 57%. In detecting depression, distress, and anxiety, the positive predictive value of the DT was much lower than negative predictive value—that is, it was good at ruling out noncases but not as accurate at identifying true cases of distress. Because of this, Mitchell concluded that ultra-short measures cannot be used alone to diagnose anxiety or depression in cancer patients, but can serve well as a first-line screening to rule out cases of depression. More specifically, one of the larger studies conducted on the DT35 validated the DT against the HADS and BSI-18 in five American comprehensive cancer centers by asking 380 patients to complete the DT, problem checklist, HADS, and BSI-18. They conducted ROC analysis on the DT against both criteria and found the AUC for a cutoff score of 4 or more on the DT was 0.80 (against the HADS cutoff score of 15 or more as the criterion) and 0.78 (using the BSI-18 cutoff scores of 10 or more for males and 13 or more for females), which are in the range characterizing good overall test accuracy. Patients with DT scores of 4 or more were more likely to be women, to have a poorer performance status, and to report practical, family, emotional, and physical problems, demonstrating the concurrent validity of the instrument. Cross-cultural validation of the DT has also been undertaken. For example, in Japan researchers assessed the validity of the DT and the HADS against psychiatrist diagnoses of DSM-IV major depression and adjustment disorders in a sample of 275 patients.19 They forward- and back-translated the term ‘‘distress’’ in an attempt to find the appropriate Japanese analogue for the term. Using ROC analysis they determined the best cutoff on the DT that maximized sensitivity and specificity of the detection of adjustment disorders and major depression was 4 or more, with rates of 84% and 61%, respectively. They justified the lower specificity by reasoning that in the case of detecting depression, it is more important to overidentify potential cases rather than miss troubled individuals. A multicenter study in Europe assessed the value of both the DT and a similar scale termed the Mood Thermometer (MT) designed to assess depressed mood in cancer patients using a population from Italy, Portugal, Spain, and Switzerland.36 A convenience sample of 312 cancer outpatients completed the DT, MT, and HADS. The DT was more highly associated with HADS anxiety scores than depression scores, while the MT was related to both HADS anxiety and depression scores and was more highly correlated to HADS scores than was the DT. ROC analyses found that a cutoff point of 4 or more on the DT maximized sensitivity (66%) and specificity (79%) for general psychosocial morbidity (HADS cutoff of 14 or more), while a cutoff of 5 or more identified more severe cases (HADS cutoffs of 19 or more: sensitivity 70%,
276
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
specificity 73%). On the MT, sensitivity and specificity for general psychosocial morbidity were 85% and 72% using the cutoff score of 3 or more. A score of 4 or more on the MT was associated with a sensitivity of 78% and a specificity of 77% in detecting more severe cases. Finally, another Italian study used the ICD-10 diagnostic interview as the gold standard. Grassi and associates18 administered the DT and the HADS to 109 participants, and once again conducted ROC analyses compared to the formal psychiatric diagnoses. The most efficient cutoff score for the DT to optimize sensitivity and specificity was again 4 or more. Other studies published since 2006 also found similar results in terms of high sensitivity and specificity against instruments such as the HADS, but with lower positive predictive values (see Table 13.2). Hence, there is general consensus in North America, Europe, and Asia that scores of 4 or 5 and above on the DT are indicative of levels of distress/depression that are generally accepted to be troubling and require some form of intervention. The DT can serve as a useful tool for accurately ruling out individuals who are not likely to require intervention, but is less accurate in ruling in true-positive cases of distress. It may best be implemented followed by a more comprehensive assessment of those who score over the cutoff value to further determine appropriate referrals.
4. Implementing Screening Programs in Oncology Settings In recent years, there has been considerable interest in computerizing the administration and scoring of short screening questionnaires in oncology37 to improve efficiencies of time and human resource requirements (see Chapter 4). This began primarily with longer assessments of quality of life, a construct that assesses much more than distress or depression, including physical, social, role, and emotional functioning, and common health-related symptoms. This literature is relevant, however, as the technology has since been applied to screening with shorter instruments. In such studies, the selected questionnaire is typically completed on a computerized interface and immediately scored, and a report is produced and presented to treatment staff to inform subsequent clinical decisions. For example, in a crossover, randomized study of touchscreen versus paper completion of two quality-of-life questionnaires, touchscreen was preferred by participants in a ratio of 2:1, within all demographic subgroups. The benefits of the touchscreen for providers were identified as automatic and immediate collection and sharing of data, automatic scoring, information available online, cost and time savings, and printouts available for immediate placement in patients’ charts.38 In another study of the feasibility of collecting
13 SCREENING FOR DEPRESSION IN CANCER CARE
277
standardized self-reported quality-of-life and psychosocial needs via a touchscreen computer, 99% of patients reported the touchscreen as easy to use.39 In the Netherlands, Detmar and colleagues40,41 reported that physicians found quality-of-life summary information to provide a useful overall impression of their patients’ functional health and symptom experience while improving the efficiency of the clinical encounter. Patients were also largely satisfied with the computerized intervention. A recent study administered the HADS online to 3,071 patients attending a cancer facility for follow-up care in a variety of clinics; 85% of all patients were able to complete the questionnaires.42 Patients who were female, were younger than 65 years old, and had more severe illness were most distressed. In a series of studies on the computerized assessment of quality of life by patients immediately prior to appointments, coupled with the immediate provision of quality-of-life summary information to oncologists, our group established excellent acceptance of computerized quality-of-life data by both physicians and patients, in breast cancer and pain and palliative care.43,44 Our current work with distress screening has followed from this and taken a phased approach. Phase I was a baseline cross-sectional assessment of the current level of psychosocial distress in patients, and an assessment of their awareness and use of psychosocial resources.33 Results in a sample of almost 3,000 patients highly representative of the overall patient population confirmed the findings of other studies, with 38% scoring above the BSI-18 cutoff for distress—that is, ‘‘caseness’’ as identified by Zabora and colleagues.29 Cases were more likely to be on active treatment, have a diagnosis other than prostate cancer, belong to an ethnic minority, or be from a low-income family. In Phase II of the program we updated the screening battery to include the DT and replaced the BSI-18 with a new tool called the Psychological Screen for Cancer (PSSCAN).45 The PSSCAN was developed for screening for depression and anxiety in clinical practice and as a research tool, and Part C is a reasonable proxy for the BSI-18, which we chose to replace given copyright-associated cost issues. The entire battery consists of the DT, modified problem checklist,27 PSSCAN, 10-point scales for fatigue and pain, and nutrition questions. Phase II, which has recently been completed, included a threearmed randomized controlled trial of the effect of three different levels of screening in lung and breast cancer outpatients. Outcomes were distress, common problems, anxiety and depression, and awareness and use of psychosocial resources 3 months after initial screening, which occurred during the first oncology appointment. The three conditions evaluated were minimal screening (DT only), full screening, and full screening plus personalized triage. In the triage condition, if the patient chose to be contacted, a staff member phoned within a specified time period to discuss and arrange referral
278
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
options. A total of 1,141 patients enrolled in this study (89% accrual), and 90% of them provided data at the 3-month follow-up. Preliminary results confirm similarly high levels of distress and common problems as identified in Phase I33 and suggest those with high distress who accepted referrals to psychosocial services showed significantly greater decreases in anxiety and depression over time than those who did not accept referrals. The program was also successful in increasing overall awareness of the services available to patients, as well as uptake of services for those who received the triage intervention, compared to baseline data from 3 years previously.
Evaluating Efficacy of Screening Programs in Clinical Oncology Practice Despite great enthusiasm for developing questionnaires to detect emotional complications of cancer, few groups have been able to implement a successful screening program for mood disorders, and even fewer have carried out systematic evaluation of the efficacy of such programs. Table 13.3 summarizes the studies to date that have longitudinally evaluated the impact of psychosocial screening on patient outcomes. Most recent studies have used computerized screening techniques, but one earlier study implemented distress screening over the telephone and subsequently evaluated its impact on quality of life. Maunsell and colleagues46 randomized women newly diagnosed with nonmetastatic breast cancer to receive either usual care, or monthly telephone distress screening followed by triage. For all participants, distress levels decreased over the following year regardless of group assignment. The authors concluded that the minimal psychosocial intervention all participants received as part of their initial cancer care may have been effective in reducing distress in and of itself, without further gain from additional screening. Our early work used a computerized version of the EORTC QLQ C-30 to screen for quality of life in lung cancer patients. Using a sequential cohort design, patients were assigned to either a usual-care control group, who completed the EORTC QLQ-C30 paper version after the clinic appointment, or an experimental group, who completed the questionnaire prior to their first clinic appointment with feedback to staff. Patients reported being equally satisfied with the treatment in both groups, but timely provision of quality-oflife information in the experimental group resulted in greater discussion of qualify-of-life issues and more actions taken by oncologists regarding these issues.47 Velikova and associates48 randomly assigned 28 oncologists treating 286 cancer patients to an intervention group who received feedback of results, an attention-control group who completed questionnaires without
Table 13.3. Summary of Efficacy Studies of Psychosocial Screening in Cancer Populations Reference
Study Design
Sample
Methods
Measures
Results
Conclusions/ Comments
Maunsell et al., 199646
Randomized controlled trial to usual care (control) or telephone distress screening intervention. Women in both groups received brief psychosocial intervention from a social worker at initial treatment.
251 women newly diagnosed with nonmetastatic breast cancer (89% of total population seen at a regional cancer center)
The experimental group had monthly telephone distress screening using the 20-item GHQ for 12 months with additional psychosocial intervention offered to those with high distress. Outcomes assessed 3 and 12 months later.
Baseline: Social Support: SSQ Marital Satisfaction: LWMAT Stressful life events: LES
Distress levels decreased over the study period across groups. No between-group differences were observed with regard to distress, physical health, functional status, social and leisure activities, return to work, or marital satisfaction. Use of outside co-interventions was similar between groups.
This distressscreening program did not improve QL among women who received minimal psychosocial intervention as part of their initial cancer care. This alone may be effective in reducing distress, making it difficult to obtain additional benefit from a screening program.
Randomized trial of a psychological distress screening program after breast cancer: Effects on QL
Primary Outcome: Psychiatric Symptom Index (PSI) Other outcomes: Overall Health Perception (one question) Health Worry (one question) Role performance (leisure; home; social; physical: CHALS) Visits to social workers and other healthcare professionals
(Continued )
Table 13.3. (Continued) Reference
Study Design
Sample
Methods
Measures
Results
Conclusions/ Comments
Taenzer et al., 200047 Impact of computerized QL screening on physician behavior and patient satisfaction in lung cancer outpatients
Sequential cohort study. Patients were sequentially recruited first into a control group, then to the experimental group. The first 26 were assigned to the control group, the next 27 to the experimental group.
57 patients with dx of any-stage lung cancer, out of 170 seen in the lung clinic (33.5%) Groups not different on demographic variables
Control group: After the standard clinic appointment, completed PDIS and paper-andpencil EORTC QLQ-C30, and exit interview
Paper-and-pencil EORTC QLQ-C30 Computerized EORTC QLQ-C30 PDIS Exit interview: Structured interview to document patients’ perception of whether QL concerns indicated on EORTC QLQC30 were addressed during the clinic appointment Medical Record Audit: Total number of QLQC30 categories charted and total number of actions taken were recorded by a research assistant blinded to the study condition
EORTC QLQ C-30: Groups did not differ in the number of items endorsed PDIS: Both groups were equally satisfied with their clinic visit. Satisfaction scores were very high. Exit Interview: Experimental group indicated that significantly more quality-oflife items were discussed during their clinic appointment than the control group (48.9% vs. 23.6%; t = 3.95, p < 0.01).
The tool was effective in detecting increased number of QL concerns during the clinic appointment. A trend was also noted of more QL concerns being charted and more actions being taken to address them (differences were not significant). Limitations: Generalizability is limited due to small sample of patients and nonrandomized design.
Experimental group: Completed computerized EORTC QLQ-C30 and provided a printed report of results to their nurse and physician during the clinic appointment. After the clinic appointment, completed PDIS and exit interview.
Medical Record Audit: Actions regarding a greater
Table 13.3. (Continued) Reference
Study Design
Sample
Methods
Measures
Results
Conclusions/ Comments
number of QL categories were indicated in charts of patients in the experimental than in the control group. McLachlan et al., 200149 Randomized trial of coordinated psychosocial interventions based on patient self-assessments vs. standard care to improve the psychosocial functioning of patients with cancer
Randomized controlled trial Patients were stratified by clinic of origin (eg, lung, breast). Two thirds were assigned to the intervention arm and one third to the control arm within each clinic.
450 cancer patients Inclusion criteria: diagnosis of cancer; attending medical oncology clinic; not attending for very first consultation; fluent in English; ECOG status 2; age 18; adequate follow-up scheduled in the institution; completion of 90% on prestudy items
Completed questionnaires on touchscreen computer prior to appointment. Randomly assigned to intervention or control group in 2:1 ratio. Intervention group: Printed summary of results presented at the clinic appointment. Coordination nurse present at clinic
CNQ short form EORTC QLQ-C30 BDI Short Form Patient satisfaction at 6 months: satisfaction with medical staff, information provision, overall satisfaction (1–4 Likert scale) Primary outcome: Difference between 2 arms with respect to changes from
86% response rate at 2 months and 71% at 6 months. 63% of offered services were not accepted by patients across groups. Greater benefit in intervention over control group in respect to psychological and health information needs at 2 months but no differences at 6 months.
There were no meaningful changes from baseline in QOL between the intervention and usual care groups at 2 and 6 months. The feasibility of using touchscreen technology was endorsed by both groups. Standardized QOL assessments prior to clinic appointments (Continued )
Table 13.3. (Continued) Reference
Detmar et al., 200250 HRQL assessments and patient– physician communication; randomized controlled trial
Study Design
Prospective, longitudinal, randomized crossover trial
Sample
10 physicians, 273 cancer patients Inclusion criteria: after receiving two cycles of chemotherapy
Methods
Measures
Results
Conclusions/ Comments
visit. After visit nurse formulated a personalized care plan based on results of summary report. Control group: Summary report was not made available during the clinic visit. Follow-up at 2 and 6 months. Satisfaction with care received assessed at 6 months. Intervention group: Patients filled out EORTC QLQ-C30 in the waiting room before each visit. Responses were optically scanned into a computer
baseline in psychological needs and information needs measured by CNQ at 2 months Secondary outcomes: Differences in other domains of CNQ, EORTC QLQ-C30, and depression
No significant differences in secondary outcomes at 2 months. No significant differences in levels of satisfaction with care.
facilitate patient– healthcare team communication about QOL issues.
Patient–physician communication: All visits were audiotaped and content analyzed. A score (0–12) of all health-related QL topics
Patient–physician communication: Higher in the intervention group than control; 12 HRQL issues were discussed more frequently.
Significant increase in discussion of HRQL topics. Intervention had only modest effect on patient management
Table 13.3. (Continued) Reference
Study Design Physicians were initially randomized into intervention vs. control group. 10 consecutive patients were recruited for each physician. First study visit was a baseline. Intervention was introduced at second visit and continued until fourth visit. At midpoint, physicians were crossed over: those in the control group were in the intervention group and vice versa.
Sample
Methods
Measures
Results
Conclusions/ Comments
and a graphic summary profile was printed out and given to patients; a copy was also placed in the medical record. Physicians were trained how to interpret the results of the questionnaire.
discussed was the primary study outcome. Physicians’ awareness of patient HRQL: At first and fourth visit both patients and physicians completed COOP and the WONCA. Patient management: Medical records and audiotapes were used to score how many HRQL actions were taken by a physician per patient. Patients’ selfreported HRQL: At first and fourth visit the SF-36 was administered to all patients.
Physicians’ awareness of patient HRQL: there were no significant differences between groups in physician–patient agreement in ratings on COOP/ WONCA charts. Patient management: Significantly more patients from intervention group (23%) received counseling from the physician on how to manage their health problems than in the control group (16%).
activities. Most patients and all physicians reported that HRQL summary of results report helped with patient–physician communication and they recommended continued use of the intervention as standard care in outpatient clinics. Limitations: Large number of tests performed, physician sample was limited; crossover design facilitated carryover and contamination effects.
(Continued )
Table 13.3. (Continued) Reference
Study Design
Sample
Methods
Measures
Results
Conclusions/ Comments
Patient and physician evaluation of the intervention: After fourth visit patients in the intervention group completed a satisfaction survey and brief phone interview; physicians underwent a semistructured interview.
Patient and physician satisfaction: Both patients and physicians reported high satisfaction Patients? HRQL: No group differences in SF36 scales at the fourth visit; intervention group reported significantly higher improvement over time in mental health and role functioning than control group. Consultation duration + evaluation of intervention: No significant differences in visit duration were found; patients
Physicians initially assigned to the intervention group tended to discuss HRQL issues more frequently even when in control group.
Table 13.3. (Continued) Reference
Study Design
Sample
Methods
Measures
Results
Conclusions/ Comments
reported positive feedback about the summary report and so did the physicians. Velikova et al., 200470 Measuring QL in routine oncology practice improves communication and patient well-being: randomized controlled trial
Prospective randomized controlled trial with repeated measures Groups: Intervention: Completion of touchscreen QL questionnaire + feedback of results to physicians Attention-control: Completion of QL questionnaire on
28 oncologists; 286 cancer patients Inclusion criteria: commencing treatment, expected to attend clinic at least three times, fluent in English, not taking part in other HRQL studies, not exhibiting overt psychopathology
Patients were randomly assigned and their clinic encounter was tape-recorded. Those in intervention and attention completed touchscreen questionnaires before each of their clinic encounters. Outcome questionnaires were provided to
Intervention questionnaires: EORTC QLQ-C30; HADS. Outcome measures: FACT-G (v4) primary outcome Process of care measures: Audiotaped encounters were analyzed for content of any quality-of-life
EORTC QLQ C30: A significant overall effect of well-being between groups FACT-G: Scores improved in the intervention vs. control group, but not vs. attentioncontrol group. Attention-control group significantly better than control.
Chronic symptoms were discussed more often due to the intervention. Intervention had a positive impact on patients’ wellbeing. Routine repeated measurements of HRQL may lead to improvements in emotional wellbeing in some patients. (Continued )
Table 13.3. (Continued) Reference
Study Design touchscreen computer, no feedback to physicians Control: no touchscreen measurement of HRQL before clinic encounters Randomizations 2:1:1 in favor of intervention group, stratified by site of cancer
Sample
Methods
Measures
Results
all patients on paper to complete at home and return by mail. Outcomes were assessed: after the baseline encounter, after three study encounters (2–3 months), after 4 months, and at the end of the study (approx. 6 months).
issues included in EORTC QLQ-C30. The content was presented as a list of binary variables (topics discussed or not) and combined score of EORTC symptoms (0–7) and functional issues (0–5). discussed. The combined scores were used as study primary outcome.
Process of Care: The number of EORTC symptoms mentioned was higher in intervention vs. control group. Chronic nonspecific symptoms (sleep, changes in appetite, fatigue) were discussed more often without prolonging the encounters. Physicians used the HRQL data 64% of the time.
Conclusions/ Comments
Table 13.3. (Continued) Reference
Study Design
Sample
Methods
Measures
Results
Conclusions/ Comments
Boyes et al., 200651 Does routine assessment and real-time feedback improve cancer patients’ psychosocial well-being?
Two-group study with alternate consenting patients assigned to treatment and control groups. Assessed at first visit and three following consecutive visits.
95 cancer patients Eligibility criteria: 18 or older, attending first consultation, received active treatment after the first visit, considered by oncologist to be emotionally and physically able to participate
Patients were alternatively assigned by computer into intervention (n = 42) or control group (n = 38). Both groups completed a 15-to 20-min survey on a touchscreen computer. Results of the intervention group made available to physicians; results of the control group were not.
Demographics and cancer characteristics (13 items) Physical symptoms: 12 symptoms associated with chemotherapy and to what extent they interfere with patients’ daily routine (1–3 scale) HADS SCNS measured patients’ level of need for help in 4 domains: psychological (8 items), health systems and information (13), patient care and support (7), physical and daily living (3).
Intervention group reported fewer debilitating physical symptoms than control group. HADS: Anxiety scores decreased in both intervention and control groups from baseline to final follow-up, but the change was not significantly different between groups. Depression scores decreased in the intervention group from baseline to final follow-up and increased in the control group, but the change was not significantly different between groups.
Overall the patients were well functioning at baseline, which presented a limited opportunity to detect changes. Both patients and clinicians provided positive feedback. Even though clinicians were involved in the development of the report and provided positive feedback, they reported that it rarely contributed to their decision making, which may be an important implication for future training.
(Continued )
Table 13.3. (Continued) Reference
Rosenbloom et al., 200752 Assesment is not enough: a randomized controlled trial of the effects of HRQL assessment
Study Design
Randomized clinical trial, stratified by primary cancer. Control group: Data not shared with treatment nurse
Sample
213 patients Eligibility: advanced breast, lung, or colorectal cancer, receiving chemotherapy, at least 6 months of life expectancy
Methods
Control group: FLIC at baseline, 3, 6 months. FACT at 6 months. Data not shared with treatment nurse.
Measures
Results
Acceptability survey was administered to both patients and oncologists.
SCNS: Both groups reported moderate to high need for help, which decreased from baseline to followup; differences between groups were not significant. Only 3 patients reported that their doctor discussed the report with them, but 50% of physicians reported providing feedback to patients based on the report.
FACT-G: 5 subscales plus additional rating of each symptom: better than, worse than, as expected.
Negative mood and age were the two significant differences between groups at baseline, used as covariates.
Conclusions/ Comments
No impact of the intervention, even among the most distressed patients Providing HRQL assessment and structured feedback of results to nursing staff
Table 13.3. (Continued) Reference
Study Design
Sample
Methods
Measures
Results
Conclusions/ Comments
on QL and satisfaction in oncology clinical practice
Assessment control: Baseline, 1- and 2month FACT-G scores were shared with the treatment nurse Structured interview and discussion condition: Structured interview about responses to FACT-G at baseline and 1 and 2 months
Exclusion: brain metastases
Assessment control: FACT-G and FLIC at baseline, and 1, 2, 3, and 6 months. Baseline and 1- and 2-month FACT-G scores were shared with the treatment nurse. Structured interview and discussion condition: FACT-G and FLIC at baseline and 1, 2, 3, and 6 months + structured interview about responses to FACT-G at baseline and 1 and 2 months.
‘‘Worse than’’ triggered the structured interview to focus on the indicated symptom. FLIC: to measure HRQL outcomes Brief POMS-17: to measure distress outcomes PSQ-III: 2 subscales for general satisfaction, and satisfaction with communication Clinical Treatment changes: Items completed by treatment nurse at baseline, 3 and 6 months included: supportive medication changes,
No significant differences in satisfaction or HRQL over time across all groups. Satisfaction and HRQL did not change over the study period. No significant group differences in clinical treatment changes between 3 groups.
prior to clinic visit did not produce improvement in patient outcomes, clinical management, or satisfaction.
(Continued )
Table 13.3. (Continued) Reference
Study Design
Sample
Methods
Measures
Results
Conclusions/ Comments
supportive care changes, referral to supportive services, other clinical changes, changes in dose of chemotherapy as a result of reported side effects or treatment toxicity. BDI, Beck Depression Inventory; COOP, Dartmouth Primary Care Cooperative Information Health Assessment; CHALS, Canada Health and Activity Limitation Survey; CNQ, Cancer Needs Questionnaire; ECOG, Eastern Cooperative Oncology Group performance status; EORTC QLQ C-30, European Organization of Research and Treatment of Cancer Quality of Life Questionnaire C30; FACT-G, Functional Assessment of Cancer Therapy-General; FLIC, Functional Living Index-Cancer; GHQ, General Health Questionnaire; HADS, Hospital Anxiety and Depression Scale; HRQL, health-related quality of life; LES, Life Experiences Scale; LWMAT, LockeWallace Marital Adjustment Test; PDIS, Patient-Doctor Interaction Scale; POMS, Profile of Mood States; PSQ III, Medical Outcomes Study Patient Satisfaction Questionnaire-III; QL, quality of life; SCNS, Supportive Cancer Needs Survey; SF-36, Medical Outcomes Study 36-Item Short Form Health Survey; SSQ, Social Support Questionnaire; WONCA, World Organization Project of National Colleges and Academics.
13 SCREENING FOR DEPRESSION IN CANCER CARE
291
feedback, and a control group with no questionnaires. Patients completed the EORTC QLQ C-30 and the HADS online before their appointment. Oncologists who received the quality-of-life reports asked more about emotional problems, work-related issues, and daily activities, and on average more issues were discussed without extending the time of the consultation. Another group further investigated the utility of providing summary reports of quality of life and depression to the oncology team for a randomly chosen two thirds of patients, with referral to appropriate psychosocial resources.49 Additionally, in the intervention arm a nurse was also present during the consultation and formulated an individualized management plan based on the issues raised and prespecified expert psychosocial algorithms. Six months after randomization there were no significant differences between the two arms in any domain or regarding satisfaction with care. However, the most striking finding was that for patients who were moderately or severely depressed at baseline on the BDI, appropriate triage did result in decreased depression 6 months later compared to the group whose results were not shared with the healthcare team.49 Similarly, Detmar and colleagues50 randomly assigned patients in palliative care to complete computerized quality-of-life assessments and either did or did not provide the graphical presentation of results to physicians. For patients whose physicians did receive the results, more health-related quality-of-life issues were discussed, and more quality-of-life issues were identified by physicians. Boyes and associates, in Australia,51 had patients complete a computerized version of the HADS while waiting to see their oncologist during each visit. Responses were immediately scored and summary reports placed in each patient’s file before the appointment. There were no effects on subsequent anxiety, depression, and perceived needs among those who received the intervention. However, it is possible that the oncologists were not using the report, as only three intervention patients reported that their oncologist discussed the feedback report with them. Most recently, Rosenbloom and colleagues52 randomly assigned 213 patients with metastatic breast, lung, or colorectal cancer to usual care, quality-of-life assessment, or assessment followed by a structured interview (with presentation of symptoms to the treating nurse). There were no improvements seen in patient outcomes, clinical management, or patient satisfaction between the three conditions. In summary, the data on implementation of distress screening followed by evaluation of efficacy on subsequent patient outcomes has shown that such interventions can result in more discussion of psychosocial issues between patients and oncology staff, but there is limited evidence that this results in
292
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
better patient outcomes in the longer term. It appears that screening alone is not enough to result in improvements for patients; screening ideally should be accompanied by triage and referral to appropriate services known to have proven efficacy in treating psychosocial distress, and should be accompanied by training for oncology staff regarding how to make these types of referrals.
5. Special Issues in Screening Cancer Patients In the context of healthcare, not only asking but also acting upon the patient’s most intimate and complex concerns requires a change of practice and a change in assumptions, incorporating the full biopsychosocial model. Psychosocial screening is an opportunity for patients and their support persons to understand the relevance and importance of emotional well-being. One of the primary goals of screening in cancer care is to provide programs that contribute to the normalization and treatment of distress, as implied in the adoption of the concept of ‘‘distress’’ as the ‘‘sixth vital sign in cancer care.’’25,26 Implications of adopting this model are that distress is assessed minimally upon entry into the system and monitored at regular intervals throughout the treatment program. Physicians and other members of the healthcare team also require training in how to access and act upon the information provided through psychosocial screening. Currently patients experience service delivery in many different and often inconsistent ways; the hope is to provide a more streamlined, consistent, meaningful, and proactive experience through application of routine screening. Successful integration into the complex cancer care system is an ongoing process and demands collaboration, integration of theory into practice, flexibility, and communication. In this environment, the need for connection, understanding, and transparency with representatives from all levels is essential. This includes nurses, oncologists, booking clerks, receptionists, patient records staff, managers, administrators, information technology services, and program planners. From a clinical perspective, patient presentation is diverse. Patients who are less distressed at their initial screening can sometimes show significant distress in various areas, including resource needs, depression, anxiety, and coping upon follow-up assessment—these needs have to be acted upon no matter when they arise in an appropriate manner. On the other hand, patients who are extremely distressed at their first screening often report feeling less distressed at later intervals. It is essential within the framework of a combined clinical and research setting that clinical staff are available when the patient identifies a need, so that research findings are acted upon ethically when these needs are identified.
13 SCREENING FOR DEPRESSION IN CANCER CARE
293
Defining, understanding, accommodating, and advocating for the needs of people living with cancer and their support persons is the foundation that drives service delivery. In terms of emotional needs, patients have provided feedback that having psychosocial support as a core component of their medical appointment is important and that it helps them feel cared about. Patients have shared that they appreciate confidentiality and private space provided where they can answer sensitive questions in a discreet way—hence, the physical setting of screening should accommodate these privacy needs. Most, if not all, cancer patients seen for first consultations in cancer clinics bring at least one support person with them. The importance of providing an environment that includes those most important to patients, both in terms of physical space and inclusion in the screening process, is an essential part of providing complete care. In preliminary planning, the choice of screening space and technology must be inclusive in the areas of physical needs and mobility—including access to those in wheelchairs and on stretchers, if necessary. The technology chosen to present the application has to be both psychologically and physically accessible for people with disabilities. At the same time, it has to be efficient and relevant to clinical staff, providing useful feedback in real-time that can be used in the clinical encounter. Research to date has pointed to simple touchscreen computer programs as the best way to balance these needs of patients and families with the needs of the healthcare system. In addition, in recognition of the predominantly older population served in cancer centers (average patient age is in the mid-60s), supports have to been put in place to accommodate patients who are not comfortable with computer use. Finally, taking into consideration diversity in the patient population, screening programs should offer versions of the program in multiple languages or have translators present to assist patients who don’t speak the dominant language, thus ensuring that they can benefit equally from the opportunities screening provides.
6. Summary, Integration, Future Directions There is an increasing awareness of the value and importance of screening for depression and distress in oncology settings, based on research that has consistently documented substantial rates of psychological morbidity in a range of patients, using both conventional measures of depression and anxiety and more recently introduced short screening tools for psychosocial distress. Researchers have devised quick and simple methods for assessing symptoms in a wide range of patients that are acceptable to both patients and providers, and introduced computerized systems that make it possible to quickly screen a large number of patients and provide immediate feedback regarding depression, distress, and quality of life. Despite these advances, little evaluation of the actual downstream
294
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
impact of these programs on patients has been conducted, and most of the work done to date has not resulted in clearly demonstrable benefits. As a result, screening has yet to be implemented into routine clinical practice. A 2005 survey of all NCCN member institutions in the United States treating adults found that of 15 responding centers, 8 (53%) conducted routine distress screening for at least some patient groups, and 4 (27%) were pilottesting screening strategies.53 However, only 20% of surveyed member institutions screened all patients as the NCCN guidelines recommend. In addition, 37.5% of institutions that conducted screening relied only on interviews to identify distressed patients rather than using validated screening tools. In addition, the fiscal costs of implementing screening have not been compared to potential benefits. Some areas of potential cost savings resulting from distress screening may be less use of inappropriate and expensive resources such as visits to the emergency room or unnecessary chemotherapy, which may be used inappropriately to treat anxiety (see Carlson and Bultz54,55 for reviews of medical cost offset). Some form of economic analysis of psychosocial screening may be required by policymakers before large-scale implementation becomes common. The high levels of distress documented in many cancer patients may serve as a call to action and spur future research and program development. Ethically, it can be argued that the documented prevalence of distress and depression in these patients can no longer be ignored. Recognition of distress as the sixth vital sign in cancer care requires service providers to assess and treat this problem—respecting it with the same importance as treatment of physical illness. Given the high prevalence of distress, cancer must be considered a biopsychosocial illness with emotional sequelae that often include accompanying symptoms of depression and anxiety that can be treated. It is the imperative of the treatment and research team to determine how to most reliably and efficiently identify and treat those in need of such care. The several efficacy studies to date that have directly assessed potential benefit to patients of screening with feedback to the medical team have provided inconsistent results. It appears that screening alone is not sufficient to alleviate patient problems; some form of training must be provided to the care team to stimulate appropriate action to treat identified problems, and ideally the required psychosocial services must be available for needy patients. Further research to determine the specifics of how to best act upon information provided from patient screening to optimize patient outcomes is critically needed.
7. Acknowledgments Dr. Linda Carlson is supported by the Enbridge Endowed Research Chair in Psychosocial Oncology, funded by Enbridge Inc., the Canadian Cancer Society
13 SCREENING FOR DEPRESSION IN CANCER CARE
295
Alberta/NWT Division, and the Alberta Cancer Foundation. This program of research has been funded by the Public Health Agency of Canada, and the Alberta Cancer Board Bridge and Pilot Funding and Research Initiatives Programs.
References 1. Sontag S. Illness as metaphor and AIDS and its metaphors. New York: Picador USA, 2001. 2. Pirl WF. Evidence report on the occurrence, assessment, and treatment of depression in cancer patients. J Natl Cancer Inst Monogr. 2004;32:32–39. 3. Massie MJ. Prevalence of depression in patients with cancer. J Natl Cancer Inst Monogr. 2004;32:57–71. 4. Massie MJ, Popkin MK. Depressive disorders. In: Holland J, ed. Psycho-Oncology. New York: Oxford University Press, 1998:518–540. 5. Sellick SM, Crooks DL. Depression and cancer: An appraisal of the literature for prevalence, detection, and practice guideline development for psychological interventions. Psychooncology. 1999;8:315–333. 6. Bottomley A. Depression in cancer patients: A literature review. Eur J Cancer Care (Engl). 1998;7:181–191. 7. Bennett G, Badger TA. Depression in men with prostate cancer. Oncol Nurs Forum. 2005;32:545–556. 8. Boyd AD, Riba M. Depression and pancreatic cancer. J Natl Compr Canc Netw. 2007;5:113–116. 9. Potash M, Breitbart W. Affective disorders in advanced cancer. Hematol Oncol Clin North Am. 2002;16:671–700. 10. Dejong M, Fombonne E. Depression in paediatric cancer: An overview. Psychooncology. 2006;15:553–566. 11. Kua J. The prevalence of psychological and psychiatric sequelae of cancer in the elderly—how much do we know? Ann Acad Med Singapore. 2005;34:250–256. 12. Trask PC. Assessment of depression in cancer patients. J Natl Cancer Inst Monogr. 2004;32:80–92. 13. Newport DJ, Nemeroff CB. Assessment and treatment of depression in the cancer patient. J Psychosom Res. 1998;45:215–237. 14. Rodin G, Craven J, Littlefield C. Depression in the medically ill: an integrated approach. New York: Brunner/Mazel, 1991. 15. Klinkman MS, Coyne JC, Gallo S, et al. Can case-finding instruments be used to improve physician detection of depression in primary care? Arch Fam Med. 1997;6:567–573. 16. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J Gen Pract. 2007;57:144–151. 17. Razavi D, Delvaux N, Farvacques C, et al. Screening for adjustment disorders and major depressive disorders in cancer in-patients. Br J Psychiatry. 1990;156:79–83. 18. Grassi L, Sabato S, Rossi E, Marmai L, Biancosino B. J Affect Disord. 2009 Apr;114(1-3):193–199. 19. Akizuki N, Akechi T, Nakanishi T, et al. Development of a brief screening interview for adjustment disorders and major depression in patients with cancer. Cancer. 2003;97:2605–2613.
296
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
20. Love AW, Grabsch B, Clarke DM, et al. Screening for depression in women with metastatic breast cancer: A comparison of the Beck Depression Inventory Short Form and the Hospital Anxiety and Depression Scale. Aust N Z J Psychiatry. 2004;38:526– 531. 21. Hall A, A’Hern R, Fallowfield L. Are we using appropriate self-report questionnaires for detecting anxiety and depression in women with early breast cancer? Eur J Cancer. 1999;35:79–85. 22. Berard RM, Boermeester F, Viljoen G. Depressive disorders in an out-patient oncology setting: Prevalence, assessment, and management. Psychooncology. 1998;7:112–120. 23. Holland JC. How’s your distress? A simple intervention addressing the emotional impact of cancer can help put the ‘‘care’’ back in caregiving. Oncology (Williston Park). 2007;21:530. 24. Holland JC, Bultz BD, National Comprehensive Cancer Network (NCCN). The NCCN guideline for distress management: A case for making distress the sixth vital sign. J Natl Compr Canc Netw. 2007;5:3–7. 25. Bultz BD, Carlson LE. Emotional distress: The sixth vital sign—future directions in cancer care. Psychooncology. 2006;15:93–95. 26. Bultz BD, Carlson LE. Emotional distress: The sixth vital sign in cancer care. J Clin Oncol. 2005;23:6440–6441. 27. National Comprehensive Cancer Network, Inc. Practice guidelines in oncology— v.1.2002: Distress management. National Comprehensive Cancer Network, Inc; 2002;version 1. 28. Derogatis LR. Brief Symptom Inventory 18: administration, scoring and procedures manual. Minneapolis, MN: NCS Pearson Inc, 2001. 29. Zabora J, Brintzenhofe-Szoc K, Jacobsen P, et al. A new psychosocial screening instrument for use with cancer patients. Psychosomatics. 2001;42:241–246. 30. Derogatis LR. SCL-90-R: administration, scoring and procedures manual-II, 2nd ed. Baltimore, MD: Clinical Psychometric Research, 1983. 31. Derogatis LR. Brief Symptom Inventory: administration, scoring, and procedures manual. National Computer Systems, Inc, 1993. 32. Zabora J, Brintzenhofe-Szoc K, Curbow B, et al. The prevalence of psychological distress by cancer site. Psychooncology. 2001;10:19–28. 33. Carlson LE, Angen M, Cullum J, et al. High levels of untreated distress and fatigue in cancer patients. Br J Cancer. 2004;90:2297–2304. 34. Mitchell AJ. Pooled results from 38 analyses of the accuracy of Distress Thermometer and other ultra-short methods of detecting cancer-related mood disorders. J Clin Oncol. 2007;25:4670–4681. 35. Jacobsen PB, Donovan KA, Trask PC, et al. Screening for psychologic distress in ambulatory cancer patients. Cancer. 2005;103:1494–1502. 36. Gil F, Grassi L, Travado L, et al, Southern European Psycho-Oncology Study Group. Use of distress and depression thermometers to measure psychosocial morbidity among Southern European cancer patients. Support Care Cancer. 2005;13:600–606. 37. Wright EP, Selby PJ, Crawford M, et al. Feasibility and compliance of automated measurement of quality of life in oncology practice. J Clin Oncol. 2003;21:374–382. 38. Velikova G, Wright EP, Smith AB, et al. Automated collection of quality-of-life data: A comparison of paper and computer touch-screen questionnaires. J Clin Oncol. 1999;17:998–1007.
13 SCREENING FOR DEPRESSION IN CANCER CARE
297
39. Allenby A, Matthews J, Beresford J, et al. The application of computer touch-screen technology in screening for psychosocial distress in an ambulatory oncology setting. Eur J Cancer Care (Engl ). 2002;11:245–253. 40. Detmar SB, Muller MJ, Wever LD. The patient-physician relationship. patientphysician communication during outpatient palliative treatment visits: An observational study. JAMA. 2001;285:1351–1357. 41. Detmar SB, Aaronson NK. Quality of life assessment in daily clinical oncology practice: A feasibility study. Eur J Cancer. 1998;34:1181–1186. 42. Strong V, Waters R, Hibberd C, et al. Emotional distress in cancer patients: The Edinburgh cancer centre symptom study. Br J Cancer. 2007;96:868–874. 43. Taenzer PA, Speca M, Atkinson MJ, et al. Computerized quality of life screening in an oncology clinic. Cancer Pract. 1997;5:168–175. 44. Carlson LE, Speca M, Hagen N, et al. Computerized quality-of-life screening in a cancer pain clinic. J Palliat Care. 2001;17:46–52. 45. Linden W, Yi D, Barroetavena MC, et al. Development and validation of a psychosocial screening instrument for cancer. Health Qual Life Outcomes. 2005;3:54. 46. Maunsell E, Brisson J, Deschenes L, et al. Randomized trial of a psychologic distress screening program after breast cancer: Effects on quality of life. J Clin Oncol. 1996;14:2747–2755. 47. Taenzer P, Bultz BD, Carlson LE, et al. Impact of computerized quality of life screening on physician behaviour and patient satisfaction in lung cancer outpatients. Psychooncology. 2000;9:203–213. 48. Velikova G, Brown JM, Smith AB, et al. Computer-based quality of life questionnaires may contribute to doctor-patient interactions in oncology. Br J Cancer. 2002;86:51–59. 49. McLachlan SA, Allenby A, Matthews J, et al. Randomized trial of coordinated psychosocial interventions based on patient self-assessments versus standard care to improve the psychosocial functioning of patients with cancer. J Clin Oncol. 2001;19:4117–4125. 50. Detmar SB, Muller MJ, Schornagel JH, et al. Health-related quality-of-life assessments and patient-physician communication: A randomized controlled trial. JAMA. 2002;288:3027–3034. 51. Boyes A, Newell S, Girgis A, et al. Does routine assessment and real-time feedback improve cancer patients’ psychosocial well-being? Eur J Cancer Care (Engl). 2006;15:163–171. 52. Rosenbloom SK, Victorson DE, Hahn EA, et al. Assessment is not enough: A randomized controlled trial of the effects of HRQL assessment on quality of life and satisfaction in oncology clinical practice. Psychooncology. 2007;16:1069–1079. 53. Jacobsen PB, Ransom S. Implementation of NCCN distress management guidelines by member institutions. J Natl Compr Canc Netw. 2007;5:99–103. 54. Carlson LE, Bultz BD. Efficacy and medical cost offset of psychosocial interventions in cancer care: Making the case for economic analyses. Psychooncology. 2004;13:837– 849. 55. Carlson LE, Bultz BD. Benefits of psychosocial oncology care: Improved quality of life and medical cost offset. Health Qual Life Outcomes. 2003;1:8. 56. Hopwood P, Howell A, Maguire P. Screening for psychiatric morbidity in patients with advanced breast cancer: Validation of two self-report questionnaires. Br J Cancer. 1991;64:353–356. 57. Ibbotson T, Maguire P, Selby P, et al. Screening for anxiety and depression in cancer patients: The effects of disease and treatment. Eur J Cancer. 1994;30A:37–40.
298
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
58. Lloyd-Williams M, Friedman T, Rudd N. An analysis of the validity of the hospital anxiety and depression scale as a screening tool in patients with advanced metastatic cancer. J Pain Symptom Manage. 2001;22:990–996. 59. Patrick-Miller LJ, Broccoli TL, Much JK. Validation of the Distress Thermometer: A single item screen to detect clinically significant psychological distress in ambulatory oncology patients. J Clin Oncol. 2004;24:Abstr 6024. 60. Hoffman BM, Zevon MA, D’Arrigo MC, et al. Screening for distress in cancer patients: The NCCN rapid-screening measure. Psychooncology. 2004;13:792–799. 61. Akizuki N, Yamawaki S, Akechi T, et al. Development of an impact thermometer for use in combination with the Distress Thermometer as a brief screening tool for adjustment disorders and/or major depression in cancer patients. J Pain Symptom Manage. 2005;29:91–99. 62. Ransom S, Jacobsen PB, Booth-Jones M. Validation of the Distress Thermometer with bone marrow transplant patients. Psychooncology. 2006;15:604–612. 63. Mehnert A, Muller D, Lehmann C. Die deutsche version des NCCN distressthermometers: Empirische Prufung eines screening-instruments zur erfassung psychosozialer belastung bei krebspatienten. [in German with English translation by author]. Zeitschrift fur Psychiatrie Psychologie und Psychotherapie. 2006;54:213–223. 64. Adams CA, Carter GL, Clover KA. Concurrent validity of the Distress Thermometer with other validated measures of psychological distress. Psychooncology. 2006;15:s105. 65. Andritsch E, Ladinek V, Zlokikovits S. Identifying symptom burden and distress of cancer patients with chemotherapy: A pilot study for an Austrian sample. Psychooncology. 2006;15:s158. 66. Ohno T, Noguchi W, Nakayama Y, et al. How do we interpret the answer ‘‘neither’’ when physicians ask patients with cancer ‘‘are you depressed or not?’’ J Palliat Med. 2006;9:861–865. 67. Kumar TM, Venkateswaran C, Bostock N. Screening for psychosocial distress: Crosscultural issues. Psychooncology. 2006;15:S692. 68. Ozalp E, Cankurtaran ES, Soygur H, et al. Screening for psychological distress in Turkish cancer patients. Psychooncology. 2007;16:304–311. 69. Gessler SF, Lowe J, Daniells E. UK validation of the Distress Thermometer. Psychooncology. 2006;15:s107. 70. Velikova G, Booth L, Smith AB, et al. Measuring quality of life in routine oncology practice improves communication and patient well-being: A randomized controlled trial. J Clin Oncol. 2004;22:714–772.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS Jodi Barton and Philip Boyce
1. 2. 3. 4. 5. 6.
Introduction: Perinatal Screening in Context Why Screen, and What Are We Screening For? Screening Practices in Perinatal Settings Screening Guidelines and Recommendations Evidence-Based Comparison of Screening Methods Implementation in Practice: Does Screening Make any Real-World Difference? 7. Service Delivery and Treatment Implications 8. Summary and Key Recommendations
Context Implementing screening in perinatal settings poses a potentially complex set of issues, but screening is nonetheless increasingly being recommended and even mandated. When should screening occur—during pregnancy, postpartum, or both? What instrument should be used? How acceptable is screening to mothers? What difference does screening make to the management of postpartum depression? This chapter presents an evidence-based approach to all aspects of perinatal screening.
1.
Introduction: Perinatal Screening in Context
Over the past 20 years there has been considerable interest in psychiatric disorders arising during the course of pregnancy and following childbirth. Most of the attention has been focused on depressive disorders arising within the first 3 months to 1 year after childbirth, commonly referred to as postnatal 299
300
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
or postpartum depression. Pregnancy was once thought to be protective against depressive symptoms; however, women are just as likely to experience depressive symptoms while pregnant as they are during the postpartum period.1,2 The mean prevalence of antenatal depression is between 10.7%3 and 12%,4 with increasing prevalence and severity2 through the second and third trimesters. This is comparable with the 10% to 15% of women who develop postpartum depression.5 While the DSM-IV official recognition of postpartum depression arising after childbirth is confined to a postpartum specifier for those episodes of major depression that have an onset within 4 weeks after delivery, increasing knowledge of depression during the antenatal period has given rise to its equally important early recognition and treatment. Whatever the specifier of postpartum depression in the DSM-IV, depression at this time has been granted considerable importance because of its potential adverse impacts upon child development and maternal morbidity and mortality;6 and because of the treatment challenges inherent in pregnant and breastfeeding women.7 Even though the consequences of postpartum depression have been recognized, the illness itself is frequently not identified; it has been estimated that between 50%8 and 75%9 of the women suffering from postpartum depression will have it identified and potentially treated. More recent work has focused attention on depression during the course of pregnancy, so-called antenatal depression. However, the validity of measuring depression during pregnancy and in the postpartum period is not clear, especially the boundary between depressive symptoms and clinically significant depressive disorder. The timing of onset of the disorder is also important; it may more accurately represent a continuation of a depressive episode that had commenced prior to conception. The perinatal period is technically defined as the period between 154 days (22 weeks) of gestation and 28 days postpartum.10 While the DSM-IV definition of postpartum depression is an onset within 4 weeks of parturition, symptoms of depression often develop much later within the first year of the infant’s life. Practically, as healthcare providers, the entire antenatal period and up to 1 year after delivery is managed under a broader perinatal umbrella. It is a time when women will have more contact with healthcare professionals than any other time in adulthood. This is why it is considered an opportune time to identify those at risk for developing depression (so that prevention can take place) and to detect depression so that early intervention can be instituted. This has encouraged the development of a variety of screening strategies to identify risk and detect disorder that will be discussed in this chapter.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS
2.
301
Why Screen, and What Are We Screening For?
Screening for a disorder (or a marker for disorder, such as HB1Ac for diabetes) enables health practitioners to provide early intervention to reduce or eliminate negative outcomes. Screening for depression during the perinatal period permits both obstetric and mental health clinicians to identify women who are experiencing depression or anxiety, associated with childbearing, or to attempt to identify women at risk of developing depression. Early intervention strategies can be targeted directly at women who may be most in need of additional support, thereby potentially ameliorating the negative effects that maternal depression can have upon the development of the infant and the mother–infant relationship. By targeting those who may need intervention, we can more effectively use the physical and staffing resources available to clinical care providers. There are two predominant approaches to screening in the perinatal period, one aimed at detection of an occult disorder and the other to identify risk factors for the disorder. Further predictive methods are under development to determine who is at risk of future episodes of depression. We also need to consider the timing of screening, which complicates the strategies chosen. The variable approaches that have been taken and reported are as follows.
Screening for Depressive Symptoms in the Postpartum Period Screening is usually conducted at routine postpartum checkups or at a ‘‘wellbaby’’ 6-week-postpartum health checkup using the Edinburgh Postnatal Depression Scale (EPDS).11 Screening usually occurs at general practices, pediatricians, or maternal and child healthcare centers. A cutoff score of 12 on the EPDS in the postpartum period is suggested to indicate that major depression12 is likely to be present and is typically used to trigger further assessment, referral, and treatment.13
Screening for Depressive Symptoms in the Antenatal Period Screening occurs during pregnancy using questionnaires such as the EPDS, the Beck Depression Inventory II (BDI-II), the Postpartum Depression Screening Scale (PDSS), the Center for Epidemiological Studies Depression Scale (CESD),14 and the Prime MD-PHQ.15 However, none of these questionnaires have been suitably validated for this purpose in this population, and cutoff scores that accurately predict major or minor depression have not been adequately established. Further, treatment options are limited, especially for major depression. Screening is usually conducted in conjunction with antenatal care visits at clinics and in general practice.
302
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Screening for Depressive Symptoms/Psychological Distress During the Antenatal Period for Those at Risk of Developing Postnatal Depression There is little evidence to show that the robust prediction of depression in the postpartum period can be based upon psychological distress during pregnancy. Screening is usually conducted using the instruments and methods listed above. Further effective interventions to prevent the development of postnatal depression have yet to be clearly identified, and whether individual versus general risk aversion should be implemented remains unclear.
Screening for Psychosocial Risk Factors During the Antenatal Period for Risk of Developing Postnatal Depression The Holy Grail of screening in perinatal psychiatry has been to develop instruments that identify significant risk factors that will reliably predict subsequent postpartum depression. A series of predictive tools have been developed and significant risk factors for depression in the postpartum period have been identified.16–18 While it seems reasonable to generalize to include the antenatal period, appropriate investigation would be indicated. Some obstetric care models already screen for the presence of risk factors such as domestic violence in routine care. This screening strategy is similarly encumbered by less-than-adequate resources to routinely follow up women considered to be at risk. The evaluation and validation of instruments to screen for antenatal risk factors for postnatal depression is an ongoing endeavor. The merit of screening for risk factors remains questionable given that ‘‘most risk factors have poor discriminatory power, or poor positive predictive value’’19 (p. 176). Even if there is a strong association between a risk factor and potential disease outcome, it does not automatically ‘‘follow that the risk factor provides a basis for an effective prediction rule for individual patients’’20 (p. 2616). We need to differentiate between the statistical risk of a population-determined risk factor and the clinical risk that is pertinent to the current status of the individual patient, as risk factors for depression are dynamic rather than static. It is not yet clear whether we are screening for current psychological distress, major or minor depression, or the presence of risk factors that may predict future depressive episodes. The objectives of screening in this clinical population require clarification and strategic development before routine screening, using instruments such as the EPDS, for depression becomes an integral part of obstetric care.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS
3.
303
Screening Practices in Perinatal Settings
Since the advent of routine perinatal depression screening, mixed evidence has emerged about its utility on the basis that even though screening was implemented and treatment offered for women detected as at risk, they often refused treatment.9,21 Reasons for treatment refusal will be discussed below. Attempts at prediction of later risk have shown average sensitivity and specificity and do not always capture the women who are most at risk, as they often do not participate in the screening process and/or refuse subsequent intervention. The majority of studies that have attempted to identify risk factors for perinatal depression have been conducted in postnatal women.22 Many studies also do not take into account racial and cultural variations that are likely to entail different levels of risk and different risk factors. The generalizability of these findings to the antenatal population and in varying cultures needs to be ensured by thorough investigation. Cost-effectiveness not only of screening, but also of outcomes needs to be ensured before wide area screening methods are implemented.23 Screening for depression in perinatal care often becomes the responsibility of obstetric care providers such as midwives, maternal and child health nurses,24,25 general practitioners,26 and obstetricians. Such screening is in addition to the other important health issues managed at busy antenatal clinics and needs to be backed up with adequate training to improve clinicians’ skill base, confidence, and subsequent willingness to implement routine screening. Ideally, mental health services for childbearing women would be co-located with obstetric services; however, this is rarely the case, particularly in busy public hospital settings where time and space are premium assets. Mental health services are not predominantly located in primary care facilities that are used by many women for their obstetric care, and thus the onus remains with the primary care provider. Practically, it is optimal for screening to be conducted by primary antenatal care providers, with adequate mental health training and awareness, due to their proximity to perinatal women during this time. While this may be a practical approach to screening, introducing depression screening into already busy and demanding obstetric practices can be problematic, especially in the antenatal setting. Routine screening can be used where all women are screened at all their visits to their practitioner. Alternatively, strategic screening can be implemented at specific visits (eg, 6 weeks postpartum). The optimal time to screen antenatally has not been established; however, most investigators have screened late in the second trimester or early in the third trimester. The EPDS, the BDI-II, and the PDSS have shown greatest utility and predictive validity to date.8 An alternative strategy, recommended by the National Institute for Clinical Health and Excellence guidelines,27 is the use of two or three simple targeted interview questions aimed at identifying key DSM-IV diagnostic criteria—
304
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
namely, whether the woman has been bothered by feeling down, depressed, or hopeless; whether she has been bothered by having little pleasure or interest in normal activities; and whether she would like to receive further help. Given that individuals may endorse the first two questions but may not in fact be subjectively bothered by it, the third question—whether the woman would like further help with the way she has been feeling—has been suggested (the ‘‘help question’’). The Patient Health Questionnaire (PHQ2) has been developed for this purpose.28 By taking an approach such as this, not only is it simple, sensitive,29 and fast, but the clinician can conserve resources by not referring women who do not, at that time, want or need further assistance from mental health services. The PHQ2 screening strategy has sensitivity and specificity equivalent to the use of the EPDS, both in the antenatal and postnatal period, and has high accuracy to rule out women who are not at risk of being depressed; the negative predictive value is between 97% and 99%.30 It is also appropriate to use with women who have low levels of education, as it is not limited by literacy levels. Pregnant or postnatal patients may also feel more attended to by having the clinician ask about their well-being, rather than by having them complete a pen-and-paper questionnaire. Studies to date show that while obstetric care providers recognize the importance and impact of mental health problems, they also feel they lack adequate knowledge about how to recognize and manage perinatal depression and about where to refer women to for specialized psychiatric help. They often feel screening is difficult to carry out in everyday practice and question whether it leads to better outcomes.31 Practitioner education is a critical element in the implementation of any screening program, as it will ensure more accurate detection, confident independent practice, and potentially the capacity to streamline referrals to psychiatric services.32
4.
Screening Guidelines and Recommendations
The National Screening Committee (NSC) criteria appraise ‘‘the viability, effectiveness and appropriateness of a screening programme.’’ Screening for depression in the perinatal period has been evaluated against these existing guidelines and significant deficits have been found33 (Textbox 14.1 and Table 14.1). The current screening initiatives used do not meet the majority of the criteria to warrant routine screening in national health services. Gaynes and associates23 found similar deficits in the U.S. context and highlighted the need for thorough research in this population. The existing evidence is just too sparse to adequately inform clinicians and clinical policy decision makers about the most appropriate screening methods to be used, whether screening is cost-effective, and whether screening leads to better outcomes for perinatal women and their families.
305
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS
Textbox 14.1.Comparison of Perinatal Screening against National Screening Committee Criteria The Condition Important health problem Adequately understood and detected Cost-effective primary prevention available The Test Validated screening instrument Known and agreed cutoff score Acceptability Agreed policy of diagnostic investigation and treatment options for positive screens Treatment Evidence of effective early intervention Agreed policy on availability of effective treatment Optimal condition management prior to the implementation of screening Screening program RCT evidence of reduction of morbidity/mortality Clinically, socially & ethically acceptable to health professionals and consumers Benefits outweigh risk of harm Cost-effectiveness & value of screening Quality assurance & monitoring Adequate staff & facilities Cost-effectiveness in comparison to existing management options Informed decision making for consumer Justifiable screening criteria & cutoffs for treatment eligibility
[ [ [ [ [
[ [
Either no clear evidence or criteria not met [ Clear evidence and criteria met
5.
Evidence-Based Comparison of Screening Methods
The NSC criteria for screening state that a ‘‘screening test should be safe, simple, precise and validated; a suitable cut-off value should be defined and agreed’’ before any screening program is routinely implemented. Defining and diagnosing a psychiatric disorder is not a simple process and not one aided by definite measurable biomarkers. There is increased opportunity for subjective bias and variable interpretation; this is the case whether questionnaires or interviews are used to screen for depression (see Chapter 2). Further, the
Table 14.1. Screening Guidelines and Recommendations for Best Practice by Country of Origin National Guideline
Date of Release
Country of Origin
Intention
Selected Recommendations
Evaluation of screening for postnatal depression against the NSC handbook criteria33
August 2001
UK
To evaluate screening initiatives against current national guidelines
Antenatal care: routine care for the healthy pregnant woman34
October 2003
UK
To provide a national clinical framework for best practice in routine antenatal care. Covers all aspects of antenatal care, psychiatric assessment considered as singular element.
Postnatal depression and puerperal psychosis. A national clinical guideline35
June 2002
Edinburgh, UK
To provide evidence for clinicians and health consumers about the screening for and prevention and management of postnatal depression
Many national criteria not met particularly with regard to cost effectiveness and outcomes from screening. Insufficient evidence to draw substantial conclusions, though concerns raised about national screening initiatives already implemented. Women should be assessed and interviewed for a history of psychiatric disorder. Women should not be screened routinely with the Edinburgh Postnatal Depression Scale (EPDS) to predict risk of developing postnatal depression. Women should not be offered antenatal education interventions to reduce perinatal or postnatal depression, There is no evidence to support routine screening in the antenatal period to predict the development of postnatal depression. The EPDS should be offered as part of a screening program for postnatal depression at 3 weeks and 6 months postpartum. The EPDS is not a diagnostic tool, diagnosis requiring clinical evaluation.
Table 14.1. (Continued) National Guideline
Date of Release
Country of Origin
Intention
Selected Recommendations
U.S. Preventive Services Task Force36
May 2002
US
To provide evidence for routine screening for depression (not specifically postnatal) in primary practice
Senate Select Committee on Mental Health37
April 2006
Australia
Report of wide-ranging inquiry into the national mental health strategy and objective achievement. Intended as recommendations for strategic reform.
Recommends screening adults for depression in clinical practices that have systems in place to assure accurate diagnosis, effective treatment, and follow-up. Some evidence of costeffectiveness provided. That a national strategy for perinatal health services be developed, including early identification, intervention, prevention, and education and support of all new parents. Recommendations developed subsequent to submission of findings of ‘‘beyondblue’’ postnatal depression program.
308
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
treatment of depression during pregnancy and for breastfeeding mothers is not simple; thus, better care may not necessarily follow better identification. Short depression screening questionnaires (with 10 items or less) have become a popular method of screening for depression in the perinatal period. A range of questionnaires have been tested. The most commonly used is the EPDS, which was initially designed by Cox and Holden11 as a detection tool to assist health visitors in assessing the mental health of new mothers during home visits. Since then, screening for postpartum depression has gained substantial momentum and validation. The EPDS is short, easy to administer, and easy to score, has reasonable predictive validity in the postnatal context, and has good face validity with the consumer. The EPDS is not the only screening instrument used, but it is the most widely used and has been more widely tested, providing the strongest data of its utility. Screening for depression during pregnancy is a more recent initiative making convenient use of routine obstetric care. Any screening instrument used must have not only construct validity but also face validity—that is, it must be acceptable to the population in which it is to be used. This applies also to the use of interview questions to screen for depressive symptoms or distress, as is advocated by the National Institute for Clinical Excellence (NICE) guidelines in preference for screening questionnaires. There is currently not enough evidence about the comparative validity of interviewing versus questionnaire approaches to suggest the superiority of one over the other—nor if, in fact, routine screening should even be conducted. Reviews of screening instruments for postpartum depression found that the EPDS, the BDI-II, and the PDSS38 have greater sensitivity and specificity in the perinatal population than other measures that have been tested.8,23 The benefit of the EPDS over the other two measures is its brevity: it has only 10 items, compared with 21 and 35 items on the BDI-II and PDSS respectively. This makes completion and scoring easier. A methodologic problem in the validation of many questionnaires is that they are not validated in the intended population, nor against a gold-standard clinical interview. A summary of the review studies and their findings is given in Table 14.2. Higher cutoff points were usually used to detect major depression only, where lower scores were used to detect possible major or minor depression. Lowering the cutoff increases the number of false positives and reduces the specificity, or the ability of the instrument to detect those who truly do not have depressive symptoms. Clinicians would need to clarify their screening objectives to decide whether a higher or lower cutoff best meets their needs; cutoff score ranges are given in Table 14.2. The instrument of choice is best dictated by the clinical population, and it would be ideal to choose an instrument that has been adequately validated in that population, with particular regard given to the appropriate cutoff score to be used in each unique culture.
309
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS
There have been recent initiatives to incorporate screening for psychosocial risk factors for depression in the perinatal setting, in addition to screening for depressive symptoms.39 Many health services already routinely screen for known risk factors such as family violence and financial difficulties as part of routine antenatal clinic intake interviews. While it is important to know what risk factors are pertinent for a depressed woman and likely to be contributing to her symptoms, we suggest caution in this additive approach as a means of detecting women who may be depressed. Studies that have evaluated the utility of psychosocial risk screening instruments have so far shown poor sensitivity17 or do not provide any evidence of their predictive value.40 Dichotomizing risk factors for depression into a categorical yes or no, as can happen with the use of risk factor screening strategies, may oversimplify the impact of risk factors on psychological well-being. Risk factors are dimensional in nature and are perhaps best considered on a continuum, such as number of significant life events or adequacy of social support. There is also no evidence to indicate at what point risk for depression becomes clinically significant: How many risk factors need to be present? How severe do they need to be? Table 14.2. Summary of Perinatal Depression Screening Instrument Sensitivity, Specificity, Cutoff, and Positive Predictive Value (PPV) Ranges Instrument
EPDS
Time of Screening Antenatal 28–34/40 weeks Postnatal 4 days to 12 weeks
BDI
Postnatal
BDI-II
Postnatal
PDSS CES-D
Postnatal Postnatal
Depression Screened For
Cutoff Range
PPV (%)*
Sensitivity Range
Specificity Range
Major
12–15
8–35
1.0
0.79–0.96
Major/minor
11–14
0.57–0.71
0.72–0.95
Major
10–13
0.75–1.0
0.7–0.99
Major/minor
10–13 8–14 11–21 10 21 15 81 16–21
0.44–0.81 0.23–0.79 0.32–0.68 0.48 0.56 0.57 0.94 0.6–0.43
0.77–0.92 0.43–0.96 0.88–0.99 0.86 1.0 0.97 0.98 0.92–0.97
Major Major/minor Major Major/minor Major Major/minor
19–92
34–53 74–100 33–88 53
*At 13% prevalence rate estimate. CES-D, Center for Epidemiological Studies Depression Scale; EPDS, Edinburgh Postnatal Depression Scale; BDI-II, Beck Depression Inventory; PDSS, Postpartum Depression Screening Scale. Data from references 8, 12, 22, 23, 41, 42.
310
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
6. Implementation in Practice: Does Screening Make any Real-World Difference? Let us consider the case of a woman who is 28 weeks pregnant and scores 21 on the EPDS, clearly indicating psychological distress, maybe even a depressive episode. What then? Is there somewhere we can refer the patient? Are appropriate treatments available? Will the patient have expedient access to support and treatment? Are treatment facilities adequately resourced and staffed by appropriately trained personnel? She is now referred for further assessment and perhaps treatment, but she declines the services offered. Many women identified using the EPDS as a screening tool for probable depression in the beyondblue depression screening study declined follow-up care. This is not an uncommon finding in both research and clinical care and indicates many women’s tendency to mask their distress with stoicism in their endeavor to ‘‘stay strong’’ for themselves and their baby/family, or to dismiss their distress and cope as best they can. Some women also decline psychiatric care for fear of mandatory reporting to social service agencies (where such protocols exist), which is of particular concern for women with severe mental illness (Textbox 14.2). Table 14.3 outlines the potential outcomes from screening against true diagnosis of depression. The inherent inaccuracy of depression screening leads to high numbers of false positives, which in turn leads to inefficient use of available resources in both psychiatric and obstetric settings. The World
Textbox 14.2.Reasons for Refusal of Treatment for Perinatal Depressive Symptoms
• • • • • • • • • •
Lack of knowledge of condition and resources Cultural factors Somatized distress and help seeking for treatment of physical condition Denial Accepted as a normal part of being a mother Don’t want to be a burden Fear of loss of child through social services Lack time or willingness to attend appointments Health professionals normalizing/dismissing depressive symptoms Time constraints in primary care
Dennis CL, Chung-Lee L. Postpartum depression help-seeking barriers and maternal treatment preferences: a qualitative systematic review. Birth. 2006;33:323–331.
311
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS
Table 14.3. Possible Reasons for False Positives and Negatives in Screening Depression diagnosis Depressed Positive screening outcome
TRUE POSITIVE
FALSE POSITIVE
• Full diagnostic assessment • Referral for treatment • Potential for appropriate perinatal psychiatric
• False referral • Ineffective use of
care – if pathways to care are appropriately established and resourced. Negative screening outcome
Non-depressed
FALSE NEGATIVE • Not offered follow-up assessment or treatment • Fall through the gaps of clinical care • Increased risk for mother and infant on social, emotional, and cognitive level • May not be seeking help for psychological distress or masking symptoms (stoicism)
clinical resources
• Inappropriate labeling TRUE NEGATIVE • Not offered followup • Clinical resources not required
Mental Health Survey noted that ‘‘a meaningful number of services are going to those without apparent needs. Such potential diversion of limited treatment resources to individuals without apparent needs would be of concern in view of the magnitude of unmet needs for patients with clearly defined and serious disorders.’’43 Not only does this reflect the limitations of depression screening strategies but it also questions the merit of doing so, especially when effective treatment options for pregnant or postpartum women are not clearly defined. Targeted multilevel screening is recommended to make the most efficient use of the health resources available (Fig. 14.1). A multilevel strategy also permits the detection of women with different health risk profiles44 and may assist in the assessment of their unique clinical risk and management needs.
7.
Service Delivery and Treatment Implications
Beyond the issue of screening and accurate detection of depressive symptoms in the perinatal setting, according to the NSC screening program criteria there must be evidence of effective early intervention, agreed-on policies on the availability of effective treatment, and optimal condition management in place before the implementation of screening. The overall focus on prevention and early intervention has put great emphasis on the perinatal period as a seemingly ideal time to provide interventions to prevent postnatal depression. This is due to the high level of contact
ANTENATAL
6 WEEKS POSTNATAL Targeted Interview • Have you noticed any change in your mood or the way you feel about things since you became pregnant? (Asked at each visit )
NO
YES
NO
• Have you been depressed or anxious before? • Is the way you are feeling causing you distress? • Would you like further help with the way you are feeling?
• No intervention • Educate about mental health maintenance strategies for new mothers • Provide information on available resources
Assessment • Risk factor assessment • Symptom screening using EPDS OR BDI-ll OR PDSS
EPDS ≥ 10 and < 15 (or other appropriate antenatal cut off score if alternate instrument used ) Positive for risk factors
EPDS ≥ 15 (or other appropriate antenatal cut off score if alternate instrument used ) Positive for risk factors
Monitoring
Referral
• Ongoing monitoring for change. Antenatal staff to repeat screening at subsequent antenatal visits. • Educate about mental health maintenance strategies for new mothers • Provide information on available resources
• Refer for diagnostic assessment, follow up and treatment, if appropriate, by psychiatry and or perinatal mental health personnel - as determined by available local resources
Figure 14.1. Perinatal depression screening model.
• Symptom screening using EPDS
EPDS < 12
YES
EPDS < 10 (or other appropriate antenatal cut off score if alternate instrument used ) Negative for risk factors
Assessment
EPDS ≥ 12
Referral • Refer for diagnostic assessment, follow up and treatment, if appropriate, by psychiatry and or perinatal mental health personnel - as determined by available resources
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS
313
that women have with their healthcare providers during the perinatal period. Effective preventive strategies offered to women at high risk should theoretically prevent the emergence and consequences of depression upon the social, emotional, and cognitive development of the infant. Such interventions to date have included psychoeducation about risk factors and symptoms,45 psychotherapy,21 interpersonal therapy,46 both individually and in group settings, interventions such as increased community care, and interventions designed to affect directly the attachment relationship between mother and infant. Metaanalyses conclude that psychosocial interventions designed to prevent postpartum depression do not reduce the number of women who go on to develop depression,47 and although intensive, professional postpartum support is effective in treating postpartum depression, there is no substantive evidence of the cost-effectiveness of any of these interventions.48 While there is a significant need to identify and eliminate barriers to treatment, we must also focus on providing effective and consumer-friendly treatment that is readily available to those who need and choose to participate in it. Perinatal psychiatry services need to provide evidence-based treatments that are safe for both mother and baby. A combination of inpatient mother– baby, outpatient, and outreach services that run in parallel with obstetric care would be the optimal service model. There are few specialist perinatal psychiatry facilities in public health settings, and pathways to such facilities are not always clear. The facilities that are available vary depending upon service models and available resources; thus, it is important for the obstetric care provider to know what resources are available and how to expediently obtain access to resources to help women who are depressed and ensure optimal condition management.
8.
Summary and Key Recommendations
A repeated approach to antenatal screening using the NICE approach to screening of two or three critical interview questions first helps the clinician to detect whether there is a problem. Asking targeted interview questions at each antenatal visit promotes communication and rapport between the mother and her healthcare provider and permits monitoring over time. It also establishes whether in fact the woman even desires further assistance, at that time, with any emotional distress she may be experiencing, thus conserving health resources, time, and effort. Secondary screening with a severity scale then permits the clinician to gauge the severity of any symptoms the woman may be experiencing and her unique risk factors for depression. Whether there is an optimal cutoff score is yet to be resolved.23 Referral to appropriate services to diagnose and treat
314
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
mental illness will depend on the resources available; these vary due to differing service models and available trained personnel and facilities. However, a full diagnostic interview should then be conducted prior to the formulation and implementation of management plans. Postpartum screening is more straightforward. Screening at a well-baby checkup, between 2 weeks and 6 months postpartum,8 is recommended with the use of a questionnaire such as the EPDS. This should be followed up by a clinical interview to confirm or refute the diagnosis of major depression. Scores of 12 and over on the EPDS are predictive of symptoms of postpartum depression severe enough to necessitate referral for diagnosis and/or treatment.12,13,49,50 Psychotherapeutic and pharmacologic treatments are both effective in the treatment of postpartum depression. As discussed earlier, symptoms of postpartum depression often develop much later in the infant’s first year of life than the DSM-IV-defined 4-week-postpartum period. Clinicians need to be mindful of this and ask their patients about their emotional or mental health since the birth of their baby each time they see them. Staying vigilant and sensitive to women’s mental health status provides a maximal opportunity for depression detection and treatment.
References 1. Dennis CL, Chung-Lee L. Postpartum depression help-seeking barriers and maternal treatment preferences: a qualitative systematic review. Birth. 2006;33:323–331. 2. Evans J, Heron J, Francomb H, et al. Cohort study of depressed mood during pregnancy and after childbirth. Br Med J. 2001;323:257–260. 3. Dennis CL, Ross LE, Grigoriadis S. Psychosocial and psychological interventions for treating antenatal depression. Cochrane Database Syst Rev. 2007:CD006309. 4. Bennett H, Einarson A, Taddio A, et al. Prevalence of depression during pregnancy: systematic review. Obstetr Gynecol. 2004;103:698–709. 5. O’Hara MW, Swain AM. Rates and risk of postpartum depression—a meta-analysis. Int Rev Psychiatry. 1996;8:37–54. 6. Oates M. Perinatal psychiatric disorders: a leading cause of maternal morbidity and mortality. Br Med Bull. 2003;67:219–229. 7. Riecher-Rossler A, Hofecker FM. Postpartum depression: do we still need this diagnostic term? Acta Psychiatr Scand Suppl. 2003;418:51–56. 8. Boyd RC, Le HN, Somberg R. Review of screening instruments for postpartum depression. Arch Womens Ment Health. 2005;8:141–153. 9. Thio IM, Oakley Browne MA, Coverdale JH, et al. Postnatal depressive symptoms go largely untreated: a probability study in urban New Zealand. Soc Psychiatry Psychiatr Epidemiol. 2006;41:814–818. 10. Australian Institute of Health & Welfare. Perinatal Period. NPDD Committee, 2005. 11. Cox J, Holden J, Sagovsky R. Detection of postnatal depression: Development of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987;150:782–786.
14 SCREENING FOR DEPRESSION IN PERINATAL SETTINGS
315
12. Eberhard-Gran M, Eskild A, Tambs K, et al. Review of validation studies of the Edinburgh Postnatal Depression Scale. Acta Psychiatr Scand. 2001;104:243–249. 13. Leverton TJ, Elliott SA. Is the EPDS a magic wand? 1. A comparison of the Edinburgh Postnatal Depression Scale and health visitor report as predictors of diagnosis on the Present State Examination. J Reprod Infant Psychol. 2000;18:279–296. 14. Radloff LS. The CES-D Scale: a self-report depression scale for research in the general population. Appl Psychol Measurement. 1977;1. 15. Spitzer RL, Williams JB, Kroenke K, et al. Validity and utility of the PRIME-MD patient health questionnaire in assessment of 3000 obstetric-gynecologic patients: the PRIME-MD Patient Health Questionnaire Obstetric-Gynecology Study. Am J Obstet Gynecol. 2000;183:759–769. 16. Appleby L, Gregoire A, Platz C, et al. Screening women for high risk of postnatal depression. J Psychosom Res. 1994;38:539–545. 17. Austin MP, Hadzi-Pavlovic D, Saint K, et al. Antenatal screening for the prediction of postnatal depression: validation of a psychosocial pregnancy risk questionnaire. Acta Psychiatr Scand. 2005;112:310–317. 18. Cooper PJ, Murray L, Hooper R, et al. The development and validation of a predictive index for postpartum depression. Psychol Med. 1996;26(3):627–634. 19. Rockhill B, Kawachi I, Colditz G. Individual risk prediction and population-wide disease prevention. Epidemiol Rev. 2000;22:176–180. 20. Ware JH. Statistics and medicine: The limitations of risk factors as prognostic tools. N Engl J Med. 2006;355:2615–2618. 21. Carter FA, Carter JD, Luty SE, et al. Screening and treatment for depression during pregnancy: a cautionary note. Aust N Z J Psychiatry. 2005;39(4):255–261. 22. Austin MP, Lumley J. Antenatal screening for postnatal depression: a systematic review. Acta Psychiatr Scand. 2003;107(1):10–17. 23. Gaynes BN, Gavin N, Meltzer-Brody S, et al. Perinatal depression: prevalence, screening accuracy, and screening outcomes. Evidence Report: Technology Assessment (Summary). 2005;119:1–8. 24. Buist A, Condon J, Brooks J, et al. Acceptability of routine screening for perinatal depression. J Affect Disord. 2006;93:233–237. 25. Massoudi P, Wickberg B, Hwang P. Screening for postnatal depression in Swedish child health care. Acta Paediatr. 2007;96:897–901. 26. Seehusen DA, Baldwin LM, Runkle GP, et al. Are family physicians appropriately screening for postpartum depression? J Am Board Fam Pract. 2005;18:104–112. 27. National Institute for Health and Clinical Excellence. Antenatal and postnatal mental health: Clinical management and service guidance. In: NICE Clinical Guideline. London, 2007. 28. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a two-item depression screener. Medical Care. 2003;41:1284–1292. 29. Whooley MA, Avins AL, Miranda J, et al. Case-finding instruments for depression. Two questions are as good as many. J Gen Intern Med. 1997;12:439–445. 30. Bennett IM, Coco A, Coyne JC, et al. Can the burden of screening for depression in pregnancy and postpartum be reduced? Efficiency of a two-question pre-screen: An IMPLICIT network study. J Am Board Fam Med. 2008;21(4):317–325. 31. LaRocco-Cockburn A, Melville J, Bell M, et al. Depression screening attitudes and practices among obstetrician-gynecologists. Obstet Gynecol. 2003;101:892–898. 32. Coleman VH, Morgan MA, Zinberg S, et al. Clinical approach to mental health issues among obstetrician-gynecologists: A review. Obstet Gynecol Surv. 2006;61:51–58.
316
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
33. Shakespeare J. Evaluation of screening for postnatal depression against the NSC handbook criteria. United Kingdom, 2001:1–21. 34. National Collaborating Centre for Women’s and Children’s Health. Antenatal are: Routine care for the healthy pregnant woman. London, 2003:1–304. 35. Scottish Intercollegiate Network. Postnatal depression and puerperal psychosis. A national clinical guideline. Edinburgh, 2002. 36. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2002;136:765–776. 37. Senate Select Committee on Mental Health. A national approach to mental health— from crisis to community. Canberra, Australia, 2006:1–33. 38. Beck CT, Gable RK. Postpartum Depression Screening Scale: Development and psychometric testing. Nursing Res. 2000;49:272–282. 39. Matthey S, Phillips J, White T, et al. Routine psychosocial assessment of women in the antenatal period: frequency of risk factors and implications for clinical services. Arch Womens Mental Health. 2004;7:223–229. 40. Blackmore ER, Carroll J, Reid A, et al. The use of the Antenatal Psychosocial Health Assessment (ALPHA) tool in the detection of psychosocial risk factors for postpartum depression: a randomized controlled trial. J Obstet Gynaecol Can. 2006;28:873–878. 41. Adouard F, Glangeaud-Freudenthal NM, Golse B. Validation of the Edinburgh Postnatal Depression Scale (EPDS) in a sample of women with high-risk pregnancies in France. Arch Womens Ment Health. 2005;8:89–95. 42. Adewuya AO, Ola BA, Dada AO, et al. Validation of the Edinburgh Postnatal Depression Scale as a screening tool for depression in late pregnancy among Nigerian women. J Psychosom Obstet Gynaecol. 2006;27:267–272. 43. Wang PS, Aguilar-Gaxiola S, Alonso J, et al. Use of mental health services for anxiety, mood, and substance disorders in 17 countries in the WHO world mental health surveys. Lancet. 2007;370:841–850. 44. Harrington AR, Greene-Harrington CC. Healthy Start screens for depression among urban pregnant, postpartum and interconceptional women. J Natl Med Assoc. 2007;99:226–231. 45. Lumley J, Austin MP. What interventions may reduce postpartum depression. Curr Opin Obstet Gynecol. 2001;13:605–611. 46. Spinelli MG. Interpersonal psychotherapy for depressed antepartum women: a pilot study. Am J Psychiatry. 1997;154:1028–1030. 47. Dennis CL. Psychosocial and psychological interventions for prevention of postnatal depression: systematic review. BMJ. 2005;331:15. 48. Brugha TS, Wheatley S, Taub NA, et al. Pragmatic randomized trial of antenatal intervention to prevent post-natal depression by reducing psychosocial risk factors. Psychol Med. 2000;30:1273–1281. 49. Leverton TJ, Elliott SA. Is the EPDS a magic wand? 2. ‘Myths’ and the evidence base. J Reprod Infant Psychol. 2000;18:297–307. 50. McQueen K, Montgomery P, Lappan-Gracon S, et al. Evidence-based recommendations for depressive symptoms in postpartum women. J Obstet Gynecol Neonatal Nurs. 2008;37:127–136.
15 SCREENING IN CARDIOVASCULAR CARE Brett D. Thombs and Roy C. Ziegelstein
1. 2. 3. 4.
Depression in Cardiovascular Disease The Prevalence of Depression in Cardiovascular Disease Screening Instruments for Depression in Cardiovascular Care Recommendations for Evaluation and Treatment of Patients in Cardiovascular Care 5. Conclusions
Context There is great interest in screening in cardiovascular settings but little evidence that implementation of screening will affect depression or cardiac outcomes despite the epidemiologic evidence that depression predicts cardiac events and mortality. Since this chapter was accepted, in October 2008 the American Heart Association (AHA) Working Group published a Scientific Advisory recommending that all patients with cardiovascular disease be screened for depression, although this recommendation was not based on a systematic review of the evidence. Several weeks after release of the Scientific Advisory, a systematic review of depression screening in cardiovascular care was published but did not find evidence that patients with cardiovascular disease would benefit from screening for depression. The authors of the review noted that no published trials have assessed whether screening for depression improves depressive symptoms or cardiac outcomes in patients with cardiovascular disease, suggesting that the recommendations of the AHA Scientific Advisory were premature. 317
318
1.
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Depression in Cardiovascular Disease
High rates of depression were first documented among patients with cardiovascular disease (CVD) in the late 1960s. Early research on depression in CVD focused on patients with acute myocardial infarction (AMI) and conceptualized depression as an acute reaction to a catastrophic medical event.1–4 In the 1990s, groundbreaking work by Frasure-Smith and colleagues5,6 demonstrated a connection between major depression during hospitalization for AMI and subsequent mortality. Since then, many other studies have identified major depression or depressive symptoms as risk factors for mortality and recurrent cardiac events among patients with AMI or unstable angina pectoris (together known as acute coronary syndromes [ACS]) even after controlling for other known risk factors, although not all studies have reported a significant association.7–10 Other studies have reported that depression among patients with ACS is related to decreased quality of life11,12 and poor adherence to secondary prevention behaviors, including smoking cessation, taking prescribed medications, exercising, and attending cardiac rehabilitation.13 Less research on the relationship between depression and mortality has been done in other CVD patient groups, although similar links have been reported in studies of patients with congestive heart failure (CHF), for instance.14–17 Authors of systematic reviews and meta-analyses have not all agreed that the evidence is sufficiently robust to determine that depression is a risk factor for mortality in CVD above and beyond other risk factors and cardiac disease severity, however, and some have raised the issue of possible methodologic limitations in study designs, including inadequate control for other risk factors and cardiac disease severity.7–10 In addition, anxiety and self-reported quality of life, which overlap substantially with depression, have also been shown to be important predictors of outcomes among patients with CVD.18,19 Only one trial, the ENhancing Recovery in Coronary Heart Disease (ENRICHD) trial, which enrolled over 2,000 patients, has been designed to test whether treatment of depression among post-AMI patients would reduce mortality risk. It did not find that patients randomized into the cognitive–behavioral therapy (CBT) treatment group fared better than patients in the usual-care control group in terms of mortality,20 although secondary analyses indicated that patients who received CBT and whose depression improved or patients who were treated with sertraline due to severe depression or an initially poor response to CBT exhibited lower mortality.21,22 The decision to screen for depression among patients in cardiovascular care, however, should not depend on whether or not treatment of depression improves cardiac outcomes or overall mortality. Depression is a chronic, disabling condition that has been shown to have a major impact on quality of life in CVD,23 even after controlling for standard somatic measures, such as the degree of heart
15 SCREENING IN CARDIOVASCULAR CARE
319
failure or the severity of an index myocardial infarction.24,25 Indeed, for many patients with CVD, quality of life is as important as survival.23 Screening is indicated if a disease or condition is an important health problem; if its presence would not be readily detected without screening; if it is prevalent in the population; if cost-efficient screening mechanisms with good performance characteristics (eg, sensitivity and specificity) exist and are available; if effective treatments are available; and if failure to identify and treat would have important negative consequences. Ideally, screening methods should carry a minimal risk of false-positive results that might lead to unnecessary diagnostic testing, adverse effects and costs of inappropriate treatment, and the sequelae of being incorrectly labeled.26–28 The American College of Cardiology/American Heart Association (ACC/AHA) Guidelines for the Management of Patients with ST-Elevation Myocardial Infarction (2004)29 designate as class I (ie, procedure or treatment is useful/effective) the recommendation that ‘‘the psychosocial status of the patient should be evaluated, including inquires regarding symptoms of depression, anxiety, or sleep disorders and the social support environment’’ (p. e153), and the ACC/ AHA 2007 Guidelines for the Management of Patients with Unstable Angina/NonST-Elevation Myocardial Infarction30 designate as class IIa (ie, recommendation in favor of treatment or procedure being useful) the recommendation that ‘‘it is reasonable to consider screening UA/NSTEMI patients for depression and refer/ treat when indicated’’ (p. e96). Neither recommendation, however, describes procedures for assessing depression, and no guidelines recommend for or against depression screening for patients with other cardiovascular disease diagnoses. In most centers, screening for depression is not yet part of standard cardiac care,31 and the merit of routinely screening every patient is still debated. The objective of this chapter is to provide an overview of key issues related to the implementation of depression screening as standard care. The chapter reviews the prevalence of depression in cardiovascular care and available depression screening tools and makes recommendations on how screening, treatment, and follow-up programs may be best integrated into cardiovascular care.
2.
The Prevalence of Depression in Cardiovascular Disease
Several questions related to the prevalence of depression in cardiovascular care have a direct impact on the likelihood that screening can be implemented in a cost-effective and efficient manner that produces beneficial results. For instance, is depression sufficiently prevalent among patients with CVD to warrant the time and cost involved in implementing a screening program? Among which patients, and at what point in the disease process? Is depression mostly a phenomenon related to a life-threatening event like an AMI? Will it resolve on its own even without specific treatment?
320
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Comorbid major depression is present in approximately 1 of 5 patients with cardiovascular disease,1,32 which is a substantially higher rate than the estimated 5% prevalence in the general population33 or the 5% to 10% among patients in primary care.34 A recent systematic review reported that rates of major depression among patients hospitalized with AMI ranged from 16% to 27%.32 A similar prevalence of depression (14% to 27%) was reported across a wider spectrum of CVD, including hospitalized patients with AMI or unstable angina, outpatients and inpatients with coronary artery disease, and patients after coronary artery bypass graft surgery.1 Studies of inpatients and outpatients with CHF and of patients with cardiomyopathy have reported similar depression prevalence rates of 14% to 21%.14,15,35,36 These rates include only major depression, but minor depression and subsyndromal symptoms of depression are also highly prevalent among patients with CVD and have been associated with risk for future cardiac events and mortality among post-AMI patients.37–39 The Beck Depression Inventory (BDI)40 is the most commonly used assessment tool in studies of depression in CVD, and based on a standard cutoff of 10 or greater, between 20% and 37% of hospitalized post-AMI patients have at least mild to moderate symptoms of depression,32 consistent with rates reported among patients with implantable cardioverter defibrillators (ICDs).41–43 Patients with CHF may have even higher rates of depressive symptoms based on a BDI score of 10 or more (30% to 51%), although their rates of major depression are similar.14,35,44,45 Minor depressive symptoms may occasionally be seen as a reaction to the acute event, although the majority of patients who are depressed in the hospital continue to be depressed months after discharge.32 Recent research has shown that the trajectory of depressive symptoms over the course of time, rather than symptom levels in the hospital following an AMI alone, may play a role in long-term health. Patients who have high levels of depressive symptoms during hospitalization following AMI, but whose symptoms resolve fairly rapidly, are not at greater risk for negative health outcomes. Patients whose symptoms are persistent or increase following discharge, on the other hand, tend to have worse outcomes.12,46,47 Thus, the evidence suggests that high rates of depression and/or subsyndromal symptoms of depression are present among most CVD patient groups. Levels of depressive symptoms change over time for individual patients, but, overall, depression is not a transient phenomenon related to acute events. Instead, depression and subsyndromal depressive symptoms tend to be persistent.
3.
Screening Instruments for Depression in Cardiovascular Care
Many potential screening instruments have been developed and tested in various patient populations. A reasonable question is whether health professionals who
15 SCREENING IN CARDIOVASCULAR CARE
321
work in cardiovascular care need to select a screening tool that has been validated specifically for cardiovascular care or whether one screening instrument is as good as any other for use with CVD patients. Indeed, many different depression screening instruments have been validated and tested against diagnostic criteria in primary care settings. A few of the better-recognized assessment instruments include the BDI40 or its revised version, the BDI-II,48 the Patient Health Questionnaire-9 (PHQ-9),49 the Patient Health Questionnaire-2 (PHQ-2),50 the Center for Epidemiologic Study Depression Scale (CES-D),51 and the General Health Questionnaire (GHQ).52 Fewer depression screening tools have been specifically validated against a ‘‘gold standard’’ structured diagnostic interview in cardiovascular care.53 In primary care, however, there is little evidence to suggest that any particular instrument performs better than other instruments. A systematic review found that although there was inconsistency across studies that used the same instrument, there were not systematic differences between instruments, and that brief two- or three-item screening tools appeared to perform as well as longer screening instruments for screening purposes.54 Median sensitivity and specificity across 38 studies of 16 different case-finding instruments with primary care patients were 85% and 74%, respectively, which was only slightly better than similar values reported in a meta-analysis of brief two- or three-item screeners (overall pooled sensitivity = 74%, specificity = 75%),55 although this comparison is based on sets of studies using different samples rather than head-to-head comparisons in the same settings. Both brief and longer screening tools, however, tend to have relatively high false-positive rates—approximately 50% when the prevalence of depression is 20% and 60% to 70% when the prevalence is 10%.54,55 Thus, positive screens must be confirmed by a diagnostic interview.56,57 Table 15.1 shows instruments that have published data on diagnostic accuracy compared to a structured interview, such as the Structured Clinical Interview for DSM,58,59 the Diagnostic Interview Schedule,60 or the Composite International Diagnostic Interview,61 for major depression among patients with CVD. Sensitivity refers to the proportion of patients with major depression who had a positive screen, and specificity is the proportion without major depression with negative screens. The positive predictive value (PPV) is the proportion of patients with positive screens who were also diagnosed with major depression based on a structured clinical interview, and the negative predictive value (NPV) is the proportion of patients with negative screens who did not receive a major depression diagnosis based on a structured clinical interview (see Chapter 5 for further discussion). In Table 15.1, where PPV, NPV, and/or 95% confidence intervals were not provided in the original studies, they were estimated from available prevalence, sensitivity, and specificity data.
Table 15.1. Summary of Studies of Performance Characteristics of Depression Screening Tools in Cardiovascular Disease Study Author, Year
Patient Group
Study Site
n
Mean Males Age (%) (Years)
% Depressed Instrument/ Cutoff
FrasureSmith, 19956, 63
Post-AMI
Canada
218
60
78
15%
BDI 10
Gutierrez 199981
Outpatient CHF
Canada
40
70
50
15%
Strik, 200164
Post-AMI
Netherlands
206
60
76
11%
Derivation of Cutoff
Sensitivity (%) (95% CI)
Specificity (%) (95% CI)
Positive Predictive Value (%) (95% CI)
Negative Predictive Value (%) (95% CI)
Standard
82 (68–94)
78 (71–83)
40 (27–51)
96 (93–99)
BDI 13
Standard
83 (53–100)
94 (86–100)
71 (37– 100)
97 (89–100)
BDI 10
ROC
82 (66–98)
79 (73–85)
37 (21–45)
98 (96–100)
HADS 13
ROC
90 (77–100)
84 (79–90)
45 (31–59)
99 (96–100)
HADS-D 4
ROC
85 (70–100)
75 (69–81)
32 (21–43)
98 (96–100)
SCL-90-D 25
ROC
96 (87–100)
74 (68–80)
37 (26–48)
96 (93–99)
Freedland, 200335
Hospitalized CHF
US
613
66
49
20%
BDI 10
Standard
88 (81–93)
58 (54–62)
34 (28–38)
95 (93–97)
Dickens, 200470
Post-AMI
UK
314
58
63
21%
HADS 17
ROC
88 (80–96)
85 (80–89)
60 (50–70)
96 (94–99)
McManus, 200566
CHD
US
1,024
67
82
22%
CES-D-10 10 Standard
76 (70–81)
79 (76–82)
50 (45–56)
92 (90–94)
54 (47–61)
90 (88–92)
59 (53–67)
87 (85–90)
92 (90–94)
58 (50–65)
84 (82–87)
69 (66–73)
45 (40–49)
96 (95–98)
PHQ-9 10
Standard
PHQ-2 3
Standard
2-item screen
Standard
39 (33–46) 90 (86–94)
Table 15.1. (Continued) Study Author, Year
Patient Group
Study Site
n
Mean Males Age (%) (Years)
% Depressed Instrument/ Cutoff
Denollet, 200682
Post-AMI
Netherlands
176
60
76
11%
SAD4 3
Huffman, 200669
Post-AMI
US
131
62
80
13%
2 items from BDI
Low 200765
ACS
Canada
119
63
75
6%
Stafford, 200778
CAD
Australia
193
64
81
18%
FrasureSmith, 200818
ACS
Canada
804
60
81
7%
BDI-II 14
Derivation of Cutoff
Sensitivity (%) (95% CI)
Specificity (%) (95% CI)
Positive Predictive Value (%) (95% CI)
Negative Predictive Value (%) (95% CI)
Upper tertile
95 (85–100)
68 (60–74)
28 (17–37)
99 (97–100)
ROC
94 (83–100)
76 (68–84)
37 (23–52)
99 (97–100)
Standard
86 (59–100)
89 (82–94)
33 (11–55)
99 (95–100)
85 (77–91)
29 (11–47)
100
82 (76–86)
49 (38 – 60)
95 (91–97)
GSD 11
Standard
HADS-D 6
ROC
100 80 (69–91)
PHQ-9 6
ROC
83 (71–93)
79 (73–83)
46 (36–56)
95 (92–98)
BDI-II 14
Standard
91 (84–98)
78 (74–80)
24 (17–29)
99 (98–100)
HADS-A 8
Standard
84 (74–94)
62 (58–65)
14 (10–18)
98 (97–99)
ACS, acute coronary syndrome; AMI, acute myocardial infarction; BDI, Beck Depression Inventory; BDI-II, Beck Depression Inventory-II; CAD, coronary artery disease; CES-D-10, 10-item version of the Center for Epidemiological Studies Depression Scale; CHD, coronary heart disease; CHF, congestive heart failure; DMI-10, Depression in the Medically Ill 10-item measure; DMI-18, Depression in the Medically Ill 18-item measure; GDS, Geriatric Depression Scale; HADS, Hospital Anxiety and Depression Scale, total score; HADS-A, Anxiety Subscale of the Hospital Anxiety and Depression Scale; HADS-D, Depression Subscale of the Hospital Anxiety and Depression Scale; PHQ-2, Patient Health Questionnaire-2; PHQ-9, Patient Health Questionnaire-9; ROC, receiver operator curve analysis; SAD4, Symptoms of Anxiety-Depression index; SCL-90-D, Depression Subscale of the Symptom Checklist 90.
324
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
As shown in Table 15.1, some studies used receiver operator characteristic (ROC) curve analysis62 to derive cutoff scores in an exploratory fashion and other studies used established cutoff scores based on published results from studies with other patient groups or guidelines from screening tool developers. Overall, consistent with reviews of screening in primary care,54,55 there were few major differences in sensitivity or specificity, and the rate of false positives was high across studies. Studies that reported results based on established cutoff scores for the BDI,6,35,63,64 BDI-II,18,65 and Geriatric Depression Scale65 generally performed reasonably well. Use of the standard cutoff score of 10 or above on the BDI produced good sensitivity and specificity to diagnose major depression post-AMI.6,63,64 However, this cutoff resulted in poor specificity in a sample of 613 hospitalized heart failure patients.35 The use of cutoff thresholds developed for primary care patients also resulted in poor sensitivity with the PHQ-2 (3 or more) and PHQ-9 (10 or more) in a study by McManus and colleagues.66 Results from that study, however, were consistent with findings reported by Stafford and associates78 that the PHQ-9 was more accurate when a lower cutoff level of 6 or greater was used. In studies that used ROC curve analysis, the same patient data were used to set cutoff levels and to test the accuracy of those very same cutoff levels. This is important because ROC curve analysis involves the generation of a list or menu of all sensitivity and specificity combinations across the range of possible cutoff scores, from which researchers identify the combination that, in their judgment, maximizes diagnostic utility. Like any exploratory data analysis technique, however, ROC curve analysis capitalizes on chance and often overemphasizes idiosyncratic characteristics of a given set of patients or particularities of the diagnostic process in a given study. Thus, cutoffs derived from ROC curve analysis may not generalize well to other samples, and crossvalidation is necessary before cutoffs can be accepted as useful for practice.67,68 This is particularly the case with small samples, and of the studies in Table 15.1 that used ROC curve analysis, diagnostic characteristics are based on between 1769 and 6570 patients with major depression. The need for cross-validation of derived cutoff scores is illustrated by the large discrepancy in cutoffs for the total score of the Hospital Anxiety and Depression Scale (HADS) in studies by Dickens70 and Strik64 and their coworkers. The two studies obtained sensitivity and specificity values that were approximately equal, but Dickens and colleagues found a HADS score of 17 or above to be the most accurate, whereas Strik and associates used a score of 13 or greater. The HADS depression subscale (HADS-D) has been used more frequently than the total HADS in studies of post-AMI depression. A concern with the HADS-D, however, is that based on the weighted prevalence of identified possible or probable cases across studies, it identifies a much lower rate than the actual rate of major depression found in CVD patients
15 SCREENING IN CARDIOVASCULAR CARE
325
(HADS-D of 8 or more, 15.5%; HADS-D of 11 or more, 7.3%), whereas a BDI score of 10 or above, for instance, identifies a greater proportion of patients when used as a screening tool (31.1%).32 Use of instruments like the HADS that inquire only about nonsomatic symptoms has been justified based on claims that other screening tools that inquire about a full range of symptoms (eg, BDI, PHQ-9) are likely to be biased in CVD patients due to the overlap between somatic symptoms of depression and those of CVD itself. These alternative approaches, however, have been based on face validity rather on empirical evidence that existing methods are biased or that alternative approaches increase accuracy.71 Furthermore, across cultures, the majority of primary care patients with depression present primarily with somatic symptoms,72 and depression treatment affects somatic and nonsomatic symptoms similarly in patients with and without chronic medical illness.73 We recently examined responses on the BDI from a sample of hospitalized post-AMI patients compared to a matched sample of psychiatric outpatients using rigorous techniques for detecting potential bias74 due to possible somatic symptom over-endorsement, and did not find that total BDI scores from the post-AMI patients were affected by somatic symptom endorsement any more than the total scores from non-medical psychiatric outpatients (submitted for review). One possible explanation for this finding may relate to the overt, as opposed to covert, nature of assessment of depressive symptoms, which has been shown to influence responses to self-report questionnaires.75 Hospitalized post-AMI patients who are tired or not eating well, for instance, may not endorse these symptoms because they are aware that they are being asked about depression and may attribute these symptoms to the cardiac event or the hospitalization itself, although this has not been demonstrated. Summarizing the information in Table 15.1, many screening tools have been used in cardiovascular care, although few have been shown to achieve good sensitivity and specificity (using a standard of 80%, for example) in more than one sample of CVD patients using the same cutoff threshold. Cutoff scores of 10 or above on the BDI and 14 or above on the BDI-II are reasonably sensitive and specific, although these cutoffs on the BDI are not as specific among patients with CHF. None of the tools and cutoffs tested are convincingly superior to any others, and more research is needed with larger samples from multiple centers before we can be comfortable that published cutoffs for other available instruments will work efficiently in cardiovascular care. Given the lack of evidence of consistently good performance by any single instrument across multiple samples or a clear performance advantage of one instrument over others, other considerations, such as an instrument’s brevity, readability, and comprehensibility, should be considered. Healthcare workers in cardiovascular care settings have limited time with each patient to focus on his or her emotional health. In addition, CVD patients, particularly those in
326
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
acute care, may have difficulty with some instrument formats. Some screening tools, such as the BDI, are long and include response options that vary across items, increasing complexity for patients and staff. Instruments that require simple yes-or-no responses or estimates of symptom frequency based on numeric ratings or visual-analogue scales may be easier to administer to patients or for patients to complete independently.54 The PHQ-949 is a nineitem patient-completed measure of depression symptoms that replicates the symptoms included in the DSM-IV; a score of 10 or above has been shown to be highly sensitive (88%) and specific (88%) for detecting DSM-IV-defined depression among primary care patients. The PHQ-250 is an even briefer, twoitem measure that is also sensitive (83%) and specific (92%) for major depression in primary care. Research concerning the accuracy of the PHQ-9 in identifying ICD-10-based depression is limited, although it performed better than two other measures in a study of medical outpatients.76 Recently, a National Heart, Lung, and Blood Institute (NHLBI) working group report made recommendations for research purposes on the assessment and treatment of depression in patients with CVD. The report recommended a screening algorithm that included administering the PHQ-2 followed by the PHQ-9 if one or both of the items on the PHQ-2 are positive for depression.77 Although a cutoff threshold of 10 or greater is used in primary care, this cutoff was not sensitive among patients in cardiovascular care in one study,66 and a lower threshold of 6 worked well in another study.78 Thus, until accurate cutoffs are verified for patients with CVD, a potential strategy would be to follow the NHLBI recommendations using the lower threshold (6 or more) on the PHQ-9 (Fig. 15.1).
4. Recommendations for Evaluation and Treatment of Patients in Cardiovascular Care Practical Recommendations Screening for depression in primary care is recommended by the U.S. Preventive Services Task Force (USPSTF) when systems are in place to ensure accurate diagnosis, effective treatment, and follow-up.56 Many patients with depression can be successfully managed by their primary care provider. Most primary care providers have, or should have, experience treating patients with many forms of depression, but the degree to which cardiologists are comfortable with, and experienced in, the care of patients with mood disorders is generally more variable. The triage and treatment of patients with cardiac disease and comorbid depression therefore must be individualized in every instance. Psychiatric or psychological consultation (or advice) should be considered when (1) depression is suspected or diagnosed, (2) none of the
No
PHQ-2 Positive? Yes
No Ongoing Assessment/Care?
PHQ-9 Positive? Yes No
Clinical Interview Positive? Yes Severe or Complex Symptoms?
Refer for Psychiatric Evaluation and Treatment
Yes
• • • • •
Severe Symptoms Manic Symptoms Psychosis Suicide Risk Substance Abuse
No
Refer to CBT Provider
Includes CBT
Informed Patient Preference and Management in Cardiovascular Care Clinic • • • •
Cognitive-Behavioral Therapy (CBT) Psychopharmacology Combined CBT and Psychopharmacology Watchful Waiting/Self-Help
Ongoing Follow-up in Cardiovascular Care Clinic • • •
Symptom Monitoring Assessment of Effectiveness of Management Strategy Re-evaluation of Management Strategy
Figure 15.1. Recommended decision process for screening for depression in cardiovascular care. Recommended screening, treatment, and follow-up decisions and strategies are presented. In addition to strategies presented in the figure, health-promoting practices of benefit to most cardiovascular care patients, such as maximizing social support and healthy lifestyle choices, such as regular exercise, should be emphasized. These recommendations may be of particular benefit to patients with minor depression who may be able to make lifestyle changes that improve mood. Patients with severe depression, on the other hand, are unlikely to be able to make lifestyle changes without depression treatment, which should be prioritized. 327
328
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
patient’s ‘‘front-line’’ care providers are able to manage the condition, and (3) the patient wishes to receive this form of specialist help. In addition, psychiatric consultation should be considered when diagnostic uncertainty, a history of mania or psychosis, substance abuse, or suicide risk is present.79,80
Barriers to Implementation Consistent with the USPSTF guidelines for primary care, we recommend that screening only be considered in cardiovascular care settings when personnel and resources are available to ensure appropriate diagnosis, treatment, and followup.56 We recognize that personnel who are adequately trained and experienced in diagnosing and treating depression may not be available in many cardiovascular care settings, and that specific mental health resources may not be readily available either. We also recognize that even if cardiovascular care providers are adequately trained and experienced in diagnosing and treating depression, in a busy cardiology practice attention is typically focused on issues that are considered more central to cardiovascular care. Given these realities, some may argue that it is reasonable to administer a PHQ-9 and to base treatment on the results, since a score of 10 or more on the PHQ-9 has approximately 90% specificity for major depression.80 Based on the sensitivity (54%) and specificity (90%) figures reported by McManus and associates66 for CVD patients and assuming a rate of major depression of 20%, however, almost half of patients (47%) treated for depression if this protocol is followed would be treated inappropriately. Evidence does not suggest that treating patients with subsyndromal depressive symptoms with selective serotonin reuptake inhibitors (SSRIs) is helpful, so this strategy could expose many patients to potential harm without established benefit. Although the harms of treatment in non-cases are not well documented, potential negative ramifications include the cost of treatment, side effects of drugs, drug– drug interactions, and the potentially adverse effects of being incorrectly labeled.56 Given that time constraints are likely to be a formidable barrier to screening for many cardiologists, alternative strategies, such as using trained nursing or social work personnel to assist with assessment, may be considered. When insufficient resources are available to provide accurate diagnostic, treatment, and follow-up services, either in the cardiovascular care setting or through referrals, however, screening is not likely to benefit patients and may actually have negative effects.
5.
Conclusions
In summary, there is no evidence from research with primary care or CVD patients that any single screening tool works consistently better than any other screening tool. Without evidence of superiority for any instrument,
15 SCREENING IN CARDIOVASCULAR CARE
329
considerations such as brevity, user-friendliness, and match to current DSM-IV criteria suggest that the PHQ instruments are a reasonable choice for clinical screening. Future research with large patient samples from multiple centers should be done to verify the best cutoffs for cardiovascular care. Consistent with USPSTF guidelines for primary care, screening for depression may be considered in cardiovascular care settings where resources are available to provide accurate diagnosis, treatment, and follow-up services.
References 1. Rudisch B, Nemeroff CB. Epidemiology of comorbid coronary artery disease and depression. Biol Psychiatry. 2003;54:227–240. 2. Hackett TP, Cassem NH, Wishnie HA. The coronary-care unit. An appraisal of its psychologic hazards. N Engl J Med. 1968;279:1365–1370. 3. Cassem NH, Hackett TP. Psychiatric consultation in a coronary care unit. Ann Intern Med. 1971;75:9–14. 4. Dreyfuss F, Dasberg H, Assael MI. The relationship of myocardial infarction to depressive illness. Psychother Psychosom. 1969;17:73–81. 5. Frasure-Smith N, Lesperance F, Talajic M. Depression following myocardial infarction. Impact on 6-month survival. JAMA. 1993;270:1819–1825. 6. Frasure-Smith N, Lesperance F, Talajic M. Depression and 18-month prognosis after myocardial infarction. Circulation. 1995;91:999–1005. 7. van Melle JP, de Jonge P, Spijkerman TA, et al. Prognostic association of depression following myocardial infarction with mortality and cardiovascular events: A metaanalysis. Psychosom Med. 2004;66:814–822. 8. Barth J, Schumacher M, Herrmann-Lingen C. Depression as a risk factor for mortality in patients with coronary heart disease: A meta-analysis. Psychosom Med. 2004;66:802–813. 9. Sorensenf C, Friis-Hasche E, Haghfelt T, et al. Postmyocardial infarction mortality in relation to depression: A systematic critical review. Psychother Psychosom. 2005;74:69–80. 10. Nicholson A, Kuper H, Hemingway H. Depression as an aetiologic and prognostic factor in coronary heart disease: A meta-analysis of 6362 events among 146,538 participants in 54 observational studies. Eur Heart J. 2006;27:2763–2774. 11. Parashar S, Rumsfeld JS, Spertus JA, et al. Time course of depression and outcome of myocardial infarction. Arch Intern Med. 2006;166:2035–2043. 12. Thombs BD, Ziegelstein RC, Stewart DE, et al. Usefulness of persistent symptoms of depression to predict physical health status 12 months after an acute coronary syndrome. Am J Cardiol. 2008;101:15–19. 13. Kronish IM, Rieckmann N, Halm EA, et al. Persistent depression affects adherence to secondary prevention behaviors after acute coronary syndromes. J Gen Intern Med. 2006;21:1178–1183. 14. Jiang W, Alexander J, Christopher E, et al. Relationship of depression to increased risk of mortality and rehospitalization in patients with congestive heart failure. Arch Intern Med. 2001;161:1849–1856. 15. Faris R, Purcell H, Henein MY, et al. Clinical depression is common and significantly associated with reduced survival in patients with non-ischaemic heart failure. Eur J Heart Fail. 2002;4:541–551.
330
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
16. Friedmann E, Thomas SA, Liu F, et al. Relationship of depression, anxiety, and social isolation to chronic heart failure outpatient mortality. Am Heart J. 2006;152:940.e1–940.e8. 17. Jiang W, Kuchibhatla M, Cuffe MS, et al. Prognostic value of anxiety and depression in patients with chronic heart failure. Circulation. 2004;110:3452–3456. 18. Frasure-Smith N, Lesperance F. Depression and anxiety as predictors of 2-year cardiac events in patients with stable coronary artery disease. Arch Gen Psychiatry. 2008;65:62–71. 19. Faller H, Stork S, Schowalter M, et al. Is health-related quality of life an independent predictor of survival in patients with chronic heart failure? J Psychosom Res. 2007;63:533–538. 20. Berkman LF, Blumenthal J, Burg M, et al. Effects of treating depression and low perceived social support on clinical events after myocardial infarction: The Enhancing Recovery in Coronary Heart Disease Patients (ENRICHD) randomized trial. JAMA. 2003;289:3106–3116. 21. Taylor CB, Youngblood ME, Catellier D, et al. Effects of antidepressant medication on morbidity and mortality in depressed patients after myocardial infarction. Arch Gen Psychiatry. 2005;62:792–798. 22. Carney RM, Blumenthal JA, Freedland KE, et al. Depression and late mortality after myocardial infarction in the Enhancing Recovery in Coronary Heart Disease (ENRICHD) study. Psychosom Med. 2004;66:466–474. 23. Rumsfeld JS, Ho PM. Depression and cardiovascular disease: A call for recognition. Circulation. 2005;111:250–253. 24. Muller-Tasch T, Peters-Klimm F, Schellberg D, et al. Depression is a major determinant of quality of life in patients with chronic systolic heart failure in general practice. J Card Fail. 2007;13:818–824. 25. Dickens CM, McGowan L, Percival C, et al. Contribution of depression and anxiety to impaired health-related quality of life following first myocardial infarction. Br J Psychiatry. 2006;189:367–372. 26. Wilson JM, Jungner G. Principles and practices of screening for disease. Geneva: World Health Organization, 1968. 27. Magruder KM, Norquist GS, Feil MB, et al. Who comes to a voluntary depression screening program? Am J Psychiatry. 1995;152:1615–1622. 28. Greenfield SF, Reizes JM, Magruder KM, et al. Effectiveness of community-based screening for depression. Am J Psychiatry. 1997;154:1391–1397. 29. Antman EM, Anbe DT, Armstrong PW, et al. ACC/AHA guidelines for the management of patients with ST-elevation myocardial infarction; A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (committee to revise the 1999 guidelines for the management of patients with acute myocardial infarction). J Am Coll Cardiol. 2004;44:E1-E211. 30. Anderson JL, Adams CD, Antman EM, et al. ACC/AHA 2007 guidelines for the management of patients with unstable angina/non-ST-elevation myocardial infarction: A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (writing committee to revise the 2002 guidelines for the management of patients with unstable Angina/Non-ST-elevation myocardial infarction) developed in collaboration with the American College of Emergency Physicians, the Society for Cardiovascular Angiography and Interventions, and the Society of Thoracic Surgeons endorsed by the American Association of Cardiovascular and Pulmonary Rehabilitation and the Society for Academic Emergency Medicine. J Am Coll Cardiol. 2007;50:e1–e157.
15 SCREENING IN CARDIOVASCULAR CARE
331
31. Ziegelstein RC, Kim SY, Kao D, et al. Can doctors and nurses recognize depression in patients hospitalized with an acute myocardial infarction in the absence of formal screening? Psychosom Med. 2005;67:393–397. 32. Thombs BD, Bass EB, Ford DE, et al. Prevalence of depression in survivors of acute myocardial infarction. J Gen Intern Med. 2006;21:30–38. 33. Blazer DG, Kessler RC, McGonagle KA, et al. The prevalence and distribution of major depression in a national community sample: The National Comorbidity Survey. Am J Psychiatry. 1994;151:979–986. 34. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: A summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2002;136:765–776. 35. Freedland KE, Rich MW, Skala JA, et al. Prevalence of depression in hospitalized patients with congestive heart failure. Psychosom Med. 2003;65:119–128. 36. Poole NA, Morgan JF. Validity and reliability of the Hospital Anxiety and Depression Scale in a hypertrophic cardiomyopathy clinic: The HADS in a cardiomyopathy population. Gen Hosp Psychiatry. 2006;28:55–58. 37. Bush DE, Ziegelstein RC, Tayback M, et al. Even minimal symptoms of depression increase mortality risk after acute myocardial infarction. Am J Cardiol. 2001;88:337–341. 38. Lesperance F, Frasure-Smith N, Juneau M, et al. Depression and 1-year prognosis in unstable angina. Arch Intern Med. 2000;160:1354–1360. 39. Frasure-Smith N, Lesperance F, Juneau M, et al. Gender, depression, and one-year prognosis after myocardial infarction. Psychosom Med. 1999;61:26–37. 40. Beck AT, Steer RA. Manual for the revised Beck Depression Inventory. San Antonio, TX : Psychological Corporation, 1987. 41. Luyster FS, Hughes JW, Waechter D, et al. Resource loss predicts depression and anxiety among patients treated with an implantable cardioverter defibrillator. Psychosom Med. 2006;68:794–800. 42. Friedmann E, Thomas SA, Inguito P, et al. Quality of life and psychological status of patients with implantable cardioverter defibrillators. J Interv Card Electrophysiol. 2006;17:65–72. 43. Simson U, Perings C, Plaskuda A, et al. Impact of attachment style, social support and the number of implantable cardioverter defibrillator (ICD) discharges on psychological strain of ICD patients. Psychother Psychosom Med Psychol. 2006;56:493–499. 44. Gottlieb SS, Khatta M, Friedmann E, et al. The influence of age, gender, and race on the prevalence of depression in heart failure patients. J Am Coll Cardiol. 2004;43:1542–1549. 45. Jiang W, Kuchibhatla M, Clary GL, et al. Relationship between depressive symptoms and long-term mortality in patients with heart failure. Am Heart J. 2007;154:102–108. 46. de Jonge P, van den Brink RH, Spijkerman TA, et al. Only incident depressive episodes after myocardial infarction are associated with new cardiovascular events. J Am Coll Cardiol. 2006;48:2204–2208. 47. Kaptein KI, de Jonge P, van den Brink RH, et al. Course of depressive symptoms after myocardial infarction and cardiac prognosis: A latent class analysis. Psychosom Med. 2006;68:662–668. 48. Beck AT, Steer RA, Brown GK. Manual for the Beck Depression Inventory-II. San Antonio, TX : Psychological Corporation, 1996. 49. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: Validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–613. 50. Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: Validity of a two-item depression screener. Med Care. 2003;41:1284–1292.
332
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
51. Radloff LS. The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement. 1977;1:385–401. 52. Goldberg DP, Gater R, Sartorius N, et al. The validity of two versions of the GHQ in the WHO study of mental illness in general health care. Psychol Med. 1997;27:191–197. 53. Thombs BD, Magyar-Russell G, Bass EB, et al. Performance characteristics of depression screening instruments in survivors of acute myocardial infarction: Review of the evidence. Psychosomatics. 2007;48:185–194. 54. Williams JW Jr, Pignone M, Ramirez G, et al. Identifying depression in primary care: A literature synthesis of case-finding instruments. Gen Hosp Psychiatry. 2002;24:225–237. 55. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J Gen Pract. 2007;57:144–151. 56. U.S. Preventive Services Task Force. Screening for depression: Recommendations and rationale. Ann Intern Med. 2002;136:760–764. 57. MacMillan HL, Patterson CJ, Wathen CN, et al. Screening for depression in primary care: Recommendation statement from the Canadian Task Force on Preventive Health Care. CMAJ. 2005;172:33–35. 58. Spitzer R, Williams J, Gibbons M. Structured clinical interview for DSM-III-R-patient version. New York: Biometrics Research Department, New York State Psychiatric Institute, 1988. 59. First MB, Spitzer RL, Gibbon M, et al. Structured clinical interview for DSM-IV Axis I disorders. New York: Biometrics Research Unit, New York Psychiatric Institute, 1995. 60. Robins LN, Helzer JE, Croughan J, et al. National Institute of Mental Health Diagnostic Interview Schedule. Its history, characteristics, and validity. Arch Gen Psychiatry. 1981;38:381–389. 61. Wittchen HU. Reliability and validity studies of the WHO—Composite International Diagnostic Interview (CIDI): A critical review. J Psychiatr Res. 1994;28:57–84. 62. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143:29–36. 63. Frasure-Smith N, Lesperance F, Talajic M. Depression after myocardial infarction: Response. Circulation. 1998;97:707–708. 64. Strik JJ, Honig A, Lousberg R, et al. Sensitivity and specificity of observer and selfreport questionnaires in major and minor depression following myocardial infarction. Psychosomatics. 2001;42:423–428. 65. Low GD, Hubley AM. Screening for depression after cardiac events using the Beck Depression Inventory-II and the Geriatric Depression Scale. Soc Indic Res. 2007;82:527–543. 66. McManus D, Pipkin SS, Whooley MA. Screening for depression in patients with coronary heart disease (data from the Heart and Soul Study). Am J Cardiol. 2005;96:1076–1081. 67. Charlson ME, Ales KL, Simon R, et al. Why predictive indexes perform less well in validation studies. Is it magic or methods? Arch Intern Med. 1987;147:2155–2161. 68. Dawes RM, Faust D, Meehl PE. Clinical versus actuarial judgment. Science. 1989;243:1668–1674. 69. Huffman JC, Smith FA, Blais MA, et al. Rapid screening for major depression in postmyocardial infarction patients: An investigation using Beck Depression Inventory II items. Heart. 2006;92:1656–1660. 70. Dickens CM, Percival C, McGowan L, et al. The risk factors for depression in first myocardial infarction patients. Psychol Med. 2004;34:1083–1092.
15 SCREENING IN CARDIOVASCULAR CARE
333
71. Simon GE, Von Korff M. Medical co-morbidity and validity of DSM-IV depression criteria. Psychol Med. 2006;36:27–36. 72. Simon GE, VonKorff M, Piccinelli M, et al. An international study of the relation between somatic symptoms and depression. N Engl J Med. 1999;341:1329–1335. 73. Simon GE, Von Korff M, Lin E. Clinical and functional outcomes of depression treatment in patients with and without chronic medical illness. Psychol Med. 2005;35:271–279. 74. Jones RN. Identification of measurement differences between English and Spanish language versions of the mini-mental state examination. Detecting differential item functioning using MIMIC modeling. Med Care. 2006;44:S124–133. 75. Hunt M, Auriemma J, Cashaw AC. Self-report bias and underreporting of depression on the BDI-II. J Pers Assess. 2003;80:26–30. 76. Lowe B, Grafe K, Zipfel S, et al. Diagnosing ICD-10 depressive episodes: Superior criterion validity of the Patient Health Questionnaire. Psychother Psychosom. 2004;73:386–390. 77. Davidson KW, Kupfer DJ, Bigger JT, et al. Assessment and treatment of depression in patients with cardiovascular disease: National Heart, Lung, and Blood Institute working group report. Psychosom Med. 2006;68:645–650. 78. Stafford L, Berk M, Jackson HJ. Validity of the Hospital Anxiety and Depression Scale and Patient Health Questionnaire-9 to screen for depression in patients with coronary artery disease. Gen Hosp Psychiatry. 2007;29:417–424. 79. Fancher T, Kravitz R. In the clinic. Depression. Ann Intern Med. 2007;146:ITC5–1-ITC5–16. 80. Whooley MA. Depression and cardiovascular disease: Healing the broken-hearted. JAMA. 2006;295:2874–2881. 81. Gutierrez RC. Assessing depression in patients with congestive heart failure. Can J Cardiovasc Nurs. 1999;10:29–36. 82. Denollet J, Strik JJ, Lousberg R, et al. Recognizing increased risk of depressive comorbidity after myocardial infarction: Looking for 4 symptoms of anxietydepression. Psychother Psychosom. 2006;75:346–352.
This page intentionally left blank
16 SCREENING IN DIABETES CARE: DETECTING AND MANAGING DEPRESSION IN DIABETES Norbert Hermanns and Bernhard Kulzer
1. 2. 3. 4. 5.
Depression in Diabetes is a Major Health Problem Screening Tests Treatment Options Screening Program Conclusions for Clinical Practice
Context The analysis of depression screening in diabetes according to the four criteria of the United Kingdom’s National Screening Committee shows that both screening tests and treatment options are available. However, results of the Cochrane meta-analysis about depression screening in primary care settings indicate that the implementation of depression screening needs a structured approach to link these two components. A stepped-care approach comprising verification of positive screening results, treatment options, assessment of response to treatment, and adaptation may carry favorable results with regard to reduction of depression as well as cost-effectiveness. The association between diabetes and distress has long been recognized. In 1685 Thomas Willis, a British physician, suggested that diabetes might be a consequence of prolonged sorrow.1 In the middle of the 20th century Alexander2 regarded diabetes as one of the seven major psychosomatic diseases. In more recent years these historical observations have been supported by growing empirical evidence of a special relationship between emotional distress and diabetes. A meta-analysis regarding depression and diabetes onset showed that the presence of depressed symptoms increased the risk of developing diabetes by 37%.3 However, the effect is 335
336
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
bidirectional.4,5 Meta-analytic findings suggest that the comorbidity of depression and diabetes is frequent: approximately one third of diabetic patients report symptoms of depression, and a smaller group of 10% of diabetic patients meet the criteria of a clinical depression.6 In diabetes care settings the recognition rate of depression in diabetic patients is disappointingly low, ranging between 20% and 50%.7 Even in more specialized diabetes care settings approximately 50% of depressed diabetic patients remain undetected.8–10 Thus, there are strong and compelling arguments in favor of depression screening in diabetes, and this is also recommended by several guidelines for diabetes care (Fig. 16.1). However, there are also arguments against depression screening. Studies analyzing the effectiveness of depression screening in primary care settings do not all support large-scale implementation of depression screening.11 Increasingly, there is a need to justify depression screening in different medical conditions with regard to its effectiveness and ethical and clinical implications and to specify whether screening as a routine or more selective case-finding is warranted.12 Screening for depression potentially exposes both false positives and true positives (but otherwise unrecognized cases) to stigmatization and potential discrimination by health insurance companies or employers. Thus, the potential benefits of screening for a specific condition have to be balanced against its disadvantages. The U.K. National Screening Committee specified criteria for screening that should help to ensure that any screening program does more good than harm.13 It established criteria pertaining to the condition (it should be a major health problem), the screening tests (they should have sufficient screening performance), the treatment options (they should be available for those detected), and the screening program (it should be of proven benefit). This chapter will analyze the rationale for depression screening in diabetic patients according to these broad criteria.
100%
detection rate
80% 60%
75
78
75
49
44
51
56
75
40% 20% 25
22
25
25
0% Rubin (7)
Pouwer (9) Pouwer (9) Hermanns (8) Katon (10) Hermanns (8) subthreshold subthreshold
detection
No detection
Figure 16.1. Detection rates of depression in diabetic patients.
16 SCREENING IN DIABETES CARE
1.
337
Depression in Diabetes is a Major Health Problem
The relevance of depression in diabetes can be demonstrated with regard to the frequency of depression in diabetes and its impact on the prognosis, quality of life, and healthcare costs of diabetic patients.
Prevalence of Depression in Diabetes Depression is a frequent comorbid condition in diabetes. A meta-analysis demonstrated that 31.0% of diabetic patients described themselves as having elevated depressive symptoms, compared with 14% of nondiabetic subjects. Depression based on the diagnosis of mental health specialists occurred in 11.4%, compared with 5.0% of nondiabetics. Minor and subsyndromal depressions are about twice as common as major depression in diabetes.6 Out of 100 unselected diabetic patients, approximately 10 to 12 meet the diagnostic criteria for clinical depression and a further 20 suffer from mild or subthreshold depression.
Prognostic Relevance of Depression The comorbidity of depression and diabetes must be taken seriously because of the implications for the prognosis and quality of life of affected diabetic patients.14–17 There is evidence that depression might impair effective diabetes self-care. Diabetic patients with a higher depression score showed higher rates of nonadherence to oral antidiabetic medication, less exercise, more unhealthy diet, and less glucose monitoring.18,19 A meta-analysis found a significant association between depression and glycemic control; subanalysis showed this relationship was even stronger if the only patients who were analyzed were those who fully met the diagnostic criteria for depression.17 Depression in people with diabetes is also a risk factor for the occurrence of late complications and functional disability. A prospective study with 7 years of follow-up demonstrated that the hazard ratio for macrovascular complication is more than three times higher if depressive symptoms were reported at baseline.20 The hazard ratios for microvascular complications and functional disability were 8.6 and 6.9 if minor depression was present. There was only a small difference between mild and more severe depression with regard to the risk of late complications.21,22 An epidemiologic analysis of the NHANES II study also revealed that depression is a risk factor for enhanced mortality in diabetic patients: depressed diabetic patients had a mortality rate 54% higher than nondepressed diabetic patients.23 Katon and colleagues24 found a relative risk for mortality of 1.67 in
338
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
diabetic patients with minor depression and a hazard rate of 2.67 in diabetic patients with major depression. In summary, there seems no safe threshold for depression, as even mild depressive symptoms seem to have a negative impact on the prognosis.
Depression and Quality of Life Diabetes care guidelines define an optimal quality of life as one of the primary objectives of diabetes therapy. Depression in diabetes not only has adverse somatic consequences but also impairs quality of life in diabetic patients (Table 16.1). According to an Australian survey, depression in diabetic patients was associated with poorer quality of life in all eight quality-of-life dimensions (physical functioning, role limitation due to physical health, bodily pain, general health, vitality, social functioning, role limitations due to emotional health and mental health).25
Table 16.1. Quality of Life and Depression in Diabetes. P value Diabetes Diabetes Major No major depression only (%) and major depression depression (%) and no (%) diabetes (%) Difficulty walking 12 city blocks Difficulty climbing 10 steps Difficulty standing on feet for 2 h Difficulty sitting for 2 h Difficulty stooping, bending, or kneeling Difficulty reaching over head Difficulty grasping small objects Difficulty lifting 10 pounds Difficulty pushing or pulling heavy objects Difficulty shopping Difficulty visiting friends
10.9
26.7
39.0
60.2
<0.0001
8.1
20.1
30.7
51.7
<0.0001
13.2
30.1
40.0
61.6
<0.0001
7.3
23.9
17.1
35.9
<0.0001
15.9
35.3
44.0
59.7
<0.0001
5.5
18.0
17.0
32.0
<0.0001
5.4
14.4
18.9
30.9
<0.0001
7.1
19.5
25.4
49.5
<0.0001
10.2
26.2
32.2
55.5
<0.0001
5.1 4.0
17.3 16.8
20.5 15.6
39.8 34.7
<0.0001 <0.0001
339
16 SCREENING IN DIABETES CARE
Table 16.1. (Continued) P value Diabetes Diabetes Major No major depression only (%) and major depression depression (%) and no (%) diabetes (%) Difficulty watching television or listening to music to relax Overall functional disability
2.1
12.2
7.6
22.3
<0.0001
24.5
51.3
58.1
77.8
<0.0001
Egede LE. Effects of depression on work loss and disability bed days in individuals with diabetes. Diabetes Care. 2004;27:1751–1753.
Clinical depression and depressive symptoms (including subsyndromal depression) are also associated with higher diabetes-related distress. In a clinical survey, only 14.7% of patients with low or no depression reported a high amount of diabetes-related distress, but 56.3% of patients with subthreshold depression and 73.6% with clinical depression suffered from diabetesrelated stress.8 Fisher and colleagues26 found a strong association between the presence of subthreshold depression, a high amount of diabetes-related distress, and metabolic as well as behavioral risk factors. Although the causal relationship between depression, diabetes-related distress, and impaired quality of life is not fully understood, it seems that there is a syndrome of depressed mood, reduced well-being, and diabetes-related distress that is a major barrier for attaining an optimal quality of life as the ultimate treatment goal of diabetes.
Socioeconomic Aspects of Depression in Diabetes Depression in diabetes has also socioeconomic implications. Despite the poorer outcome of depressed diabetic patients, costs associated with treatment of depressed diabetic patients are significantly higher than in nondepressed diabetic patients.27,28 There is evidence that glycemic control is more impaired, diabetes self-care is more reduced, and the likelihood of adverse outcomes such as functional disability, comorbidities, higher healthcare costs, or even mortality are significantly enhanced in depressed diabetic patients. This adverse outcome of depressed mood in diabetic patients is evident in patients with minor or subthreshold depression as well as in patients with clinical depression.
340
2.
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Screening Tests
Screening tests for depression in diabetes should have sufficient screening performance, but they should also be simple to administer and acceptable to both healthcare professionals and patients.13
Screening Performance There are many validated questionnaires available to screen for depression or to assess depressive symptoms. All depression scales used for depression screening in the general population could be used in diabetic patients. Additional evidence is available for the Beck Depression Inventory (BDI)8,29 and the Center for Epidemiological Studies Depression Scale (CES-D).8 The 9-item Patient Health Questionnaire (PHQ-9) has been used in diabetic patients,19 but its screening performance has not yet been assessed in the diabetic population. For depression screening in diabetic patients, the WHO 5 questionnaire30 and the Problem Areas in Diabetes Questionnaire (PAID),8 assessing diabetes-related distress, have also been used. The latter two questionnaires measure a broader aspect of negative emotional status in diabetic patients (psychological well-being, diabetes-related distress) than the more specific depression questionnaires. The screening performance of questionnaires is evaluated according to their sensitivity and specificity. These depend on the selection of a cutoff score defining a positive screening result. For clinical practice, the positive (PPV) and negative (NPV) predictive values are also of considerable interest, since the PPV informs the healthcare professional about the relationship between patients who screen positive and truly depressed patients. A rather low PPV is associated with high rate of false positives. Table 16.2 summarizes the screening performance for case-finding of clinical depression of the above-mentioned screening instruments.
Table 16.2. Screening Performance of Depression Screening Tools Used in Diabetic Patients. Questionnaire Cutoff Sensitivity Specificity Positive Negative Predictive Value Predictive Value BDI (41) BDI (8) CES-D (8) WHO-5 (30) PAID (8)
12 11 23 £12 38
90.0% 87% 79.2% 100% 81.1%
84% 81% 89% 78% 74%
59% 66% 54% 45% 34%
97% 83% 96% 100% 96%
341
16 SCREENING IN DIABETES CARE
Depression questionnaires like the BDI and CES-D showed high sensitivity and specificity. PPVs were higher than 50% and NPVs were higher than 80%. The questionnaires that are less depression-specific, like WHO-5 and PAID, had a comparable sensitivity to depression questionnaires but a lower specificity. These questionnaires measure a broader aspect of emotional aspects (psychological well-being and diabetes-related distress), which may result in a lower specificity and rather low PPVs (less than 50%). Figure 16.2 summarizes the screening performance of the different screening tools. The screening performance is expressed as the positive likelihood ratio. As expected, depression-specific questionnaires had the highest positive likelihood ratio, followed by the less-depression-specific questionnaires (well-being and diabetes-related distress). The advantages of all questionnaires are that they are easy to administer and to evaluate. Furthermore, all questionnaires are able not only to screen for clinical depression but also to quantify subthreshold emotional problems. Well-being and diabetes-related distress questionnaires may be more in line with the expectations of diabetic patients seeking medical treatment than depression questionnaires—patients may expect to be asked about diabetesrelated problems or well-being instead of depressed feelings and suicidal intentions—but this advantage is balanced by the somewhat lower screening performance. The low screening performance of verbally asked questions31 may be explained by a reduced readiness on the part of diabetic patients to speak about emotional problems if they are directly asked about depressed feelings.
Acceptability of Screening The acceptability of screening for patients is determined by the time needed to complete the questionnaire and the complexity of questions. 8
7,2
positive likelihood ratio
7 5,6
6
4,7
5
4,5
4 3,1 3
2,7
2 1 0 CES-D (8)
BDI (29)
BDI (8)
WH0 (30)
PAID (8)
Figure 16.2. Positive likelihood ratio of different screening tests.
Quest (31)
342
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
For health professionals the time needed to score and interpret the result is also important. The time to complete the above-mentioned questionnaires ranges from 1 to 5 minutes, and so they could be completed in the waiting room. Questionnaires that measure quantity and intensity of negative emotions might be seen as difficult to complete by some people. Ideally, the purpose of the questionnaire and the findings would be discussed individually with each patient. The pros and cons of screening questionnaires are summarized in Textbox 16.1. From a clinician’s perspective, the discriminatory value, in particular the PPV, of a screening tool plays a decisive role.32 If the PPV is low, the healthcare professional has to deal with numerous false positives. Where the prevalence of a condition is low, most tests will yield a low PPV (Fig. 16.3). There are several possible solutions to the low PPV problem. One is to add a second screen for those who initially screen positive (Appendix Table 4). A second option is to choose a higher cutoff, and a third option is to screen only the high-risk cases who have by definition a high prevalence of depression. The likelihood for depression is not equally distributed among diabetic patients: risk factors such as female gender, lack of social support, younger age, and low socioeconomic status are associated with a higher risk of depression.32 In diabetic patients there are additional risk factors showing a substantial association with clinical or subthreshold depression: occurrence of late complications, especially neuropathy and erectile dysfunction in men, the need for insulin therapy in type 2 diabetic patients, poor glycemic control, and hypoglycemia problems.15–17,33
TextBox 16.1. Pros and Cons of Screening vs. Routine Detection Pros
Cons
• • • •
• Requires literacy of the patient • Scoring sometimes is complex
Easy to administer Easy to evaluate Time saving if done during waiting time Measures of subclinical depression, well-being, or diabetes-related distress
(need of templates)
• Feedback and discussion of test results needs communicative skills
• Cutoff scores sometimes have to be adapted with regard to the setting
343
16 SCREENING IN DIABETES CARE
true depressed
false positive
positive predictive value
100% 80%
34
40
47
55
60%
67
34
34
19
10%
5%
40% 66
60
20%
53
45
0% 30%
25%
20%
15%
population prevalence of a condition
Figure 16.3. Positive predictive value and population prevalence.
3.
Treatment Options
The third criterion of the U.K. National Screening Committee for the evaluation of depression screening refers to the treatment options for the screened condition.13 For the ethical consideration of depression screening in diabetic patients, it is important that screening not merely leads to an additional diagnosis of depression, but that effective treatment options are available. An additional depression diagnosis without an effective treatment option could cause a stigma to the patient and a risk of discrimination by insurance companies or employers.12 Fortunately, there are treatment options available, including nonspecific interventions like diabetes education and counseling on diabetes-related problems and more specific antidepressive treatment strategies.
Nonspecific Interventions Diabetes education has proven to be effective in treating subthreshold as well as clinical depression. In two different studies the rate of subthreshold depression dropped from 38% to 13% 6 months after diabetes education34 and from 28% to 18% 1 year after diabetes education.35 In randomized controlled trials that evaluated more specific treatments like nortriptyline, fluoxetine, or cognitive– behavioral therapy in diabetic patients with major or clinical depression, diabetes education was also used frequently as a ‘‘placebo treatment.’’ The remission rate of major depression after diabetes education was 37% (compared to fluoxetine36 or cognitive–behavioral therapy37) and 41% (compared to nortriptyline38). In summary, diabetes education that provides diabetic patients with skills and knowledge to better cope with diabetes-related challenges, can halve the rate of subthreshold depression and even reduce the rate of major depression by more than one third.
344
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Specific Antidepressive Interventions Specific antidepressive treatments consist of antidepressant medication and psychotherapy. In diabetic patients the antidepressant drugs nortriptyline38 and fluoxetine36 reduced major depression rates and severity to less than 50%. Cognitive–behavioral therapy focuses on replacing the diabetic patient’s dysfunctional attitudes and negative cognitions with more appropriate perspectives and cognitions. In one study, cognitive–behavioral therapy led to a remission of major depression of 70%37. Thus, all specific antidepressive treatments were effective in reducing depression in diabetic patients. However, only cognitive–behavioral therapy had an additional effect on glycemic control;37 fluoxetine did not have any additional beneficial effect on glycemic control,36 and nortriptyline led to a slight deterioration of glycemic control.38
4.
Screening Program
Implementation of a large-scale screening program for depression in diabetes should be justified by high-quality randomized trials demonstrating that such a program in diabetic patients reduces the morbidity of depression.39 Furthermore, data showing the cost-effectiveness of screening would strengthen arguments for implementation of depression screening, since all screening programs will have an impact on finite healthcare resources.
Effect of Depression Screening on Morbidity There are no meta-analytic findings based on randomized controlled trials about the effectiveness of depression screening in diabetes. Therefore, we have to rely on a Cochrane review about the efficacy of depression screening in primary care settings,11 which is where most diabetic patients are treated. Depression screening is able to identify unrecognized cases if there is a selected feedback of elevated depression scores. But if there is no strategy for dealing with positively screened or identified cases, the effect on depression management is not substantial. In the field of diabetes, the only randomized controlled trial is the Pathways study,19 which shows a beneficial effect of a structured screening and treatment program for depression in diabetic patients. In a two-stage process, positive screening results on the PHQ-9 were confirmed by a second diagnostic procedure using the Hopkins Symptom Checklist (SCL-90). Depressed diabetic patients in the intervention group were offered a choice of antidepressant medication or problem-solving therapy. If depression did not improve within 10 to 12 weeks, the initial treatment was either intensified or changed. If patients did not respond to the intensification or treatment switch, they were referred to a
16 SCREENING IN DIABETES CARE
345
specialized mental health service. This stepped-care approach was compared to the control condition, which involved simply informing the patients about their depression and asking them to speak with their primary care physician about depression treatment. The intervention group had a significantly greater reduction of depression (–40%) than the control group (–12%). In summary, there is evidence that depression management in diabetic patients is effective. The stepped-care approach described, containing screening, an offer of treatment options, and assessment of treatment response, proved to have the potential to reduce the morbidity of depression in diabetes.
Cost-Effectiveness of Depression Screening and Intervention Healthcare resources are finite, so the cost-effectiveness of depression screening in diabetes is a matter of debate. In the Pathways study, a costeffectiveness analysis40 showed that within 2 years, there was an increase of days without depression to 61 days per patient in the intervention group. The cost analysis showed that this stepped-care approach led to an annual net cost reduction of $314 in total healthcare costs, even taking into account the costs for depression screening ($ 27) and antidepressant treatment ($ 545). In summary, there are promising results that implementation of depression screening using a stepped-care approach is effective in reducing depression and also in reducing healthcare costs, but further research is clearly required.
5.
Conclusions for Clinical Practice
The analysis of depression screening in diabetes according to the four criteria of the U.K. National Screening Committee showed that depression in diabetes is a major healthcare problem warranting attention. Screening tests are available, although their accuracy might be improved, and treatment options are available. However, results of the Cochrane meta-analysis about depression screening in primary care settings indicate that the implementation of depression screening needs a structured approach with defined treatment options and control of treatment response; otherwise, depression screening alone will not reduce morbidity. A stepped-care approach comprising verification of positive screening results, treatment options, assessment of response to treatment, and adaptation showed favorable results with regard to reducing depression as well as to costeffectiveness and even mortality. For the clinical management of depression in diabetic patients, the flowchart in Figure 16.4 could provide some guidance. Depression screening for diabetic patients seems to be appropriate if there is a clinical impression of low mood or if risk factors are present (eg, history of depression, late complications). It is not yet clear if routine screening for an
346
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
risk factors for depression
clinical impression
Treatment as usual
Monitoring
negative Depression
Differential diagnosis Exclude, eg,mental comorbidities Anxiety, dementia
positive Depression diagnosis
negative Diabetes management
positive
Reduction of diabetes-related problems
Specific antidepressive treatment Dosage increase treatment
(eg, medication, CBT)
Response not sufficient
Response sufficient
Figure 16.4. Flow diagram of depression management in diabetes care.
unselected population with diabetes is worthwhile. Treatment as usual seems indicated if depression screening is negative and NPV rates are high. For those with positive screening results, follow-up assessment and a clinical interview for clinical depression should be performed. If the depression screening is positive but the criteria for clinical depression are not met, it seems appropriate to offer help for any related concerns. If diabetes-related problems are involved, the patient may profit from improved diabetes management, reducing diabetes-related distress. The patient’s psychological well-being should be monitored to find out if depressive symptoms will improve, deteriorate, or remain stable. Depression and poor quality of life are common in diabetic patients. Since subthreshold disorders, clinical depression, and distress all have a negative impact on the quality of life as well as on the course of diabetes, depressive symptoms deserve attention in clinical care. The timely identification of patients with subthreshold or clinical depression enables effective management of this comorbidity to diabetes. This is important so that these patients can achieve the ultimate goal of diabetes therapy, an optimal quality of life, and the fewest possible diabetes complications.
References 1. Willis T. Pharmaceutice rationalis sive diabtriba de medicamentorum operantionibus in humano corpore. Oxford, 1675. 2. Alexander F. Psychosomatic medicine. New York, Norton, 1950.
16 SCREENING IN DIABETES CARE
347
3. Knol MJ, Twisk JW, Beekman AT, et al. Depression as a risk factor for the onset of type 2 diabetes mellitus. A meta-analysis. Diabetologia. 2006;49:837–845. 4. Hermanns N, Kubiak T, Kulzer B, et al. Emotional changes during experimentally induced hypoglycaemia in type 1 diabetes. Biol Psychol. 2003;63:15–44. 5. Hermanns N, Scheff C, Kulzer B, et al. Association of glucose levels and glucose variability with mood in type 1 diabetic patients. Diabetologia. 2007;50:930–933. 6. Anderson RJ, Freedland KE, Clouse RE, et al. The prevalence of comorbid depression in adults with diabetes: A meta-analysis. Diabetes Care. 2001;24:1069–1078. 7. Rubin RR, Ciechanowski P, Egede LE, et al. Recognizing and treating depression in patients with diabetes. Current Diabetes Reports. 2004;4:119–125. 8. Hermanns N, Kulzer B, Krichbaum M, et al. How to screen for depression and emotional problems in patients with diabetes: comparison of screening characteristics of depression questionnaires, measurement of diabetes-specific emotional problems and standard clinical assessment. Diabetologia. 2006;49:469–477. 9. Pouwer F, Beekman AT, Lubach C, et al. Nurses’ recognition and registration of depression, anxiety and diabetes-specific emotional problems in outpatients with diabetes mellitus. Patient Educ Couns. 2006;60:235–240. 10. Katon WJ, Simon G, Russo J, et al. Quality of depression care in a population-based sample of patients with diabetes and major depression. Med Care. 2004;42:1222–1229. 11. Gilbody S, House AO, Sheldon TA. Screening and case finding instruments for depression. Cochrane Database of Systematic Reviews CD002792, 2005. 12. Gilbody S, Sheldon T, Wessely S. Should we screen for depression? BMJ. 2006;332:1027–1030. 13. The UK’s National Screening Committee’s criteria for appaising the viability, effectiveness and appropriateness of a screening programme. Available at: http:// www.nsc.nhs.uk/pdfs/criteria.pdf. 2003. 14. Peyrot M. Depression: A quiet killer by any name. Diabetes Care. 2003;26:2952–2953. 15. Peyrot M, Rubin RR Levels and risks of depression and anxiety symptomatology among diabetic adults. Diabetes Care. 1997;20:585–590. 16. de Groot M, Anderson RJ, Freedland KE, et al. Association of depression and diabetes complications: A meta-analysis. Psychosom Med. 2001;63:619–630. 17. Lustman PJ, de Groot M, Anderson RJ, et al. Depression and poor glycemic control. Diabetes Care. 2000;23:934–942. 18. Ciechanowski PS, Katon WJ, Russo JE. Depression and diabetes: impact of depressive symptoms on adherence, function, and costs. Arch Intern Med. 2000;160:3278–3285. 19. Katon WJ, Von Korff M, Lin EH, et al. The Pathways Study: a randomized trial of collaborative care in patients with diabetes and depression. Arch Gen Psychiatry. 2004;61:1042–1049. 20. Black SA, Markides KS, Ray LA. Depression predicts increased incidence of adverse health outcomes in older Mexican Americans with type 2 diabetes. Diabetes Care. 2003;26:2822–2828. 21. Egede LE. Diabetes, major depression, and functional disability among U.S. adults. Diabetes Care. 2004;27:421–428. 22. Pouwer F, Beekman ATF, Nijpels G, et al. Rates and risks for co-morbid depression in patients with type 2 diabetes mellitus: results of a community based study. Diabetologia. 2003;46:892–898. 23. Zhang X , Norris SL, Gregg EW, et al. Depressive symptoms and mortality among persons with and without diabetes. Am J Epidemiol. 2005;161:652–660.
348
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
24. Katon W, Cantrell CR, Sokol MC, et al. Impact of antidepressant drug adherence on comorbid medication use and resource utilization. Arch Intern Med. 2005;165:2497– 2503. 25. Goldney RD, Phillips PJ, Fisher LJ, et al. Diabetes, depression, and quality of life: a population study. Diabetes Care. 2004;27:1066–1070. 26. Fisher L, Skaff MM, Mullan JT, et al. Clinical depression versus distress among patients with type 2 diabetes: Not just a question of semantics. Diabetes Care. 2007;30:542–548. 27. Egede LE. Effects of depression on work loss and disability bed days in individuals with diabetes. Diabetes Care. 2004;27:1751–1753. 28. Egede LE, Zheng D, Simpson K. Comorbid depression is associated with increased health care use and expenditures in individuals with diabetes. Diabetes Care. 2002;25:464–470. 29. Lustman PJ, Clouse RE, Griffith LS, et al. Screening for depression in diabetes using the Beck Depression Inventory. Psychosom Med. 1997;59:24–31. 30. Awata S, Bech P, Yoshida S, et al. Reliability and validity of the Japanese version of the World Health Organization-Five Well-Being Index in the context of detecting depression in diabetic patients. Psychiatry Clin Neurosci. 2007;61:112–119. 31. Arroll B, Khin N, Kerse N. Screening for depression in primary care with two verbally asked questions: cross-sectional study. BMJ. 2003;327:1144–1146. 32. Carter RM, Wittchen HU, Pfister H, et al. One year prevalence of subthreshold and threshold DSM-IV generalized anxiety disorder in a nationally representative sample. Depression Anxiety. 2001;13:78–88. 33. Hermanns N, Kulzer B, Krichbaum M, et al. Affective and anxiety disorders in a German sample of diabetic patients: prevalence, comorbidity and risk factors. Diabet Med. 2005;22:293–300. 34. Peyrot M, Rubin RR. Persistence of depressive symptoms in diabetic adults. Diabetes Care. 1999;22:448–452. 35. Hermanns N, Kulzer B, Kubiak T, et al. Course of depression in type 2 diabetes [abstract]. Diabetes. 2004;53:A16. 36. Lustman PJ, Freedland KE, Griffith LS, et al. Fluoxetine for depression in diabetes: a randomized double-blind placebo-controlled trial. Diabetes Care. 2000;23:618–623. 37. Lustman PJ, Griffith LS, Freedland KE, et al. Cognitive-behavior therapy for depression in type 2 diabetes mellitus: a randomized, controlled trial. Ann Intern Med. 1998;129:613–621. 38. Lustman PJ, Griffith LS, Clouse RE, et al. Effects of nortriptyline on depression and glycemic control in diabetes: results of a double-blind, placebo-controlled trial. Psychosom Med. 1997;59:241–250. 39. Jones LE, Doebbeling CC. Depression screening disparities among veterans with diabetes compared with the general veteran population. Diabetes Care. 2007;30:2216–2221. 40. Simon GE, Katon WJ, Lin EH, et al. Cost-effectiveness of systematic depression treatment among people with diabetes mellitus. Arch Gen Psychiatry. 2007;64:65–72. 41. Lustman PJ, Clouse RE, Griffith LS, et al. Screening for depression in diabetes using the Beck Depression Inventory. Psychosom Med. 1997;59:24–31.
17 COMMENTARY AND INTEGRATION: IS IT TIME TO ROUTINELY SCREEN FOR DEPRESSION IN CLINICAL PRACTICE? James C. Coyne
We were pleased we were able to convince such talented authors to contribute chapters to this volume. We hope that their contributions will serve to redefine key issues in the implementation of screening programs for depression in clinical settings. The chapters are quite varied but are notable for their balanced, evidence-based recommendations and skepticism about introducing screening into routine care unless there is a substantial infusion of resources. Taken together, the chapters provide a foundation for critiquing screening programs as they are currently being implemented. Screening has become the most commonly adopted enhancement of care for depression, even if questions can be raised about the fidelity with it is being implemented. Yet, the enthusiasm for screening is not based on the accumulation of compelling new evidence, but rather a reframing of the question of its efficacy, and the evidence is mustered to answer it. The crucial question has shifted from ‘‘Does routine screening improve patient outcomes?’’ to ‘‘Can screening be used to improve outcomes when there is a substantial effort made to ensure adequate treatment and follow up?’’1 This seemingly important difference has been downplayed in endorsements of screening. And yet, stand-alone screening programs are simply not effective in improving the management of depression in primary care (see Chapter 7). Moreover, including screening as a component in more comprehensive enhancements of care may not be necessary to improve outcomes. 349
350
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
One can readily find basis in this volume for questioning the wisdom of stand-alone screening initiatives and for raising doubts whether routine screening is acceptable and sustainable in non-mental-health medical settings. I will highlight these points in the context of providing a more general commentary on the preceding chapters. One goal is to alert readers tempted by enthusiasm about screening to some frustrations and disappointments that await them if they proceed with a screening program without additional resources. I acknowledge that I am going beyond the conclusions of many chapters. However, almost all limit endorsement of screening to settings where supports are in place for absorbing the effects of screening and ensuring it has its intended effect. Unfortunately, such settings are far less common than presumed. So, the question becomes, ‘‘What are the implications of routine screening being implemented without such support?’’ In the preface, Katon notes the inadequacy of routine care for depression. Most primary care patients discontinue treatment with antidepressants shortly after it is initiated.2 Only 20% to 30% of depressed persons being treated exclusively in primary care settings receive adequate care and follow-up.3 Berndt and colleagues4 estimate that 40% of depressed patients are administered treatment with little or no benefit over what would be obtained by remaining on a wait list. This represents about 20% of the total cost of treating depression. Katon also notes the underappreciated problem of overtreatment of depression—that is, the prescribing of treatment to patients who are not depressed or likely to show benefit (see also Chapters 3 and 5). Over the past 20 years, rates of treatment of depression have doubled to quadrupled in most Westernized countries, largely due to increases in the prescription of antidepressants to persons who are mildly depressed or not at all depressed.5–7 Increasingly, rates of antidepressant prescriptions equal or exceed the estimated prevalence of depression,8 even if the most depressed persons in the community still go untreated.9 Zimmerman and Mitchell, in Chapter 1, question the validity of the diagnosis of major depression. There is no gold standard for diagnosis, and arbitrary decisions are involved in the presumptive gold standards such as diagnosis on the basis of semi-structured interview using formal diagnostic criteria. They note often-overlooked differences between DSM-IV and ICD-10 criteria. In DSM-IV, major depression requires five symptoms. The more nuanced ICD-10 criteria distinguish between mild depression, requiring only four symptoms, and moderate depression, requiring six symptoms. Thus, U.S. practice guidelines10 do not distinguish degree of severity in recommending that a diagnosis of major depression indicates a need for treatment, in contrast to U.K. recommendations, which encourage watchful waiting and nonpharmacologic intervention for mild and moderate depression.
17 COMMENTARY AND INTEGRATION
351
Zimmerman and Mitchell propose the justification for a diagnosis lies in its identifying ‘‘meetable unmet needs.’’ Does major depression satisfy such a criterion? Patten11 recently raised the question of whether the diagnosis of major depression is overinclusive as an indicator of addressable clinical need, singling out community-based studies that used lay interviews and that produced high estimates of the prevalence of depression and low rates of its treatment (see also Brugha and colleagues12). Establishing that the criteria for depression are somewhat arbitrary sets the stage for psychiatrists not relying on them in any systematic fashion for interviewing patients and making diagnoses. Psychiatrists may be prone to making invidious comparisons between their own and primary care physicians’ diagnostic skills, but Zimmerman and Mitchell show that psychiatrists typically inquire only about depressed mood and not anhedonia, and that 90% of psychiatrists do not use formal criteria for case identification or assessment of severity. Mitchell, in Chapter 2, provides an historical overview of existing mood scales. His exhaustive list is long, but only a small handful are in very wide use. He notes that scales can be applied in the separate tasks of screening, diagnosing, and monitoring clinical improvement. While it is tempting to expect that a single scale will perform all of these tasks well, it is unrealistic. One issue that arises in the evaluation of screening scales is whether it is of any advantage that scale items conform closely to diagnostic criteria. Presumably a scale such as the nine-item Patient Health Questionnaire (PHQ-9) that is directly modeled on such criteria should be more efficient, but that has not generally proven to be the case. Yet, such scales, originally designed as screening instruments, are increasingly being promoted and accepted as both diagnostic instruments suitable for making treatment decisions13 and the gold standard to which physician detection of depression is compared.14 However, such scales do not consider exclusion criteria for major depression. When administered in a self-report format, they do not provide for answering patients’ questions about what is meant by particular items, probing their responses, or asking clarifying questions. Screening instruments need to be acceptable to clinicians and patients, and this criterion in turn needs to be the balanced against the validity of the instrument. Seemingly minor changes in the burden on patients completing screening instruments or on clinicians in scoring them can make large differences in their acceptability. Bermejo and coworkers15 found that after participating in a screening study, 62.5% of primary care physicians found the PHQ-9 too long and 37.5% found it too time-consuming, even though it typically took less than 2 minutes. Half of the physicians rated the PHQ-9 as an impediment to daily practice and 75% thought it was impractical. Kessler and Wang16 report a physician’s objection to screening: ‘‘You are proposing we use half or more of
352
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
our appointment to ask patients a set of questions about things that they usually are not here to discuss and usually will not generate a positive finding?’’ Shorter scales should be more acceptable than longer ones, and a suggestion repeatedly appears in the literature that ultrashort, one- or two-item scales can be sufficient as screens, in terms of validity. Mitchell and Coyne17 conducted a systematic review and meta-analysis of such ultrashort screening scales and concluded that they were better at ruling out the need for further assessment than ruling in further assessments—that is, their negative predictive value is substantially higher than the positive predictive value. Even so, despite the high negative predictive value of a two-item screen, over 25% of patients who are depressed will be missed. Bennett and colleagues18 examined ultrashort screening scales as a basis for deciding to administer a longer screening scale. While a small advantage was found, this strategy carries the risk that patients will decline a second questionnaire or that harried clinicians will accept the simple screen as confirmation of a diagnosis. There are inherent limitations to the performance of screening scales; there is always some trade-off in maximizing specificity versus sensitivity or vice versa, and between validity and acceptability. Furthermore, any existing and perhaps any conceivable self-report scale is going to require a clinical interview if underidentification or overidentification of depression is to be minimized. With properly cross-validated cut points, many scales appear to perform about the same.19 Consumers of the screening literature should be suspicious of claims to the contrary. It is quite common to find claims amounting to home-court advantage—in other words, adjustments to the cut points for a particular instrument favored by some investigator produce better performance in the specific sample under study than fixed, well-validated cut points on an established scale. These findings capitalize on chance, including sampling error. A meta-analysis found that studies tailoring cut points to particular samples produce spurious estimates of the performance of instruments.20 Zimmerman and Mitchell’s doubts about the interviewing practices of psychiatrists do not take primary care physicians off the hook. Mitchell, in Chapter 3, documents a persistent failure of primary care physicians to detect depression, despite campaigns and educational efforts to promote detection. Mitchell provides the pooled estimates of a sensitivity of 48% and a specificity of 70% for detecting depression. Assume that the prevalence of depression in primary care is 10%, consistent with a large body of research. Figure 3.3 in Mitchell’s chapter shows that at that prevalence, these pooled sensitivity and specificity figures translate into physicians correctly identifying 4.8% of patients as depressed, falsely identifying 5.2%, correctly reassuring 60.5% that they are not depressed, and falsely reassuring 29.5% of patients who are actually depressed. The remainder of Mitchell’s chapter reviews factors
17 COMMENTARY AND INTEGRATION
353
associated with whether depression is detected in a primary care visit. Most primary care physicians cannot recall the formal criteria for depression; few use formal interviews or screening instruments. Yet, screening programs typically require primary care clinicians to conduct a follow-up interview of patients who screen positive to confirm a diagnosis. There are reasons to doubt they would be prodded to do this or assisted in doing so: simple efforts at guideline implementation or educational interventions to improve physician recognition of depression are notoriously ineffective.21,22 Valenstein and associates23 investigated the effects of providing interview guides and other aids for physicians participating in an implementation of the PRIME-MD,24 which consists of a coordinated patient screening questionnaire and Clinician Evaluation Guide (CEG). Despite provision of support staff, only 21% of positive screens were followed up. Use of the PRIME-MD fell off sharply after withdrawal of added support; there was soon no use of either the questionnaire or the CEG. These authors provided more resources and support to the clinical setting than would likely be available in settings implementing routine screening, but their results are quite consistent with other studies of stand-alone screening that uniformly yield little or no effect on interventions being offered and no significant effect on patient depression outcomes.25–27 Yet, the authors provide a potential insight into why there is little or no benefit to screening: namely, clinicians may see little need to follow up on positive screens if patients do not appear distressed, and they do not want to use any formal algorithm to determine diagnosis when depression is suspected. Moreover, clinicians notified of patients who screen positive often preferred watchful waiting to active intervention with the mildly depressed patients who are identified.27 Thus, the likelihood that a positive screen will become an adequately treated case is lower than expected. This is illustrated in Figures AP.1-3 in the Appendix. Asked to nominate barriers to detection of depression, primary care physicians emphasize structural and organizational issues: half of all them endorsed lack of time, lack of reimbursement for depression treatment, and lack of access to specialist care. Turning to patient factors, Mitchell notes that most primary care patients who spontaneously complain of depressive symptoms are detected, and most will volunteer symptoms when asked directly. However, prominent among the reasons for nondisclosure are patients’ beliefs that they can handle the symptoms on their own, and if they need professional assistance, that primary care is not where to obtain it. Who are the currently untreated depressed primary care patients who would be identified with screening? Ethnic minorities and low-education patients are particularly likely to go undetected,28 but severity of depression is a crucial determinant of detection.29,30 Coyne and associates31 found that over half of undetected patients had one or no symptoms beyond the five required for a
354
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
diagnosis and were less likely to have a past history of treatment. Results suggested that if primary care physicians were to improve their detection, they would have to increase their willingness to make a diagnosis on the basis of fewer symptoms and pay more attention to mild symptoms in highly functioning patients. Von Korff32 notes, ‘‘we need to be circumspect about concluding that unrecognized, undiagnosed, and untreated primary care patients with mental disorders necessarily indicate poor quality of care’’ (p. 295). Smith, in Chapter 4, provides an excellent review of innovations in psychometrics that can be used to improve the efficiency and the validity of existing mood scales. With the development of the Rasch model as a basis for constructing and refining instrumentation, no longer does the dictum of classical test theory that ‘‘longer is better’’ hold. Smith notes that many of the barriers to routine screening for depression realizing its potential are organizational or related matters of clinicians being able and willing to change their existing practices to ensure quality of care. However, to the extent to which inadequacies of existing scales burden patients and clinicians, there is room for increasing the sustainability and effectiveness of routine screening by refining the scales. As he notes, very often existing scales can be pared down to a quarter or a third of their current length with no loss in validity, and even improvement. Furthermore, the creation of large databanks of past patients’ responses to individual items can be used to create algorithms for computerized, tailored adaptive testing of new individual patients, with the selection of the next item to be presented to them determined by their accumulating responses. Demonstrating the incredible power of adaptive testing, Gibbons and colleagues33 administered 616 items from Mood and Anxiety Spectrum Scales (MASS) to 800 outpatients from a mood and anxiety treatment program as the basis for developing a computerized adaptive testing (CAT) using post hoc simulation. On average, the 616 items were reduced by 95% to 24, and the CAT version was still correlated 0.95 with the original MASS. Despite the promise of new psychometric approaches for the refinement of screening instruments, we should not be under the illusion that they can entirely overcome their limitations. Santor and Coyne34 used such methods with the CES-D in a sample of 528 primary care patients, split into a study and a crossvalidation group. A reduction of the scale from 20 items to 9 was possible, and the positive predictive value was raised 30%. However, even with these improvements, a proportion of patients screening positive would not be found to be depressed, and a proportion of depressed patients would be missed. Gilbody and Beck, in Chapter 7, declare the conclusion of their systematic review in their title ‘‘Implementing Screening as Part of Enhanced Care: Screening Alone is Not Enough.’’ The 2002 U.S. Preventive Services Task Force (USPSTF), which was pivotal in revising the recommendations concerning screening for depression, was based on expanding the inclusion criteria
17 COMMENTARY AND INTEGRATION
355
for relevant studies to include studies of screening in the context of more general quality improvement in depression care. The single decisive study was Wells and colleagues’ Partners in Care,35 an ambitious effort involving resources such as personnel to administer and score screening instruments, training materials and academic detailing, depression management specialists, initiatives to ensure scheduling of follow-up appointments, consultations and training with mental health professionals, and ready access to antidepressants and psychotherapy, as well as resources in kind provided by participating practices. Gilbody and Beck note that prior to the UPSTF report, stand-alone screening programs were not recommended because of a lack of evidence for their effectiveness. These authors take up the issue of whether programmatic enhancements of care for depression are effective, and conclude that they are indeed modestly effective. However, they next identified studies relevant to answering the critical question not adequately addressed in the UPSTF report: whether screening is a necessary component of enhanced care, in terms of the added value for effectiveness. A stratified analysis of studies with and without screening revealed a slight advantage for programs that included screening, but the differences was modest, with a standardized mean difference of 0.15, accounting for less than 1% percent of the variance in outcomes. Yet, none of these studies involved randomized comparisons of screening versus the same program without screening, and reliance on screening to select patients covaried with other factors. Meta-regression analyses of sources of heterogeneity in outcomes found that among key program elements, whether the care manager was trained, whether this care manager received regular supervision from mental health professionals, and whether the enhanced care intervention targeted increased guideline-concordant treatment with antidepressants all had greater impact than whether patients were screened or referred. Additional analyses reported elsewhere36 differentiated among outcomes of screening and found that including screening in an enhancement of depression care had nonsignificant effects on physician recognition or provision of any treatment for depression, and no effect on use of antidepressants or changes in patient depression scores. Rogers, Lerner, and Adler, in Chapter 8, discuss the use of technology to reduce the burden of screening on patients and clinicians and to improve its efficiency. These technologies range from telephone screening by a human interviewer and automated telephone response systems in which the voice of the interviewer is pre-recorded and patient responses are automated using voice recognition technology or the telephone touchpad to personal digital assistants (PDAs), touchscreens, and the Internet. Rogers and colleagues also distinguish between nonadaptive applications that basically replace the use of conventional pencil-and-paper screens with telephone or electronic
356
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
presentations of the same items, and adaptive applications that use a large banking of other patients’ responses to items and the power of computers to tailor the selection of items presented to individual patients based on their previous item responses. Essentially, adaptive testing streamlines and individualizes screening in a way that cannot be readily accomplished with penciland-paper screening. There are high hopes that such technology will extend the reach and sustainability of routine screening for depression by improving its acceptability, efficiency, and accuracy, and where resources are available, clinical settings are in a rush to obtain touchscreens and PDAs. Yet, Rogers and colleagues note that evaluations of nonadaptive technologies have been largely limited to their acceptability to patients and clinicians and their comparability in results to what is obtained with conventional pencil-and-paper screening. Evidence consistently indicates that such technologies are convenient and acceptable to patients and comparable in their results to conventional screening. They may even have advantages for some patients who find a more impersonal assessment more acceptable and more conducive to honesty than completing a screen in front of a clinic staff member. Adaptive testing, however, requires a large database to evaluate the performance of individual items and develop algorithms for selecting the items to be administered in an individual screening, and the requisite item banks are just being assembled, with validated algorithms not readily available for clinical applications. Perhaps the efficiency and automated screening results afforded by technological aids can free up clinical time and other resources for assessment and treatment, and the same technologies used in screening can be efficiently used in monitoring the progress of individual depressed patients and tailoring adjustments in their treatment to maximizing their improvement, including prompting clinicians of the need for follow-up. This is an ambitious but potentially realizable goal with emerging technologies.37 Chapter 9, ‘‘Screening for Depression in Primary Care: Can It Become More Efficient?’’ by Magruder and Yeager is this volume’s most upbeat chapter about the prospects of screening for depression, but still qualifies its optimism with the assumption that screening is implemented in clinical contexts where resources are available to resolve positive screens and adequate treatment and follow-up are ensured. The chapter articulates general standards for evaluating whether screening is worthwhile, drawing on the World Health Organization’s criteria for the implementation of screening. Magruder and Yeager’s optimism is also predicated on progress in technologies for screening and monitoring clinical change and automating recontact of patients for follow-up. Chapters by Parker and Hyett (Chapter 10) and Babaei and Mitchell (Chapter 11) take opposing sides in the debate over whether screening for
17 COMMENTARY AND INTEGRATION
357
depression should accommodate physical comorbidity. Parker and Hyett dispute whether there is any consistency in depression seen in psychiatric versus general medical contexts, and so screening instruments for general medical settings need to accommodate overlap between the symptoms of the physical condition and depression. The authors suggest that screening instruments originally developed in specialty mental health contexts may be inadequate for general medical settings without substantial modification. Importantly, Parker and Hyett argue that an excess of false positives will accumulate in screening if the influence of confounding medical conditions is not taken into account. In contrast to an ‘‘inclusive’’ approach to diagnosis and assessment, in which symptoms that might be attributable to physical health conditions, two alternatives exist. The first, exclusionist approach is to eliminate consideration of possibly confounded symptoms, and the second, substitutive approach involves substituting other symptoms for those that are suspected to be confounded. Parker and Hyett identify three screening instruments that adopt an exclusionist or approach to test construction: the sevenitem Beck Depression Inventory for Primary Care (BDI-PC), the Hospital Anxiety and Depression Scale (HADS), and a new scale developed by Parker and colleagues, the DMI. The BDI-PC was constructed by excluding more somatically oriented items from the existing scale, which already had a heavily cognitive emphasis. The HADS, like the Edinburgh Postnatal Depression Scale (EPDS), was constructed with an effort to avoid not only somatic items but also formal psychiatric symptoms, with item construction using deliberately colloquial language, with the intention of destigmatizing and therefore being more acceptable to nonpsychiatric patients. The logic is appealing, and as a result, both the HADS and EPDS have been widely recommended and implemented. Yet, both are as highly correlated with conventional scales as their respective reliabilities allow, and in head-tohead comparisons, neither has been shown to be consistently superior to conventional scales. The use of colloquial language may be a source of problems in that such items lack specificity and may resist precise translation, as in the case of the ‘‘butterflies in the stomach’’ item of the HADS or the ‘‘Things have been getting on top of me’’ item of the EPDS. Different cut points that are obtained in apparently similar populations have been a source of complaints about both scales,38 and it is not clear whether this is entirely due to ‘‘home-court advantage.’’ Parker and Hyett’s well-reasoned arguments for construction of scales with an exclusionist strategy have not translated into better performance. Parker’s own DMI adopted a ‘‘bottoms up’’ approach to scale construction using a medically ill population and emphasizing cognitive symptoms. It remains to be seen whether apparent advantages persist when cutoffs are fixed and tested in head-to-head comparisons in new populations.
358
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Babaei and Mitchell acknowledge that substantial physical comorbidity occurs in general medical care, and, further, that the association between physical health problems and depression is likely to be bidirectional. Physical conditions may be associated with a prolonging of depressive episodes, and untreated depression may slow functional recovery from the physical conditions. They articulate three strategies for investigating the role for somatic symptoms in making a diagnosis of depression, but also the selection of item content for screening instruments: comparing somatic items’ discrimination between healthy controls and those with major depression; comparing the somatic items’ ability to distinguish between patients with uncomplicated major depression and those with comorbid major depression and physical health problems; and comparing patients with comorbid depression and those with physical illness alone. With regard to the question of somatic symptoms in making a diagnosis of depression among populations that were not physically ill, Babaei and Mitchell found that reports of single somatic symptoms are common among nondepressed persons, but that these symptoms nonetheless have a contribution to make to both ruling in and ruling out depression. Concerning whether somatic items perform differently in physically ill populations, they cite evidence that somatic items function similarly in both populations (see also Thombs and colleagues20). Finally, they report that while individual somatic symptoms may be common among depressed and nondepressed patients with physical illness, taken together, such symptoms still are more elevated among physically ill patients with comorbid depression. Overall, Babaei and Mitchell conclude that somatic symptoms have a role in detecting and diagnosing depression among both otherwise healthy and physically ill patients. Scales constructed to exclude such symptoms may have no advantage and may actually be at a disadvantage. The remaining five chapters (Chapters 11 through 16) discuss screening for depression in specialty medical settings or among patients with specific physical illnesses. Calls for introducing routine screening in these contexts are most often based on generalizations from the evidence from primary care populations. Additionally, calls for screening are bolstered by the belief that depression and other mental disorders have a heightened prevalence in particular medical settings or populations. Many of these claims have been deflated by methodologically superior studies with representative samples and diagnoses based on semi-structured diagnostic interviews. Thus, claims by a former president of the American Psychiatric Association39 that half of all cancer patients have a psychiatric disorder stretches the definition of psychiatric disorder and are seemingly contradicted by findings that cancer patients are not more likely that other medically ill populations to be depressed.40,41 Similarly, major depression during pregnancy and postpartum has important implications for mothers, their infants, and the family, and so should be of
17 COMMENTARY AND INTEGRATION
359
particular concern, but it appears that major depression is no more common among pregnant and postpartum women than among age-matched control women.42 The inevitable deflation of such claims can lead to a backlash and a withdrawal of necessary resources for dealing with the depression that is found in these settings. Calls for screening for depression into specialty settings are often based on associations between depression and adverse health outcomes, such as reinfarctions and mortality among postmyocardial infarction patients,43 and the presumption that improving depression outcomes will yield other benefits in terms of improvements in these physical health outcomes Yet, however well intended, such claims have typically yet to be supported by treatment of depression producing demonstrable changes in physical health.20 Perhaps a failure to obtain expected gains is due to the quality of care for depression not being adequate to produce sufficient change in depression. Regardless, here too deflation of unrealistic claims may result in a diminished interest in treating depression and a withdrawal of resources needed for adequate care for depression. Enhancements in care for depression such as collaborative care that have proven necessary for screening to have its intended effect in primary care may face formidable challenges in any effort to import and sustain them in tertiary specialty medical settings. Importantly, specialty physicians may not be prepared to provide the investment in time and resources and collaboration that is expected of primary care physicians. They may see diagnosing and treating depression as beyond their competence or priorities, especially when they are better trained to deal with life-threatening specialty conditions. They may be discouraged with their limited success in making referrals for depression care and the small proportion that actually get completed. Perhaps effective enhancements of care for depression in many specialty settings will have to involve not just collaborative care, but also integration of mental health professionals into the setting who take primary responsibility for diagnosing and treating depression. Andres Kanner (Chapter 12) discusses screening in neurologic and rehabilitation settings, focusing on four major neurologic disorders: stroke, epilepsy, Parkinson’s disease, and multiple sclerosis. Some consistent themes emerge. First, depression is not only common but is likely to have a bidirectional relationship with disorders, sometimes being prodromal and emerging prior to the neurologic disorder becoming evident. Second, with the bulk of depression remaining undetected in neurology settings, Kanner advocates screening, but with some important cautions. For each of these disorders, idiosyncratic depressive symptoms associated with particular disorder can be used to make the case for specific depressive symptoms reflecting underlying neurologic deficits and allowing for the existence of depressive syndromes to varying degrees not captured in ISC-10 or DSM-IV nosology.
360
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
In a recent paper, Kanner44 asks whether neurologists should be trained to recognize depression, which seems to imply that the answer to this question is not universally agreed upon among neurologists. He explains the question needs to be asked because in the past 5 to 10 years, the residency training of neurologists has excluded any psychiatric training. He also gives research examples that demonstrate that undetected depression among neurology patients often persists, and that patient referrals for depression are often not completed. Furthermore, he cites unpublished survey data indicating that most neurologists do not screen for depression, but they would if they were confronted with convincing data that treating depression would improve their patients’ medical adherence and quality of life. Carlson and colleagues (Chapter 13) identify unique aspects of screening for depression in the cancer care setting. Screening for depression there is understood within a distress paradigm, rather than depression as a psychiatric disorder. Psychological distress has become the ‘‘sixth vital sign’’45 requiring routine assessment, preferably at every visit, according to some advocates. The bulk of the empirical work on screening for distress or depression in cancer care consists of studies of the calibration or validation of screening instruments or their acceptability. Carlson and her colleagues in Calgary and Sharpe and his colleagues in Edinburgh have systematic research programs underway evaluating the implementation of screening, but these programs are the exception in moving the field forward in the obvious next step of evaluating the benefits of screening, beyond scale performance and acceptability. There are signs that the advocacy of routine screening for distress in cancer care is not being heeded. Jacobsen and Ransom46 found only that 3 of the 15 cancer centers of the National Comprehensive Cancer Network (NCCN) that has promulgated the recommendations for routine screening had actually implemented these procedures. Mitchell and associates47 surveyed workers in cancer care in the United Kingdom and found that fewer than 10% relied on a standardized questionnaire; most preferred to rely on clinical skills or recall of the two questions of the PHQ-2. Yet, Mitchell’s48 Bayesian analyses found that that while two question were adequate in ruling out depression, more extensive interviewing is necessary to confirm a diagnosis. Implementation of a screening program in one major cancer center resulted in a shift from the population largely consisting of breast cancer patients seeking psychosocial services on their own or being referred by cancer care professionals to an increasing proportion of head and neck cancer patients49 before the program was discontinued. A positive screen warrants a discussion of a patient’s sources of distress, but many of these sources are not cancerrelated and not best addressed by cancer care professionals. Thus, a positive screen on a measure of distress that does not represent a psychiatric disorder for which there are empirically based guidelines will often nonetheless require a
17 COMMENTARY AND INTEGRATION
361
time-consuming discussion that does not lead to treatment. A false-positive screen is often not dispensed with a brief reassurance, but may require an extended discussion of nonmedical issues. Mitchell48 found that in contrast to most nurses, most oncologists were not prepared to give patients sufficient time to discuss their distress. While such issues do not rule out screening programs, they do suggest the need for considerable planning, communication with affected staff, and piloting before implementation. Garssen and de Kok50 propose that simply asking cancer patients what services they want is preferable to formal screening. Despite these challenges, there have been two promising projects demonstrating that effective care for depression can be provided to cancer patients. Dwight-Johnson and associates51 showed that a collaborative care model could be used to improve the treatment and outcome of depression among low-income Latina breast and ovarian cancer patients. The collaborative care team involved psychiatrists and bilingual master’slevel social workers who had to address numerous barriers to the women becoming engaged in effective care. Strong and colleagues52 showed that trained cancer care nurses could have similar effects n a different healthcare environment in Scotland. The nurses provided psychoeducation, problem-solving therapy, and consultation with patients’ oncologists and primary care physicians. Screening for depression may be most enthusiastically promoted in oncologic settings, but it is in perinatal settings where the most ambitious efforts have been made to implement it (Boyce and Barton, Chapter 14). Still, there is a lack of systematic data from controlled trials that screening improves depression outcomes for pregnant and postpartum women. Accumulating data suggest that implementation of screening programs meets with resistance from clinicians and women alike and does not yield a substantial increase in the uptake of treatment, much less improvements in outcomes.53,54 Many maternal care providers are uncomfortable treating pregnant women for depression without consultation,55 preferring to refer them to mental health professionals versus overseeing treatment themselves.56 The literature advocating screening of women during pregnancy and the postpartum seems to downplay the immense barriers to women obtaining uninterrupted quality care for depression during these periods. Many women abruptly terminate existing antidepressant treatment when planning to get pregnant or learning that they are pregnant,55 often without medical consultation, leaving them at a risk for relapse estimated to be 43%.57 Pregnant women may be reluctant to take medication of any kind but are particularly apprehensive about antidepressants.58 There is evidence for long-term effects of in utero exposure to antidepressant medications, but the absolute risk is considered quite low.59
362
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
Boyce and Barton review the literature, ranging from available instruments and screening practices to the paucity of evidence that screening makes any difference in clinical outcomes. Most instruments lack adequate validation in perinatal settings. Some instruments, such as the EPDS and the Postnatal Depression Screening Scale (PDSS), have appealing names suggesting particular appropriateness for perinatal settings, but that is not substantiated in headto-head comparisons with more generic screening instruments. Boyce notes that some initially appealing notions have arisen in the perinatal screening for depression literature that warrant critical scrutiny. The first, endorsed by the U.K. National Institute for Clinical Excellence (NICE),60 is that screening can be efficiently accomplished with three items, two concerning the core symptoms of depression and the third inquiring whether help is sought. The rationale is that the third question addresses the problem that many pregnant and postpartum women do not wish help, and so attention can be directed away from them with a negative response to this question. As noted above, a systematic review17 recommends against such ultra-short screening instruments except as a ruleout of further examination. The performance of the ‘‘help question’’ has not been formally evaluated. Although it might be attractive in other clinical contexts, it poses an obvious problem with pregnant or nursing women. Such women are averse to taking antidepressants. However, a personalized risk assessment with their maternal care provider might result in women deciding to initiate or resume treatment, particularly after delivery. A negative response to the ‘‘help question’’ effectively rules out such discussions. The second novel idea from Barton and Boyce is that with prevention of depression as a goal, the focus of screening should not be just for current depression, but for risk factors such as low social support. The authors dismiss this because ‘‘most risk factors have poor discriminatory power, or poor positive predictive value.’’ To these objections could be added that preventive interventions require treating many ‘‘at-risk’’ persons who will not develop the disorder anyway. Hermanns and Kulzer’s coverage of the detection and management of depression in diabetes care (Chapter 16) reiterates some points raised in the chapters on neurologic, rehabilitation, and cardiac settings but also introduces some new considerations. While good arguments can be made that the prevalence of major depression is likely high among persons with diabetes, prevalence estimates obtained with research diagnostic interviews in representative populations suggest that the comorbidity of diabetes and major depression is well within the range of other chronic medical conditions.41 There may nonetheless be a bidirectional association between major depression and diabetes,61 and particularly between major depression and diabetes control and related complications. Hermanns and Kulzer suggest that the possibility that major depression may be more common in persons with diabetes with poor
17 COMMENTARY AND INTEGRATION
363
glycemic control or serious diabetic complications poses a rationale for depression screening that specifically targets these higher-risk patient populations. In the Pathways study, Katon and colleagues62 demonstrated the effectiveness of collaborative care for improving the outcome of major depression among patients with diabetes. However, some caution should be exercised in generalizing from this study to care for diabetes in specialty settings. The study was conducted in primary care, where patients are more likely to be older and not insulin dependent. More importantly, primary care physicians are more likely than diabetologists to assume responsibility for care for depression and actively collaborate with depression care managers in the manner required by such interventions. Nonetheless, results of the Pathways study point to the possibility of ‘‘bundling’’ improvements in the care for both diabetes and major depression, with nurse specialists providing monitoring and follow-up care for both conditions. Thombs and Ziegelstein (Chapter 15) provide an overview of screening for depression in cardiovascular care. Provocative findings that depression following a myocardial infarction is an independent risk factor for reinfarction and death63 have captured the attention of behavioral medicine and mental health professionals. The authors note that a number of organizations have endorsed screening for depression, and most recently the American Heart Association (AHA) has updated recommendations and provided specific instructions on how screening should be conducted.64 They also review evidence concerning the prevalence of major depression in cardiovascular disease, the performance of screening instruments in cardiovascular care, and recommendations for evaluation and treatment of depressed patients in cardiovascular care. They note that in addition to studies of performance in primary care, a variety of instruments have been tested specifically with cardiac patients. Few instruments have been validated in more than one sample, and there is no convincing evidence of the superiority of any instrument. The authors provide strong documentation of the home-court advantage effect hinted at in studies with other populations. They ultimately fall back on the criteria of ease of administration and scoring and, consistent with a National Heart, Lung, and Blood Institute Working Group recommendation, endorse the PHQ-9 until contradicted by new data. They similarly endorse USPSTF guidelines for screening, but reiterate the necessity of resources being available for treatment and follow-up. A systematic review of screening for depression in cardiovascular settings65 coincidentally appeared at the same time as the statement by the AHA64 was released. The contrast between the systematic review and the AHA recommendations was striking. The authors of the systematic review were unable to find sufficient evidence for or against routine screening for depression to make a recommendation. Yet, they expressed concern that ‘‘the adoption of depression screening in cardiovascular care settings would likely be unduly resource-intensive and would not be likely to benefit patients in the absence of significant
364
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
changes in current models of care.’’ Yet, the AHA statement declared that the opportunity ‘‘should not be missed’’ to screen for depression in patients with cardiovascular disease across the settings in which they are treated. Furthermore, qualified professionals should follow up with patients who screen positive and monitor their treatment. The statement concludes by noting that ‘‘Coordination of care between healthcare providers is essential in patients with combined medical and mental health diagnoses.’’ Yet, what evidence is there that introduction of screening will improve cardiac outcomes, or for that matter, even depression outcomes as hoped? What evidence is there that cardiologists are willing or able to diagnosis or initiate treatment of depression, and if they are not up to these tasks, where are the mental health professionals in cardiac care to fill in? Taken seriously, the AHA statement seems to call implicitly for an integrated system of care for depression that shows no sign of being developed within cardiac care settings, yet there is no recognition expressed in the statement that such a fundamental reorganization of care is needed or possible.
Integration: Deflating the Puffer Phenomenon and Making the Case Against Screening Goldner66 has described the puffer phenomenon in psychiatric epidemiology, a spurious initial inflation of prevalence or incidence rates that is corrected in later studies that are more methodologically sophisticated and draw on more representative samples. Puffer phenomena in screening studies are not limited to exaggerated estimates of prevalence, however, but extend to estimates of unmet clinical need, the accuracy and efficiency of screening instruments, the acceptability of screening to clinicians and patients, the resulting uptake of clinical services, the effectiveness with which those services will be delivered, and, ultimately, the yield in terms of improved patient outcomes. The various chapters in this book and the material introduced in the present chapter provide ample demonstration of the pervasiveness of puffer phenomena in making the case for routine screening of depression. Exaggerated estimates of the potential benefits of screening begin with estimates of rates of clinically significant depression produced by lay clinical interviews conducted in the community and clinical settings. That much of the depression identified in this way is also found to be undetected and untreated at least in part reflects on the validity of these estimates, not just well-documented inadequacies of routine care for depression. Advocates of screening further inflate estimates of the presumed prevalence of depression by expanding the range of depressive phenomena to include elevated scores on depression scales, various ill-defined subclinical states, and minor depression. The question needs to be refocused on not whether such
17 COMMENTARY AND INTEGRATION
365
conditions are associated with impairment, but whether they are effectively and best addressed in general medical settings, and with the most likely intervention offered there, prescription of an antidepressant. Repeatedly seen across chapters were claims of the superiority of particular screening instruments, usually based on findings in a single sample that are not replicated in subsequent samples. In contrast, the message of this volume might be that cut points that are not cross-validated should be disallowed. Overall, across general medical populations there is little evidence of the superiority of any particular instrument and little support for the intuitively appealing notion that an instrument with somatically oriented items removed will perform better than a conventional instrument that includes them. Taking an overview of the large literature on performance of screening instruments, one gets a sense of the difficulty of inferring from estimates of sensitivity and specificity how instruments will perform with the prevalence of depression found in particular populations. Assuming the prevalence of clinically significant depression is 20% or more rather than the more realistic 9% to 12% can yield markedly distorted estimates of false positives and false negatives. The bulk of the empirical literature concerning screening does not stop with evidence-based estimates of the performance of screening instruments or comparisons between the instruments and rates of unassisted detection of depression by primary care physicians, but rather proceeds to project how introduction of screening will improve detection and promote treatment of depression. Missing from these inflated estimates, however, is any consideration of whether physicians would actually offer treatment of depression to otherwise undetected cases of depression and whether these physicians are registering in their ‘‘nondetection’’ that they do not consider such depression as appropriate for treatment or that patients would accept treatment. The large gaps that are reported between rates of detection under naturalistic conditions and actual treatment rates should give pause to anyone assuming that rates of undetected depression necessarily represent missed opportunities for effective intervention. The final missing bit of evidence from most enthusiastic claims for screening are data suggesting that detected cases of depression do better than undetected cases, or even that the treatment offered to detected patients will be adequate and appropriate and lead to improved outcomes. Available assessments of the quality of routine care for depression in general medical settings are a cause for great pessimism. It is unfortunate that the strong sentiment in favor of routine screening for depression in general medical settings is such that any expression of skepticism is held to a higher burden of proof than unsubstantiated claims for its benefits.67 Skepticism is countered quickly by its contradiction from practice guidelines and their advocates and ‘‘everybody knows’’ clinical wisdom. Nonetheless, the basic data seem obvious in their implication. Introduction of screening without
366
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
substantial resources to enhance routine care does not appreciably improve outcomes, and analyses of the literature concerning enhancement of care for depression in primary care do not indicate an independent contribution to any screening component. Furthermore, introduction of well-resourced collaborative care interventions, with or without screening, still produces only modest improvements in the quality of care that often are not sustained beyond the implementation phase.68 The presumed contribution of screening needs to be evaluated in the context of rapidly evolving conditions in the larger clinical and community environment. Most importantly, one must consider the escalating rates of prescription of antidepressants, often exceeding reasonable estimates of the prevalence of depression, and with much of this medication being prescribed to persons who are not depressed. Finally, there is persistent evidence that the intensity of treatment of depression in the community is too inadequate to yield clinically significant improvements in depression outcomes. Are there risks to implementing routine screening for depression? First, there is the risk that screening will consume scarce resources and aggravate existing problems. More patients will inappropriately be prescribed antidepressants, with the already nonspecific diagnostic and prescribing practices of physicians increased if patients are identified by screening instruments as depressed that physicians and the patients themselves would not identify. The resources consumed by these ineffective efforts might be at the expense of already poor monitoring and follow-up of patients who are already known to be depressed. All of these problems would be compounded in specialty settings where physicians are less inclined and less prepared to diagnose and treat depression and where pressing, well-defined medical issues compete with depression care. Should we do nothing if we do not introduce routine screening for depression? The most urgent tasks are to correct existing problems, namely that depression is so inadequately treated in the primary care setting. Patten69 has produced mathematical models that strongly indicate that improvements in the outcome of depression for patients who are already being treated is more cost-effective than introducing more patients into this inadequate treatment. Misplaced confidence in the power of screening alone to affect depression outcomes will delay recognition of this problem and divert existing resources from its solution.
References 1. Pignone MP, Gaynes BN, Rushton JL, et al. Screening for depression in adults: a summary of the evidence for the U.S. Preventive Services Task Force. Ann Intern Med. 2002;136:765–776. 2. Mojtabai R, Olfson M. National patterns in antidepressant treatment by psychiatrists and general medical providers: Results from the National Comorbidity Survey Replication. J Clin Psychiatry. 2008;69:1064–1074.
17 COMMENTARY AND INTEGRATION
367
3. Fernandez A, Haro JM, Martinez-Alonso M, et al. Treatment adequacy for anxiety and depressive disorders in six European countries. Br J Psychiatry. 2007;190:172–173. 4. Berndt ER, Bir A, Busch SH, et al. The medical treatment of depression, 1991–1996: productive inefficiency, expected outcome variations, and price indexes. J Health Economics. 2002;21:373–396. 5. Berardi D, Menchetti M, Cevenini N, et al. Increased recognition of depression in primary care—Comparison between primary-care physician and ICD-10 diagnosis of depression. Psychotherapy and Psychosomatics. 2005;74:225–230. 6. Esposito E, Wang JL, Adair CE, et al. Frequency and adequacy of depression treatment in a Canadian population sample. Can J Psychiatry. 2007;52:780–789. 7. Mojtabai R. Increase in antidepressant medication in the US adult population between 1990 and 2003. Psychotherapy and Psychosomatics. 2008;77:83–92. 8. Beck CA, Patten SB, Williams JVA, et al. Antidepressant utilization in Canada. Social Psychiatry Psychiatr Epidemiol. 2005;40:799–807. 9. Kessler RC, Merikangas KR, Wang PS. Prevalence, comorbidity, and service utilization for mood disorders in the United States at the beginning of the twenty-first century. Ann Rev Clin Psychol. 2007;3:137–158. 10. Depression Guideline Panel (1993). Depression in primary care: Vol. 2. Treatment of major depression (Clinical Practice Guideline No. 5, AHCPR Publication No. 93–0551). Rockville, MD: Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research. 11. Patten SB. Major depression prevalence is very high, but the syndrome is a poor proxy for community populations’ clinical treatment needs. Can J Psychiatry. 2008;53:411–418. 12. Brugha TS, Jenkins R, Taub N, et al. A general population comparison of the Composite International Diagnostic Interview (CIDI) and the Schedules for Clinical Assessment in Neuropsychiatry (SCAN). Psychol Med. 2001;31:1001–1013. 13. Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report version of PRIME-MD—The PHQ primary care study. JAMA. 1999;282:1737–1744. 14. Norton J, De Roquefeuil G, Boulenger JP, et al. Use of the PRIME-MD Patient Health Questionnaire for estimating the prevalence of psychiatric disorders in French primary care: comparison with family practitioner estimates and relationship to psychotropic medication use. Gen Hosp Psychiatry. 2007;29:285–293. 15. Bermejo I, Frey C, Kriston L, et al. Stability of the effects of guideline training in primary care on the identification of depressive disorders. Primary Care & Community Psychiatry. 2007;12:99–107. 16. Kessler RC, Wang PS. The descriptive epidemiology of commonly occurring mental disorders in the United States. Ann Rev Public Health. 2008;29:115–129. 17. Mitchell AJ, Coyne JC. Do ultra-short screening instruments accurately detect depression in primary care? A pooled analysis and meta-analysis of 22 studies. Br J Gen Practice. 2007;57:144–151. 18. Bennett IM, Coco A, Coyne JC, et al. Efficiency of a two-item pre-screen to reduce the burden of depression screening in pregnancy and postpartum: An IMPLICIT network study. J Am Board Family Med. 2008;21:317–325. 19. Williams JW, Pignone M, Ramirez G, et al. Identifying depression in primary care: a literature synthesis of case-finding instruments. Gen Hosp Psychiatry. 2002;24:225–237. 20. Thombs BD, Fuss S, Hudson M, et al. High rates of depressive symptoms among patients with systemic sclerosis are not explained by differential reporting of somatic symptoms. Arthritis Rheumatism. 2008;59:431–437.
368
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
21. Gilbody S, Whitty P, Grimshaw J, et al. Educational and organizational interventions to improve the management of depression in primary care—A systematic review. JAMA. 2003;289:3145–3151. 22. Hodges B, Inch C, Silver I. Improving the psychiatric knowledge, skills, and attitudes of primary care physicians, 1950–2000: A review. Am J Psychiatry. 2001;158:1579–1586. 23. Valenstein M, Dalack G, Blow F, et al. Screening for psychiatric illness with a combined screening and diagnostic instrument. J Gen Intern Med. 1997;12:679–685. 24. Spitzer RL, Williams JBW, Kroenke K, et al. Utility of new procedure for diagnosing mental disorders in primary care—the PRIME-MD-1000 study. JAMA. 1994;272:1749–1756. 25. Lewis G, Sharp D, Bartholomew J, et al. Computerized assessment of common mental disorders in primary care: Effect on clinical outcome. Family Practice. 1996;13:120–126. 26. Magruder-Habib K, Zung WWK. Improving physicians’ recognition and treatment of depression in general medical care—Results from a randomized clinical trial. Medical Care. 1990;28:239–250. 27. Swindle RW, Rao JK, Helmy A, et al. Integrating clinical nurse specialists into the treatment of primary care patients with depression. Int J Psychiatry Med. 2003;33:17–37. 28. Borowsky SJ, Rubenstein LV, Meredith LS, et al. Who is at risk of nondetection of mental health problems in primary care? J Gen Intern Med. 2000;15:381–388. 29. Demyttenaere K, Bruffaerts R, Posada-Villa J, et al. Prevalence, severity, and unmet need for treatment of mental disorders in the World Health Organization World Mental Health Surveys. JAMA. 2004;291:2581–2590. 30. Schwenk TL, Coyne JC, Fechner-Bates S. Differences between detected and undetected patients in primary care and depressed psychiatric patients. Gen Hosp Psychiatry. 1996;18:407–415. 31. Coyne JC, Schwenk TL, Fechnerbates S. Nondetection of depression by primary care physicians reconsidered. Gen Hosp Psychiatry. 1995;17:3–12. 32. Von Korff M. Case definitions in primary care—the need for clinical epidemiology. Gen Hosp Psychiatry. 1992;14:293–295. 33. Gibbons RD, Weiss DJ, Kupfer DJ, et al. Using computerized adaptive testing to reduce the burden of mental health assessment. Psychiatr Serv. 2008;59:361–368. 34. Santor DA, Coyne JC. Shortening the CES-D to improve its ability to detect cases of depression. Psychol Assess. 1997;9:233–243. 35. Wells KB, Sherbourne C, Schoenbaum M, et al. Impact of disseminating quality improvement programs for depression in managed primary care—A randomized controlled trial. JAMA. 2000;283:212–220. 36. Gilbody S, Sheldon T, House A. Screening and case-finding instruments for depression: a meta-analysis. Can Med Assoc J. 2008;178:997–1003. 37. Unick GJ, Shumway M, Hargreaves W. Are we ready for computerized adaptive testing? Psychiatr Serv. 2008;59:369–369. 38. Matthey S, Henshaw C, Elliott S, et al. Variability in use of cut-off scores and formats on the Edinburgh Postnatal Depression Scale—implications for clinical and research practice. Arch Womens Mental Health. 2006;9:309–315. 39. Riba M. Identifying depression, distress, & anxiety in cancer patients [Online]. Physicians Weekly [serial online] 1997 [cited 2005 Jul 05]; 22(28) [4 screens] Available from: http:// www.physweekly.com/article.asp?IssueID=260&ArticleID=2426. 40. Coyne JC, Palmer SC, Shapiro PJ, et al. Distress, psychiatric morbidity, and prescriptions for psychotropic medication in a breast cancer waiting room sample. Gen Hosp Psychiatry. 2004;26:121–128.
17 COMMENTARY AND INTEGRATION
369
41. Evans DL, Charney DS, Lewis L, et al. Mood disorders in the medically ill: Scientific review and recommendations. Biol Psychiatry. 2005;58:175–189. 42. Vesga-Lopez O, Blanco C, Keyes K, et al. Psychiatric disorders in pregnant and postpartum women in the United States. Arch Gen Psychiatry. 2008;65:805–815. 43. Frasure-Smith N, Lesperance F. Reflections on depression as a cardiac risk factor. Psychosom Med. 2005;67:S19–S25. 44. Kanner AM. Should neurologists be trained to recognize and treat comorbid depression of neurologic disorders? Yes. Epilepsy & Behavior. 2005;6:303–311. 45. Bultz BD, Carlson LE. Emotional distress: The sixth vital sign in cancer care. J Clin Oncol. 2005;23:6440–6441. 46. Jacobsen PB, Ransom, S. Implementation of NCCN distress management guidelines by member institutions. Journal of the National Comprehensive Cancer Network. 2007;5:99–103. 47. Mitchell A, Kaar S, Coggan C, et al. Acceptability of common screening methods used to detect distress and related mood disorders—preferences of cancer specialists and non-specialists. Psychooncology. 2008;17:226–236. 48. Mitchell AJ. Are one or two simple questions sufficient to detect depression in cancer and palliative care? A Bayesian meta-analysis. Br J Cancer. 2008;98:1934–1943. 49. Zabora JR, Diaz L, Loscalzo MJ, et al. Psychosocial screening goes mainstream: a prospective problem-solving system as an essential element of comprehensive cancer care: background and rationale. Psychooncology. 2003;12(Suppl. 4):S71. 50. Garssen B, de Kok E. How useful is a screening instrument? Psychooncology. 2008;17:726–728. 51. Dwight-Johnson M, Ell K, Lee PJ. Can collaborative care address the needs of lowincome Latinas with comorbid depression and cancer? Results from a randomized pilot study. Psychosomatics. 2005;46:224–232. 52. Strong V, Waters R, Hibberd C, et al. Management of depression for people with cancer 4 (SMaRT oncology 1): a randomised trial. Lancet. 2008;372:40–48. 53. Mitchell AJ, Coyne JC. Screening for postnatal depression: Barriers to success. Br J Obstet Gynaecol. 2009;116:11–14. 54. Von Ballestrem CL, Strauss M, Kachele H. Contribution to the epidemiology of postnatal depression in Germany—implications for the utilization of treatment. Arch Womens Mental Health. 2005;8:29–35. 55. Wisner KL, Zarin DA, Holmboe ES, et al. Risk-benefit decision making for treatment of depression during pregnancy. Am J Psychiatry. 2000;157:1933–1940. 56. Dietrich AJ, Williams JW, Ciotti MC, et al. Depression care attitudes and practices of newer obstetrician-gynecologists: A national survey. Am J Obstet Gynecol. 2003;189:267–273. 57. Cohen LS, Altshuler LL, Harlow BL, et al. Relapse of major depression during pregnancy in women who maintain or discontinue antidepressant treatment. JAMA. 2006;295:499–507. 58. Sleath B, West S, Tudor G, et al. Ethnicity and depression treatment preferences of pregnant women. J Psychosom Obstetr Gynecol. 2005;26:135–140. 59. Alwan S, Reefhuis J, Rasmussen SA, et al. Use of selective serotonin-reuptake inhibitors in pregnancy and the risk of birth defects. N Engl J Med. 2007;356:2684–2692. 60. National Institute for Clinical Excellence. Depression: core interventions in the management of depression in primary and secondary care. London: HMSO, 2004. 61. Rubin RR, Peyrot M. Was Willis right? Thoughts on the interaction of depression and diabetes. Diabetes Metab Res Rev. 2002;18:173–175.
370
SCREENING FOR DEPRESSION IN CLINICAL PRACTICE
62. Katon WJ, Von Korff M, Lin EHB, et al. The Pathways study—A randomized trial of collaborative care in patients with diabetes and depression. Arch Gen Psychiatry. 2004;61:1042–1049. 63. Frasure-Smith N, Lesperance F, Talajic M. Depression following myocardial infarction—impact on 6-month survival. JAMA. 1993;270:1819–1825. 64. Lichtman JH, Bigger JT, Blumenthal JA, et al. Depression and coronary heart disease: Recommendations for screening, referral, and treatment. Circulation. 2008;118:1768–1775. 65. Thombs BD, de Jonge P, Coyne JC, et al. Depression screening and patient outcomes in cardiovascular care: a systematic review. JAMA. 2008;300:2161–2171. 66. Goldner EM. Is it time to revise our understanding and management of depression? Can J Psychiatry. 2008;53:409–410. 67. Palmer SC, Coyne JC. Screening for depression in medical care—Pitfalls, alternatives, and revised priorities. J Psychosom Res. 2003;54:279–287. 68. Gilbody S, Bower P, Fletcher J, et al. Collaborative care for depression—A cumulative metaanalysis and review of longer-term outcomes. Arch Intern Med. 2006;166:2314–2321. 69. Patten SB. A framework for describing the impact of antidepressant medications on population health status. Pharmacoepidemiology and Drug Safety. 2002;11:549–559.
Appendix Table AP.1. Symptoms of Depression from 11 Popular Scales (Ordered by Frequency of Symptom) Symptom Reference Classic Scales (problem with . . .) ICD-10 HAM- BDI-II Zung CES-D MADRS GDS-15 (MDI) D-21
New Scales
HADS EPDS
MOS-8
DSM-IV (PHQ9)
Low mood
Yes
Yes
Yes (sadness)
Yes (blue)
Yes
Yes (sadness)
Yes
No
Yes
Yes
Yes
Sleep disturbance
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Interest/ pleasure
Yes
No
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Energy Thoughts of death or self-harm Agitation
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes No
No Yes
Yes No
No No
No Yes
No No
Yes Yes
Yes
Yes
Yes
Yes
No
No
Yes
No
No
Yes
Confidence/ self-esteem Guilt
Yes
No
Yes
Yes
No
No
Yes
Yes
No
Yes
Yes (worthless) No
No
Yes
Yes (worthless) No
Yes (tension) No
No
No
Yes
No
Yes
Yes
Yes
Yes
No
No
Yes (blame) No
Yes (worthless) Yes
No
Yes
Yes No No
Yes No Yes
Yes Yes No
No Yes No
No Yes Yes
No No No
No No Yes
Yes No Yes
No Yes Yes
No Yes No
Yes No No
Concentration/ indecisiveness Retardation Crying Anxiety/ fearful
(continued)
Table AP.1. (Continued) Symptom (problem with . . .)
Reference
Classic Scales
New Scales
ICD-10 (MDI)
HAMD-21
BDIII
Zung
CES-D
MADRS
GDS15
HADS
EPDS
MOS-8
DSM-IV (PHQ9)
Appetite
Yes
No
Yes
No
Yes
No
No
No
No
No
Yes
Hope lessness
No
No
Yes
Yes*
Yes*
No
No
Yes*
No
No
No
Irritability
No
No
Yes
Yes
Yes (bothered)
No
No
No
No
No
No
Loss libido
No
Yes
Yes
Yes
No
No
No
No
No
No
No
Lassitude Weight change
No No
No No
No No
Yes Yes
Yes No
Yes No
No No
No No
No No
No No
No Yes
Sense humor/ laughter Activities, work
No
No
No
No
No
No
No
Yes
Yes
No
No
No
Yes
No
No
No
No
Yes
No
No
No
No
Diurnal mood variation Satisfaction/ quality of life
No
Yes
No
Yes
No
No
No
No
No
No
No
No
No
No
Yes
No
No
Yes*
No
No
No
No
Helplessness Punishment feelings
No No
No No
No Yes
No No
No No
No No
Yes No
No No
No No
No No
No No
Loneliness Difficulty coping
No No
No No
No No
No No
Yes No
No No
No No
No No
No Yes
No No
No No
Constipation
No
No
No
Yes
No
No
No
No
No
No
No
*Reverse keyed or alternate wording.
Table AP.2. Somatic and Nonsomatic Symptoms of Depression from 11 Popular Mood Scales Symptom Reference Classic Scales (problem with. . .) ICD-10 HAMBDI-II Zung CES- MADRS (MDI) D-21 D
New Scales
GDS-15
HADS EPDS
MOS-8
DSM-IV (PHQ9)
Sleep disturbance Energy Agitation
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes Yes
Yes Yes
Yes Yes
Yes Yes
Yes No
Yes No
No Yes
No No
No No
Yes Yes
Concentration/ indecisiveness Retardation Appetite Loss libido Lassitude Weight change Activities, work Diurnal mood variation Constipation Low mood
Yes
No
Yes
Yes
Yes
No Yes (tension) Yes
No
No
No
No
Yes
Yes Yes No No No No No
Yes No Yes No No Yes Yes
Yes Yes Yes No No No No
No No Yes Yes Yes No Yes
No Yes No Yes No No No
No No No Yes No No No
No No No No No Yes No
Yes No No No No No No
No No No No No No No
No No No No No No No
Yes Yes No No Yes No No
No Yes
No Yes
No No
No Yes
No Yes
No Yes
No Yes
No Yes (sadness) Yes Yes
No Yes
Yes Yes
Yes Yes (blue) Yes Yes
No Yes
Interest/pleasure Thoughts of death or self-harm
No Yes (sadness) Yes Yes
No No
Yes No
Yes Yes
Yes No
Yes Yes
No No
(Continued )
Table AP.2. (Continued) Symptom (problem with. . .)
Reference
Classic Scales
New Scales
ICD-10 (MDI)
HAMD-21
BDI-II
Zung
CES-D
MADRS
GDS-15
HADS EPDS
MOS-8
DSM-IV (PHQ9)
Confidence/ self-esteem Guilt
Yes
No
Yes
Yes
No
No
No
Yes
Yes
No
Yes
Yes (worthless) No
No
Yes
Yes (worthless) No
No
No
Crying Anxiety/fearful Hopelessness Irritability
No No No No
No Yes No No
Yes No Yes Yes
Yes No Yes* Yes
No No No No
No Yes No No
No Yes Yes* No
Yes No No No
No No No No
Sense humor/ laughter Satisfaction/quality of life
No
No
No
No
Yes Yes Yes* Yes (bothered) No
Yes (blame) Yes Yes No No
Yes (worthless) Yes
No
No
Yes
Yes
No
No
No
No
No
Yes
No
No
Yes*
No
No
No
No
No
No
No
No
No
No
Yes
No
No
No
No
No
No
Yes
No
No
No
No
No
No
No
No
No No 1.2
No No 1.75
No No 0.77
No No 1.125
Yes No 0.71
No No 1
No No 0.4
No No 0.5
No Yes 0.125
No No 0.33
No No
#3
#1
Neutral Neutral
#4
Neutral Neutral
Neutral Neutral
#3
#4
#1
#2
Helplessness Punishment feelings Lonely Difficulty coping Somatic/ Psychological Ratio: Most Somatic Scale Most Psychological Scale
Table AP.3. Statistical Summary of Accuracy from Hypothetical Single-Step Diagnostic Tests (n = 1,000) Depressed (n)
TP
FN
Nondepressed (n)
TN
FP
PPV
NPV
PSI
Youden
UIþ
UI
FC
10 20 30 40 50 10 10 10 20 20 20 40
900 900 900 900 900 900 900 900 900 900 900 900
810 720 630 540 450 720 630 540 810 630 540 720
90 180 270 360 450 180 270 360 90 270 360 180
0.50 0.31 0.21 0.14 0.10 0.33 0.25 0.20 0.47 0.23 0.18 0.25
0.99 0.97 0.95 0.93 0.90 0.99 0.98 0.98 0.98 0.97 0.96 0.95
0.49 0.28 0.16 0.07 0 0.32 0.23 0.18 0.45 0.20 0.15 0.20
0.80 0.60 0.40 0.20 0 0.70 0.60 0.50 0.70 0.50 0.40 0.40
0.45 0.25 0.14 0.09 0.05 0.30 0.23 0.18 0.38 0.18 0.15 0.15
0.89 0.78 0.67 0.56 0.45 0.79 0.69 0.59 0.88 0.68 0.58 0.76
0.90 0.80 0.70 0.60 0.50 0.81 0.72 0.63 0.89 0.71 0.62 0.78
20 40 60 80 100 20 20 20 40 40
800 800 800 800 800 800 800 800 800 800
720 640 560 480 400 640 560 480 720 560
80 160 240 320 400 160 240 320 80 240
0.69 0.50 0.37 0.27 0.20 0.53 0.43 0.36 0.67 0.40
0.97 0.94 0.90 0.86 0.80 0.97 0.97 0.96 0.95 0.93
0.67 0.44 0.27 0.13 0 0.50 0.39 0.32 0.61 0.33
0.80 0.60 0.40 0.20 0 0.70 0.60 0.50 0.70 0.50
0.62 0.40 0.26 0.16 0.10 0.48 0.39 0.32 0.53 0.32
0.88 0.75 0.63 0.51 0.40 0.78 0.68 0.58 0.85 0.65
0.90 0.80 0.70 0.60 0.50 0.82 0.74 0.66 0.88 0.72
Prevalence 0.10 (Sensitivity: Specificity) Single Step 90:90 100 90 Single Step 80:80 100 80 Single Step 70:70 100 70 Single Step 60:60 100 60 Single Step 50:50 100 50 Single Step 90:80 100 90 Single Step 90:70 100 90 Single Step 90:60 100 90 Single Step 80:90 100 80 Single Step 80:70 100 80 Single Step 80:60 100 80 Single Step 60:80 100 60 Prevalence 0.20 (Sensitivity: Specificity) Single Step 90:90 200 180 Single Step 80:80 200 160 Single Step 70:70 200 140 Single Step 60:60 200 120 Single Step 50:50 200 100 Single Step 90:80 200 180 Single Step 90:70 200 180 Single Step 90:60 200 180 Single Step 80:90 200 160 Single Step 80:70 200 160
(Continued )
Table AP.3. (Continued) Depressed (n)
TP
FN
Nondepressed (n)
TN
FP
PPV
NPV
PSI
Youden
UIþ
UI
FC
200 200
160 120
40 80
800 800
480 640
320 160
0.33 0.43
0.92 0.89
0.26 0.32
0.40 0.40
0.27 0.26
0.55 0.71
0.64 0.76
Prevalence 0.50 (Sensitivity: Specificity) Single Step 90:90 500 450 Single Step 80:80 500 400 Single Step 70:70 500 350 Single Step 60:60 500 300 Single Step 50:50 500 250 Single Step 90:80 500 450 Single Step 90:70 500 450 Single Step 90:60 500 450 Single Step 80:90 500 400 Single Step 80:70 500 400 Single Step 80:60 500 400 Single Step 60:80 500 300
50 100 150 200 250 50 50 50 100 100 100 200
500 500 500 500 500 500 500 500 500 500 500 500
450 400 350 300 250 400 350 300 450 350 300 400
50 100 150 200 250 100 150 200 50 150 200 100
0.90 0.80 0.70 0.60 0.50 0.82 0.75 0.69 0.89 0.73 0.67 0.75
0.90 0.80 0.70 0.60 0.50 0.89 0.88 0.86 0.82 0.78 0.75 0.67
0.80 0.60 0.40 0.20 0 0.71 0.63 0.55 0.71 0.51 0.42 0.42
0.80 0.60 0.40 0.20 0 0.70 0.60 0.50 0.70 0.50 0.40 0.40
0.81 0.64 0.49 0.36 0.25 0.74 0.68 0.62 0.71 0.58 0.53 0.45
0.81 0.64 0.49 0.36 0.25 0.71 0.61 0.51 0.74 0.54 0.45 0.53
0.90 0.80 0.70 0.60 0.50 0.85 0.80 0.75 0.85 0.75 0.70 0.70
Single Step 80:60 Single Step 60:80
TP, true positive; FN, false negative; TN, true negative; FP, false positive; PPV, positive predictive value; NPV, negative predictive value; PSI, predictive summary index; UI, utility index; FC, fraction correct.
Table AP.4. Statistical Summary of Accuracy from Hypothetical Two-Step (Algorithm) Diagnostic Tests (n = 1,000) Depressed (n) TP FN Nondepressed TN FP PPV NPV NNP+PPV-1 Youden UI+ UI FC (PSI) Prevalence 0.10 – Step i. Sensitivity: Specificity; Step ii. Sensitivity: Specificity Combined i.90:90 ii.90:90 100 81 19 900 891 9 0.90 Combined i.80:80 ii.90:90 100 72 28 900 882 18 0.80 Combined i.80:80 ii.80:80 100 64 36 900 864 36 0.64 Combined i.70:70 ii.70:70 100 49 51 900 819 81 0.38 Combined i.60:60 ii.60:60 100 36 64 900 756 144 0.20 Combined i.80:90 ii.80:90 100 64 36 900 891 9 0.88 Combined i.80:70 ii.90:60 100 72 28 900 792 108 0.40 Combined i.90:60 ii.80:70 100 72 28 900 792 108 0.40 Combined i.80:70 ii.60:90 100 48 52 900 873 27 0.64 Combined i.60:90 ii.80:70 100 48 52 900 873 27 0.64
0.98 0.97 0.96 0.94 0.92 0.96 0.97 0.97 0.94 0.94
0.88 0.77 0.60 0.32 0.12 0.84 0.37 0.37 0.58 0.58
0.80 0.70 0.60 0.40 0.20 0.63 0.60 0.60 0.45 0.45
0.73 0.58 0.41 0.18 0.07 0.56 0.29 0.29 0.31 0.31
0.97 0.95 0.92 0.86 0.77 0.95 0.85 0.85 0.92 0.92
0.97 0.95 0.93 0.87 0.79 0.96 0.86 0.86 0.92 0.92
Prevalence 0.20 – Step i. Sensitivity: Specificity; Step ii. Sensitivity: Specificity Combined i.90:90 ii.90:90 200 81 119 800 792 8 0.91 Combined i.80:80 ii.90:90 200 72 128 800 784 16 0.82 Combined i.80:80 ii.80:80 200 64 136 800 768 32 0.67 Combined i.70:70 ii.70:70 200 49 151 800 728 72 0.40 Combined i.60:60 ii.60:60 200 36 164 800 672 128 0.22 Combined i.80:90 ii.80:90 200 64 136 800 792 8 0.89 Combined i.80:70 ii.90:60 200 72 128 800 704 96 0.43 Combined i.90:60 ii.80:70 200 72 128 800 704 96 0.43 Combined i.80:70 ii.60:90 200 48 152 800 776 24 0.67 Combined i.60:90 ii.80:70 200 48 152 800 776 24 0.67
0.87 0.86 0.85 0.83 0.80 0.85 0.85 0.85 0.84 0.84
0.78 0.68 0.52 0.23 0.02 0.74 0.27 0.27 0.50 0.50
0.40 0.34 0.28 0.16 0.02 0.31 0.24 0.24 0.21 0.21
0.37 0.29 0.21 0.10 0.04 0.28 0.15 0.15 0.16 0.16
0.86 0.84 0.82 0.75 0.68 0.84 0.74 0.74 0.81 0.81
0.87 0.86 0.83 0.78 0.71 0.86 0.78 0.78 0.82 0.82
(Continued )
Table AP.4. (Continued) Depressed (n) TP FN Nondepressed TN FP PPV NPV NNP+PPV-1 Youden UI+ UI FC (PSI) Prevalence 0.50 – Step i. Sensitivity: Specificity; Step ii. Sensitivity: Specificity Combined i.90:90 ii.90:90 500 405 95 500 495 Combined i.80:80 ii.90:90 500 360 140 500 490 Combined i.80:80 ii.80:80 500 320 180 500 480 Combined i.70:70 ii.70:70 500 245 255 500 455 Combined i.60:60 ii.60:60 500 180 320 500 492 Combined i.80:90 ii.80:90 500 320 180 500 495 Combined i.80:70 ii.90:60 500 360 140 500 440 Combined i.90:60 ii.80:70 500 360 140 500 440 Combined i.80:70 ii.60:90 500 240 260 500 485 Combined i.60:90 ii.80:70 500 240 260 500 485
5 10 20 45 8 5 60 60 15 15
0.99 0.97 0.94 0.84 0.96 0.98 0.86 0.86 0.94 0.94
0.84 0.78 0.73 0.64 0.61 0.73 0.76 0.76 0.65 0.65
0.83 0.75 0.67 0.49 0.56 0.72 0.62 0.62 0.59 0.59
0.80 0.70 0.60 0.40 0.34 0.63 0.60 0.60 0.45 0.45
0.80 0.70 0.60 0.41 0.34 0.63 0.62 0.62 0.45 0.45
0.83 0.76 0.70 0.58 0.60 0.73 0.67 0.67 0.63 0.63
0.90 0.85 0.80 0.70 0.67 0.82 0.80 0.80 0.73 0.73
TP, true positive; FN, false negative; TN, true negative; FP, false positive; PPV, positive predictive value; NPV, negative predictive value; PSI, predictive summary index; UI, utility index; FC, fraction correct.
N = 1000
Input data:
Unselected Population
N=100 Se 80Sp 80
Depression
n = 100
No Depression
n = 900
Sp 90%
Screening method #1
Sp 90%
Screen #1 +ve
Screen #1 –ve
PPV 50%
NPV 98.8%
TP = 90
Possible case
FP = 90
TN = 810
Possible non-case
FN = 10
Offered & Accept Treatment 50%
Adequate Treatment TP = 45
Inappropriate Treatment FP = 45
Cumulative Yield (recognition)
Unmet Needs FN = 10
TN = 810 TP = 90
Cumulative Yield (treatment)
TN = 855 TP = 45
TP = 45
FP = 90 FN = 10
FP = 45 FN = 50
No Unmet Needs TN = 810
FP = 45
Se 90%
PPV 50%
Sp 90%
NPV 99%
Se 45%
PPV 50%
Sp 45%
NPV 94%
Figure AP.1. Low-risk single screen yield. In this scenario 50% of those who screen positive are actually depressed, although 99% of those who screen negative are not depressed. 9% of the sample are possible false alarms or an ‘‘excess diagnostic burden.’’ Assuming 50% of those identified accept treatment, then 45% of depressed patients receive adequate treatment, but an equivalent raw number of nondepressed subjects receive inappropriate treatment. 55% of depressed individuals have unmet needs and 855 have no unmet needs.
379
Input data:
N = 1000
All Stroke Patients
N=1000 Prevalence 90% Step 1: Se 80 Sp 70 Step 2: Se 60 Sp 91
n = 100
Depression
n = 900
No Depression
Sp 90%
PHQ2
(Q1 or Q2 +ve)
Screen #1 +ve
Sp 90%
Screen #1 –ve
PPV 50%
NPV 98.8%
TP = 90
Possible case
FP = 90
TN = 810
Possible non-case
FN = 10
TP = 90
HADS-D
(9v10)
Screen #1 +ve
Screen #2 –ve
FP = 90
PPV 50%
NPV 98.8%
Probable Depression
TP = 48 FP = 24
TN = 246
Possible non-case
FN = 32
Offered & Accept Treatment 50%
Adequate Treatment TP = 12
Inappropriate Treatment FP = 6
Unmet Needs FN = 32
TN = 876
Cumulative Yield (recognition)
Cumulative Yield (treatment)
TP = 48
TN = 876 TP = 12
No Unmet Needs
TP = 36
FP = 24 FN = 32
FP = 24 FN = 68
TN = 876
FP = 18
Se 48%
PPV 66%
Sp 97%
NPV 94%
Se 12%
PPV 33%
Sp 97%
NPV 72%
Figure AP.2. Low-risk two-step screen yield. In this scenario the first step screen yields a positive predictive value (PPV) of only 23%, but adding the second step improves this to 66%. However, if only one in four of those identified are offered and accept treatment, then the treatment yield is weakened: effectively, the PPV becomes 33% and the negative predictive value 72%.
380
N = 1000
Selected Population n = 200
Depression
No Depression
n = 800
Se 80%
Screening Method #1
Sp 90%
Screen #1 Screen #1 +ve –ve PPV 67%
NPV 95%
TP = 160
Possible case
Possible non-case
FP = 80
Want Help Screening Method 2 Screen #2 +ve
n = 114
Cumulative Yield
Reject Help
n = 646
Sp 90%
Screen #2 –ve
NPV 81%
Probable Depression n = 73
FN = 40
Sp 90%
PPV 95%
Want Help
TN = 720
TP = 114
TN = 72
Probable non-case
FP = 8
Reject Help
n = 79
Want Help
n = 17
FN = 16
Reject Help
n = 71
TN = 792
FP = 8
Sp = 95%
PPV 95%
Helped = 203
TP = 144
FN = 56
Sp = 93%
NPV 93%
Not Helped = 797
Figure AP.3. Two-step screen yield including desire for help in medium-prevalence sample. In this scenario the first step screen yields a positive predictive value (PPV) of 67% (given a baseline prevalence of 20%), but the addition of the second step improves this to 95%. However, only about half of those identified as depressed actually want and accept professional help and about 15% of those without depression also want help. Thus, desire for help for psychosocial problems does not map exactly with the presence of distress.
381
Optimum Cut-off value Healthy Individuals
Test Negative
Test Positive
Healthy
True –ve
False +ve
False –ve
True +ve
Test Score
Depressed
Depressed Individuals
Figure AP.4. Conceptual overlap of test scores in healthy and depressed individuals.
382
Revised Emotion Thermometers Scale 7-items Instructions In the first four columns, please mark the number (0–10) that best describes how much emotional upset you have been experiencing in the past two weeks, including today. In the next three columns, please indicate how much impact this has had on you. Emotional Upset
Emotional Impact
1. Distress
2. Anxiety
3. Depression
4. Anger
5. Duration
6. Burden
7. Need Help
10 = Extreme
10 = Extreme
10 = Extreme
10 = Extreme
10 = 10+months
10 = Cannot function at all
10 = Desperately
10
10
10
10
10
10
10
9
9
9
9
9
9
9
8
8
8
8
8
8
8
7
7
7
7
7
7
7
6
6
6
6
6
6
6
5
5
5
5
5
5
5
4
4
4
4
4
4
4
3
3
3
3
3
3
3
2
2
2
2
2
2
2
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0 = None
0 = None
0 = None
0 = None
0 = Just today
0 = No Effect on me 0 = can manage myself
Figure AP.5. Emotion Thermometers. Source: Adapted from the NCC Distress Thermometers, , Alex Mitchell.
Edinburgh Postnatal Depression Scale (EPDS) Name:
Address:
Your Date of Birth: Phone: Please check the answer that comes closed to how you have felt in the past 7 days, not just how you feel today. Here is an example, already completed. I have felt happy: Yes, all the time Yes, most of the time No, not very often No, not at all
This would mean: “I have felt happy most of the time” during the past week. Please complete the other questions in the same way.
In the past 7 days: 1. I have been able to laugh and see the funny side of things
As much as I always could Not quite so much now Definitely not so much now Not at all 2. I have looked forward with enjoyment to things
As much as I ever did Rather less than I used to Definitely less than I used to Hardly at all *3. I have blamed myself unnecessarily when things
went wrong Yes, most of the time Yes, some of the time Not very often No, never *4. I have been anxious or worried for no good reason
No, not at all Hardly ever Yes, sometimes Yes, very often *5. I have felt scared or panicky for no very good reason
Yes, quite a lot Yes, sometimes No, not much No, not at all
*6. Things have been getting on top of me
Yes, most of the time I haven’t been able to cope at all Yes, sometimes I haven’t been coping as well as usual No, most of the time I have coped quite well No, I have been coping as well as ever *7. I have been so unhappy that I have had difficulty sleeping
Yes, most of the time Yes, sometimes Not very often No, not at all *8. I have felt sad or miserable
Yes, most of the time Yes, quite often Not very often No, not at all *9. I have been so unhappy that I have been crying
Yes, most of the time Yes, quite often Only occasionally No, never *10. The though of harming myself has occured to me
Yes, quite often Sometimes Hardly ever Never
Source: Reprinted, with permission, from Cox JL, Holden JM, Sagovsky R. 1987. Detection of postnatal depression: Development of the 10-item Edinburgh Postnatal Depression Scale. British Journal of Psychlatry 150:782–786.
Figure AP.6. Edinburgh Postnatal Depression Scale (EPDS). 1987 The Royal College of Psychiatrists. The Edinburgh Postnatal Depression Scale may be photocopied by individual researchers or clinicians for their own use without seeking permission from the publishers. The scale must be copied in full and all copies must acknowledge the following source: Cox, J.L., Holden, J.M., & Sagovsky, R. (1987). Detection of postnatal depression. Development of the 10-item Edinburgh Postnatal Depression Scale. British Journal of Psychiatry, 150, 782-786. Written permission must be obtained from the Royal College of Psychiatrists for copying and distribution to others or for republication (in print, online or by any other medium). Translations of the scale, and guidance as to its use, may be found in Cox, J.L. & Holden, J. (2003) Perinatal Mental Health: A Guide to the Edinburgh Postnatal Depression Scale. London: Gaskell.
Index
Note: Page Numbers followed by f denotes figures, t denotes tables and b denotes boxes Accuracy. See Diagnostic accuracy Activities of daily living (ADLs), 244, 246, 257 Acute myocardial infarction (AMI). See Cardiovascular care Adaptive technology. See Technological approaches Adjustment disorders, 18–19, 168, 195, 268, 271, 275 Affect heuristic, 117 African Americans detection and, 62, 75 readiness to disclose, 67 Agency for Health Care Policy and Research (U.S.), 125 Agency for Healthcare Research and Quality, 165 Alcohol, Drug Abuse, and Mental Health Administration, U.S. (ADAMHA), 21–22 Alcohol and drug abuse underdiagnosis of, 115 Algorithms in adaptive testing, 144, 354–356 clinician avoidance of, 353 for CVD patients, 326 in diagnostic accuracy, 22, 49–50, 109 in diagnostic checklists, 10 in DSM-IV, 43–44 in MDI, 43–44 in mood scales, 33 psychosocial, 291 in questionnaires, 12, 14–15 screening frequency, 184 for touchscreen technology use, 152 American College of Cardiology, 319 American Heart Association (AHA), 317, 319, 363–364 Anxiety and Depression Detector (ADD), 175 Aphasic Depression Rating Scale, 33
Appendices, 371–384 Edinburgh Postnatal Depression Scale, 384f emotion thermometers, 383f low-risk single screen yield, 379f low-risk two-step screen yield, 380f overlap of test scores, 382f somatic/nonsomatic symptoms and scales, 373–374t statistical accuracy, 375–378t symptoms and scales, 371–372t two-step screen yield in medium-prevalence sample, 381f Area under the curve (AUC), 93, 268, 275 Availability heuristic, 117 Axis I disorders epilepsy diagnosis and, 250 screening instruments, 22 underdiagnosis of, 115 Axis II disorders screening instruments, 22 Barriers, diagnostic, 65–66b, 69b Bayes’ theorem, 59, 59f, 102, 360 Bech-Rafaelsen Melancholia Scale (MES), 92 Beck Depression Inventory (BDI) BDI-FS (Fast Screen), 38, 245, 248–249 BDI-PC (Primary Care), 195–196, 198–199, 357 comparisons to other scales, 39, 41, 45t future developments, 44, 48 history of, 30, 31, 31b, 37–38 sensitivity to change, 35 Best estimates procedure (BEP), 15–16 Bipolar disorders among MS patients, 247 antidepressants and, 119 epilepsy and, 254 false-positive diagnosis and, 18
385
386 Bipolar disorders (continued ) Parkinson’s disease and, 255 suicidal ideation in, 252 Black African patients (U.K.) detection and, 64 Brief Assessment Schedule Depression Cards (BASDEC), 245 Brief Finder for Depression (BCD), 196–197 Brief Symptom Inventory (BSI)-18, 272–277 CADI (Computer-Assisted Diagnostic Interview), 18 Canadian Community Health Survey (CCHS 1.2), 249 Canadian Task Force on Preventive Health Care (CTFPHC), 127b, 128–129 Cancer care, 265–294 future developments, 293–294 HADS and, 6f implementation of screening programs in, 276–292 evaluating efficacy in, 278–292, 279–290t PHQ and, 94, 360 prevalence of depression in, 265–266 Rasch Models and, 92, 93 SCID and, 266–267 screening in oncology settings, 267–276 BSI-18, 272 conventional mood severity scales, 267–271, 269–270t distress paradigm, 271 distress thermometers, 272–276, 273–274t screening methods, 266–267 special issues in, 292–293 technological approaches and, 152 Cardiovascular care, 317–329 evaluation and treatment recommendations, 326–328 barriers to implementation, 328 decision process, 327f HAM-D and, 217 PHQ and, 214, 324–329, 363 prevalence of depression in, 319–320 SCID and, 217, 244–245 screening instruments in, 320–326, 322–323t performance characteristics, 322–323 Case-finding, definition of, 29–30 CAT. See Computer-adaptive testing CATEGO-5 software, 21 CATI (computer-assisted telephone interview), 144, 147 CBT. See Cognitive-behavioral therapy CEG (Clinician Evaluation Guide, PRIME-MD), 42, 198, 353
INDEX Centre for Epidemiologic Studies Depression Scale (CES-D). See also specific research and studies comparisons to other scales, 41–42, 45b future developments, 44, 48 history of, 31, 31b, 39 Checklists. See Diagnostic checklists Chronic heart failure, 214, 215f. See also Cardiovascular care CIDI. See Composite International Diagnostic Interview Clinical judgment, 113–120 research on, 114–119 cognitive processes of clinicians, 116–119 narrowness of interviews, 114–115 patient feedback, 115–116 screening limitations, 119–120 Clinician Evaluation Guide (CEG) (PRIME-MD), 42, 198, 353 Clinician rated scales, 34–35, 34b Cochrane Collaboration evidence review, 125–126, 335, 344, 345 Cognitive-behavioral therapy (CBT) in cardiovascular care, 318, 327f in diabetes care, 343–344, 346f global fatigue severity and, 247 in primary care, 168 Cognitive heuristics, 117 Collaborative care. See Enhanced care Comorbid depression and somatic symptoms, 211–217 comparative studies on, 218–235t healthy controls versus, 214 noncomparative studies on, 216–217 physical illness alone versus, 214–215, 215f primary versus secondary depression, 211–214, 212f Composite International Diagnostic Interview (CIDI) CCHS and, 249–250 explanation of, 20–22 in MAGPIE study, 64 in primary care, 163, 175, 183 in studies on diagnostic accuracy, 16t, 17, 21, 50, 321, 322–323t Comprehensive Psychopathological Rating Scale (CPRS), 37 Computer-adaptive testing (CAT) algorithms in, 144, 354–356 based on Rasch Model, 93–95 elderly and, 293 in oncology settings, 276–278 studies on, 355–356 Computer-Assisted Diagnostic Interview (CADI), 18
INDEX Computerization issues, 147–150. See also Technological approaches acceptability, 149 availability, 150 embedding in systems, 150 error control, 147–148 examples of, 150–152 honesty, 148 performance, 148 physical clues, 148 price, 149–150 quality control and accuracy, 147 workload considerations, 148–149 Confirmatory hypothesis testing, 116 Conspicuous psychiatric morbidity, 136 Continuum argument, 5 Cornell Scale for the Assessment of Depression in Dementia (CSDD), 33 Coronary heart disease. See Cardiovascular care CPRS (Comprehensive Psychopathological Rating Scale), 37 Cronholm-Ottosson Depression Scale, 30 Crowding-out hypothesis, 73–74 Decision process in cardiovascular care, 327f clinical diagnosis, 103f Depression, 3–24 definition of, 3–7 DSM-IV symptoms in Zurich study, 7f HADS scores in cancer outpatients, 6f psychiatric diagnostic certainty levels, 4b test score distribution, 5f diagnostic checklists, 10–15 DSM, 11–15 DSM-IV, 9–11, 11b history of, 10b ICD-10, 9–15, 11b, 21 for psychiatry, 11b structured/semi-structured diagnosis fully structured assisted interviews, 20b, 21–22 interviews, 19–22 partially structured assisted interviews, 20b unstructured clinician diagnosis, 15–19 diagnostic accuracy in primary care, 16t diagnostic accuracy in routine diagnoses, 15–19 diagnostic accuracy of psychiatrists, 17t, 19t validity of syndrome concept, 7–10 validity testing, 8b Depression in the Medical Ill (DMI) Scales, 195–199, 197t, 357 Depression Scale in Schizophrenia (DEPS), 33
387 Detection, 57–75 barriers to, 69b influences on, 66–71 clinician communication, 68–71, 69b illness related, 71–74, 73f patient self-report, 66–68 over/under diagnosis, 57–62 detection errors, 60–62, 61–62f in primary care, 58–60, 59f predictors of, 62–66 recognition barriers, 65–66b recognition sensitivity, 63f Diabetes care, 335–346 depression as major health problem in prevalence of, 336f, 337 prognostic relevance of, 337–338 quality of life issues, 338–339, 336–337t socioeconomic aspects of, 339, 342 screening programs, 344–345 clinical management flow diagram, 346f cost-effectiveness of, 345 effect on morbidity, 344–345 screening tests, 340–343, 341t acceptability of, 341, 343f performance of, 340–341 routine detection versus, 342b treatment options, 343–344 nonspecific interventions, 343 specific antidepressive interventions, 344 Diagnosis, definition of, 29–30 Diagnostic accuracy. See also Diagnostic methods algorithms in, 22, 49–50, 109 comparison of screening tools, 49t computerization and, 147 improving accuracy, 49–50 measures of, 102b in primary care, 16t of psychiatrists, 17t, 19t in routine diagnosis, 15–19 somatic symptoms, 209–211 approaches to, 210b, 212f studies on, 16–17, 16t, 17–20, 17t, 19t, 23f Diagnostic and Statistical Manual (DSM). See also specific research and studies comparisons to other scales, 20b, 21–22, 36–44, 45t DSM-III-R checklists, 22 history of, 24 DSM-IV algorithms in, 43–44 checklists, 9–15, 11b decision-tree logic and, 22
388 Diagnostic and Statistical Manual (DSM) (Continued ) history of, 21, 24, 31, 31b limitations of, 33 MDD characteristics, 5, 7–8 somatic symptoms, 208t validation of criteria, 11–15, 12t in Zurich study, 7f future developments, 44, 50 history of, 11–12 in studies on diagnostic accuracy, 16–17 Diagnostic barriers, 65–66b Diagnostic certainty, levels of, 4b Diagnostic checklists algorithms in, 10 DSM-IV, 9–15, 11b history of, 10b ICD-10, 9–15, 11b, 21 for psychiatry, 11b Diagnostic Interview Schedule (DIS), 17, 20–22, 42, 62, 210 Diagnostic methods, 99–111 clinical diagnosis, 99–103 accuracy measures, 102b case examples, 100–101, 100t, 101b, 106b, 107t decision theory, 103f evidence-based, 101–103 diagnostic accuracy, clinical aspects, 105–109 algorithm approaches, 109 pre/post testing, 106–108, 107t, 108f rule-in accuracy, 108–109 diagnostic accuracy, scientific aspects, 103–105 likelihood ratios, 104–105, 105f 2 2 table, 104f implementation studies, 109–111 added value, 111 feasibility, 110–111 UK guidelines, 110b Diagnostic overshadowing, 115 Differential item functioning (DIF), 36, 87, 91–93, 213 DIS. See Diagnostic Interview Schedule Distress as ‘‘vital sign,’’ 292 Distress thermometers, 272–276, 273–274t DMI Scale. See Depression in the Medical Ill DSM. See Diagnostic and Statistical Manual Dysthymia criteria for, 12t, 13 MDD versus, 19 prevalence in primary care, 162 Dysthymic-like disorder of epilepsy (DLDE), 252
INDEX ECA study (Epidemiologic Catchment Area), 17, 42, 210 Edinburgh Postnatal Depression Scale (EPDS) acceptability of, 357 examples of, 384 history of, 31, 41–42 in perinatal care, 33, 301–304, 308–310, 313, 362 Rasch analysis on, 42, 91 EDSS (Extended Disability Status Scale), 249 Education Testing Service, 144 Elderly cancer care and, 265 CES-D and, 92 comorbid depression and, 203–204 computer-adaptive tests and, 293 detection and, 63, 64, 68, 70, 73, 75 IVR methods and, 148 as stroke patients, 245 Electronic medical records (EMR), 150, 184 Emotional State Questionnaire-2 (EST-Q2), 175 Emotion thermometers, 49, 275–276, 383f Enhanced care, 123–137 screening and arguments for and against, 124–125 comparison of studies, 131f evidence for, 128–129 outcome improvement, 125–127, 126f, 129, 136 recommendations for, 128 studies and patient population, 132–135t Enhancing Recovery in Coronary Heart Disease (ENRICHD) trial, 318 EORTC QLQ C-30, 278, 291 Epidemiologic Catchment Area (ECA) study, 17, 42, 210 Epilepsy depression in, 249–255 clinical manifestations, 250–253 epidemiologic aspects, 249–250 quality of life issues, 254–255 screening instruments, 253–254 HAM-D and, 254 Erectile dysfunction, 342 Ethnic groups. See specific groups Etiologic approach characteristics of, 194 definition of, 210b somatic symptoms and, 209–210, 266–267 European Study of the Epidemiology of Mental Disorders (ESEMeD), 58, 60, 72t Even Briefer Assessment Scale for Depression (EBAS DEP), 44
389
INDEX Exclusive approach characteristics of, 193 definition of, 210b in screening tools, 198–199 somatic symptoms and, 73–74, 266–267, 357 Extended Disability Status Scale (EDSS), 249 Factor analyses, 88, 91 False-positive depression, 193–194 Feighner Diagnostic Criteria (FDC), 10, 21 Fluoxetine, 343–344 General Health Questionnaire (GHQ), 31, 65, 67–69, 75, 88, 94, 175, 208, 321 Geriatric Depression Scale (GDS) cutoff scores and, 324 history of, 31, 40–41 in neurologic disorders, 244–245 PD and, 256–257 in primary care, 169 Rasch analysis on, 41, 91, 93 Global Parkinson’s Disease Survey Steering Committee, 258 HADS. See Hospital Anxiety Depression Scale Hamilton, Max, 36 Hamilton Depression Rating Scale (HAM-D) in cancer care, 214 in cardiovascular care, 217 epilepsy and, 254 HAMD-17, 257 history of, 30–31, 36–37 in neurologic disorders, 244–245, 247, 257 Rasch analysis on, 91–92, 216 sensitivity to change, 35 Hawthorne effect, 111 Heart disease. See Cardiovascular care Hermanns, Norbert, 362–363 Hispanics collaborative care and, 223 readiness to disclose, 67 History taking diagnosis and, 68 importance of, 114–115 HIV/AIDS depression and, 216 Hopkins Symptom Checklist (SCL), 31, 41–42, 91, 93, 175, 344–345 Hospital Anxiety Depression Scale (HADS). See also specific research and studies in cancer care, 6f history of, 31, 39–40 in primary care, 40f sensitivity to change, 35, 37
Household National Health Interview Survey, 204 HSCL-25. See Hopkins Symptom Checklist (SCL) Human bias, 147 HUNT-II study (Norway), 210 Hyett, Matthew, 356–357 Ictal symptoms, 250–251 Inclusive approach characteristics of, 193 definition of, 210b somatic symptoms and, 266–267, 357 Institute of Medicine, 161–162 Institut National de la Sante´ et de la Recherche Me´dicale (INSERM) study, 60, 72t Interactive voice recognition (IVR) methods, 148, 149, 152 Interictal dysphoric disorder (IDD), 252–253 Interictal symptoms, 250–253 International Classification of Diseases (ICD). See also specific research and studies comparisons to other scales, 20b, 21–22, 43–44 future developments, 10b, 44, 50 history of, 11–12, 24, 31, 31b ICD-10 assessment tools and, 22 diagnostic checklists, 9–15, 11b, 21 MDD characteristics, 8 somatic symptoms, 208t validation of criteria, 13 limitations of, 33 International Diagnostic Checklists (IDCL), 11b Internet based screening. See Technological approaches Interviews, diagnostic, 19–22 fully structured assisted interviews, 20b, 21–22 narrowness of, 114–115 partially structured assisted interviews, 20b patient-centered, 66b research on, 114–115 Intuition, 41, 113, 117 Item banking, 85–86, 88, 89f, 93–95 IVR (interactive voice recognition) methods, 148, 149, 152 LEAD (Longitudinal evaluation performed by Expert clinicians who utilize All available Data) standard, 15, 18, 22 Likert scoring, 41 Lists of Integrated Criteria for the Evaluation of Taxonomy (LICET), 11b MAGPIE study on amount of patient contact, 71 on detection, 61, 64 patient disclosure, 67–68
390 Magruder-Habib, K., 356 Major Depression Inventory (MDI), 31, 43–44 Major depressive disorder (MDD). See also specific aspects and research characteristics of, 5, 7–8 criteria for, 12–15, 12t, 14b as disorder, 3–5, 7 syndrome concept and, 7–9 prevalence rates of, 163, 204f unstructured clinician diagnosis in, 15–19 MASS (Mood and Anxiety Spectrum Scales), 95, 354 MDQ (Mood Disorder Questionnaire), 254 Medical settings, 191–199 DMI and, 195–199, 197t false-positive depression, 193–194 medically ill patients, 192–193 depression in, 192–193 HADS and, 194, 195–196 parsimonious screening, 196–197 PRIME-MD and, 198 Men detection and, 62, 75 diabetes and, 342 erectile dysfunction, 342 stroke and, 243 Mental disorders detection of, 57 ESEMeD studies on, 58, 60, 72t Rasch Model on, 88–91, 89–90f underdiagnosis of retardation, 115 WHO classification of, 11 WHO studies on, 163, 164, 356 Mental Health Index and Hospital Anxiety and Depression Scale, 152 Mental retardation underdiagnosis of, 115 MES (Bech-Rafaelsen Melancholia Scale), 92 Meta-regression, 130, 136, 355 MIDAS project (Rhode Island), 13–15, 14b, 207, 210–211 Mild depression, 5, 43, 60, 73, 338, 350 M.I.N.I. (Mini-International Neuropsychiatric Interview), 20, 22, 166 Minor depression. See also specific diseases criteria for, 12–13, 12t detection of, 43, 60, 71, 192 diagnostic rates of, 16, 58, 163 treatment for, 168 Monothetic diagnostic checklists, 10 Montgomery, S. A., 37 ˚ sberg Depression Rating Scale Montgomery-A (MADRS), 30–31, 35, 37–38, 213, 216, 256–257
INDEX Mood and Anxiety Spectrum Scales (MASS), 95, 354 Mood Disorder Questionnaire (MDQ), 254 Mood scales, 83–96 algorithms in, 33 Rasch Model, 86–95 assessment of, 87 clinical testing, 93 computer-adaptive tests, 93–95 features of, 87–88 instruments based on, 91–93, 91t item banking, 93–95 mental health measures and, 88–91, 89–90f tool development, 84–85t, 84–86 Mood Thermometer (MT), 275–276 MOS 8-Item Depression Screener (Burnam Screen) acceptability of, 44–45, 48 detection and, 62–63 history of, 31, 42 MOS-D, 169, 172 Multiple sclerosis (MS) depression in, 246–249 clinical manifestations, 247 epidemiologic aspects, 246–247 quality of life issues, 249 screening instruments, 248–249 SCID and, 249 National Ambulatory Medical Care Survey, 164 National Comorbidity Survey, 13, 21, 162, 164, 166 National Comprehensive Cancer Network (NCCN), 271, 272, 294, 360 National Heart, Lung, and Blood Institute (NHLBI), 326, 363 National Institute for Health and Clinical Excellence (NICE) (U.K.), 42, 127b, 128, 137, 303–304, 308, 313, 362 National Institute of Mental Health (NIMH), 21, 39 National Institutes of Health (U.S.), 95 National Screening Committee (NSC) (U.K.) on depression and diabetes, 345 on perinatal screening, 304–308, 305b, 306–307t, 311 screening definitions, 30b screening guidelines, 109–110, 110b, 124–125, 336 on treatment options, 343 National Study of Medical Care Outcomes (MOS), 42. See also MOS 8-item Depression Screener National Survey of Mental Health and WellBeing (Australia), 50
INDEX Natural disease entities, 4 Negative predictive value (NPV), 29–30 Negative Predictive Value, definition of, 102 Neurological Disorders Depression Inventory for Epilepsy (NDDI-E), 253 Neurologic disorders, 241–258 epilepsy, 249–255 clinical manifestations, 250–253 epidemiologic aspects, 249–250 quality of life issues, 254–255 screening instruments, 253–254 GDS and, 244–245 HAM-D and, 244–245, 248, 257 multiple sclerosis, 246–249 clinical manifestations, 247 epidemiologic aspects, 246–247 quality of life issues, 249 screening instruments, 248–249 Parkinson’s disease, 255–258 clinical manifestations, 205, 256 comparative studies on, 213, 216, 236 quality of life issues, 257–258 screening instruments, 44, 256–257, 361 stroke, 242–246 clinical manifestations, 243–244 epidemiologic aspects, 242–243 PSD and, 246 screening instruments, 106b, 244–246 Neuropsychiatric Inventory (NPI), 248 NHANES II study, 337 Nortriptyline, 343–344 Not otherwise specific (NOS), 13 Oncological settings. See Cancer care Operational Criteria Checklist (OPCRIT), 11b Parkinson’s disease (PD) depression in, 255–258 clinical manifestations, 205, 256 comparative studies on, 213, 216, 236 quality of life issues, 257–258 screening instruments, 44, 256–257, 361 MADRS and, 213, 216, 256–257 SCID and, 216 SDS and, 44 studies on, 205, 213, 216, 236 Partial Credit Model, 88 Partners in Care study, 128–130, 355 Pathways study, 344–345, 363 Patient Health Questionnaire (PHQ) in cancer care, 360 item banking and, 94 in cardiovascular care, 214, 324–329, 363 in diabetes care, 340, 344–345
391 in medical settings, 198 PHQ-2 acceptability of, 48 in perinatal settings, 304 as short screener, 170–175, 267 in studies on diagnostic accuracy, 109 PHQ-9 as computerized tool, 144, 148 for general emotional distress, 175 history of, 31, 42–43 as self-rating instrument, 254 in studies on diagnostic accuracy, 16–17, 17t, 100–101 in studies on feasibility, 110–111 treatment effectiveness and, 177 in primary care, 198, 321, 351–352 Rasch analysis on, 88 Patient-Reported Outcomes Measurement Information System (PROMIS), 95 PDSS (Postpartum Depression Screening Scale), 301, 303, 308, 362 Peri-ictal symptoms, 250 Perinatal settings, 299–314 comparison of methods, 305, 308–309, 309t EPDS and, 33, 301–304, 308–310, 314, 362 guidelines and recommendations, 304–305, 305b, 306–307t implementation in practice, 310–311 false screening outcomes, 311t perinatal screening model, 312f refusal of treatment, 303, 310b purpose of screening in antenatal period, 301–302 in postpartum period, 301 recommendations for, 313–314 screening practices in, 303–304 service delivery and treatment implications, 311, 313 Person-item map, 89f Perth Community Stroke Study, 245 PHQ. See Patient Health Questionnaire Polydiagnostic Interview (PODI), 21 Polythetic diagnostic checklists, 10 Positive predictive value (PPV), 30 Positive Predictive Value, definition of, 102 Positive utility index, 108 Postictal symptoms, 251–252 Postpartum Depression Screening Scale (PDSS), 301, 303, 308, 362 Post-stroke depression (PSD). See also Cardiovascular care clinical manifestations, 243–244 epidemiologic considerations, 242–243 recovery and risks, 246
392 Post-stroke depression (PSD) (continued ) screening instruments, 33, 106b, 244–246 studies on, 106–109, 211–213, 216 Post-Stroke Depression Scale (PSDS), 245 Predictive Summary Index, definition of, 102 Pregnancy. See Perinatal settings Preictal symptoms, 251 Primacy effect, 116 Primary care, 161–184. See also Clinical judgment CIDI and, 163, 175, 183 detection in, 58–60, 59f epidemiology of depression in, 162–164 population prevalence of, 162–163 primary care as de facto treatment system, 163–164 primary care prevalence of, 163 unassisted recognition, 164 future developments, 182–184 GDS and, 169 HADS and, 40f importance of screening in, 165–168 acceptable diagnostic tools, 166–167, 167f cost-effectiveness of, 168 facility availability, 166 as health problem, 165 history of disease, 167–168 in latent stage, 166 lifespan screening, 168 screening tools, 166 treatable conditions, 165 treatment policies, 168 M.I.N.I. and, 166 PHQ and, 198, 321, 351–352 screening strategy implementation, 177–182 one-stage approach, 177–182, 178t time burden, 181–182, 181t two-stage approach, 178–182, 179–180t screening tools, 169–177, 170–171t, 173–174t, 176t for general emotional distress, 175 for multiple disorders, 175, 177 one-stage approach, 172, 175 severity ratings, 177 short screeners, 169, 172 standard screeners, 169 ultra-short/brief screeners, 172 unstructured clinician diagnosis in, 16t Primary Care Evaluation of Mental Disorders (PRIME-MD), 42, 183, 197–198, 353. See also Patient Health Questionnaire Primary care physician (PCP). See Primary care Problem Areas in Diabetes Questionnaire (PAID), 340, 341
INDEX PROSPECT trial, 167–168 PSE (Present State Examination)/SCAN, 20–22, 93 Psychiatric Screening Questionnaire for Primary Care Patients (PSP) study, 42 Psychological Problems in General Health Care study (PPGHC) on detection, 58, 60, 72t Psychological Screen for Cancer (PSSCAN), 277–278 QOLIE-89, 254–255 Quality of life issues in diabetes care, 338–339, 338–339t in epilepsy, 254–255 in MS, 249 in PD, 257–258 Rasch analysis on BDI, 38 on EPDS, 42, 91 on GDS, 41, 91, 93 on HAM-D, 37 on SDS, 38 Rasch Model, mood scales, 86–95 assessment of, 87 clinical testing, 93 computer-adaptive tests, 93–95 features of, 87–88 instruments based on, 91–93, 91t item banking, 93–95 mental health measures and, 88–91, 89–90f Rating Scale, 88 Receiver operating characteristic (ROC) analyses, 196, 267–268, 272, 275–276, 324 Refusal of treatment, 303, 310b Renard Diagnostic Instruments, 21 Representativeness heuristic, 117 Research Diagnostic Criteria (RDC), 10, 20–21, 209 Roter Interactional Analysis System, 68–69, 70 Routine screening in clinical practice, 349–366 Babaei on, 356, 358 Barton on, 361, 362 Beck on, 354 Bermejo on, 351 Boyce on, 361, 362 Carlson on, 360 de Kok on, 361 Dwight-Johnson on, 361 Garssen on, 361 Gibbons on, 354 Gilbody on, 354
INDEX Hermanns on, 362–363 Hyett on, 356–357 Jacobsen on, 360 Kanner on, 359–360 Katon on, 350, 363 Kessler on, 351–352 Kulzer on, 362–363 Magruder on, 356 Mitchell on, 350–351, 352–353, 356, 358, 360–361 Parker on, 356–357 Patten on, 351 puffer phenomenon and, 364–366 Ransom on, 360 Rogers on, 355–356 Smith on, 354 Strong on, 361 Thombs on, 363 UPSTF report and, 354–355 Valenstein on, 353 Von Korff on, 354 Wang on, 351–352 Wells on, 355 Yeager on, 356 Ziegelstein on, 363 Zimmerman on, 350–351 SADQ (Stroke Aphasic Depression Questionnaire), 33, 245–246 Scales and tools, 29–50. See also specific scales and tools classic Severity Scales BDI, 37–38 CES-D, 39 HAM-D, 36–37 MADRS, 37 SDS, 38 cutoff scores for varying severity, 45t future developments accuracy comparison, 49t improving acceptability, 44, 48–49 improving accuracy, 49–50 generic scales, 32b history of, 30–32 new Severity Scales EPDS, 41–42 GDS, 40–41 HADS, 39–40 MDI, 43–44 MOS, 42 PHQ, 42–43 patient-rated versus clinician-rated scales, 34–35 scale properties, 46–47t
393 screening procedures definition of, 29–30 sensitivity to change of mood and, 35–36, 36f severity scales, 33–34 short versions of rating scales, 48b special population scales, 32b Schedule for Affective Disorders and Schizophrenia (SADS), 20, 208 Schedule for Affective Disorders and Schizophrenia-Lifetime (SADS-L), 16 Schedules for Clinical Assessment in Neuropsychiatry (SCAN), 16, 44 SCID (SCID). See Structured Clinical Interview for DSM Disorders SCL. See Hopkins Symptom Checklist SCL-90-R (Symptom Checklist-90), 272 Screening, definition of, 29–30 Screening algorithms. See Algorithms Screening-detection-treatment-improvement paradigm, 124, 137 Screening procedures, definition of, 29–30 SDDS-PC. See Symptom-Driven Diagnostic System for Primary Care SDMT (Symbol Digit Modalities Test), 248 SDS. See Zung Self-Rating Depression Scale Self-reporting efficacy of, 95, 267, 352 influences on, 66–68, 325 by medically ill patients, 197t patient-rated scales, 34–35, 34b, 38, 40–41, 242, 267 recognition rate and, 71 recommendations on, 127b studies on computerization of, 276–277 studies on screening instruments, 198, 248–249 Sensitivity (Se), definition of, 102 Sensitivity to change, 35–36 Severity assessment, definition of, 30 Severity Scales (1960-1980). See also specific scales and tools BDI, 37–38 CES-D, 39 HAM-D, 36–37 MADRS, 37 SDS, 38 Severity Scales (1981-2008). See also specific scales and tools EPDS, 41–42 GDS, 40–41 HADS, 39–40 MDI, 43–44 MOS, 42 PHQ, 42–43 Severity scales, limitations of, 33–34
394 Software, 21, 22, 151–152f Somatic symptoms, 203–236 comorbid depression and, 211–217 comparative studies on, 218–235t healthy controls versus, 214 noncomparative studies on, 216–217 physical illness alone versus, 214–215, 215f primary versus secondary depression, 211–214, 212f definition of, 205–209 diagnostic systems and scales in, 208–209 depression in physical disease, 203–205 inter-rater reliability and, 207b suicide risk and, 205, 206f, 207 diagnostic accuracy, 209–211 approaches to, 210b, 212f presentation rates of, 67 screening implications, 217, 236 Special population scales, 32b Specificity (Sp), definition of, 102 Specificity to change, 35 Spielberger State-Trait Anxiety Inventory (STAI), 94 Standardized mortality rate (SMR), 253 STAR*D, 213–214 Stroke. See also Post-stroke depression (PSD) depression and, 205, 217, 242–246 clinical manifestations, 243–244 epidemiologic aspects, 242–244 screening instruments, 33, 92, 217, 244–246 studies on, 217, 359 Stroke Aphasic Depression Questionnaire (SADQ), 33, 245–246 Strong, V., 361 Structured Clinical Interview for DSM Disorders (SCID) in cancer care, 266–267 in cardiovascular care, 217, 244–245 in detection, 64 explanation of, 43 MS and, 249 PD and, 216 Rasch analysis on, 92–93 in studies on diagnostic accuracy, 17–22, 19t, 23f Sub-major depression, 163, 167–168 Substitutive approach characteristics of, 196–194 definition of, 210b somatic symptoms and, 266–267, 357 Subsyndromal depression, 167, 192, 236, 320, 328, 337, 339 Sufficient diagnostic checklists, 10
INDEX Suicide bipolar disorders and, 252 clinical judgment and, 114–115 epilepsy and, 250, 252, 253 MIDAS project and, 13–14, 14b MS and, 247 Parkinson’s disease and, 256 physical disease and, 205, 206f, 207 under/over detection of risk, 57, 70, 74 Summary receiver operator characteristic curve (sROC), 104 Symbol Digit Modalities Test (SDMT), 248 Symptom Checklist-90 (SCL-90-R), 272 Symptom-Driven Diagnostic System for Primary Care (SDDS-PC), 44, 48, 166, 172, 175–176, 183 Targeted (high-risk) case-finding, 30 Technological approaches, 143–154 computerization issues, 147–150 acceptability, 149 availability, 150 embedding in systems, 150 error control, 147–148 honesty, 148 performance, 148 physical clues, 148 price, 149–150 quality control and accuracy, 147 workload considerations, 148–149 implementation of computerized screening, 150–152 methods of, 144–146, 145–146t Telephone technology in assessment, 39, 40, 130, 144, 147, 149, 152, 153, 278, 355–356 Thermometers distress assessment, 272–276, 273–274t emotional assessment, 49, 275–276, 383f Thombs, B. D., 358, 363 Treatment refusal, 303, 310b Tversky, Amos, 117 Unassisted diagnosis. See Unstructured clinician diagnosis Unemployed people, detection and, 62–63 Unified Parkinson’s Disease Rating Scale (UPDRS), 257 United States economic costs of depression in, 123–124, 165 mandated screening programs in, 42 NCCN member survey in, 294 PCPS and detection in, 71 prevalence rates in, 21 time spent per patient visit in, 162
395
INDEX Unstructured clinician diagnosis diagnostic accuracy in primary care, 16t of psychiatrists, 17t, 19t in routine diagnoses, 15–19 U.S. Preventative Services Task Force (USPSTF) guidelines for cardiovascular care, 328–329, 363–364 guidelines for enhanced care, 128–130, 136, 137 quality improvement in care, 354–355 on screening in primary care settings, 125–126, 127b, 177, 182–183, 326 Validus (validity), 8–9 Virginia Twin Registry, 13, 92–93 Wakefield Depression Inventory, 246 Well-Being Scale (WHO-5), 175, 198, 340, 341 Whites (U.K.) detection and, 64 Whites (U.S.) readiness to disclose, 67 Willis, Thomas, 335 Women. See also Perinatal settings breast cancer and screening, 268, 278 collaborative care and, 361 confiding in PCPs, 68 detection and, 63
Distress Thermometers and, 272, 275 EPDS and, 41–42 during pregnancy, 358–359, 361–362 Work and Health Initiative depression software, 151–152f World Bank, 123 World Health Organization (WHO). See also Composite International Diagnostic Interview (CIDI) analytic principles of, 124–125 criteria for screening programs, 165–168 on mental disorders, 11 Psychological Problems in General Health Care study, 58 SCAN instrument, 21, 93 study on mental disorders in primary care, 163, 164, 354 two-item scales, 267 Well-Being Scale (WHO-5), 175, 198, 340, 341 World Mental Health Survey, 310–311 Youden’s J in accuracy testing, 102, 104 Zung Self-Rating Depression Scale (SDS), 31, 38–39, 44, 64, 91, 144, 169, 177, 208–209, 244–245